Skip to content

Comparison npstat vs Grenedalf: strong correlations but systematic biases #41

@elombaert

Description

@elombaert

Having previously used npstat for calculating theta pi, theta Watterson, and Tajima’s D, but being very interested in Grenedalf’s greater flexibility, I conducted some empirical comparative tests between the two methods.

I performed these tests on 8 populations from 4 insect species (2 populations per species), with genome sizes ranging from 250Mb to 2.5Gb. These populations consist of pooled samples of 40 to 50 individuals, with an average sequencing depth of approximately 100X. My tests involved window-based analyses using windows of 100,000 bases.

The general trends I observe are as follows:
• A very high correlation between the two methods for all three statistics (generally r > 0.95), except for one of the four species (details below).
• A systematic bias: npstat tends to yield higher values for theta pi, whereas Grenedalf tends to yield higher values for theta Watterson; no clear bias is observed for Tajima’s D. This trend is consistent across the 8 populations from the 4 species.

Among the four species tested, I observed an exception with the species having the largest genome (2.5Gb, with 10 chromosomes). In this case, the correlations for theta pi and theta Watterson are much lower, between 0.6 and 0.7. However, the correlations for Tajima’s D remain very high (>0.97).
Notably, for this species only, I also tested larger window sizes (500,000 and 1,000,000 bases), but no improvement was observed (the figures were very similar, and there was even a slight decrease in correlations).

Have you ever observed this kind of trend? Do you have any idea what might be causing the systematic biases in theta estimates?
Overall, I am unsure which tool yields the most reliable results, and I would be very interested in any feedback or comparative evaluations.

Thanks in advance to the community!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions