Hi Brian,
When using vcfR2genlight() with a very small dataset, the function may set different ploidy for each sample. This behavior comes from the automatic determination of the ploidy when creating an object of genlight class (see ploidy section from the genlight-class documentation). Below a reproducible example and a solution.
The reproducible example
library(adegenet)
data(vcfR_test)
# Conversion
gl <- vcfR2genlight(vcfR_test)
# Ploidy of each sample (all should be 2)
ploidy(gl)
#NA00001 NA00002 NA00003
# 1 1 2
# The genotype data with only biallelic sites
vcfR_test[ !grepl(",", vcfR_test@fix[,5]), ]@gt
# FORMAT NA00001 NA00002 NA00003
#[1,] "GT:GQ:DP:HQ" "0|0:48:1:51,51" "1|0:48:8:51,51" "1/1:43:5:.,."
#[2,] "GT:GQ:DP:HQ" "0|0:49:3:58,50" "0|1:3:5:65,3" "0/0:41:3"
#[3,] "GT:GQ:DP:HQ" "0|0:54:7:56,60" "0|0:48:4:51,51" "0/0:61:2"
When vcfR2genlight() translates the genotypes in binary code, the maximum allele code will be 0 (0|0), 1 (1|0 or 0|1), and 2 (1/1) for the first, second and third sample, respectively. So genlight will determine that the two first samples are haploid and the second is diploid.
The solution
The ploidy can be indicated when creating a genlight object. So the solution is to add this argument in the vcfR_conversion.R file:
# Line 145 should be:
x <- new('genlight', t(x), n.cores=n.cores, ploidy=2)
But I think it would be better to have the ploidy as an argument of the vcfR2genlight(). This would allow:
- someone who has "diploidized" the data to specify the ploidy of the organism (this can sometimes be the case when you work on malaria).
- someone who has samples with different ploidy (like in yeast).
If the ploidy is an argument, a warning message about mixed ploidy in the resulting genlight object could be a plus.
Comment
A more elegant way could be to determine the ploidy from the data. I wonder if it would be worth writing a general function to estimate the ploidy from the VCF because this has been raised several times (#106, #117, #121).
Fred
Hi Brian,
When using
vcfR2genlight()with a very small dataset, the function may set different ploidy for each sample. This behavior comes from the automatic determination of the ploidy when creating an object of genlight class (see ploidy section from the genlight-class documentation). Below a reproducible example and a solution.The reproducible example
When
vcfR2genlight()translates the genotypes in binary code, the maximum allele code will be 0 (0|0), 1 (1|0or0|1), and 2 (1/1) for the first, second and third sample, respectively. So genlight will determine that the two first samples are haploid and the second is diploid.The solution
The ploidy can be indicated when creating a genlight object. So the solution is to add this argument in the
vcfR_conversion.Rfile:But I think it would be better to have the
ploidyas an argument of thevcfR2genlight(). This would allow:If the
ploidyis an argument, a warning message about mixed ploidy in the resulting genlight object could be a plus.Comment
A more elegant way could be to determine the ploidy from the data. I wonder if it would be worth writing a general function to estimate the ploidy from the VCF because this has been raised several times (#106, #117, #121).
Fred