Sunday, April 8, 2018

1000 Genomes VCF files might not have all SNPs

I've been trying to determine the co-occurrence of a handful of SNPs using a 1000 Genomes VCF file, but one of the SNPs seems to be absent. Its rsID doesn't appear in the VCF, nor can I find it by position. It's a pretty common SNP if I'm reading its NCBI page right; I expect that it should co-occur with the others that I can find in the VCF.

Per the IGSR site, a couple million variations were culled in the production of the Phase 3 data set (that I'm using) for a handful of reasons including quality control. It's possible that this SNP didn't quite make the cut due to uncertainty.

No comments:

Post a Comment