A Hierarchical Clustering Method of Selecting Kernel SNP to Unify Informative SNP and Tag SNP.

IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM

PubMedID: 26357082

Bo Liao , Xiong Li , Lijun Cai , Zhi Cao , Haowen Chen . A Hierarchical Clustering Method of Selecting Kernel SNP to Unify Informative SNP and Tag SNP. IEEE/ACM Trans Comput Biol Bioinform. 2015;12(1):113-22.
Various strategies can be used to select representative single nucleotide polymorphisms (SNPs) from a large number of SNPs, such as tag SNP for haplotype coverage and informative SNP for haplotype reconstruction, respectively. Representative SNPs are not only instrumental in reducing the cost of genotyping, but also serve an important function in narrowing the combinatorial space in epistasis analysis. The capacity of kernel SNPs to unify informative SNP and tag SNP is explored, and inconsistencies are minimized in further studies. The correlation between multiple SNPs is formalized using multi-information measures. In extending the correlation, a distance formula for measuring the similarity between clusters is first designed to conduct hierarchical clustering. Hierarchical clustering consists of both information gain and haplotype diversity, so that the proposed approach can achieve unification. The kernel SNPs are then selected from every cluster through the top rank or backward elimination scheme. Using these kernel SNPs, extensive experimental comparisons are conducted between informative SNPs on haplotype reconstruction accuracy and tag SNPs on haplotype coverage. RESULTS
indicate that the kernel SNP can practically unify informative SNP and tag SNP and is therefore adaptable to various applications.