A comparison of the number of SNPs and mutations with synonymous (Ks) and nonsynonymous (Ka) substitution rates in human immunome
MESUE, NICHOLAS (2009)
MESUE, NICHOLAS
2009
Bioinformatiikka - Bioinformatics
Lääketieteellinen tiedekunta - Faculty of Medicine
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2009-10-01
Julkaisun pysyvä osoite on
https://urn.fi/urn:nbn:fi:uta-1-20102
https://urn.fi/urn:nbn:fi:uta-1-20102
Tiivistelmä
Background
Changes that occur in a nucleotide sequence of a gene are known as mutations. Mutations in general and single nucleotide polymorphisms (SNPs) in particular are the major driving forces of both genetic variation/evolution and genetic diseases in humans and other organisms. An understanding of the evolutionary pattern of genes and proteins related to the human immune system (human immunome) is of prime importance due to the fundamental role they play in preventing pathogens from invading host organisms. The values of nonsynonymous substitution (Ka) and synonymous substitution rates (Ks) give us a clear picture into the evolution of the human immunome. However, since our knowledge of mutations is increasing day by day, estimating these rates in order to understand human immunome is very essential.
Methods
I collected four datasets, gene2RefSeq and HomoloGene from EntrezGene database, SNPs and mutations from Immunome Knowledgebase (IKB). In addition, I used a data file consisting of 874 human immunome genes collected from IKB. I used Perl/bioperl modules to download GenBank files for both human and mouse orthologs, picked up the coding sequences, compared them with the standard GenBank’s, translated them, generated cDNA sequences using their protein sequences as a guide, aligned them globally and then estimated Ka and Ks rates for each ortholog pair.
Results
In a total of 755 human immunome genes, the mean nonsynonymous substitution rate (Ka) = 0.178 (0.158), mean synonymous substitution rate (Ks) = 0.685 (0.169), mean Ka/Ks = 0.394 (0.488) and mean Z-score = -13.15 (7.873). Most SNPs occurred in the intronic regions 123,265 (80.04%). Missense mutations had the highest frequency 1,878 (46.074%). The highest correlation was observed between Z-score and the number of coding synonymous SNPs (r = -0.47, p < 2.2e-16). Interestingly, the number of SNPs is associated with Ks and Z-score (r = -0.116, p = 0.001; r = -0.37, p < 2.2e-16) respectively.
Conclusion
Pooling ideas from the Ka, Ks and Ka/Ks estimates, human immunome genes are highly conserved at the protein level. Less than 3.3% of these genes evolving quickly, suggests a possibly adaptation of these genes. A strong evidence of a negative correlation between Z-score and number of coding synonymous SNPs despite a moderate correlation, suggests a biological relevance between these variables which is worth seeking, and interpreting.
Asiasanat:mutations, single nucleotide polymorphisms, synonymous sunstitutions, nonsynonymous substitutions, evolution, human immunome
Changes that occur in a nucleotide sequence of a gene are known as mutations. Mutations in general and single nucleotide polymorphisms (SNPs) in particular are the major driving forces of both genetic variation/evolution and genetic diseases in humans and other organisms. An understanding of the evolutionary pattern of genes and proteins related to the human immune system (human immunome) is of prime importance due to the fundamental role they play in preventing pathogens from invading host organisms. The values of nonsynonymous substitution (Ka) and synonymous substitution rates (Ks) give us a clear picture into the evolution of the human immunome. However, since our knowledge of mutations is increasing day by day, estimating these rates in order to understand human immunome is very essential.
Methods
I collected four datasets, gene2RefSeq and HomoloGene from EntrezGene database, SNPs and mutations from Immunome Knowledgebase (IKB). In addition, I used a data file consisting of 874 human immunome genes collected from IKB. I used Perl/bioperl modules to download GenBank files for both human and mouse orthologs, picked up the coding sequences, compared them with the standard GenBank’s, translated them, generated cDNA sequences using their protein sequences as a guide, aligned them globally and then estimated Ka and Ks rates for each ortholog pair.
Results
In a total of 755 human immunome genes, the mean nonsynonymous substitution rate (Ka) = 0.178 (0.158), mean synonymous substitution rate (Ks) = 0.685 (0.169), mean Ka/Ks = 0.394 (0.488) and mean Z-score = -13.15 (7.873). Most SNPs occurred in the intronic regions 123,265 (80.04%). Missense mutations had the highest frequency 1,878 (46.074%). The highest correlation was observed between Z-score and the number of coding synonymous SNPs (r = -0.47, p < 2.2e-16). Interestingly, the number of SNPs is associated with Ks and Z-score (r = -0.116, p = 0.001; r = -0.37, p < 2.2e-16) respectively.
Conclusion
Pooling ideas from the Ka, Ks and Ka/Ks estimates, human immunome genes are highly conserved at the protein level. Less than 3.3% of these genes evolving quickly, suggests a possibly adaptation of these genes. A strong evidence of a negative correlation between Z-score and number of coding synonymous SNPs despite a moderate correlation, suggests a biological relevance between these variables which is worth seeking, and interpreting.
Asiasanat:mutations, single nucleotide polymorphisms, synonymous sunstitutions, nonsynonymous substitutions, evolution, human immunome