Codon usage bias of the overlapping genes in microbial genomes
NAIR, PREETHY (2010)
NAIR, PREETHY
2010
Bioinformatiikka - Bioinformatics
Lääketieteellinen tiedekunta - Faculty of Medicine
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2010-11-02
Julkaisun pysyvä osoite on
https://urn.fi/urn:nbn:fi:uta-1-20961
https://urn.fi/urn:nbn:fi:uta-1-20961
Tiivistelmä
Abstract
Background and aims: Overlapping genes are adjacent genes whose coding sequences overlap partially or completely. They are abundant in the viruses and also present in archaea, prokaryotes and eukaryotes. Former studies have showed that the overlapping genes play substantial roles in the genome size minimization, gene expression regulation and slowing down of evolution as modifications of the overlap part may cause deleterious effect on both of the genes. The present study was aimed to find whether there is any difference in the codon usage by the overlapping part of the genomes compared to the other parts and if any, to investigate trends in the codon usage bias patterns in the phylum level of taxonomy.
Methods: GenBank identifiers for the completely sequenced microbial genomes from the NCBI ftp site were collected and filtered to contain those for which the codon usage tables were available from the CUTG database. Perl and R scripts were created for data generation, result formatting and analyses. Genomic overlaps were identified, codon usage of the overlapping parts were estimated and compared to that of the normal parts. As the codon usage by the genomic overlaps in the analysed genomes showed a significant bias, trends of the codon usage bias were investigated in the phylum level of taxonomy by principal component analysis, self-organizing maps, correspondence analysis and heat map visualisations.
Results: Overlapping genes are present in all of the analysed microbes. Overlaps of length greater than nine base pairs are same strand overlaps. Codon usage by the codons in the overlapping part of all of the examined genomes showed a significant bias (p-vale <2.2e-16 from the χ2 goodness of fit tests) compared to the overall codon usage for the respective species in the CUTG database. The most overrepresented codon was TGA (73% of the analysed species). Analysis of the codon usage bias patterns using self-organizing map clustering followed by heat map visualisations shows that the clusters (codon usage bias patterns) identified with self-organizing map are very well associated with most of the Phyla analysed.
Conclusions: Genomic overlaps are present in all of the analysed species. Same strand overlaps are more abundant than the opposite strand overlaps. Codon usage of the genomic overlaps differs significantly from the normal part of the genomes in the analysed species. A well-defined trend in the stop codon usage bias pattern by the overlap parts in the phylum level of taxonomy is evident from the self-organizing map and heat map visualizations. This finding gives the evidence for the role of genomic overlaps in translational efficiency and gene expression regulation through transcriptional and translational coupling.
Asiasanat:Overlapping gene Codon usage
Background and aims: Overlapping genes are adjacent genes whose coding sequences overlap partially or completely. They are abundant in the viruses and also present in archaea, prokaryotes and eukaryotes. Former studies have showed that the overlapping genes play substantial roles in the genome size minimization, gene expression regulation and slowing down of evolution as modifications of the overlap part may cause deleterious effect on both of the genes. The present study was aimed to find whether there is any difference in the codon usage by the overlapping part of the genomes compared to the other parts and if any, to investigate trends in the codon usage bias patterns in the phylum level of taxonomy.
Methods: GenBank identifiers for the completely sequenced microbial genomes from the NCBI ftp site were collected and filtered to contain those for which the codon usage tables were available from the CUTG database. Perl and R scripts were created for data generation, result formatting and analyses. Genomic overlaps were identified, codon usage of the overlapping parts were estimated and compared to that of the normal parts. As the codon usage by the genomic overlaps in the analysed genomes showed a significant bias, trends of the codon usage bias were investigated in the phylum level of taxonomy by principal component analysis, self-organizing maps, correspondence analysis and heat map visualisations.
Results: Overlapping genes are present in all of the analysed microbes. Overlaps of length greater than nine base pairs are same strand overlaps. Codon usage by the codons in the overlapping part of all of the examined genomes showed a significant bias (p-vale <2.2e-16 from the χ2 goodness of fit tests) compared to the overall codon usage for the respective species in the CUTG database. The most overrepresented codon was TGA (73% of the analysed species). Analysis of the codon usage bias patterns using self-organizing map clustering followed by heat map visualisations shows that the clusters (codon usage bias patterns) identified with self-organizing map are very well associated with most of the Phyla analysed.
Conclusions: Genomic overlaps are present in all of the analysed species. Same strand overlaps are more abundant than the opposite strand overlaps. Codon usage of the genomic overlaps differs significantly from the normal part of the genomes in the analysed species. A well-defined trend in the stop codon usage bias pattern by the overlap parts in the phylum level of taxonomy is evident from the self-organizing map and heat map visualizations. This finding gives the evidence for the role of genomic overlaps in translational efficiency and gene expression regulation through transcriptional and translational coupling.
Asiasanat:Overlapping gene Codon usage