Bioinformatic analysis of next-generation sequencing data
RANTAPERO, TOMMI (2012)
RANTAPERO, TOMMI
2012
Bioinformatiikka - Bioinformatics
Biolääketieteellisen teknologian yksikkö - Institute of Biomedical Technology
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2012-06-11
Julkaisun pysyvä osoite on
https://urn.fi/urn:nbn:fi:uta-1-22740
https://urn.fi/urn:nbn:fi:uta-1-22740
Tiivistelmä
Backround and aims:In a recent linkage study involving 69 Finnish HPC (Hereditary prostate cancer) families, a novel prostate cancer susceptibility locus 2q37.3 was found (Cropp et al. 2011). In addition a signal from 17q21-22, found in a previous study, was confirmed.To further study these loci the families showing the strongest linkage were selected for targeted high-throughput sequencing in FIMM (Finnish Institute for Molecular Medicine). The aim of this study was to utilize bioinformatics methods to assess the variant data produced by the FIMM high-throughput sequencing pipeline in order to find potential candidates predisposing to prostate cancer
Methods:The variants were annotated utilizing an in house Python program and a local database constructedof resources including annotation tracks from UCSC Genome browser, Ensemble, microRNA.org and Vista. To evaluate the pathogenicity of the variants, three tolerance predictor programs were used: Mutation Taster, PolyPhen-2 and PON-P. These results were used to construct a list of candidate genes and variants. To find prostate cancer associated genes two databases DDPC, and COSMIC were used. To further study the relationship of the prostate cancer associated genes and candidate genes a gene ontology and pathway enrichment analysis was conducted for the prostate cancer gene set using WebGestalt2.
Results:As a result of pathogenicity prediction 155 pathogenic mutations were found. These variants were distributed to 101 genes of which four are associated to prostate cancer based on previous research.
Conclusion:In conclusion bioinformatics methods seem to be efficient in prioritizing variants for experimental validation. In addition, these methods can provide insights of how the pathogenic variants can predispose to cancer.
Asiasanat:Prostate cancer, next-generation sequencing, bioinformatics, variant data-analysis
Methods:The variants were annotated utilizing an in house Python program and a local database constructedof resources including annotation tracks from UCSC Genome browser, Ensemble, microRNA.org and Vista. To evaluate the pathogenicity of the variants, three tolerance predictor programs were used: Mutation Taster, PolyPhen-2 and PON-P. These results were used to construct a list of candidate genes and variants. To find prostate cancer associated genes two databases DDPC, and COSMIC were used. To further study the relationship of the prostate cancer associated genes and candidate genes a gene ontology and pathway enrichment analysis was conducted for the prostate cancer gene set using WebGestalt2.
Results:As a result of pathogenicity prediction 155 pathogenic mutations were found. These variants were distributed to 101 genes of which four are associated to prostate cancer based on previous research.
Conclusion:In conclusion bioinformatics methods seem to be efficient in prioritizing variants for experimental validation. In addition, these methods can provide insights of how the pathogenic variants can predispose to cancer.
Asiasanat:Prostate cancer, next-generation sequencing, bioinformatics, variant data-analysis