Development of a tool for copy number analysis of cancer genomes using high throughput sequencing data
Afyounian, Ebrahim (2015)
Afyounian, Ebrahim
2015
Master's Degree Programme in Bioinformatics
BioMediTech - BioMediTech
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2015-06-25
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:uta-201507302190
https://urn.fi/URN:NBN:fi:uta-201507302190
Tiivistelmä
Genomic copy number alterations (CNA) and loss of heterozygozity (LOH) are two types of genomic instabilities associated with cancer. Acquisition of these genomic instabilities affects the expression level of oncogenes and tumor suppressor genes. Thus, accurate detection of these abnormalities is a crucial step in identifying novel oncogenes and tumor suppressor genes. Whole-genome sequencing of tumor tissues has enabled new opportunities for the detection of such aberrations and the characterization of genomic aberrations in tumor samples.
In this work, a fast tool for the identification of CNAs and copy-neutral LOH in tumor samples using whole-genome sequencing data was developed. The developed tool segments the genome by analyzing the read-depth and B-allele fraction profiles using a double sliding window method. It requires a matched normal sample to correct for biases such as GC-content and mapability and to discriminate somatic from germline events. The developed tool was evaluated on both simulated and real whole-genome sequencing data against competing, state of the art tools to demonstrate its accuracy. The tool, written in the Python programming language, is fast and performs segmentation of a whole genome in less than two minutes.
In this work, a fast tool for the identification of CNAs and copy-neutral LOH in tumor samples using whole-genome sequencing data was developed. The developed tool segments the genome by analyzing the read-depth and B-allele fraction profiles using a double sliding window method. It requires a matched normal sample to correct for biases such as GC-content and mapability and to discriminate somatic from germline events. The developed tool was evaluated on both simulated and real whole-genome sequencing data against competing, state of the art tools to demonstrate its accuracy. The tool, written in the Python programming language, is fast and performs segmentation of a whole genome in less than two minutes.