Preprocessing and analysis of single-cell RNA-sequencing data
Pekkarinen, Meeri (2018)
Pekkarinen, Meeri
2018
Bioteknologian tutkinto-ohjelma - Degree Programme in Biotechnology
Lääketieteen ja biotieteiden tiedekunta - Faculty of Medicine and Life Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2018-05-21
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:uta-201806272091
https://urn.fi/URN:NBN:fi:uta-201806272091
Tiivistelmä
In recent years, single-cell measurement technologies have greatly advanced and offer a new approach to study biological problems. While the traditional RNA sequencing results in computational average transcriptome that represents the whole biopsy, scRNA-sequencing records the transcriptional differences between the cell types. However, the method is also more sensitive to various biological and technical noise, and the field still calls for further research and establishment. Especially data normalization and quality control require novel methods, since many of the tools originally developed for bulk RNA-seq data are based on assumptions that are not valid for scRNA-seq data.
The goal of this thesis was to gain insight in RNA-seq analysis and especially find the optimal ways to preprocess the data and asses its quality. The relevant tools were chosen and further tested in the computational part of the thesis. The final and the most important deliverable was an analysis pipeline constructed by combining the best approaches and necessary quality metrics.
A three-step pipeline utilizing various command line tools and Bioconductor R-packages was implemented. This pipeline performs preprocessing, transcript quantification, filtering, normalization and simple downstream steps. Most importantly, it produces both visual and statistical information for estimating various general features and quality properties of the data. The pipeline was successfully used to process two public scRNA-seq datasets.
The work was done for Genevia Technologies Oy in Tampere between October 2017 and May 2018. An ultimate goal was to develop a generalized pipeline that would be useful to the company. However, diverse analytical and technical issues make this task very challenging, and a couple of pitfalls still remain unsolved. One major reason is that no best practices are yet established. Regardless, the information provided by the pipeline should be helpful for picking suitable tools and thresholds for more sophisticated methods. Future development in the field will most certainly discover biological information that cannot be discerned by current tools.
The goal of this thesis was to gain insight in RNA-seq analysis and especially find the optimal ways to preprocess the data and asses its quality. The relevant tools were chosen and further tested in the computational part of the thesis. The final and the most important deliverable was an analysis pipeline constructed by combining the best approaches and necessary quality metrics.
A three-step pipeline utilizing various command line tools and Bioconductor R-packages was implemented. This pipeline performs preprocessing, transcript quantification, filtering, normalization and simple downstream steps. Most importantly, it produces both visual and statistical information for estimating various general features and quality properties of the data. The pipeline was successfully used to process two public scRNA-seq datasets.
The work was done for Genevia Technologies Oy in Tampere between October 2017 and May 2018. An ultimate goal was to develop a generalized pipeline that would be useful to the company. However, diverse analytical and technical issues make this task very challenging, and a couple of pitfalls still remain unsolved. One major reason is that no best practices are yet established. Regardless, the information provided by the pipeline should be helpful for picking suitable tools and thresholds for more sophisticated methods. Future development in the field will most certainly discover biological information that cannot be discerned by current tools.