Spatial chromatin accessibility : a computational method for single cell ATAC-seq and spatial transcriptomics data integration
Kiviaho, Antti (2023)
Kiviaho, Antti
2023
Master's Programme in Biomedical Sciences and Engineering
Lääketieteen ja terveysteknologian tiedekunta - Faculty of Medicine and Health Technology
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2023-01-31
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202301111339
https://urn.fi/URN:NBN:fi:tuni-202301111339
Tiivistelmä
Cell fate is largely determined by the intricate interplay of chromatin compression and gene expression regulation. Expression can be suppressed by tightly packing the chromatin or enhanced by unwinding the DNA strand in key regulatory areas. Chromatin accessibility has been shown to predict gene expression, yielding valuable insight into many molecular biology phenomena.
Accessing gene expression and chromatin accessibility information has become easier by using modern sequencing technologies. Single-cell RNA sequencing assays (scRNA-seq) have become a widely popular tool since their conception a decade ago. Likewise, assay for transposase-accessible chromatin using sequencing (scATAC-seq) has emerged as a method to probe the chromatin accessibility profile of single cells. Methodological advances have also enabled researchers to combine the two by computationally modeling their relation or by performing simultaneous sequencing assays in the same cell.
The spatial transcriptomics (ST) assay has been developed more recently. Like scRNA-seq, it measures gene expression by capturing and sequencing gene transcripts, but also incorporates sequenceable locational information into each molecule. This makes it possible to link gene expression back to a spatial location on the tissue. Although ST data comes with the benefit of spatial information, it lacks many of the qualities that make scRNA-seq data so advantageous. This has led to efforts to combine information from both, resulting in more accurate descriptions of the underlying biology.
While existing integration tools are capable of connecting scRNA-seq to scATAC-seq data, and scRNA-seq to ST data, there have been no efforts to integrate scATAC-seq and ST data. This thesis presents a method for integrating the two data modalities to infer chromatin accessibility information in a spatial context. The method takes influence from a wide range of computational tools developed for biomolecular data integration. The viability of this method is tested on both synthetic and real ST data of mouse and human tissues. The method works out of the box with appropriate data, but the results should be interpreted with care.
Accessing gene expression and chromatin accessibility information has become easier by using modern sequencing technologies. Single-cell RNA sequencing assays (scRNA-seq) have become a widely popular tool since their conception a decade ago. Likewise, assay for transposase-accessible chromatin using sequencing (scATAC-seq) has emerged as a method to probe the chromatin accessibility profile of single cells. Methodological advances have also enabled researchers to combine the two by computationally modeling their relation or by performing simultaneous sequencing assays in the same cell.
The spatial transcriptomics (ST) assay has been developed more recently. Like scRNA-seq, it measures gene expression by capturing and sequencing gene transcripts, but also incorporates sequenceable locational information into each molecule. This makes it possible to link gene expression back to a spatial location on the tissue. Although ST data comes with the benefit of spatial information, it lacks many of the qualities that make scRNA-seq data so advantageous. This has led to efforts to combine information from both, resulting in more accurate descriptions of the underlying biology.
While existing integration tools are capable of connecting scRNA-seq to scATAC-seq data, and scRNA-seq to ST data, there have been no efforts to integrate scATAC-seq and ST data. This thesis presents a method for integrating the two data modalities to infer chromatin accessibility information in a spatial context. The method takes influence from a wide range of computational tools developed for biomolecular data integration. The viability of this method is tested on both synthetic and real ST data of mouse and human tissues. The method works out of the box with appropriate data, but the results should be interpreted with care.