Introducing the Y-chromosomal Ancestral-like Reference Sequence: Improving the Capture of Human Evolutionary Information
Köksal, Zehra; Preussner, Annina; Leinonen, Jaakko; Tukiainen, Taru (2025)
Lataukset:
Köksal, Zehra
Preussner, Annina
Leinonen, Jaakko
Tukiainen, Taru
2025
MOLECULAR BIOLOGY AND EVOLUTION
msaf222
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-2025102210045
https://urn.fi/URN:NBN:fi:tuni-2025102210045
Kuvaus
Peer reviewed
Tiivistelmä
Reference sequences are essential for reproducible genetic analyses but are often chosen without regard to evolutionary relevance within the analyzed species. The human Y chromosome is widely used in evolutionary studies, yet current references represent evolutionarily young sequences, which can cause misleading variant calling. To address this issue, we constructed a Y-chromosomal ancestral-like reference sequence to improve the detection of evolutionarily informative variants on the Y chromosome. The Y-chromosomal ancestral-like reference sequence was constructed by applying a weighted maximum parsimony approach to human and primate Y chromosome sequences. To benchmark the performance of the Y-chromosomal ancestral-like reference sequence, 40 Y chromosome short-read sequences from diverse haplogroups were aligned to Y-chromosomal ancestral-like reference sequence and existing references (GRCh37, GRCh38, and T2T-CHM13). Overall, the Y-chromosomal ancestral-like reference sequence yielded the highest and most consistent number of SNPs per sample (mean = 1,400; SD = 77), while other references yielded on average fewer variants (mean = 866 to 968) and showed greater variability across samples (SD = 457 to 531) depending on their phylogenetic distance from the reference. Additionally, alignments to the Y-chromosomal ancestral-like reference sequence resulted in calling solely SNPs with evolutionarily derived alleles, while alignments to other references resulted in calling on average 46% SNPs with ancestral alleles. This study demonstrates how the existing reference sequences fail to capture the full range of evolutionary information on the Y chromosome. The Y-chromosomal ancestral-like reference sequence improves capturing evolutionary information on the Y chromosome, making it a valuable resource for various evolutionary applications, such as TMRCA estimations and phylogenetic analyses. Finally, alongside the Y-chromosomal ancestral-like reference sequence, we provide a publicly available tool, polaryzer, to annotate variants as ancestral or derived in pre-aligned Y chromosome data.
Kokoelmat
- TUNICRIS-julkaisut [22385]
