Sample size Determination for Microbiome Differential Abundance Analysis: Frequentist and Bayesian Approaches
Herath Mudiyanselage, Piyumika Pamala Kumari Herath Samarasekara (2025)
Herath Mudiyanselage, Piyumika Pamala Kumari Herath Samarasekara
2025
Matematiikan ja tilastollisen data-analyysin maisteriohjelma - Master's Programme in Mathematics and Statistical Data Analytics
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
Hyväksymispäivämäärä
2025-12-29
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-2025122312097
https://urn.fi/URN:NBN:fi:tuni-2025122312097
Tiivistelmä
Microbiome differential abundance analysis (DAA) plays a pivotal role in discovering microbial biomarkers of health and disease. Determining an adequate sample size for such studies is challenging due to the high-dimensional, sparse, overdispersed, and compositional nature of microbiome data. Existing frequentist approaches, such as powmic, provide simulation-based solutions tailored to these complexities but are limited in their ability to incorporate prior information—a limitation in settings where data collection is difficult or costly.
This thesis proposes a Bayesian method for sample size determination in microbiome DAA. The method adapts established simulation-based Bayesian design principles by modeling taxon abundances on an approximately normal scale after appropriate transformation and incorporating a power prior, to enable optional borrowing of information from historical studies. To ensure a fair comparison between analytical paradigms, the proposed Bayesian approach is evaluated alongside powmic, with modifications to integrate MaAsLin2, an established microbiome DAA method.
Across four diverse microbiome datasets, the Bayesian method consistently outperformed the frequentist benchmark, demonstrating higher power for a given sample size while maintaining comparable or lower type I error rates. The improved efficiency reflects the Bayesian method’s ability to incorporate prior information and explicitly model uncertainty through posterior distributions.
Despite higher computational costs, the proposed Bayesian method demonstrates strong potential as a flexible and data-efficient tool for microbiome study planning. This work expands the methodological toolkit available for microbiome research and lays the foundation for future efforts to harmonize Bayesian and frequentist approaches in power and sample size estimation.
This thesis proposes a Bayesian method for sample size determination in microbiome DAA. The method adapts established simulation-based Bayesian design principles by modeling taxon abundances on an approximately normal scale after appropriate transformation and incorporating a power prior, to enable optional borrowing of information from historical studies. To ensure a fair comparison between analytical paradigms, the proposed Bayesian approach is evaluated alongside powmic, with modifications to integrate MaAsLin2, an established microbiome DAA method.
Across four diverse microbiome datasets, the Bayesian method consistently outperformed the frequentist benchmark, demonstrating higher power for a given sample size while maintaining comparable or lower type I error rates. The improved efficiency reflects the Bayesian method’s ability to incorporate prior information and explicitly model uncertainty through posterior distributions.
Despite higher computational costs, the proposed Bayesian method demonstrates strong potential as a flexible and data-efficient tool for microbiome study planning. This work expands the methodological toolkit available for microbiome research and lays the foundation for future efforts to harmonize Bayesian and frequentist approaches in power and sample size estimation.
