Topic Modeling Applied to Publications In Economics
Nguyen, Quang (2025)
Nguyen, Quang
2025
Tieto- ja sähkötekniikan kandidaattiohjelma - Bachelor's Programme in Computing and Electrical Engineering
Tekniikan ja luonnontieteiden tiedekunta - Faculty of Engineering and Natural Sciences
Hyväksymispäivämäärä
2025-05-28
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202505185717
https://urn.fi/URN:NBN:fi:tuni-202505185717
Tiivistelmä
In recent years, there has been a growing interest in analyzing text data from different scientific fields. Significant advancements in artificial intelligence (AI), especially in natural language processing (NLP), have made it possible to systematically analyze and categorize large volumes of scientific publications into distinct thematic clusters. Among these advancements, topic modeling has been introduced as a crucial tool for exploring the latent structure within textual datasets. Topic modeling facilitates the identification of latent thermatic structures with large text corpora, enabling researchers to analyze trend, content and context.Among various modeling techniques, Latent Dirichlet Allocation (LDA) remains one of the most widely used methods due to its probabilistic foundation, interpretability, and robustness.
This thesis applies LDA to a curated dataset of economic publications, consisting of titles and abstracts drawn from peer-reviewed journals and preprint repositories. The study aims to identify mainly the dominant research themes within economics. To ensure meaningful and reliable topic discovery, the analysis integrates domain-specific preprocessing techniques and coherence-based model optimization. The findings provide insights into thematic priorities within the discipline, reveal underexplored research areas, and highlight methodological considerations for applying topic modeling to economic texts.
This thesis applies LDA to a curated dataset of economic publications, consisting of titles and abstracts drawn from peer-reviewed journals and preprint repositories. The study aims to identify mainly the dominant research themes within economics. To ensure meaningful and reliable topic discovery, the analysis integrates domain-specific preprocessing techniques and coherence-based model optimization. The findings provide insights into thematic priorities within the discipline, reveal underexplored research areas, and highlight methodological considerations for applying topic modeling to economic texts.
Kokoelmat
- Kandidaatintutkielmat [10747]