Enhancing educational AI educhat with fine-tuned open-source language models : comparative study of parameter-efficient fine-tuning techniques and chain-of-thought reasoning
Sadiq, Junaid (2025)
Sadiq, Junaid
2025
Tietojenkäsittelyopin maisteriohjelma - Master's Programme in Computer Science
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
Hyväksymispäivämäärä
2025-07-04
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202507047562
https://urn.fi/URN:NBN:fi:tuni-202507047562
Tiivistelmä
Large language models have brought new capabilities to NLP. These transformer-based models, pre-trained on vast text data, can generate contextually relevant and coherent text by capturing nuances of complex human language patterns. Although these pre-trained models and their subsequent applications, like chatbots, perform well in general assistance roles, using them in academic environments without providing relevant context and specialized training can lead to uncertain and generic responses. These models typically do not possess specialized knowledge in specific fields, encounter problems with intricate reasoning tasks, and demand significant computational resources. While specific commercial chatbots (such as Klarna’s AI assistant) demonstrate contextual understanding through extensive post-training, chatbots deployed in educational settings currently fail to provide adequate subject-matter comprehension. This work examines parameter-efficient fine-tuning approaches for developing resource-efficient, context-aware assistants for academic settings.
This thesis addresses how well domain-specific synthetic datasets can be created from university course materials using various LLMs, what performance differences emerge when fine-tuning models of different scales with educational data, and how effective is Guided Reward Preference Optimization (GRPO) in turning a regular model into a chain-of-thought reasoning model and its effects on answer quality in problem-solving tasks.
Synthetic dataset was successfully generated using Qwen3:32B with augmentoolkit, demonstrating scalability of this technique to include more university course materials. Parameter-efficient fine-tuning using LoRA with quantization revealed architecture-dependent effectiveness, with successful adaptation achieved for compatible models while certain architectures required specialized techniques. Chain-of-thought reasoning enhancement through GRPO showed significant improvements in step-by-step transparency and educational explanation quality across most evaluated models, though effectiveness varied based on architecture and configuration requirements. The research provides a validated framework for educational AI development with practical applications for Tampere University’s Kiran AI assistant, demonstrating that selective application based on model-specific assessment is crucial for successful deployment.
This work provides comprehensive steps for adapting general-purpose LLMs into specialized educational assistants through parameter-efficient fine-tuning and reasoning enhancement. The findings demonstrate how domain-specific datasets combined with targeted fine-tuning techniques can improve contextual understanding and reasoning capabilities in academic settings, with practical applications for educational AI systems at Tampere University.
This thesis addresses how well domain-specific synthetic datasets can be created from university course materials using various LLMs, what performance differences emerge when fine-tuning models of different scales with educational data, and how effective is Guided Reward Preference Optimization (GRPO) in turning a regular model into a chain-of-thought reasoning model and its effects on answer quality in problem-solving tasks.
Synthetic dataset was successfully generated using Qwen3:32B with augmentoolkit, demonstrating scalability of this technique to include more university course materials. Parameter-efficient fine-tuning using LoRA with quantization revealed architecture-dependent effectiveness, with successful adaptation achieved for compatible models while certain architectures required specialized techniques. Chain-of-thought reasoning enhancement through GRPO showed significant improvements in step-by-step transparency and educational explanation quality across most evaluated models, though effectiveness varied based on architecture and configuration requirements. The research provides a validated framework for educational AI development with practical applications for Tampere University’s Kiran AI assistant, demonstrating that selective application based on model-specific assessment is crucial for successful deployment.
This work provides comprehensive steps for adapting general-purpose LLMs into specialized educational assistants through parameter-efficient fine-tuning and reasoning enhancement. The findings demonstrate how domain-specific datasets combined with targeted fine-tuning techniques can improve contextual understanding and reasoning capabilities in academic settings, with practical applications for educational AI systems at Tampere University.
