Adaptive Progressive Fine-Tuning of VLMs for Long-Tailed Multimodal Retrieval
Alijani, Farid; Late, Elina; Kumpulainen, Sanna (2025)
Avaa tiedosto
Lataukset:
Alijani, Farid
Late, Elina
Kumpulainen, Sanna
2025
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202602252776
https://urn.fi/URN:NBN:fi:tuni-202602252776
Kuvaus
Peer reviewed
Tiivistelmä
Adapting large VLMs to specialized, long-tailed domains requires a careful balance between performance and the preservation of pretrained knowledge. Although full parameter fine-tuning is powerful, it is resource-intensive and can easily overfit on imbalanced data. We propose Adaptive Progressive Fine-Tuning (APFT), a strategy that automates this complex process. APFT employs a staged layer unfreezing process guided by an event-triggered mechanism; instead of relying on a fixed schedule, phase transitions are automatically initiated based on real-time training stability metrics like loss volatility and performance plateaus. Upon transition, a cosine annealing scheduler is re-initialized, and weight decay is adaptively increased to regularize the newly trainable parameters. Experiments on the long-tailed HISTORY-X4 archival dataset indicate that APFT significantly outperforms all baselines, including full fine-tuning and LoRA. The advantage is most pronounced on tailed labels, where our APFT method achieves a 19.9% relative improvement in text-to-image mAP@10 over the strongest baseline, demonstrating its ability to effectively adapt to new domains while preserving foundational knowledge.
Kokoelmat
- TUNICRIS-julkaisut [24216]
