Hyppää sisältöön
    • Suomeksi
    • In English
Trepo
  • Suomeksi
  • In English
  • Kirjaudu
Näytä viite 
  •   Etusivu
  • Trepo
  • TUNICRIS-julkaisut
  • Näytä viite
  •   Etusivu
  • Trepo
  • TUNICRIS-julkaisut
  • Näytä viite
JavaScript is disabled for your browser. Some features of this site may not work without it.

Hybrid deep learning approach for multi-label classification problem: genre prediction

Ünal, Fatıma Zehra; Güzel, Mehmet Serdar; Ünal, Metehan; Ekinci, Fatih; Aşuroğlu, Tunç; Açıcı, Koray (2026-03)

 
Avaa tiedosto
Hybrid_deep_learning_approach_for_multi-label_classification_problem.pdf (2.074Mt)
Lataukset: 



Ünal, Fatıma Zehra
Güzel, Mehmet Serdar
Ünal, Metehan
Ekinci, Fatih
Aşuroğlu, Tunç
Açıcı, Koray
03 / 2026

Neural Computing and Applications
129
doi:10.1007/s00521-026-11852-3
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202604083764

Kuvaus

Peer reviewed
Tiivistelmä
The primary aim is to develop a model that can achieve high accuracy in solving multi label classification problems by training, testing, and analyzing deep learning models that utilize both image and text data. In this paper, we propose a novel hybrid model that jointly considers textual and visual data for the task of movie genre prediction, which is a representative example of multi label classification problems. In the proposed model, textual features extracted from movie summaries are obtained using the DistilBERT model, while visual features derived from movie posters are extracted using the ConvNeXt deep learning model. These features from the two modalities are then combined using the XGBoost machine learning algorithm to perform genre prediction. This approach aims to achieve higher accuracy and better generalizability in movie genre classification by integrating information from different modalities through a late fusion method. The ConvNeXt architecture was adapted to the problem using transfer learning and fine-tuning techniques. To achieve the highest performance from the DistilBERT model, optimization was performed for the token length and threshold hyperparameters, and the model with a token length of 256 and a threshold of 0.5 was used in the hybrid model. Furthermore, to maximize the overall performance of the novel hybrid model, optimization was conducted using the Grid Search algorithm. All three proposed models were trained and tested on a dataset obtained from the IMDB website. The performances of the models were evaluated using hamming loss, precision and F1 score metrics. Experimental results revealed that, overall, the text-based model outperformed the image-based model, while the proposed hybrid model achieved higher performance than both individual models. It was demonstrated that textual and visual features complement each other and positively enhance the overall performance. This paper presents an original and effective study for multi label movie genre classification, combining the fields of computer vision, natural language processing, and machine learning.
Kokoelmat
  • TUNICRIS-julkaisut [24447]
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste
 

 

Selaa kokoelmaa

TekijätNimekkeetTiedekunta (2019 -)Tiedekunta (- 2018)Tutkinto-ohjelmat ja opintosuunnatAvainsanatJulkaisuajatKokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste