Physical Color Calibration of Digital Pathology Scanners for Robust Artificial Intelligence–Assisted Cancer Diagnosis
Ji, Xiaoyi; Salmon, Richard; Mulliqi, Nita; Khan, Umair; Wang, Yinxi; Blilie, Anders; Olsson, Henrik; Pedersen, Bodil Ginnerup; Sørensen, Karina Dalsgaard; Ulhøi, Benedicte Parm; Kjosavik, Svein R.; Janssen, Emilius A.M.; Rantalainen, Mattias; Egevad, Lars; Ruusuvuori, Pekka; Eklund, Martin; Kartasalo, Kimmo (2025-05)
Ji, Xiaoyi
Salmon, Richard
Mulliqi, Nita
Khan, Umair
Wang, Yinxi
Blilie, Anders
Olsson, Henrik
Pedersen, Bodil Ginnerup
Sørensen, Karina Dalsgaard
Ulhøi, Benedicte Parm
Kjosavik, Svein R.
Janssen, Emilius A.M.
Rantalainen, Mattias
Egevad, Lars
Ruusuvuori, Pekka
Eklund, Martin
Kartasalo, Kimmo
05 / 2025
Modern Pathology
100715
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202503212940
https://urn.fi/URN:NBN:fi:tuni-202503212940
Kuvaus
Peer reviewed
Tiivistelmä
<p>The potential of artificial intelligence (AI) in digital pathology is limited by technical inconsistencies in the production of whole slide images (WSIs). This causes degraded AI performance and poses a challenge for widespread clinical application, as fine-tuning algorithms for each site is impractical. Changes in the imaging workflow can also compromise diagnostic accuracy and patient safety. Physical color calibration of scanners, relying on a biomaterial-based calibrant slide and a spectrophotometric reference measurement, has been proposed for standardizing WSI appearance, but its impact on AI performance has not been investigated. We evaluated whether physical color calibration can enable robust AI performance. We trained fully supervised and foundation model–based AI systems for detecting and Gleason grading prostate cancer using WSIs of prostate biopsies from the STHLM3 clinical trial (n = 3651) and evaluated their performance in 3 external cohorts (n = 1161) with and without calibration. With physical color calibration, the fully supervised system's concordance with pathologists’ grading (Cohen linearly weighted κ) improved from 0.439 to 0.619 in the Stavanger University Hospital cohort (n = 860), from 0.354 to 0.738 in the Karolinska University Hospital cohort (n = 229), and from 0.423 to 0.452 in the Aarhus University Hospital cohort (n = 72). The foundation model's concordance improved as follows: from 0.739 to 0.760 (Karolinska), from 0.424 to 0.459 (Aarhus), and from 0.547 to 0.670 (Stavanger). This study demonstrated that physical color calibration provides a potential solution to the variation introduced by different scanners, making AI-based cancer diagnostics more reliable and applicable in diverse clinical settings.</p>
Kokoelmat
- TUNICRIS-julkaisut [20043]