Sort by:
Page 1 of 18 results

MedGemma Technical Report

Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, Justin Chen, Fereshteh Mahvar, Liron Yatziv, Tiffany Chen, Bram Sterling, Stefanie Anna Baby, Susanna Maria Baby, Jeremy Lai, Samuel Schmidgall, Lu Yang, Kejia Chen, Per Bjornsson, Shashir Reddy, Ryan Brush, Kenneth Philbrick, Howard Hu, Howard Yang, Richa Tiwari, Sunny Jansen, Preeti Singh, Yun Liu, Shekoofeh Azizi, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Riviere, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Elena Buchatskaya, Jean-Baptiste Alayrac, Dmitry, Lepikhin, Vlad Feinberg, Sebastian Borgeaud, Alek Andreev, Cassidy Hardin, Robert Dadashi, Léonard Hussenot, Armand Joulin, Olivier Bachem, Yossi Matias, Katherine Chou, Avinatan Hassidim, Kavi Goel, Clement Farabet, Joelle Barral, Tris Warkentin, Jonathon Shlens, David Fleet, Victor Cotruta, Omar Sanseviero, Gus Martins, Phoebe Kirk, Anand Rao, Shravya Shetty, David F. Steiner, Can Kirmizibayrak, Rory Pilgrim, Daniel Golden, Lin Yang

arxiv logopreprintJul 7 2025
Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that perform well on medical tasks and require less task-specific tuning data are critical to accelerate the development of healthcare AI applications. We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B. MedGemma demonstrates advanced medical understanding and reasoning on images and text, significantly exceeding the performance of similar-sized generative models and approaching the performance of task-specific models, while maintaining the general capabilities of the Gemma 3 base models. For out-of-distribution tasks, MedGemma achieves 2.6-10% improvement on medical multimodal question answering, 15.5-18.1% improvement on chest X-ray finding classification, and 10.8% improvement on agentic evaluations compared to the base models. Fine-tuning MedGemma further improves performance in subdomains, reducing errors in electronic health record information retrieval by 50% and reaching comparable performance to existing specialized state-of-the-art methods for pneumothorax classification and histopathology patch classification. We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP. MedSigLIP powers the visual understanding capabilities of MedGemma and as an encoder achieves comparable or better performance than specialized medical image encoders. Taken together, the MedGemma collection provides a strong foundation of medical image and text capabilities, with potential to significantly accelerate medical research and development of downstream applications. The MedGemma collection, including tutorials and model weights, can be found at https://goo.gle/medgemma.

MedGemma Technical Report

Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, Justin Chen, Fereshteh Mahvar, Liron Yatziv, Tiffany Chen, Bram Sterling, Stefanie Anna Baby, Susanna Maria Baby, Jeremy Lai, Samuel Schmidgall, Lu Yang, Kejia Chen, Per Bjornsson, Shashir Reddy, Ryan Brush, Kenneth Philbrick, Howard Hu, Howard Yang, Richa Tiwari, Sunny Jansen, Preeti Singh, Yun Liu, Shekoofeh Azizi, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Riviere, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Elena Buchatskaya, Jean-Baptiste Alayrac, Dmitry Lepikhin, Vlad Feinberg, Sebastian Borgeaud, Alek Andreev, Cassidy Hardin, Robert Dadashi, Léonard Hussenot, Armand Joulin, Olivier Bachem, Yossi Matias, Katherine Chou, Avinatan Hassidim, Kavi Goel, Clement Farabet, Joelle Barral, Tris Warkentin, Jonathon Shlens, David Fleet, Victor Cotruta, Omar Sanseviero, Gus Martins, Phoebe Kirk, Anand Rao, Shravya Shetty, David F. Steiner, Can Kirmizibayrak, Rory Pilgrim, Daniel Golden, Lin Yang

arxiv logopreprintJul 7 2025
Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that perform well on medical tasks and require less task-specific tuning data are critical to accelerate the development of healthcare AI applications. We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B. MedGemma demonstrates advanced medical understanding and reasoning on images and text, significantly exceeding the performance of similar-sized generative models and approaching the performance of task-specific models, while maintaining the general capabilities of the Gemma 3 base models. For out-of-distribution tasks, MedGemma achieves 2.6-10% improvement on medical multimodal question answering, 15.5-18.1% improvement on chest X-ray finding classification, and 10.8% improvement on agentic evaluations compared to the base models. Fine-tuning MedGemma further improves performance in subdomains, reducing errors in electronic health record information retrieval by 50% and reaching comparable performance to existing specialized state-of-the-art methods for pneumothorax classification and histopathology patch classification. We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP. MedSigLIP powers the visual understanding capabilities of MedGemma and as an encoder achieves comparable or better performance than specialized medical image encoders. Taken together, the MedGemma collection provides a strong foundation of medical image and text capabilities, with potential to significantly accelerate medical research and development of downstream applications. The MedGemma collection, including tutorials and model weights, can be found at https://goo.gle/medgemma.

Deep learning-based time-of-flight (ToF) enhancement of non-ToF PET scans for different radiotracers.

Mehranian A, Wollenweber SD, Bradley KM, Fielding PA, Huellner M, Iagaru A, Dedja M, Colwell T, Kotasidis F, Johnsen R, Jansen FP, McGowan DR

pubmed logopapersJul 1 2025
To evaluate a deep learning-based time-of-flight (DLToF) model trained to enhance the image quality of non-ToF PET images for different tracers, reconstructed using BSREM algorithm, towards ToF images. A 3D residual U-NET model was trained using 8 different tracers (FDG: 75% and non-FDG: 25%) from 11 sites from US, Europe and Asia. A total of 309 training and 33 validation datasets scanned on GE Discovery MI (DMI) ToF scanners were used for development of DLToF models of three strengths: low (L), medium (M) and high (H). The training and validation pairs consisted of target ToF and input non-ToF BSREM reconstructions using site-preferred regularisation parameters (beta values). The contrast and noise properties of each model were defined by adjusting the beta value of target ToF images. A total of 60 DMI datasets, consisting of a set of 4 tracers (<sup>18</sup>F-FDG, <sup>18</sup>F-PSMA, <sup>68</sup>Ga-PSMA, <sup>68</sup>Ga-DOTATATE) and 15 exams each, were collected for testing and quantitative analysis of the models based on standardized uptake value (SUV) in regions of interest (ROI) placed in lesions, lungs and liver. Each dataset includes 5 image series: ToF and non-ToF BSREM and three DLToF images. The image series (300 in total) were blind scored on a 5-point Likert score by 4 readers based on lesion detectability, diagnostic confidence, and image noise/quality. In lesion SUV<sub>max</sub> quantification with respect to ToF BSREM, DLToF-H achieved the best results among the three models by reducing the non-ToF BSREM errors from -39% to -6% for <sup>18</sup>F-FDG (38 lesions); from -42% to -7% for <sup>18</sup>F-PSMA (35 lesions); from -34% to -4% for <sup>68</sup>Ga-PSMA (23 lesions) and from -34% to -12% for <sup>68</sup>Ga-DOTATATE (32 lesions). Quantification results in liver and lung also showed ToF-like performance of DLToF models. Clinical reader resulted showed that DLToF-H results in an improved lesion detectability on average for all four radiotracers whereas DLToF-L achieved the highest scores for image quality (noise level). The results of DLToF-M however showed that this model results in the best trade-off between lesion detection and noise level and hence achieved the highest score for diagnostic confidence on average for all radiotracers. This study demonstrated that the DLToF models are suitable for both FDG and non-FDG tracers and could be utilized for digital BGO PET/CT scanners to provide an image quality and lesion detectability comparable and close to ToF.

Deep Learning-Based Automated Detection of the Middle Cerebral Artery in Transcranial Doppler Ultrasound Examinations.

Lee H, Shi W, Mukaddim RA, Brunelle E, Palisetti A, Imaduddin SM, Rajendram P, Incontri D, Lioutas VA, Heldt T, Raju BI

pubmed logopapersJun 28 2025
Transcranial Doppler (TCD) ultrasound has significant clinical value for assessing cerebral hemodynamics, but its reliance on operator expertise limits broader clinical adoption. In this work, we present a lightweight real-time deep learning-based approach capable of automatically identifying the middle cerebral artery (MCA) in TCD Color Doppler images. Two state-of-the-art object detection models, YOLOv10 and Real-Time Detection Transformers (RT-DETR), were investigated for automated MCA detection in real-time. TCD Color Doppler data (41 subjects; 365 videos; 61,611 frames) were collected from neurologically healthy individuals (n = 31) and stroke patients (n = 10). MCA bounding box annotations were performed by clinical experts on all frames. Model training consisted of pretraining utilizing a large abdominal ultrasound dataset followed by subsequent fine-tuning on acquired TCD data. Detection performance at the instance and frame levels, and inference speed were assessed through four-fold cross-validation. Inter-rater agreement between model and two human expert readers was assessed using distance between bounding boxes and inter-rater variability was quantified using the individual equivalence coefficient (IEC) metric. Both YOLOv10 and RT-DETR models showed comparable frame level accuracy for MCA presence, with F1 scores of 0.884 ± 0.023 and 0.884 ± 0.019 respectively. YOLOv10 outperformed RT-DETR for instance-level localization accuracy (AP: 0.817 vs. 0.780) and had considerably faster inference speed on a desktop CPU (11.6 ms vs. 91.14 ms). Furthermore, YOLOv10 showed an average inference time of 36 ms per frame on a tablet device. The IEC was -1.08 with 95 % confidence interval: [-1.45, -0.19], showing that the AI predictions deviated less from each reader than the readers' annotations deviated from each other. Real-time automated detection of the MCA is feasible and can be implemented on mobile platforms, potentially enabling wider clinical adoption by less-trained operators in point-of-care settings.

Are presentations of thoracic CT performed on admission to the ICU associated with mortality at day-90 in COVID-19 related ARDS?

Le Corre A, Maamar A, Lederlin M, Terzi N, Tadié JM, Gacouin A

pubmed logopapersJun 5 2025
Computed tomography (CT) analysis of lung morphology has significantly advanced our understanding of acute respiratory distress syndrome (ARDS). During the Coronavirus Disease 2019 (COVID-19) pandemic, CT imaging was widely utilized to evaluate lung injury and was suggested as a tool for predicting patient outcomes. However, data specifically focused on patients with ARDS admitted to intensive care units (ICUs) remain limited. This retrospective study analyzed patients admitted to ICUs between March 2020 and November 2022 with moderate to severe COVID-19 ARDS. All CT scans performed within 48 h of ICU admission were independently reviewed by three experts. Lung injury severity was quantified using the CT Severity Score (CT-SS; range 0-25). Patients were categorized as having severe disease (CT-SS ≥ 18) or non-severe disease (CT-SS < 18). The primary outcome was all-cause mortality at 90 days. Secondary outcomes included ICU mortality and medical complications during the ICU stay. Additionally, we evaluated a computer-assisted CT-score assessment using artificial intelligence software (CT Pneumonia Analysis<sup>®</sup>, SIEMENS Healthcare) to explore the feasibility of automated measurement and routine implementation. A total of 215 patients with moderate to severe COVID-19 ARDS were included. The median CT-SS at admission was 18/25 [interquartile range, 15-21]. Among them, 120 patients (56%) had a severe CT-SS (≥ 18), while 95 patients (44%) had a non-severe CT-SS (< 18). The 90-day mortality rates were 20.8% for the severe group and 15.8% for the non-severe group (p = 0.35). No significant association was observed between CT-SS severity and patient outcomes. In patients with moderate to severe COVID-19 ARDS, systematic CT assessment of lung parenchymal injury was not a reliable predictor of 90-day mortality or ICU-related complications.

Accelerated High-resolution T1- and T2-weighted Breast MRI with Deep Learning Super-resolution Reconstruction.

Mesropyan N, Katemann C, Leutner C, Sommer A, Isaak A, Weber OM, Peeters JM, Dell T, Bischoff L, Kuetting D, Pieper CC, Lakghomi A, Luetkens JA

pubmed logopapersJun 1 2025
To assess the performance of an industry-developed deep learning (DL) algorithm to reconstruct low-resolution Cartesian T1-weighted dynamic contrast-enhanced (T1w) and T2-weighted turbo-spin-echo (T2w) sequences and compare them to standard sequences. Female patients with indications for breast MRI were included in this prospective study. The study protocol at 1.5 Tesla MRI included T1w and T2w. Both sequences were acquired in standard resolution (T1<sub>S</sub> and T2<sub>S</sub>) and in low-resolution with following DL reconstructions (T1<sub>DL</sub> and T2<sub>DL</sub>). For DL reconstruction, two convolutional networks were used: (1) Adaptive-CS-Net for denoising with compressed sensing, and (2) Precise-Image-Net for resolution upscaling of previously downscaled images. Overall image quality was assessed using 5-point-Likert scale (from 1=non-diagnostic to 5=excellent). Apparent signal-to-noise (aSNR) and contrast-to-noise (aCNR) ratios were calculated. Breast Imaging Reporting and Data System (BI-RADS) agreement between different sequence types was assessed. A total of 47 patients were included (mean age, 58±11 years). Acquisition time for T1<sub>DL</sub> and T2<sub>DL</sub> were reduced by 51% (44 vs. 90 s per dynamic phase) and 46% (102 vs. 192 s), respectively. T1<sub>DL</sub> and T2<sub>DL</sub> showed higher overall image quality (e.g., 4 [IQR, 4-4] for T1<sub>S</sub> vs. 5 [IQR, 5-5] for T1<sub>DL</sub>, P<0.001). Both, T1<sub>DL</sub> and T2<sub>DL</sub> revealed higher aSNR and aCNR than T1<sub>S</sub> and T2<sub>S</sub> (e.g., aSNR: 32.35±10.23 for T2<sub>S</sub> vs. 27.88±6.86 for T2<sub>DL</sub>, P=0.014). Cohen k agreement by BI-RADS assessment was excellent (0.962, P<0.001). DL for denoising and resolution upscaling reduces acquisition time and improves image quality for T1w and T2w breast MRI.

Deep learning-enhanced zero echo time MRI for glenohumeral assessment in shoulder instability: a comparative study with CT.

Carretero-Gómez L, Fung M, Wiesinger F, Carl M, McKinnon G, de Arcos J, Mandava S, Arauz S, Sánchez-Lacalle E, Nagrani S, López-Alcorocho JM, Rodríguez-Íñigo E, Malpica N, Padrón M

pubmed logopapersJun 1 2025
To evaluate image quality and lesion conspicuity of zero echo time (ZTE) MRI reconstructed with deep learning (DL)-based algorithm versus conventional reconstruction and to assess DL ZTE performance against CT for bone loss measurements in shoulder instability. Forty-four patients (9 females; 33.5 ± 15.65 years) with symptomatic anterior glenohumeral instability and no previous shoulder surgery underwent ZTE MRI and CT on the same day. ZTE images were reconstructed with conventional and DL methods and post-processed for CT-like contrast. Two musculoskeletal radiologists, blinded to the reconstruction method, independently evaluated 20 randomized MR ZTE datasets with and without DL-enhancement for perceived signal-to-noise ratio, resolution, and lesion conspicuity at humerus and glenoid using a 4-point Likert scale. Inter-reader reliability was assessed using weighted Cohen's kappa (K). An ordinal logistic regression model analyzed Likert scores, with the reconstruction method (DL-enhanced vs. conventional) as the predictor. Glenoid track (GT) and Hill-Sachs interval (HSI) measurements were performed by another radiologist on both DL ZTE and CT datasets. Intermodal agreement was assessed through intraclass correlation coefficients (ICCs) and Bland-Altman analysis. DL ZTE MR bone images scored higher than conventional ZTE across all items, with significantly improved perceived resolution (odds ratio (OR) = 7.67, p = 0.01) and glenoid lesion conspicuity (OR = 25.12, p = 0.01), with substantial inter-rater agreement (K = 0.61 (0.38-0.83) to 0.77 (0.58-0.95)). Inter-modality assessment showed almost perfect agreement between DL ZTE MR and CT for all bone measurements (overall ICC = 0.99 (0.97-0.99)), with mean differences of 0.08 (- 0.80 to 0.96) mm for GT and - 0.07 (- 1.24 to 1.10) mm for HSI. DL-based reconstruction enhances ZTE MRI quality for glenohumeral assessment, offering osseous evaluation and quantification equivalent to gold-standard CT, potentially simplifying preoperative workflow, and reducing CT radiation exposure.

Exploring interpretable echo analysis using self-supervised parcels.

Majchrowska S, Hildeman A, Mokhtari R, Diethe T, Teare P

pubmed logopapersMay 17 2025
The application of AI for predicting critical heart failure endpoints using echocardiography is a promising avenue to improve patient care and treatment planning. However, fully supervised training of deep learning models in medical imaging requires a substantial amount of labelled data, posing significant challenges due to the need for skilled medical professionals to annotate image sequences. Our study addresses this limitation by exploring the potential of self-supervised learning, emphasising interpretability, robustness, and safety as crucial factors in cardiac imaging analysis. We leverage self-supervised learning on a large unlabelled dataset, facilitating the discovery of features applicable to a various downstream tasks. The backbone model not only generates informative features for training smaller models using simple techniques but also produces features that are interpretable by humans. The study employs a modified Self-supervised Transformer with Energy-based Graph Optimisation (STEGO) network on top of self-DIstillation with NO labels (DINO) as a backbone model, pre-trained on diverse medical and non-medical data. This approach facilitates the generation of self-segmented outputs, termed "parcels", which identify distinct anatomical sub-regions of the heart. Our findings highlight the robustness of these self-learned parcels across diverse patient profiles and phases of the cardiac cycle phases. Moreover, these parcels offer high interpretability and effectively encapsulate clinically relevant cardiac substructures. We conduct a comprehensive evaluation of the proposed self-supervised approach on publicly available datasets, demonstrating its adaptability to a wide range of requirements. Our results underscore the potential of self-supervised learning to address labelled data scarcity in medical imaging, offering a path to improve cardiac imaging analysis and enhance the efficiency and interpretability of diagnostic procedures, thus positively impacting patient care and clinical decision-making.
Page 1 of 18 results
Show
per page
1

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.