Sort by:
Page 13 of 42417 results

Precision Diagnosis and Treatment Monitoring of Glioma via PET Radiomics.

Zhou C, Ji P, Gong B, Kou Y, Fan Z, Wang L

pubmed logopapersJul 17 2025
Glioma, the most common primary intracranial tumor, poses significant challenges to precision diagnosis and treatment due to its heterogeneity and invasiveness. With the introduction of the 2021 WHO classification standard based on molecular biomarkers, the role of imaging in non-invasive subtyping and therapeutic monitoring of gliomas has become increasingly crucial. While conventional MRI shows limitations in assessing metabolic status and differentiating tumor recurrence, positron emission tomography (PET) combined with radiomics and artificial intelligence technologies offers a novel paradigm for precise diagnosis and treatment monitoring through quantitative extraction of multimodal imaging features (e.g., intensity, texture, dynamic parameters). This review systematically summarizes the technical workflow of PET radiomics (including tracer selection, image segmentation, feature extraction, and model construction) and its applications in predicting molecular subtypes (such as IDH mutation and MGMT methylation), distinguishing recurrence from treatment-related changes, and prognostic stratification. Studies demonstrate that amino acid tracers (e.g., <sup>18</sup>F-FET, <sup>11</sup>C-MET) combined with multimodal radiomics models significantly outperform traditional parametric analysis in diagnostic efficacy. Nevertheless, current research still faces challenges including data heterogeneity, insufficient model interpretability, and lack of clinical validation. Future advancements require multicenter standardized protocols, open-source algorithm frameworks, and multi-omics integration to facilitate the transformative clinical translation of PET radiomics from research to practice.

DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model

Han Zhang, Xiangde Luo, Yong Chen, Kang Li

arxiv logopreprintJul 17 2025
Annotation variability remains a substantial challenge in medical image segmentation, stemming from ambiguous imaging boundaries and diverse clinical expertise. Traditional deep learning methods producing single deterministic segmentation predictions often fail to capture these annotator biases. Although recent studies have explored multi-rater segmentation, existing methods typically focus on a single perspective -- either generating a probabilistic ``gold standard'' consensus or preserving expert-specific preferences -- thus struggling to provide a more omni view. In this study, we propose DiffOSeg, a two-stage diffusion-based framework, which aims to simultaneously achieve both consensus-driven (combining all experts' opinions) and preference-driven (reflecting experts' individual assessments) segmentation. Stage I establishes population consensus through a probabilistic consensus strategy, while Stage II captures expert-specific preference via adaptive prompts. Demonstrated on two public datasets (LIDC-IDRI and NPC-170), our model outperforms existing state-of-the-art methods across all evaluated metrics. Source code is available at https://github.com/string-ellipses/DiffOSeg .

Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images

Zahra TehraniNasab, Amar Kumar, Tal Arbel

arxiv logopreprintJul 17 2025
Medical image synthesis presents unique challenges due to the inherent complexity and high-resolution details required in clinical contexts. Traditional generative architectures such as Generative Adversarial Networks (GANs) or Variational Auto Encoder (VAEs) have shown great promise for high-resolution image generation but struggle with preserving fine-grained details that are key for accurate diagnosis. To address this issue, we introduce Pixel Perfect MegaMed, the first vision-language foundation model to synthesize images at resolutions of 1024x1024. Our method deploys a multi-scale transformer architecture designed specifically for ultra-high resolution medical image generation, enabling the preservation of both global anatomical context and local image-level details. By leveraging vision-language alignment techniques tailored to medical terminology and imaging modalities, Pixel Perfect MegaMed bridges the gap between textual descriptions and visual representations at unprecedented resolution levels. We apply our model to the CheXpert dataset and demonstrate its ability to generate clinically faithful chest X-rays from text prompts. Beyond visual quality, these high-resolution synthetic images prove valuable for downstream tasks such as classification, showing measurable performance gains when used for data augmentation, particularly in low-data regimes. Our code is accessible through the project website - https://tehraninasab.github.io/pixelperfect-megamed.

Insights into a radiology-specialised multimodal large language model with sparse autoencoders

Kenza Bouzid, Shruthi Bannur, Daniel Coelho de Castro, Anton Schwaighofer, Javier Alvarez-Valle, Stephanie L. Hyland

arxiv logopreprintJul 17 2025
Interpretability can improve the safety, transparency and trust of AI models, which is especially important in healthcare applications where decisions often carry significant consequences. Mechanistic interpretability, particularly through the use of sparse autoencoders (SAEs), offers a promising approach for uncovering human-interpretable features within large transformer-based models. In this study, we apply Matryoshka-SAE to the radiology-specialised multimodal large language model, MAIRA-2, to interpret its internal representations. Using large-scale automated interpretability of the SAE features, we identify a range of clinically relevant concepts - including medical devices (e.g., line and tube placements, pacemaker presence), pathologies such as pleural effusion and cardiomegaly, longitudinal changes and textual features. We further examine the influence of these features on model behaviour through steering, demonstrating directional control over generations with mixed success. Our results reveal practical and methodological challenges, yet they offer initial insights into the internal concepts learned by MAIRA-2 - marking a step toward deeper mechanistic understanding and interpretability of a radiology-adapted multimodal large language model, and paving the way for improved model transparency. We release the trained SAEs and interpretations: https://huggingface.co/microsoft/maira-2-sae.

Insights into a radiology-specialised multimodal large language model with sparse autoencoders

Kenza Bouzid, Shruthi Bannur, Felix Meissen, Daniel Coelho de Castro, Anton Schwaighofer, Javier Alvarez-Valle, Stephanie L. Hyland

arxiv logopreprintJul 17 2025
Interpretability can improve the safety, transparency and trust of AI models, which is especially important in healthcare applications where decisions often carry significant consequences. Mechanistic interpretability, particularly through the use of sparse autoencoders (SAEs), offers a promising approach for uncovering human-interpretable features within large transformer-based models. In this study, we apply Matryoshka-SAE to the radiology-specialised multimodal large language model, MAIRA-2, to interpret its internal representations. Using large-scale automated interpretability of the SAE features, we identify a range of clinically relevant concepts - including medical devices (e.g., line and tube placements, pacemaker presence), pathologies such as pleural effusion and cardiomegaly, longitudinal changes and textual features. We further examine the influence of these features on model behaviour through steering, demonstrating directional control over generations with mixed success. Our results reveal practical and methodological challenges, yet they offer initial insights into the internal concepts learned by MAIRA-2 - marking a step toward deeper mechanistic understanding and interpretability of a radiology-adapted multimodal large language model, and paving the way for improved model transparency. We release the trained SAEs and interpretations: https://huggingface.co/microsoft/maira-2-sae.

CytoSAE: Interpretable Cell Embeddings for Hematology

Muhammed Furkan Dasdelen, Hyesu Lim, Michele Buck, Katharina S. Götze, Carsten Marr, Steffen Schneider

arxiv logopreprintJul 16 2025
Sparse autoencoders (SAEs) emerged as a promising tool for mechanistic interpretability of transformer-based foundation models. Very recently, SAEs were also adopted for the visual domain, enabling the discovery of visual concepts and their patch-wise attribution to tokens in the transformer model. While a growing number of foundation models emerged for medical imaging, tools for explaining their inferences are still lacking. In this work, we show the applicability of SAEs for hematology. We propose CytoSAE, a sparse autoencoder which is trained on over 40,000 peripheral blood single-cell images. CytoSAE generalizes to diverse and out-of-domain datasets, including bone marrow cytology, where it identifies morphologically relevant concepts which we validated with medical experts. Furthermore, we demonstrate scenarios in which CytoSAE can generate patient-specific and disease-specific concepts, enabling the detection of pathognomonic cells and localized cellular abnormalities at the patch level. We quantified the effect of concepts on a patient-level AML subtype classification task and show that CytoSAE concepts reach performance comparable to the state-of-the-art, while offering explainability on the sub-cellular level. Source code and model weights are available at https://github.com/dynamical-inference/cytosae.

Generate to Ground: Multimodal Text Conditioning Boosts Phrase Grounding in Medical Vision-Language Models

Felix Nützel, Mischa Dombrowski, Bernhard Kainz

arxiv logopreprintJul 16 2025
Phrase grounding, i.e., mapping natural language phrases to specific image regions, holds significant potential for disease localization in medical imaging through clinical reports. While current state-of-the-art methods rely on discriminative, self-supervised contrastive models, we demonstrate that generative text-to-image diffusion models, leveraging cross-attention maps, can achieve superior zero-shot phrase grounding performance. Contrary to prior assumptions, we show that fine-tuning diffusion models with a frozen, domain-specific language model, such as CXR-BERT, substantially outperforms domain-agnostic counterparts. This setup achieves remarkable improvements, with mIoU scores doubling those of current discriminative methods. These findings highlight the underexplored potential of generative models for phrase grounding tasks. To further enhance performance, we introduce Bimodal Bias Merging (BBM), a novel post-processing technique that aligns text and image biases to identify regions of high certainty. BBM refines cross-attention maps, achieving even greater localization accuracy. Our results establish generative approaches as a more effective paradigm for phrase grounding in the medical imaging domain, paving the way for more robust and interpretable applications in clinical practice. The source code and model weights are available at https://github.com/Felix-012/generate_to_ground.

Benchmarking and Explaining Deep Learning Cortical Lesion MRI Segmentation in Multiple Sclerosis

Nataliia Molchanova, Alessandro Cagol, Mario Ocampo-Pineda, Po-Jui Lu, Matthias Weigel, Xinjie Chen, Erin Beck, Charidimos Tsagkas, Daniel Reich, Colin Vanden Bulcke, Anna Stolting, Serena Borrelli, Pietro Maggi, Adrien Depeursinge, Cristina Granziera, Henning Mueller, Pedro M. Gordaliza, Meritxell Bach Cuadra

arxiv logopreprintJul 16 2025
Cortical lesions (CLs) have emerged as valuable biomarkers in multiple sclerosis (MS), offering high diagnostic specificity and prognostic relevance. However, their routine clinical integration remains limited due to subtle magnetic resonance imaging (MRI) appearance, challenges in expert annotation, and a lack of standardized automated methods. We propose a comprehensive multi-centric benchmark of CL detection and segmentation in MRI. A total of 656 MRI scans, including clinical trial and research data from four institutions, were acquired at 3T and 7T using MP2RAGE and MPRAGE sequences with expert-consensus annotations. We rely on the self-configuring nnU-Net framework, designed for medical imaging segmentation, and propose adaptations tailored to the improved CL detection. We evaluated model generalization through out-of-distribution testing, demonstrating strong lesion detection capabilities with an F1-score of 0.64 and 0.5 in and out of the domain, respectively. We also analyze internal model features and model errors for a better understanding of AI decision-making. Our study examines how data variability, lesion ambiguity, and protocol differences impact model performance, offering future recommendations to address these barriers to clinical adoption. To reinforce the reproducibility, the implementation and models will be publicly accessible and ready to use at https://github.com/Medical-Image-Analysis-Laboratory/ and https://doi.org/10.5281/zenodo.15911797.

Non-invasive liver fibrosis screening on CT images using radiomics.

Yoo JJ, Namdar K, Carey S, Fischer SE, McIntosh C, Khalvati F, Rogalla P

pubmed logopapersJul 15 2025
To develop a radiomics machine learning model for detecting liver fibrosis on CT images of the liver. With Ethics Board approval, 169 patients (68 women, 101 men; mean age, 51.2 years ± 14.7 [SD]) underwent an ultrasound-guided liver biopsy with simultaneous CT acquisitions without and following intravenous contrast material administration. Radiomic features were extracted from two regions of interest (ROIs) on the CT images, one placed at the biopsy site and another distant from the biopsy site. A development cohort, which was split further into training and validation cohorts across 100 trials, was used to determine the optimal combinations of contrast, normalization, machine learning model, and radiomic features for liver fibrosis detection based on their Area Under the Receiver Operating Characteristic curve (AUC) on the validation cohort. The optimal combinations were then used to develop one final liver fibrosis model which was evaluated on a test cohort. When averaging the AUC across all combinations, non-contrast enhanced (NC) CT (AUC, 0.6100; 95% CI: 0.5897, 0.6303) outperformed contrast-enhanced CT (AUC, 0.5680; 95% CI: 0.5471, 0.5890). The most effective model was found to be a logistic regression model with input features of maximum, energy, kurtosis, skewness, and small area high gray level emphasis extracted from non-contrast enhanced NC CT normalized using Gamma correction with γ = 1.5 (AUC, 0.7833; 95% CI: 0.7821, 0.7845). The presented radiomics-based logistic regression model holds promise as a non-invasive detection tool for subclinical, asymptomatic liver fibrosis. The model may serve as an opportunistic liver fibrosis screening tool when operated in the background during routine CT examinations covering liver parenchyma. The final liver fibrosis detection model is made publicly available at: https://github.com/IMICSLab/RadiomicsLiverFibrosisDetection .

Latent Space Consistency for Sparse-View CT Reconstruction

Duoyou Chen, Yunqing Chen, Can Zhang, Zhou Wang, Cheng Chen, Ruoxiu Xiao

arxiv logopreprintJul 15 2025
Computed Tomography (CT) is a widely utilized imaging modality in clinical settings. Using densely acquired rotational X-ray arrays, CT can capture 3D spatial features. However, it is confronted with challenged such as significant time consumption and high radiation exposure. CT reconstruction methods based on sparse-view X-ray images have garnered substantial attention from researchers as they present a means to mitigate costs and risks. In recent years, diffusion models, particularly the Latent Diffusion Model (LDM), have demonstrated promising potential in the domain of 3D CT reconstruction. Nonetheless, due to the substantial differences between the 2D latent representation of X-ray modalities and the 3D latent representation of CT modalities, the vanilla LDM is incapable of achieving effective alignment within the latent space. To address this issue, we propose the Consistent Latent Space Diffusion Model (CLS-DM), which incorporates cross-modal feature contrastive learning to efficiently extract latent 3D information from 2D X-ray images and achieve latent space alignment between modalities. Experimental results indicate that CLS-DM outperforms classical and state-of-the-art generative models in terms of standard voxel-level metrics (PSNR, SSIM) on the LIDC-IDRI and CTSpine1K datasets. This methodology not only aids in enhancing the effectiveness and economic viability of sparse X-ray reconstructed CT but can also be generalized to other cross-modal transformation tasks, such as text-to-image synthesis. We have made our code publicly available at https://anonymous.4open.science/r/CLS-DM-50D6/ to facilitate further research and applications in other domains.
Page 13 of 42417 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.