Latest Papers on Radiology AI. Tags: Mixed Modality

Fine-tuning of language models for automated structuring of medical exam reports to improve patient screening and analysis.

Elvas LB, Santos R, Ferreira JC

•papers•Jul 4 2025

The analysis of medical imaging reports is labour-intensive but crucial for accurate diagnosis and effective patient screening. Often presented as unstructured text, these reports require systematic organisation for efficient interpretation. This study applies Natural Language Processing (NLP) techniques tailored for European Portuguese to automate the analysis of cardiology reports, streamlining patient screening. Using a methodology involving tokenization, part-of-speech tagging and manual annotation, the MediAlbertina PT-PT language model was fine-tuned, achieving 96.13% accuracy in entity recognition. The system enables rapid identification of conditions such as aortic stenosis through an interactive interface, substantially reducing the time and effort required for manual review. It also facilitates patient monitoring and disease quantification, optimising healthcare resource allocation. This research highlights the potential of NLP tools in Portuguese healthcare contexts, demonstrating their applicability to medical report analysis and their broader relevance in improving efficiency and decision-making in diverse clinical environments.

Mixed Modality LLM Radiology Report Cardiac Methodology In Silico Academic Lab GenAI

A Chain of Diagnosis Framework for Accurate and Explainable Radiology Report Generation.

Jin H, Che H, He S, Chen H

•papers•Jul 3 2025

Despite the progress of radiology report generation (RRG), existing works face two challenges: 1) The performances in clinical efficacy are unsatisfactory, especially for lesion attributes description; 2) the generated text lacks explainability, making it difficult for radiologists to trust the results. To address the challenges, we focus on a trustworthy RRG model, which not only generates accurate descriptions of abnormalities, but also provides basis of its predictions. To this end, we propose a framework named chain of diagnosis (CoD), which maintains a chain of diagnostic process for clinically accurate and explainable RRG. It first generates question-answer (QA) pairs via diagnostic conversation to extract key findings, then prompts a large language model with QA diagnoses for accurate generation. To enhance explainability, a diagnosis grounding module is designed to match QA diagnoses and generated sentences, where the diagnoses act as a reference. Moreover, a lesion grounding module is designed to locate abnormalities in the image, further improving the working efficiency of radiologists. To facilitate label-efficient training, we propose an omni-supervised learning strategy with clinical consistency to leverage various types of annotations from different datasets. Our efforts lead to 1) an omni-labeled RRG dataset with QA pairs and lesion boxes; 2) a evaluation tool for assessing the accuracy of reports in describing lesion location and severity; 3) extensive experiments to demonstrate the effectiveness of CoD, where it outperforms both specialist and generalist models consistently on two RRG benchmarks and shows promising explainability by accurately grounding generated sentences to QA diagnoses and images.

Mixed Modality Report Generation Whole Body Methodology In Silico Academic Lab Open Dataset Open Code

Outcome prediction and individualized treatment effect estimation in patients with large vessel occlusion stroke

Lisa Herzog, Pascal Bühler, Ezequiel de la Rosa, Beate Sick, Susanne Wegener

•preprint•Jul 3 2025

Mechanical thrombectomy has become the standard of care in patients with stroke due to large vessel occlusion (LVO). However, only 50% of successfully treated patients show a favorable outcome. We developed and evaluated interpretable deep learning models to predict functional outcomes in terms of the modified Rankin Scale score alongside individualized treatment effects (ITEs) using data of 449 LVO stroke patients from a randomized clinical trial. Besides clinical variables, we considered non-contrast CT (NCCT) and angiography (CTA) scans which were integrated using novel foundation models to make use of advanced imaging information. Clinical variables had a good predictive power for binary functional outcome prediction (AUC of 0.719 [0.666, 0.774]) which could slightly be improved when adding CTA imaging (AUC of 0.737 [0.687, 0.795]). Adding NCCT scans or a combination of NCCT and CTA scans to clinical features yielded no improvement. The most important clinical predictor for functional outcome was pre-stroke disability. While estimated ITEs were well calibrated to the average treatment effect, discriminatory ability was limited indicated by a C-for-Benefit statistic of around 0.55 in all models. In summary, the models allowed us to jointly integrate CT imaging and clinical features while achieving state-of-the-art prediction performance and ITE estimates. Yet, further research is needed to particularly improve ITE estimation.

Mixed Modality Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

MedFormer: Hierarchical Medical Vision Transformer with Content-Aware Dual Sparse Selection Attention

Zunhui Xia, Hongxing Li, Libin Lan

•preprint•Jul 3 2025

Medical image recognition serves as a key way to aid in clinical diagnosis, enabling more accurate and timely identification of diseases and abnormalities. Vision transformer-based approaches have proven effective in handling various medical recognition tasks. However, these methods encounter two primary challenges. First, they are often task-specific and architecture-tailored, limiting their general applicability. Second, they usually either adopt full attention to model long-range dependencies, resulting in high computational costs, or rely on handcrafted sparse attention, potentially leading to suboptimal performance. To tackle these issues, we present MedFormer, an efficient medical vision transformer with two key ideas. First, it employs a pyramid scaling structure as a versatile backbone for various medical image recognition tasks, including image classification and dense prediction tasks such as semantic segmentation and lesion detection. This structure facilitates hierarchical feature representation while reducing the computation load of feature maps, highly beneficial for boosting performance. Second, it introduces a novel Dual Sparse Selection Attention (DSSA) with content awareness to improve computational efficiency and robustness against noise while maintaining high performance. As the core building technique of MedFormer, DSSA is explicitly designed to attend to the most relevant content. In addition, a detailed theoretical analysis has been conducted, demonstrating that MedFormer has superior generality and efficiency in comparison to existing medical vision transformers. Extensive experiments on a variety of imaging modality datasets consistently show that MedFormer is highly effective in enhancing performance across all three above-mentioned medical image recognition tasks. The code is available at https://github.com/XiaZunhui/MedFormer.

Mixed Modality Classification Methodology In Silico Academic Lab Open Code Benchmark SOTA

Embedding-Based Federated Data Sharing via Differentially Private Conditional VAEs

Francesco Di Salvo, Hanh Huyen My Nguyen, Christian Ledig

•preprint•Jul 3 2025

Deep Learning (DL) has revolutionized medical imaging, yet its adoption is constrained by data scarcity and privacy regulations, limiting access to diverse datasets. Federated Learning (FL) enables decentralized training but suffers from high communication costs and is often restricted to a single downstream task, reducing flexibility. We propose a data-sharing method via Differentially Private (DP) generative models. By adopting foundation models, we extract compact, informative embeddings, reducing redundancy and lowering computational overhead. Clients collaboratively train a Differentially Private Conditional Variational Autoencoder (DP-CVAE) to model a global, privacy-aware data distribution, supporting diverse downstream tasks. Our approach, validated across multiple feature extractors, enhances privacy, scalability, and efficiency, outperforming traditional FL classifiers while ensuring differential privacy. Additionally, DP-CVAE produces higher-fidelity embeddings than DP-CGAN while requiring $5{\times}$ fewer parameters.

Mixed Modality Image Synthesis Whole Body Methodology In Silico Academic Lab Open Code Ethics

Joint Shape Reconstruction and Registration via a Shared Hybrid Diffeomorphic Flow.

Shi H, Wang P, Zhang S, Zhao X, Yang B, Zhang C

•papers•Jul 3 2025

Deep implicit functions (DIFs) effectively represent shapes by using a neural network to map 3D spatial coordinates to scalar values that encode the shape's geometry, but it is difficult to establish correspondences between shapes directly, limiting their use in medical image registration. The recently presented deformation field-based methods achieve implicit templates learning via template field learning with DIFs and deformation field learning, establishing shape correspondence through deformation fields. Although these approaches enable joint learning of shape representation and shape correspondence, the decoupled optimization for template field and deformation field, caused by the absence of deformation annotations lead to a relatively accurate template field but an underoptimized deformation field. In this paper, we propose a novel implicit template learning framework via a shared hybrid diffeomorphic flow (SHDF), which enables shared optimization for deformation and template, contributing to better deformations and shape representation. Specifically, we formulate the signed distance function (SDF, a type of DIFs) as a one-dimensional (1D) integral, unifying dimensions to match the form used in solving ordinary differential equation (ODE) for deformation field learning. Then, SDF in 1D integral form is integrated seamlessly into the deformation field learning. Using a recurrent learning strategy, we frame shape representations and deformations as solving different initial value problems of the same ODE. We also introduce a global smoothness regularization to handle local optima due to limited outside-of-shape data. Experiments on medical datasets show that SHDF outperforms state-of-the-art methods in shape representation and registration.

Mixed Modality Registration Methodology In Silico Academic Lab

Recent Advances in Applying Machine Learning to Proton Radiotherapy.

Wildman VL, Wynne J, Momin S, Kesarwala AH, Yang X

•papers•Jul 3 2025

In radiation oncology, precision and timeliness of both planning and treatment are paramount values of patient care. Machine learning has increasingly been applied to various aspects of photon radiotherapy to reduce manual error and improve the efficiency of clinical decision making; however, applications to proton therapy remain an emerging field in comparison. This systematic review aims to comprehensively cover all current and potential applications of machine learning to the proton therapy clinical workflow, an area that has not been extensively explored in literature. PubMed and Embase were utilized to identify studies pertinent to machine learning in proton therapy between 2019 to 2024. An initial search on PubMed was made with the search strategy "'proton therapy', 'machine learning', 'deep learning'". A subsequent search on Embase was made with "("proton therapy") AND ("machine learning" OR "deep learning")". In total, 38 relevant studies have been summarized and incorporated. It is observed that U-Net architectures are prevalent in the patient pre-screening process, while convolutional neural networks play an important role in dose and range prediction. Both image quality improvement and transformation between modalities to decrease extraneous radiation are popular targets of various models. To adaptively improve treatments, advanced architectures such as general deep inception or deep cascaded convolution neural networks improve online dose verification and range monitoring. With the rising clinical usage of proton therapy, machine learning models have been increasingly proposed to facilitate both treatment and discovery. Significantly improving patient screening, planning, image quality, and dose and range calculation, machine learning is advancing the precision and personalization of proton therapy.

Mixed Modality Image Synthesis Review In Silico Academic Lab Benchmark SOTA

A deep active learning framework for mitotic figure detection with minimal manual annotation and labelling.

Liu E, Lin A, Kakodkar P, Zhao Y, Wang B, Ling C, Zhang Q

•papers•Jul 3 2025

Accurately and efficiently identifying mitotic figures (MFs) is crucial for diagnosing and grading various cancers, including glioblastoma (GBM), a highly aggressive brain tumour requiring precise and timely intervention. Traditional manual counting of MFs in whole slide images (WSIs) is labour-intensive and prone to interobserver variability. Our study introduces a deep active learning framework that addresses these challenges with minimal human intervention. We utilized a dataset of GBM WSIs from The Cancer Genome Atlas (TCGA). Our framework integrates convolutional neural networks (CNNs) with an active learning strategy. Initially, a CNN is trained on a small, annotated dataset. The framework then identifies uncertain samples from the unlabelled data pool, which are subsequently reviewed by experts. These ambiguous cases are verified and used for model retraining. This iterative process continues until the model achieves satisfactory performance. Our approach achieved 81.75% precision and 82.48% recall for MF detection. For MF subclass classification, it attained an accuracy of 84.1%. Furthermore, this approach significantly reduced annotation time - approximately 900 min across 66 WSIs - cutting the effort nearly in half compared to traditional methods. Our deep active learning framework demonstrates a substantial improvement in both efficiency and accuracy for MF detection and classification in GBM WSIs. By reducing reliance on large annotated datasets, it minimizes manual effort while maintaining high performance. This methodology can be generalized to other medical imaging tasks, supporting broader applications in the healthcare domain.

Mixed Modality Detection Neurological Methodology In Silico Academic Lab Benchmark SOTA

Large language model trained on clinical oncology data predicts cancer progression.

Zhu M, Lin H, Jiang J, Jinia AJ, Jee J, Pichotta K, Waters M, Rose D, Schultz N, Chalise S, Valleru L, Morin O, Moran J, Deasy JO, Pilai S, Nichols C, Riely G, Braunstein LZ, Li A

•papers•Jul 2 2025

Subspecialty knowledge barriers have limited the adoption of large language models (LLMs) in oncology. We introduce Woollie, an open-source, oncology-specific LLM trained on real-world data from Memorial Sloan Kettering Cancer Center (MSK) across lung, breast, prostate, pancreatic, and colorectal cancers, with external validation using University of California, San Francisco (UCSF) data. Woollie surpasses ChatGPT in medical benchmarks and excels in eight non-medical benchmarks. Analyzing 39,319 radiology impression notes from 4002 patients, it achieved an overall area under the receiver operating characteristic curve (AUROC) of 0.97 for cancer progression prediction on MSK data, including a notable 0.98 AUROC for pancreatic cancer. On UCSF data, it achieved an overall AUROC of 0.88, excelling in lung cancer detection with an AUROC of 0.95. As the first oncology specific LLM validated across institutions, Woollie demonstrates high accuracy and consistency across cancer types, underscoring its potential to enhance cancer progression analysis.

Mixed Modality LLM Radiology Report Retrospective Clinical In Silico Academic Lab Open Code Benchmark SOTA GenAI

[AI-based applications in medical image computing].

Kepp T, Uzunova H, Ehrhardt J, Handels H

•papers•Jul 2 2025

The processing of medical images plays a central role in modern diagnostics and therapy. Automated processing and analysis of medical images can efficiently accelerate clinical workflows and open new opportunities for improved patient care. However, the high variability, complexity, and varying quality of medical image data pose significant challenges. In recent years, the greatest progress in medical image analysis has been achieved through artificial intelligence (AI), particularly by using deep neural networks in the context of deep learning. These methods are successfully applied in medical image analysis, including segmentation, registration, and image synthesis.AI-based segmentation allows for the precise delineation of organs, tissues, or pathological changes. The application of AI-based image registration supports the accelerated creation of 3D planning models for complex surgeries by aligning relevant anatomical structures from different imaging modalities (e.g., CT, MRI, and PET) or time points. Generative AI methods can be used to generate additional image data for the improved training of AI models, thereby expanding the potential applications of deep learning methods in medicine. Examples from radiology, ophthalmology, dermatology, and surgery are described to illustrate their practical relevance and the potential of AI in image-based diagnostics and therapy.

Mixed Modality Segmentation Review Concept Academic Lab GenAI

Filter Papers

Tags

Fine-tuning of language models for automated structuring of medical exam reports to improve patient screening and analysis.

A Chain of Diagnosis Framework for Accurate and Explainable Radiology Report Generation.

Outcome prediction and individualized treatment effect estimation in patients with large vessel occlusion stroke

MedFormer: Hierarchical Medical Vision Transformer with Content-Aware Dual Sparse Selection Attention

Embedding-Based Federated Data Sharing via Differentially Private Conditional VAEs

Joint Shape Reconstruction and Registration via a Shared Hybrid Diffeomorphic Flow.

Recent Advances in Applying Machine Learning to Proton Radiotherapy.

A deep active learning framework for mitotic figure detection with minimal manual annotation and labelling.

Large language model trained on clinical oncology data predicts cancer progression.

[AI-based applications in medical image computing].

Ready to Sharpen Your Edge?