Latest Papers on Radiology AI. Tags: None

Bridging Vision and Language: Optimal Transport-Driven Radiology Report Generation via LLMs

Haifeng Zhao, Yufei Zhang, Leilei Ma, Shuo Xu, Dengdi Sun

•preprint•Jul 5 2025

Radiology report generation represents a significant application within medical AI, and has achieved impressive results. Concurrently, large language models (LLMs) have demonstrated remarkable performance across various domains. However, empirical validation indicates that general LLMs tend to focus more on linguistic fluency rather than clinical effectiveness, and lack the ability to effectively capture the relationship between X-ray images and their corresponding texts, thus resulting in poor clinical practicability. To address these challenges, we propose Optimal Transport-Driven Radiology Report Generation (OTDRG), a novel framework that leverages Optimal Transport (OT) to align image features with disease labels extracted from reports, effectively bridging the cross-modal gap. The core component of OTDRG is Alignment \& Fine-Tuning, where OT utilizes results from the encoding of label features and image visual features to minimize cross-modal distances, then integrating image and text features for LLMs fine-tuning. Additionally, we design a novel disease prediction module to predict disease labels contained in X-ray images during validation and testing. Evaluated on the MIMIC-CXR and IU X-Ray datasets, OTDRG achieves state-of-the-art performance in both natural language generation (NLG) and clinical efficacy (CE) metrics, delivering reports that are not only linguistically coherent but also clinically accurate.

X-Ray Report Generation Chest Methodology In Silico GenAI Benchmark SOTA

EdgeSRIE: A hybrid deep learning framework for real-time speckle reduction and image enhancement on portable ultrasound systems

Hyunwoo Cho, Jongsoo Lee, Jinbum Kang, Yangmo Yoo

•preprint•Jul 5 2025

Speckle patterns in ultrasound images often obscure anatomical details, leading to diagnostic uncertainty. Recently, various deep learning (DL)-based techniques have been introduced to effectively suppress speckle; however, their high computational costs pose challenges for low-resource devices, such as portable ultrasound systems. To address this issue, EdgeSRIE, which is a lightweight hybrid DL framework for real-time speckle reduction and image enhancement in portable ultrasound imaging, is introduced. The proposed framework consists of two main branches: an unsupervised despeckling branch, which is trained by minimizing a loss function between speckled images, and a deblurring branch, which restores blurred images to sharp images. For hardware implementation, the trained network is quantized to 8-bit integer precision and deployed on a low-resource system-on-chip (SoC) with limited power consumption. In the performance evaluation with phantom and in vivo analyses, EdgeSRIE achieved the highest contrast-to-noise ratio (CNR) and average gradient magnitude (AGM) compared with the other baselines (different 2-rule-based methods and other 4-DL-based methods). Furthermore, EdgeSRIE enabled real-time inference at over 60 frames per second while satisfying computational requirements (< 20K parameters) on actual portable ultrasound hardware. These results demonstrated the feasibility of EdgeSRIE for real-time, high-quality ultrasound imaging in resource-limited environments.

Ultrasound Image Synthesis Methodology Phantom/Animal Academic Lab Reproducibility

Impact of super-resolution deep learning-based reconstruction for hippocampal MRI: A volunteer and phantom study.

Takada S, Nakaura T, Yoshida N, Uetani H, Shiraishi K, Kobayashi N, Matsuo K, Morita K, Nagayama Y, Kidoh M, Yamashita Y, Takayanagi R, Hirai T

•papers•Jul 5 2025

To evaluate the effects of super-resolution deep learning-based reconstruction (SR-DLR) on thin-slice T2-weighted hippocampal MR image quality using 3 T MRI, in both human volunteers and phantoms. Thirteen healthy volunteers underwent hippocampal MRI at standard and high resolutions. Original (standard-resolution; StR) images were reconstructed with and without deep learning-based reconstruction (DLR) (Matrix = 320 × 320), and with SR-DLR (Matrix = 960 × 960). High-resolution (HR) images were also reconstructed with/without DLR (Matrix = 960 × 960). Contrast, contrast-to-noise ratio (CNR), and septum slope were analyzed. Two radiologists evaluated the images for noise, contrast, artifacts, sharpness, and overall quality. Quantitative and qualitative results are reported as medians and interquartile ranges (IQR). Comparisons used the Wilcoxon signed-rank test with Holm correction. We also scanned an American College of Radiology (ACR) phantom to evaluate the ability of our SR-DLR approach to reduce artifacts induced by zero-padding interpolation (ZIP). SR-DLR exhibited contrast comparable to original images and significantly higher than HR-images. Its slope was comparable to that of HR images but was significantly steeper than that of StR images (p < 0.01). Furthermore, the CNR of SR-DLR (10.53; IQR: 10.08, 11.69) was significantly superior to the StR-images without DLR (7.5; IQR: 6.4, 8.37), StR-images with DLR (8.73; IQR: 7.68, 9.0), HR-images without DLR (2.24; IQR: 1.43, 2.38), and HR-images with DLR (4.84; IQR: 2.99, 5.43) (p < 0.05). In the phantom study, artifacts induced by ZIP were scarcely observed when using SR-DLR. SR-DLR for hippocampal MRI potentially improves image quality beyond that of actual HR-images while reducing acquisition time.

MRI Reconstruction Neurological Retrospective Clinical In Silico Academic Lab Reproducibility

Unveiling genetic architecture of white matter microstructure through unsupervised deep representation learning of fractional anisotropy images.

Zhao X, Xie Z, He W, Fornage M, Zhi D

•papers•Jul 5 2025

Fractional anisotropy (FA) derived from diffusion MRI is a widely used marker of white matter (WM) integrity. However, conventional FA based genetic studies focus on phenotypes representing tract- or atlas-defined averages, which may oversimplify spatial patterns of WM integrity and thus limiting the genetic discovery. Here, we proposed a deep learning-based framework, termed unsupervised deep representation of white matter (UDR-WM), to extract brain-wide FA features-referred to as UDIP-FA, that capture distributed microstructural variation without prior anatomical assumptions. UDIP-FAs exhibit enhanced sensitivity to aging and substantially higher SNP-based heritability compared to traditional FA phenotypes ( P < 2.20e-16, Mann-Whitney U test, mean h 2 = 50.81%). Through multivariate GWAS, we identified 939 significant lead SNPs in 586 loci, mapped to 3480 genes, dubbed UDIP-FA related genes (UFAGs). UFAGs are overexpressed in glial cells, particularly in astrocytes and oligodendrocytes (Bonferroni-corrected P < 2e-6, Wald Test), and show strong overlap with risk gene sets for schizophrenia and Parkinson disease (Bonferroni-corrected P < 7.06e-3, Fisher exact test). UDIP-FAs are genetically correlated with multiple brain disorders and cognitive traits, including fluid intelligence and reaction time, and are associated with polygenic risk for bone mineral density. Network analyses reveal that UFAGs form disease-enriched modules across protein-protein interaction and co-expression networks, implicating core pathways in myelination and axonal structure. Notably, several UFAGs, including ACHE and ALDH2 , are targets of existing neuropsychiatric drugs. Together, our findings establish UDIP-FA as a biologically and clinically informative brain phenotype, enabling high-resolution dissection of white matter genetic architecture and its genetic links to complex brain traits.

MRI Classification Neurological Methodology In Silico Academic Lab GenAI

DHR-Net: Dynamic Harmonized registration network for multimodal medical images.

Yang X, Li D, Chen S, Deng L, Wang J, Huang S

•papers•Jul 5 2025

Although deep learning has driven remarkable advancements in medical image registration, deep neural network-based non-rigid deformation field generation methods demonstrate high accuracy in single-modality scenarios. However, multi-modal medical image registration still faces critical challenges. To address the issues of insufficient anatomical consistency and unstable deformation field optimization in cross-modal registration tasks among existing methods, this paper proposes an end-to-end medical image registration method based on a Dynamic Harmonized Registration framework (DHR-Net). DHR-Net employs a cascaded two-stage architecture, comprising a translation network and a registration network that operate in sequential processing phases. Furthermore, we propose a loss function based on the Noise Contrastive Estimation framework, which enhances anatomical consistency in cross-modal translation by maximizing mutual information between input and transformed image patches. This loss function incorporates a dynamic temperature adjustment mechanism that progressively optimizes feature contrast constraints during training to improve high-frequency detail preservation, thereby better constraining the topological structure of target images. Experiments conducted on the M&M Heart Dataset demonstrate that DHR-Net outperforms existing methods in registration accuracy, deformation field smoothness, and cross-modal robustness. The framework significantly enhances the registration quality of cardiac images while demonstrating exceptional performance in preserving anatomical structures, exhibiting promising potential for clinical applications.

Mixed Modality Registration Cardiac Methodology In Silico

Performance of open-source and proprietary large language models in generating patient-friendly radiology chest CT reports.

Prucker P, Busch F, Dorfner F, Mertens CJ, Bayerl N, Makowski MR, Bressem KK, Adams LC

•papers•Jul 5 2025

Large Language Models (LLMs) show promise for generating patient-friendly radiology reports, but the performance of open-source versus proprietary LLMs needs assessment. To compare open-source and proprietary LLMs in generating patient-friendly radiology reports from chest CTs using quantitative readability metrics and qualitative assessments by radiologists. Fifty chest CT reports were processed by seven LLMs: three open-source models (Llama-3-70b, Mistral-7b, Mixtral-8x7b) and four proprietary models (GPT-4, GPT-3.5-Turbo, Claude-3-Opus, Gemini-Ultra). Simplification was evaluated using five quantitative readability metrics. Three radiologists rated patient-friendliness on a five-point Likert scale across five criteria. Content and coherence errors were counted. Inter-rater reliability and differences among models were statistically assessed. Inter-rater reliability was substantial to near perfect (κ = 0.76-0.86). Qualitatively, Llama-3-70b was non-inferior to leading proprietary models in 4/5 categories. GPT-3.5-Turbo showed the best overall readability, outperforming GPT-4 in two metrics. Llama-3-70b outperformed GPT-3.5-Turbo on the CLI (p = 0.006). Claude-3-Opus and Gemini-Ultra scored lower on readability but were rated highly in qualitative assessments. Claude-3-Opus maintained perfect factual accuracy. Claude-3-Opus and GPT-4 outperformed Llama-3-70b in emotional sensitivity (90.0 % vs 46.0 %, p < 0.001). Llama-3-70b shows strong potential in generating quality, patient-friendly radiology reports, challenging proprietary models. With further adaptation, open-source LLMs could advance patient-friendly reporting technology.

CT LLM Radiology Report Chest Retrospective Clinical In Silico Academic Lab GenAI

PGMI assessment in mammography: AI software versus human readers.

Santner T, Ruppert C, Gianolini S, Stalheim JG, Frei S, Hondl M, Fröhlich V, Hofvind S, Widmann G

•papers•Jul 5 2025

The aim of this study was to evaluate human inter-reader agreement of parameters included in PGMI (perfect-good-moderate-inadequate) classification of screening mammograms and explore the role of artificial intelligence (AI) as an alternative reader. Five radiographers from three European countries independently performed a PGMI assessment of 520 anonymized mammography screening examinations randomly selected from representative subsets from 13 imaging centres within two European countries. As a sixth reader, a dedicated AI software was used. Accuracy, Cohen's Kappa, and confusion matrices were calculated to compare the predictions of the software against the individual assessment of the readers, as well as potential discrepancies between them. A questionnaire and a personality test were used to better understand the decision-making processes of the human readers. Significant inter-reader variability among human readers with poor to moderate agreement (κ = -0.018 to κ = 0.41) was observed, with some showing more homogenous interpretations of single features and overall quality than others. In comparison, the software surpassed human inter-reader agreement in detecting glandular tissue cuts, mammilla deviation, pectoral muscle detection, and pectoral angle measurement, while remaining features and overall image quality exhibited comparable performance to human assessment. Notably, human inter-reader disagreement of PGMI assessment in mammography is considerably high. AI software may already reliably categorize quality. Its potential for standardization and immediate feedback to achieve and monitor high levels of quality in screening programs needs further attention and should be included in future approaches. AI has promising potential for automated assessment of diagnostic image quality. Faster, more representative and more objective feedback may support radiographers in their quality management processes. Direct transformation of common PGMI workflows into an AI algorithm could be challenging.

Mammography Classification Breast Retrospective Clinical In Silico Academic Lab Reproducibility

Explainable machine learning for post PKR surgery follow-up

Soubeiran, C., Vilbert, M., Memmi, B., Georgeon, C., Borderie, V., Chessel, A., Plamann, K.

•preprint•Jul 5 2025

Photorefractive Keratectomy (PRK) is a widely used laser-assisted refractive surgical technique. In some cases, it leads to temporary subepithelial inflammation or fibrosis linked to visual haze. There are to our knowledge no physics based and quantitative tools to monitor these symptoms. We here present a comprehensive machine learning-based algorithm for the detection of fibrosis based on spectral domain optical coherence tomography images recorded in vivo on standard clinical devices. Because of the rarity of these phenomena, we trained the model on corneas presenting Fuchs dystrophy causing similar, but permanent, fibrosis symptoms, and applied it to images from patients who have undergone PRK surgery. Our study shows that the model output (probability of Fuchs dystrophy classification) provides a quantified and explainable indicator of corneal healing for post-operative follow-up.

OCT Classification Methodology In Silico

A comparative study of machine learning models for predicting neoadjuvant chemoradiotheraphy response in rectal cancer patients using radiomics and clinical features.

Ozdemir G, Tulu CN, Isik O, Olmez T, Sozutek A, Seker A

•papers•Jul 4 2025

Neoadjuvant chemoradiotherapy (nCRT) followed by total mesorectal excision is the standard treatment for locally advanced rectal cancer. However, the response to nCRT varies significantly among patients, making it crucial to identify those unlikely to benefit to avoid unnecessary toxicities. Radiomics, a technique for extracting quantitative features from medical images like computed tomography (CT), offers a promising noninvasive approach to analyze disease characteristics and potentially improve treatment decision-making. This retrospective cohort study aimed to compare the performance of various machine learning models in predicting the response to nCRT in rectal cancer based on medical data, including radiomic features extracted from CT, and to investigate the contribution of radiomics to these models. Participants who had completed a long course of nCRT before undergoing surgery were retrospectively enrolled. The patients were categorized into 2 groups: nonresponders and responders based on pathological assessment using the Ryan tumor regression grade. Pretreatment contrast-enhanced CT scans were used to extract 101 radiomic features using the PyRadiomics library. Clinical data, including age, gender, tumor grade, presence of colostomy, carcinoembryonic antigen level, constipation status, albumin, and hemoglobin levels, were also collected. Fifteen machine learning models were trained and evaluated using 10-fold cross-validation on a training set (n = 112 patients). The performance of the trained models was then assessed on an internal test set (n = 35 patients) and an external test set (n = 40 patients) using accuracy, area under the ROC curve (AUC), recall, precision, and F1-score. Among the models, the gradient boosting classifier showed the best training performance (accuracy: 0.92, AUC: 0.95, recall: 0.96, precision: 0.93, F1-score: 0.94). On the internal test set, the extra trees classifier (ETC) achieved an accuracy of 0.84, AUC of 0.90, recall of 0.92, precision of 0.87, and F1-score of 0.90. In the external validation, the ETC model yielded an accuracy of 0.75, AUC of 0.79, recall of 0.91, precision of 0.76, and F1-score of 0.83. Patient-specific biomarkers were more influential than radiomic features in the ETC model. The ETC consistently showed strong performance in predicting nCRT response. Clinical biomarkers, particularly tumor grade, were more influential than radiomic features. The model's external validation performance suggests potential for generalization.

CT Classification Abdominal Retrospective Clinical In Silico

Ultrasound Imaging and Machine Learning to Detect Missing Hand Motions for Individuals Receiving Targeted Muscle Reinnervation for Nerve-Pain Prevention.

Moukarzel ARE, Fitzgerald J, Battraw M, Pereira C, Li A, Marasco P, Joiner WM, Schofield J

•papers•Jul 4 2025

Targeted muscle reinnervation (TMR) was initially developed as a technique for bionic prosthetic control but has since become a widely adopted strategy for managing pain and preventing neuroma formation after amputation. This shift in TMR's motivation has influenced surgical approaches, in ways that may challenge conventional electromyography (EMG)-based prosthetic control. The primary goal is often to simply reinnervate nerves to accessible muscles. This contrasts the earlier, more complex TMR surgeries that optimize EMG signal detection by carefully selecting target muscles near the skin's surface and manipulate residual anatomy to electrically isolate muscle activity. Consequently, modern TMR surgeries can involve less consideration for factors such as the depth of the reinnervated muscles or electrical crosstalk between closely located reinnervated muscles, all of which can impair the effectiveness of conventional prosthetic control systems. We recruited 4 participants with TMR, varying levels of upper limb loss, and diverse sets of reinnervated muscles. Participants attempted performing movements with their missing hands and we used a muscle activity measurement technique that employs ultrasound imaging and machine learning (sonomyography) to classify the resulting muscle movements. We found that attempted missing hand movements resulted in unique patterns of deformation in the reinnervated muscles and applying a K-nearest neighbors machine learning algorithm, we could predict 4-10 hand movements for each participant with 83.3-99.4% accuracy. Our findings suggest that despite the shifting motivations for performing TMR surgery this new generation of the surgical procedure not only offers prophylactic benefits but also retains promising opportunities for bionic prosthetic control.

Ultrasound Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab

Filter Papers

Tags

Bridging Vision and Language: Optimal Transport-Driven Radiology Report Generation via LLMs

EdgeSRIE: A hybrid deep learning framework for real-time speckle reduction and image enhancement on portable ultrasound systems

Impact of super-resolution deep learning-based reconstruction for hippocampal MRI: A volunteer and phantom study.

Unveiling genetic architecture of white matter microstructure through unsupervised deep representation learning of fractional anisotropy images.

DHR-Net: Dynamic Harmonized registration network for multimodal medical images.

Performance of open-source and proprietary large language models in generating patient-friendly radiology chest CT reports.

PGMI assessment in mammography: AI software versus human readers.

Explainable machine learning for post PKR surgery follow-up

A comparative study of machine learning models for predicting neoadjuvant chemoradiotheraphy response in rectal cancer patients using radiomics and clinical features.

Ultrasound Imaging and Machine Learning to Detect Missing Hand Motions for Individuals Receiving Targeted Muscle Reinnervation for Nerve-Pain Prevention.

Ready to Sharpen Your Edge?