Latest Papers on Radiology AI. Category: papers

LLM-driven Medical Report Generation via Communication-efficient Heterogeneous Federated Learning.

Che H, Jin H, Gu Z, Lin Y, Jin C, Chen H

•papers•Jul 21 2025

Large Language Models (LLMs) have demonstrated significant potential in Medical Report Generation (MRG), yet their development requires large amounts of medical image-report pairs, which are commonly scattered across multiple centers. Centralizing these data is exceptionally challenging due to privacy regulations, thereby impeding model development and broader adoption of LLM-driven MRG models. To address this challenge, we present FedMRG, the first framework that leverages Federated Learning (FL) to enable privacy-preserving, multi-center development of LLM-driven MRG models, specifically designed to overcome the critical challenge of communication-efficient LLM training under multi-modal data heterogeneity. To start with, our framework tackles the fundamental challenge of communication overhead in federated LLM tuning by employing low-rank factorization to efficiently decompose parameter updates, significantly reducing gradient transmission costs and making LLM-driven MRG feasible in bandwidth-constrained FL settings. Furthermore, we observed the dual heterogeneity in MRG under the FL scenario: varying image characteristics across medical centers, as well as diverse reporting styles and terminology preferences. To address the data heterogeneity, we further enhance FedMRG with (1) client-aware contrastive learning in the MRG encoder, coupled with diagnosis-driven prompts, which capture both globally generalizable and locally distinctive features while maintaining diagnostic accuracy; and (2) a dual-adapter mutual boosting mechanism in the MRG decoder that harmonizes generic and specialized adapters to address variations in reporting styles and terminology. Through extensive evaluation of our established FL-MRG benchmark, we demonstrate the generalizability and adaptability of FedMRG, underscoring its potential in harnessing multi-center data and generating clinically accurate reports while maintaining communication efficiency.

Mixed Modality Report Generation Methodology In Silico Academic Lab GenAI

Noninvasive Deep Learning System for Preoperative Diagnosis of Follicular-Like Thyroid Neoplasms Using Ultrasound Images: A Multicenter, Retrospective Study.

Shen H, Huang Y, Yan W, Zhang C, Liang T, Yang D, Feng X, Liu S, Wang Y, Cao W, Cheng Y, Chen H, Ni Q, Wang F, You J, Jin Z, He W, Sun J, Yang D, Liu L, Cao B, Zhang X, Li Y, Pei S, Zhang S, Zhang B

•papers•Jul 21 2025

To propose a deep learning (DL) system for the preoperative diagnosis of follicular-like thyroid neoplasms (FNs) using routine ultrasound images. Preoperative diagnosis of malignancy in nodules suspicious for an FN remains challenging. Ultrasound, fine-needle aspiration cytology, and intraoperative frozen section pathology cannot unambiguously distinguish between benign and malignant FNs, leading to unnecessary biopsies and operations in benign nodules. This multicenter, retrospective study included 3634 patients who underwent ultrasound and received a definite diagnosis of FN from 11 centers, comprising thyroid follicular adenoma (n=1748), follicular carcinoma (n=299), and follicular variant of papillary thyroid carcinoma (n=1587). Four DL models including Inception-v3, ResNet50, Inception-ResNet-v2, and DenseNet161 were constructed on a training set (n=2587, 6178 images) and were verified on an internal validation set (n=648, 1633 images) and an external validation set (n=399, 847 images). The diagnostic efficacy of the DL models was evaluated against the ACR TI-RADS regarding the area under the curve (AUC), sensitivity, specificity, and unnecessary biopsy rate. When externally validated, the four DL models yielded robust and comparable performance, with AUCs of 82.2%-85.2%, sensitivities of 69.6%-76.0%, and specificities of 84.1%-89.2%, which outperformed the ACR TI-RADS. Compared to ACR TI-RADS, the DL models showed a higher biopsy rate of malignancy (71.6% -79.9% vs 37.7%, P<0.001) and a significantly lower unnecessary FNAB rate (8.5% -12.8% vs 40.7%, P<0.001). This study provides a noninvasive DL tool for accurate preoperative diagnosis of FNs, showing better performance than ACR TI-RADS and reducing unnecessary invasive interventions.

Ultrasound Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Ultra-low dose imaging in a standard axial field-of-view PET.

Lima T, Gomes CV, Fargier P, Strobel K, Leimgruber A

•papers•Jul 21 2025

Though ultra-low dose (ULD) imaging offers notable benefits, its widespread clinical adoption faces challenges. Long-axial field-of-view (LAFOV) PET/CT systems are expensive and scarce, while artificial intelligence (AI) shows great potential but remains largely limited to specific systems and is not yet widely used in clinical practice. However, integrating AI techniques and technological advancements into ULD imaging is helping bridge the gap between standard axial field-of-view (SAFOV) and LAFOV PET/CT systems. This paper offers an initial evaluation of ULD capabilities using one of the latest SAFOV PET/CT device. A patient injected with 16.4 MBq 18F-FDG underwent a local protocol consisting of a dynamic acquisition (first 30 min) of the abdominal section and a static whole body 74 min post-injection on a GE Omni PET/CT. From the acquired images we computed the dosimetry and compared clinical output from kidney function and brain uptake to kidney model and normal databases, respectively. The effective PET dose for this patient was 0.27 ± 0.01 mSv and the absorbed doses were 0.56 mGy, 0.89 mGy and 0.20 mGy, respectively to the brain, heart, and kidneys. The recorded kidney concentration closely followed the kidney model, matching the increase and decrease in activity concentration over time. Normal values for the z-score were observed for the brain uptake, indicating typical brain function and activity patterns consistent with healthy individuals. The signal to noise ration obtained in this study (13.1) was comparable to the LAFOV reported values. This study shows promising capabilities of ultra-low-dose imaging in SAFOV PET devices, previously deemed unattainable with SAFOV PET imaging.

PET Reconstruction Abdominal Retrospective Clinical In Silico

Trueness of artificial intelligence-based, manual, and global thresholding segmentation protocols for human mandibles.

Hernandez AKT, Dutra V, Chu TG, Yang CC, Lin WS

•papers•Jul 21 2025

To compare the trueness of artificial intelligence (AI)-based, manual, and global segmentation protocols by superimposing the resulting segmented 3D models onto reference gold standard surface scan models. Twelve dry human mandibles were used. A cone beam computed tomography (CBCT) scanner was used to scan the mandibles, and the acquired digital imaging and communications in medicine (DICOM) files were segmented using three protocols: global thresholding, manual, and AI-based segmentation (Diagnocat; Diagnocat, San Francisco, CA). The segmented files were exported as study 3D models. A structured light surface scanner (GoSCAN Spark; Creaform 3D, Levis, Canada) was used to scan all mandibles, and the resulting reference 3D models were exported. The study 3D models were compared with the respective reference 3D models by using a mesh comparison software (Geomagic Design X; 3D Systems Inc, Rock Hill, SC). Root mean square (RMS) error values were recorded to measure the magnitude of deviation (trueness), and color maps were obtained to visualize the differences. Comparisons of the trueness of three segmentation methods for differences in RMS were made using repeated measures analysis of variance (ANOVA). A two-sided 5% significance level was used for all tests in the software program. AI-based segmentations had significantly higher RMS values than manual segmentations for the entire mandible (p < 0.001), alveolar process (p < 0.001), and body of the mandible (p < 0.001). AI-based segmentations had significantly lower RMS values than manual segmentations for the condyles (p = 0.018) and ramus (p = 0.013). No significant differences were found between the AI-based and manual segmentations for the coronoid process (p = 0.275), symphysis (p = 0.346), and angle of the mandible (p = 0.344). Global thresholding had significantly higher RMS values than manual segmentations for the alveolus (p < 0.001), angle of the mandible (p < 0.001), body of the mandible (p < 0.001), condyles (p < 0.001), coronoid (p = 0.002), entire mandible (p < 0.001), ramus (p < 0.001), and symphysis (p < 0.001). Global thresholding had significantly higher RMS values than AI-based segmentation for the alveolar process (p = 0.002), angle of the mandible (p < 0.001), body of the mandible (p < 0.001), condyles (p < 0.001), coronoid (p = 0.017), mandible (p < 0.001), ramus (p < 0.001), and symphysis (p < 0.001). AI-based segmentations produced lower RMS values, indicating truer 3D models, compared to global thresholding, and showed no significant differences in some areas compared to manual segmentation. Thus, AI-based segmentation offers a level of segmentation trueness acceptable for use as an alternative to manual or global thresholding segmentation protocols.

CT Segmentation Musculoskeletal Retrospective Clinical In Silico Startup

The safety and accuracy of radiation-free spinal navigation using a short, scoliosis-specific BoneMRI-protocol, compared to CT.

Lafranca PPG, Rommelspacher Y, Walter SG, Muijs SPJ, van der Velden TA, Shcherbakova YM, Castelein RM, Ito K, Seevinck PR, Schlösser TPC

•papers•Jul 21 2025

Spinal navigation systems require pre- and/or intra-operative 3-D imaging, which expose young patients to harmful radiation. We assessed a scoliosis-specific MRI-protocol that provides T2-weighted MRI and AI-generated synthetic-CT (sCT) scans, through deep learning algorithms. This study aims to compare MRI-based synthetic-CT spinal navigation to CT for safety and accuracy of pedicle screw planning and placement at thoracic and lumbar levels. Spines of 5 cadavers were scanned with thin-slice CT and the scoliosis-specific MRI-protocol (to create sCT). Preoperatively, on both CT and sCT screw trajectories were planned. Subsequently, four spine surgeons performed surface-matched, navigated placement of 2.5 mm k-wires in all pedicles from T3 to L5. Randomization for CT/sCT, surgeon and side was performed (1:1 ratio). On postoperative CT-scans, virtual screws were simulated over k-wires. Maximum angulation, distance between planned and postoperative screw positions and medial breach rate (Gertzbein-Robbins classification) were assessed. 140 k-wires were inserted, 3 were excluded. There were no pedicle breaches > 2 mm. Of sCT-guided screws, 59 were grade A and 10 grade B. For the CT-guided screws, 47 were grade A and 21 grade B (p = 0.022). Average distance (± SD) between intraoperative and postoperative screw positions was 2.3 ± 1.5 mm in sCT-guided screws, and 2.4 ± 1.8 mm for CT (p = 0.78), average maximum angulation (± SD) was 3.8 ± 2.5° for sCT and 3.9 ± 2.9° for CT (p = 0.75). MRI-based, AI-generated synthetic-CT spinal navigation allows for safe and accurate planning and placement of thoracic and lumbar pedicle screws in a cadaveric model, without significant differences in distance and angulation between planned and postoperative screw positions compared to CT.

Mixed Modality Image Synthesis Musculoskeletal Retrospective Clinical Phantom/Animal Academic Lab Reproducibility

PXseg: automatic tooth segmentation, numbering and abnormal morphology detection based on CBCT and panoramic radiographs.

Wang R, Cheng F, Dai G, Zhang J, Fan C, Yu J, Li J, Jiang F

•papers•Jul 21 2025

PXseg, a novel approach for tooth segmentation, numbering and abnormal morphology detection in panoramic X-ray (PX), was designed and promoted through optimizing annotation and applying pre-training. Derived from multicenter, ctPXs generated from cone beam computed tomography (CBCT) with accurate 3D labels were utilized for pre-training, while conventional PXs (cPXs) with 2D labels were input for training. Visual and statistical analyses were conducted using the internal dataset to assess segmentation and numbering performances of PXseg and compared with the model without ctPX pre-training, while the accuracy of PXseg detecting abnormal teeth was evaluated using the external dataset consisting of cPXs with complex dental diseases. Besides, a diagnostic testing was performed to contrast diagnostic efficiency with and without PXseg's assistance. The DSC and F1-score of PXseg in tooth segmentation reached 0.882 and 0.902, which increased by 4.6% and 4.0% compared to the model without pre-training. For tooth numbering, the F1-score of PXseg reached 0.943 and increased by 2.2%. Based on the promotion in segmentation, the accuracy of abnormal tooth morphology detection exceeded 0.957 and was 4.3% higher. A website was constructed to assist in PX interpretation, and the diagnostic efficiency was greatly enhanced with the assistance of PXseg. The application of accurate labels in ctPX increased the pre-training weight of PXseg and improved the training effect, achieving promotions in tooth segmentation, numbering and abnormal morphology detection. Rapid and accurate results provided by PXseg streamlined the workflow of PX diagnosis, possessing significant clinical application prospect.

Mixed Modality Segmentation Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Advances in IPMN imaging: deep learning-enhanced HASTE improves lesion assessment.

Kolck J, Pivetta F, Hosse C, Cao H, Fehrenbach U, Malinka T, Wagner M, Walter-Rittel T, Geisel D

•papers•Jul 21 2025

The prevalence of asymptomatic pancreatic cysts is increasing due to advances in imaging techniques. Among these, intraductal papillary mucinous neoplasms (IPMNs) are most common, with potential for malignant transformation, often necessitating close follow-up. This study evaluates novel MRI techniques for the assessment of IPMN. From May to December 2023, 59 patients undergoing abdominal MRI were retrospectively enrolled. Examinations were conducted on 3-Tesla scanners using a Deep-Learning Accelerated Half-Fourier Single-Shot Turbo Spin-Echo (HASTEDL) and standard HASTE (HASTES) sequence. Two readers assessed minimum detectable lesion size and lesion-to-parenchyma contrast quantitatively, and qualitative assessments focused on image quality. Statistical analyses included the Wilcoxon signed-rank and chi-squared tests. HASTEDL demonstrated superior overall image quality (p < 0.001), with higher sharpness and contrast ratings (p < 0.001, p = 0.112). HASTEDL showed enhanced conspicuity of IPMN (p < 0.001) and lymph nodes (p < 0.001), with more frequent visualization of IPMN communication with the pancreatic duct (p < 0.001). Visualization of complex features (dilated pancreatic duct, septa, and mural nodules) was superior in HASTEDL (p < 0.001). The minimum detectable cyst size was significantly smaller for HASTEDL (4.17 mm ± 3.00 vs. 5.51 mm ± 4.75; p < 0.001). Inter-reader agreement was for (к 0.936) for HASTEDL, slightly lower (к 0.885) for HASTES. HASTEDL in IPMN imaging provides superior image quality and significantly reduced scan times. Given the increasing prevalence of IPMN and the ensuing clinical need for fast and precise imaging, HASTEDL improves the availability and quality of patient care. Question Are there advantages of deep-learning-accelerated MRI in imaging and assessing intraductal papillary mucinous neoplasms (IPMN)? Findings Deep-Learning Accelerated Half-Fourier Single-Shot Turbo Spin-Echo (HASTEDL) demonstrated superior image quality, improved conspicuity of "worrisome features" and detection of smaller cysts, with significantly reduced scan times. Clinical relevance HASTEDL provides faster, high-quality MRI imaging, enabling improved diagnostic accuracy and timely risk stratification for IPMN, potentially enhancing patient care and addressing the growing clinical demand for efficient imaging of IPMN.

MRI Reconstruction Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Deep learning using nasal endoscopy and T2-weighted MRI for prediction of sinonasal inverted papilloma-associated squamous cell carcinoma: an exploratory study.

Ren J, Ren Z, Zhang D, Yuan Y, Qi M

•papers•Jul 21 2025

Detecting malignant transformation of sinonasal inverted papilloma (SIP) into squamous cell carcinoma (SIP-SCC) before surgery is a clinical need. We aimed to explore the value of deep learning (DL) that leverages nasal endoscopy and T2-weighted magnetic resonance imaging (T2W-MRI) for automated tumor segmentation and differentiation between SIP and SIP-SCC. We conducted a retrospective analysis of 174 patients diagnosed with SIPs, who were divided into a training cohort (n = 121) and a testing cohort (n = 53). Three DL architectures were utilized to train automated segmentation models for endoscopic and T2W-MRI images. DL scores predicting SIP-SCC were generated using DenseNet121 from both modalities and combined to create a dual-modality DL nomogram. The diagnostic performance of the DL models was assessed alongside two radiologists, evaluated through the area under the receiver operating characteristic curve (AUROC), with comparisons made using the Delong method. In the testing cohort, the FCN_ResNet101 and VNet exhibited superior performance in automated segmentation, achieving mean dice similarity coefficients of 0.95 ± 0.03 for endoscopy and 0.93 ± 0.02 for T2W-MRI, respectively. The dual-modality DL nomogram based on automated segmentation demonstrated the highest predictive performance for SIP-SCC (AUROC 0.865), outperforming the radiology resident (AUROC 0.672, p = 0.071) and the attending radiologist (AUROC 0.707, p = 0.066), with a trend toward significance. Notably, both radiologists improved their diagnostic performance with the assistance of the DL nomogram (AUROCs 0.734 and 0.834). The DL framework integrating endoscopy and T2W-MRI offers a fully automated predictive tool for SIP-SCC. The integration of endoscopy and T2W-MRI within a well-established DL framework enables fully automated prediction of SIP-SSC, potentially improving decision-making for patients with suspicious SIP. Detecting the transformation of SIP into SIP-SCC before surgery is both critical and challenging. Endoscopy and T2W-MRI were integrated using DL for predicting SIP-SCC. The dual-modality DL nomogram outperformed two radiologists. The nomogram may improve decision-making for patients with suspicious SIP.

MRI Segmentation Retrospective Clinical In Silico

Establishment of AI-assisted diagnosis of the infraorbital posterior ethmoid cells based on deep learning.

Ni T, Qian X, Zeng Q, Ma Y, Xie Z, Dai Y, Che Z

•papers•Jul 21 2025

To construct an artificial intelligence (AI)-assisted model for identifying the infraorbital posterior ethmoid cells (IPECs) based on deep learning using sagittal CT images. Sagittal CT images of 277 samples with and 142 samples without IPECs were retrospectively collected. An experienced radiologist engaged in the relevant aspects picked a sagittal CT image that best showed IPECs. The images were randomly assigned to the training and test sets, with 541 sides in the training set and 97 sides in the test set. The training set was used to perform a five-fold cross-validation, and the results of each fold were used to predict the test set. The model was built using nnUNet, and its performance was evaluated using Dice and standard classification metrics. The model achieved a Dice coefficient of 0.900 in the training set and 0.891 in the additional set. Precision was 0.965 for the training set and 1.000 for the additional set, while sensitivity was 0.981 and 0.967, respectively. A comparison of the diagnostic efficacy between manual outlining by a less-experienced radiologist and AI-assisted outlining showed a significant improvement in detection efficiency (P < 0.05). The AI model aided correctly in identifying and outlining all IPECs, including 12 sides that the radiologist should improve portraying. AI models can help radiologists identify the IPECs, which can further prompt relevant clinical interventions.

CT Segmentation Neurological Retrospective Clinical In Silico Academic Lab

Artificial intelligence-generated apparent diffusion coefficient (AI-ADC) maps for prostate gland assessment: a multi-reader study.

Ozyoruk KB, Harmon SA, Yilmaz EC, Huang EP, Gelikman DG, Gaur S, Giganti F, Law YM, Margolis DJ, Jadda PK, Raavi S, Gurram S, Wood BJ, Pinto PA, Choyke PL, Turkbey B

•papers•Jul 21 2025

To compare the quality of AI-ADC maps and standard ADC maps in a multi-reader study. Multi-reader study included 74 consecutive patients (median age = 66 years, [IQR = 57.25-71.75 years]; median PSA = 4.30 ng/mL [IQR = 1.33-7.75 ng/mL]) with suspected or confirmed PCa, who underwent mpMRI between October 2023 and January 2024. The study was conducted in two rounds, separated by a 4-week wash-out period. In each round, four readers evaluated T2W-MRI and standard or AI-generated ADC (AI-ADC) maps. Fleiss' kappa, quadratic-weighted Cohen's kappa statistics were used to assess inter-reader agreement. Linear mixed effect models were employed to compare the quality evaluation of standard versus AI-ADC maps. AI-ADC maps exhibited significantly enhanced imaging quality compared to standard ADC maps with higher ratings in windowing ease (β = 0.67 [95% CI 0.30-1.04], p < 0.05), prostate boundary delineation (β = 1.38 [95% CI 1.03-1.73], p < 0.001), reductions in distortion (β = 1.68 [95% CI 1.30-2.05], p < 0.001), noise (β = 0.56 [95% CI 0.24-0.88], p < 0.001). AI-ADC maps reduced reacquisition requirements for all readers (β = 2.23 [95% CI 1.69-2.76], p < 0.001), supporting potential workflow efficiency gains. No differences were observed between AI-ADC and standard ADC maps' inter-reader agreement. Our multi-reader study demonstrated that AI-ADC maps improved prostate boundary delineation, had lower image noise, fewer distortions, and higher overall image quality compared to ADC maps. Question Can we synthesize apparent diffusion coefficient (ADC) maps with AI to achieve higher quality maps? Findings On average, readers rated quality factors of AI-ADC maps higher than ADC maps in 34.80% of cases, compared to 5.07% for ADC (p < 0.01). Clinical relevance AI-ADC maps may serve as a reliable diagnostic support tool thanks to their high quality, particularly when the acquired ADC maps include artifacts.

MRI Image Synthesis Abdominal Retrospective Clinical Clinical Pilot Academic Lab Benchmark SOTA

Filter Papers

Tags

LLM-driven Medical Report Generation via Communication-efficient Heterogeneous Federated Learning.

Noninvasive Deep Learning System for Preoperative Diagnosis of Follicular-Like Thyroid Neoplasms Using Ultrasound Images: A Multicenter, Retrospective Study.

Ultra-low dose imaging in a standard axial field-of-view PET.

Trueness of artificial intelligence-based, manual, and global thresholding segmentation protocols for human mandibles.

The safety and accuracy of radiation-free spinal navigation using a short, scoliosis-specific BoneMRI-protocol, compared to CT.

PXseg: automatic tooth segmentation, numbering and abnormal morphology detection based on CBCT and panoramic radiographs.

Advances in IPMN imaging: deep learning-enhanced HASTE improves lesion assessment.

Deep learning using nasal endoscopy and T2-weighted MRI for prediction of sinonasal inverted papilloma-associated squamous cell carcinoma: an exploratory study.

Establishment of AI-assisted diagnosis of the infraorbital posterior ethmoid cells based on deep learning.

Artificial intelligence-generated apparent diffusion coefficient (AI-ADC) maps for prostate gland assessment: a multi-reader study.

Ready to Sharpen Your Edge?