Latest Papers on Radiology AI. Tags: In Silico, Order: Best Match, Limit: 10.

From dictation to diagnosis: enhancing radiology reporting with integrated speech recognition in multimodal large language models.

Gertz RJ, Beste NC, Dratsch T, Lennartz S, Bremm J, Iuga AI, Bunck AC, Laukamp KR, Schönfeld M, Kottlors J

•papers•Aug 15 2025

This study evaluates the efficiency, accuracy, and cost-effectiveness of radiology reporting using audio multimodal large language models (LLMs) compared to conventional reporting with speech recognition software. We hypothesized that providing minimal audio input would enable a multimodal LLM to generate complete radiological reports. 480 reports from 80 retrospective multimodal imaging studies were reported by two board-certified radiologists using three workflows: conventional workflow (C-WF) with speech recognition software to generate findings and impressions separately and LLM-based workflow (LLM-WF) using the state-of-the-art LLMs GPT-4o and Claude Sonnet 3.5. Outcome measures included reporting time, corrections and personnel cost per report. Two radiologists assessed formal structure and report quality. Statistical analysis used ANOVA and Tukey's post hoc tests (p < 0.05). LLM-WF significantly reduced reporting time (GPT-4o/Sonnet 3.5: 38.9 s ± 22.7 s vs. C-WF: 88.0 s ± 60.9 s, p < 0.01), required fewer corrections (GPT-4o: 1.0 ± 1.1, Sonnet 3.5: 0.9 ± 1.0 vs. C-WF: 2.4 ± 2.5, p < 0.01), and lowered costs (GPT-4o: $2.3 ± $1.4, Sonnet 3.5: $2.4 ± $1.4 vs. C-WF: $3.0 ± $2.1, p < 0.01). Reports generated with Sonnet 3.5 were rated highest in quality, while GPT-4o and conventional reports showed no difference. Multimodal LLMs can generate high-quality radiology reports based solely on minimal audio input, with greater speed, fewer corrections, and reduced costs compared to conventional speech-based workflows. However, future implementation may involve licensing costs, and generalizability to broader clinical contexts warrants further evaluation. Question Comparing time, accuracy, cost, and report quality of reporting using audio input functionality of GPT-4o and Claude Sonnet 3.5 to conventional reporting with speech recognition. Findings Large language models enable radiological reporting via minimal audio input, reducing turnaround time and costs without quality loss compared to conventional reporting with speech recognition. Clinical relevance Large language model-based reporting from minimal audio input has the potential to improve efficiency and report quality, supporting more streamlined workflows in clinical radiology.

Mixed Modality LLM Radiology Report Retrospective Clinical In Silico Academic Lab GenAI

A Convergent Generalized Krylov Subspace Method for Compressed Sensing MRI Reconstruction with Gradient-Driven Denoisers

Tao Hong, Umberto Villa, Jeffrey A. Fessler

•preprint•Aug 15 2025

Model-based reconstruction plays a key role in compressed sensing (CS) MRI, as it incorporates effective image regularizers to improve the quality of reconstruction. The Plug-and-Play and Regularization-by-Denoising frameworks leverage advanced denoisers (e.g., convolutional neural network (CNN)-based denoisers) and have demonstrated strong empirical performance. However, their theoretical guarantees remain limited, as practical CNNs often violate key assumptions. In contrast, gradient-driven denoisers achieve competitive performance, and the required assumptions for theoretical analysis are easily satisfied. However, solving the associated optimization problem remains computationally demanding. To address this challenge, we propose a generalized Krylov subspace method (GKSM) to solve the optimization problem efficiently. Moreover, we also establish rigorous convergence guarantees for GKSM in nonconvex settings. Numerical experiments on CS MRI reconstruction with spiral and radial acquisitions validate both the computational efficiency of GKSM and the accuracy of the theoretical predictions. The proposed optimization method is applicable to any linear inverse problem.

MRI Reconstruction Methodology In Silico Reproducibility

A Contrast-Agnostic Method for Ultra-High Resolution Claustrum Segmentation.

Mauri C, Fritz R, Mora J, Billot B, Iglesias JE, Van Leemput K, Augustinack J, Greve DN

•papers•Aug 15 2025

The claustrum is a band-like gray matter structure located between putamen and insula whose exact functions are still actively researched. Its sheet-like structure makes it barely visible in in vivo magnetic resonance imaging (MRI) scans at typical resolutions, and neuroimaging tools for its study, including methods for automatic segmentation, are currently very limited. In this paper, we propose a contrast- and resolution-agnostic method for claustrum segmentation at ultra-high resolution (0.35 mm isotropic); the method is based on the SynthSeg segmentation framework, which leverages the use of synthetic training intensity images to achieve excellent generalization. In particular, SynthSeg requires only label maps to be trained, since corresponding intensity images are synthesized on the fly with random contrast and resolution. We trained a deep learning network for automatic claustrum segmentation, using claustrum manual labels obtained from 18 ultra-high resolution MRI scans (mostly ex vivo). We demonstrated the method to work on these 18 high resolution cases (Dice score = 0.632, mean surface distance = 0.458 mm, and volumetric similarity = 0.867 using 6-fold cross validation (CV)), and also on in vivo T1-weighted MRI scans at typical resolutions (≈1 mm isotropic). We also demonstrated that the method is robust in a test-retest setting and when applied to multimodal imaging (T2-weighted, proton density, and quantitative T1 scans). To the best of our knowledge this is the first accurate method for automatic ultra-high resolution claustrum segmentation, which is robust against changes in contrast and resolution. The method is released at https://github.com/chiara-mauri/claustrum_segmentation and as part of the neuroimaging package FreeSurfer.

MRI Segmentation Neurological Methodology In Silico Academic Lab Open Code

Comprehensive analysis of [18F]MFBG biodistribution normal patterns and variability in pediatric patients with neuroblastoma.

Wang P, Chen X, Yan X, Yan J, Yang S, Mao J, Li F, Su X

•papers•Aug 15 2025

[18F]-meta-fluorobenzylguanidine ([18F]MFBG) PET/CT is a promising imaging modality for neural crest-derived tumors, particularly neuroblastoma. Accurate interpretation necessitates an understanding of normal biodistribution and variations in physiological uptake. This study aimed to systematically characterize the physiological distribution and variability of [18F]MFBG uptake in pediatric patients to enhance clinical interpretation and differentiate normal from pathological uptake. We retrospectively analyzed [18F]MFBG PET/CT scans from 169 pediatric neuroblastoma patients, including 20 in confirmed remission, for detailed biodistribution analysis. Organ uptake was quantified using both manual segmentation and deep learning(DL)-based automatic segmentation methods. Patterns of physiological uptake variants were categorized and illustrated using representative cases. [18F]MFBG demonstrated consistent physiological uptake in the salivary glands (SUVmax 9.8 ± 3.3), myocardium (7.1 ± 1.7), and adrenal glands (4.6 ± 0.9), with low activity in bone (0.6 ± 0.2) and muscle (0.8 ± 0.2). DL-based analysis confirmed uniform, mild uptake across vertebral and peripheral skeletal structures (SUVmean 0.47 ± 0.08). Three physiological liver uptake patterns were identified: uniform (43%), left-lobe predominant (31%), and marginal (26%). Asymmetric uptake in the pancreatic head, transient brown adipose tissue activity, gallbladder excretion, and symmetric epiphyseal uptake were also recorded. These variants were not associated with structural abnormalities or clinical recurrence and showed distinct patterns from pathological lesions. This study establishes a reference for normal [18F]MFBG biodistribution and physiological variants in children. Understanding these patterns is essential for accurate image interpretation and the avoidance of diagnostic pitfalls in pediatric neuroblastoma patients.

PET Segmentation Retrospective Clinical In Silico Academic Lab

Machine learning based differential diagnosis of schizophrenia, major depression disorder and bipolar disorder using structural magnetic resonance imaging.

Cao P, Li R, Li Y, Dong Y, Tang Y, Xu G, Si Q, Chen C, Chen L, Liu W, Yao Y, Sui Y, Zhang J

•papers•Aug 15 2025

Cortical morphological abnormalities in schizophrenia (SCZ), major depressive disorder (MDD), and bipolar disorder (BD) have been identified in past research. However, their potential as objective biomarkers to differentiate these disorders remains uncertain. Machine learning models may offer a novel diagnostic tool. Structural MRI (sMRI) of 220 SCZ, 220 MDD, 220 BD, and 220 healthy controls were obtained using a 3T scanner. Volume, thickness, surface area, and mean curvature of 68 cerebral cortices were extracted using FreeSurfer. 272 features underwent 3 feature selection techniques to isolate important variables for model construction. These features were incorporated into 3 classifiers for classification. After model evaluation and hyperparameter tuning, the best-performing model was identified, along with the most significant brain measures. The univariate feature selection-Naive Bayes model achieved the best performance, with an accuracy of 0.66, macro-average AUC of 0.86, and sensitivities and specificities ranging from 0.58-0.86 to 0.81-0.93, respectively. Key features included thickness of right isthmus-cingulate cortex, area of left inferior temporal gyrus, thickness of right superior temporal gyrus, mean curvature of right pars orbitalis, thickness of left transverse temporal cortex, volume of left caudal anterior-cingulate cortex, area of right banks superior temporal sulcus, and thickness of right temporal pole. The machine learning model based on sMRI data shows promise for aiding in the differential diagnosis of SCZ, MDD, and BD. Cortical features from the cingulate and temporal lobes may highlight distinct biological mechanisms underlying each disorder.

MRI Classification Neurological Retrospective Clinical In Silico

Noninvasive prediction of microsatellite instability in stage II/III rectal cancer using dynamic contrast-enhanced magnetic resonance imaging radiomics.

Zheng CY, Zhang JM, Lin QS, Lian T, Shi LP, Chen JY, Cai YL

•papers•Aug 15 2025

Colorectal cancer stands among the most prevalent digestive system malignancies. The microsatellite instability (MSI) profile plays a crucial role in determining patient outcomes and therapy responsiveness. Traditional MSI evaluation methods require invasive tissue sampling, are lengthy, and can be compromised by intratumoral heterogeneity. To establish a non-invasive technique utilizing dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) radiomics and machine learning algorithms to determine MSI status in patients with intermediate-stage rectal cancer. This retrospective analysis examined 120 individuals diagnosed with stage II/III rectal cancer [30 MSI-high (MSI-H) and 90 microsatellite stability (MSS)/MSI-low (MSI-L) cases]. We extracted comprehensive radiomics signatures from DCE-MRI scans, encompassing textural parameters that reflect tumor heterogeneity, shape-based metrics, and histogram-derived statistical values. Least absolute shrinkage and selection operator regression facilitated feature selection, while predictive frameworks were developed using various classification algorithms (logistic regression, support vector machine, and random forest). Performance assessment utilized separate training and validation cohorts. Our investigation uncovered distinctive imaging characteristics between MSI-H and MSS/MSI-L neoplasms. MSI-H tumors exhibited significantly elevated entropy values (7.84 ± 0.92 vs 6.39 ± 0.83, P = 0.004), enhanced surface-to-volume proportions (0.72 ± 0.14 vs 0.58 ± 0.11, P = 0.008), and heightened signal intensity variation (3642 ± 782 vs 2815 ± 645, P = 0.007). The random forest model demonstrated superior classification capability with area under the curves (AUCs) of 0.891 and 0.896 across training and validation datasets, respectively. An integrated approach combining radiomics with clinical parameters further enhanced performance metrics (AUC 0.923 and 0.914), achieving 88.5% sensitivity alongside 87.2% specificity. DCE-MRI radiomics features interpreted through machine learning frameworks offer an effective strategy for MSI status assessment in intermediate-stage rectal cancer.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab

AI-Driven Integrated System for Burn Depth Prediction With Electronic Medical Records: Algorithm Development and Validation.

Rahman MM, Masry ME, Gnyawali SC, Xue Y, Gordillo G, Wachs JP

•papers•Aug 15 2025

Burn injuries represent a significant clinical challenge due to the complexity of accurately assessing burn depth, which directly influences the course of treatment and patient outcomes. Traditional diagnostic methods primarily rely on visual inspection by experienced burn surgeons. Studies report diagnostic accuracies of around 76% for experts, dropping to nearly 50% for less experienced clinicians. Such inaccuracies can result in suboptimal clinical decisions-delaying vital surgical interventions in severe cases or initiating unnecessary treatments for superficial burns. This diagnostic variability not only compromises patient care but also strains health care resources and increases the likelihood of adverse outcomes. Hence, a more consistent and precise approach to burn classification is urgently needed. The objective is to determine whether a multimodal integrated artificial intelligence (AI) system for accurate classification of burn depth can preserve diagnostic accuracy and provide an important resource when used as part of the electronic medical record (EMR). This study used a novel multimodal AI system, integrating digital photographs and ultrasound tissue Doppler imaging (TDI) data to accurately assess burn depth. These imaging modalities were accessed and processed through an EMR system, enabling real-time data retrieval and AI-assisted evaluation. TDI was instrumental in evaluating the biomechanical properties of subcutaneous tissues, using color-coded images to identify burn-induced changes in tissue stiffness and elasticity. The collected imaging data were uploaded to the EMR system (DrChrono), where they were processed by a vision-language model built on GPT-4 architecture. This model received expert-formulated prompts describing how to interpret both digital and TDI images, guiding the AI in making explainable classifications. This study evaluated whether a multimodal AI classifier, designed to identify first-, second-, and third-degree burns, could be effectively applied to imaging data stored within an EMR system. The classifier achieved an overall accuracy of 84.38%, significantly surpassing human performance benchmarks typically cited in the literature. This highlights the potential of the AI model to serve as a robust clinical decision support tool, especially in settings lacking highly specialized expertise. In addition to accuracy, the classifier demonstrated strong performance across multiple evaluation metrics. The classifier's ability to distinguish between burn severities was further validated by the area under the receiver operating characteristic: 0.97 for first-degree, 0.96 for second-degree, and a perfect 1.00 for third-degree burns, each with narrow 95% CIs. The storage of multimodal imaging data within the EMR, along with the ability for post hoc analysis by AI algorithms, offers significant advancements in burn care, enabling real-time burn depth prediction on currently available data. Using digital photos for superficial burns, easily diagnosed through physical examinations, reduces reliance on TDI, while TDI helps distinguish deep second- and third-degree burns, enhancing diagnostic efficiency.

Mixed Modality Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA GenAI

Fine-Tuned Large Language Model for Extracting Pretreatment Pancreatic Cancer According to Computed Tomography Radiology Reports.

Hirakawa H, Yasaka K, Nomura T, Tsujimoto R, Sonoda Y, Kiryu S, Abe O

•papers•Aug 15 2025

This study aimed to examine the performance of a fine-tuned large language model (LLM) in extracting pretreatment pancreatic cancer according to computed tomography (CT) radiology reports and to compare it with that of readers. This retrospective study included 2690, 886, and 378 CT reports for the training, validation, and test datasets, respectively. Clinical indication, image finding, and imaging diagnosis sections of the radiology report (used as input data) were reviewed and categorized into groups 0 (no pancreatic cancer), 1 (after treatment for pancreatic cancer), and 2 (pretreatment pancreatic cancer present) (used as reference data). A pre-trained Bidirectional Encoder Representation from the Transformers Japanese model was fine-tuned with the training and validation dataset. Group 1 data were undersampled and group 2 data were oversampled in the training dataset due to group imbalance. The best-performing model from the validation set was subsequently assessed using the test dataset for testing purposes. Additionally, three readers (readers 1, 2, and 3) were involved in classifying reports within the test dataset. The fine-tuned LLM and readers 1, 2, and 3 demonstrated an overall accuracy of 0.942, 0.984, 0.979, and 0.947; sensitivity for differentiating groups 0/1/2 of 0.944/0.960/0.921, 0.976/1.000/0.976, 0.984/0.984/0.968, and 1.000/1.000/0.841; and total time required for classification of 49 s, 2689 s, 3496 s, and 4887 s, respectively. Fine-tuned LLM effectively extracted patients with pretreatment pancreatic cancer according to CT radiology reports, and its performance was comparable to that of readers in a shorter time.

CT LLM Radiology Report Abdominal Retrospective Clinical In Silico Academic Lab GenAI

Enhancing Diagnostic Accuracy of Fresh Vertebral Compression Fractures With Deep Learning Models.

Li KY, Ye HB, Zhang YL, Huang JW, Li HL, Tian NF

•papers•Aug 15 2025

Retrospective study. The study aimed to develop and authenticated a deep learning model based on X-ray images to accurately diagnose fresh thoracolumbar vertebral compression fractures. In clinical practice, diagnosing fresh vertebral compression fractures often requires MRI. However, due to the scarcity of MRI resources and the high time and economic costs involved, some patients may not receive timely diagnosis and treatment. Using a deep learning model combined with X-rays for diagnostic assistance could potentially serve as an alternative to MRI. In this study, the main collection included X-ray images suspected of thoracolumbar vertebral compression fractures from the municipal shared database between December 2012 and February 2024. Deep learning models were constructed using frameworks of EfficientNet, MobileNet, and MnasNet, respectively. We conducted a preliminary evaluation of the deep learning model using the validation set. The diagnostic performance of the models was evaluated using metrics such as AUC value, accuracy, sensitivity, specificity, F1 score, precision, and ROC curve. Finally, the deep learning models were compared with evaluations from two spine surgeons of different experience levels on the control set. This study included a total of 3025 lateral X-ray images from 2224 patients. The data set was divided into a training set of 2388 cases, a validation set of 482 cases, and a control set of 155 cases. In the validation set, the three groups of DL models had accuracies of 83.0%, 82.4%, and 82.2%, respectively. The AUC values were 0.861, 0.852, and 0.865, respectively. In the control set, the accuracies of the three groups of DL models were 78.1%, 78.1%, and 80.7%, respectively, all higher than spinal surgeons and significantly higher than junior spine surgeon. This study developed deep learning models for detecting fresh vertebral compression fractures, demonstrating high accuracy.

X-Ray Detection Musculoskeletal Retrospective Clinical In Silico

A novel interpreted deep network for Alzheimer's disease prediction based on inverted self attention and vision transformer.

Ibrar W, Khan MA, Hamza A, Rubab S, Alqahtani O, Alouane MT, Teng S, Nam Y

•papers•Aug 15 2025

In the world, Alzheimer's disease (AD) is the utmost public reason for dementia. AD causes memory loss and disturbing mental function impairment in aging people. The loss of memory and disturbing mental function brings a significant load on patients as well as on society. So far, there is no actual treatment that can cure AD; however, early diagnosis can slow down this disease. Deep learning has shown substantial success in diagnosing AZ disease. However, challenges remain due to limited data, improper model selection, and extraction of irrelevant features. In this work, we proposed a fully automated framework based on the fusion of a vision transformer and a novel inverted residual bottleneck with self-attention (IRBwSA) for AD diagnosis. In the first step, data augmentation was performed to balance the selected dataset. After that, the vision model is designed and modified according to the dataset. Similarly, a new inverted bottleneck self-attention model is developed. The designed models are trained on the augmented dataset, and extracted features are fused using a novel search-based approach. Moreover, the designed models are interpreted using an explainable artificial intelligence technique named LIME. The fused features are finally classified using a shallow wide neural network and other classifiers. The experimental process was conducted on an augmented MRI dataset, and 96.1% accuracy and 96.05% precision rate were obtained. Comparison with a few recent techniques shows the proposed framework's better performance.

MRI Classification Neurological Methodology In Silico Academic Lab Benchmark SOTA

From dictation to diagnosis: enhancing radiology reporting with integrated speech recognition in multimodal large language models.

A Convergent Generalized Krylov Subspace Method for Compressed Sensing MRI Reconstruction with Gradient-Driven Denoisers

A Contrast-Agnostic Method for Ultra-High Resolution Claustrum Segmentation.

Comprehensive analysis of [<sup>18</sup>F]MFBG biodistribution normal patterns and variability in pediatric patients with neuroblastoma.

Machine learning based differential diagnosis of schizophrenia, major depression disorder and bipolar disorder using structural magnetic resonance imaging.

Noninvasive prediction of microsatellite instability in stage II/III rectal cancer using dynamic contrast-enhanced magnetic resonance imaging radiomics.

AI-Driven Integrated System for Burn Depth Prediction With Electronic Medical Records: Algorithm Development and Validation.

Fine-Tuned Large Language Model for Extracting Pretreatment Pancreatic Cancer According to Computed Tomography Radiology Reports.

Enhancing Diagnostic Accuracy of Fresh Vertebral Compression Fractures With Deep Learning Models.

A novel interpreted deep network for Alzheimer's disease prediction based on inverted self attention and vision transformer.

Ready to Sharpen Your Edge?