Latest Papers on Radiology AI. Tags: In Silico

A Pan-Organ Vision-Language Model for Generalizable 3D CT Representations.

Beeche C, Kim J, Tavolinejad H, Zhao B, Sharma R, Duda J, Gee J, Dako F, Verma A, Morse C, Hou B, Shen L, Sagreiya H, Davatzikos C, Damrauer S, Ritchie MD, Rader D, Long Q, Chen T, Kahn CE, Chirinos J, Witschey WR

•papers•Jul 3 2025

Generalizable foundation models for computed tomographic (CT) medical imaging data are emerging AI tools anticipated to vastly improve clinical workflow efficiency. However, existing models are typically trained within narrow imaging contexts, including limited anatomical coverage, contrast settings, and clinical indications. These constraints reduce their ability to generalize across the broad spectrum of real-world presentations encountered in volumetric CT imaging data. We introduce Percival, a vision-language foundation model trained on over 400,000 CT volumes and paired radiology reports from more than 50,000 participants enrolled in the Penn Medicine BioBank. Percival employs a dual-encoder architecture with a transformer-based image encoder and a BERT-style language encoder, aligned via symmetric contrastive learning. Percival was validated on over 20,000 participants imaging data encompassing over 100,000 CT volumes. In image-text recall tasks, Percival outperforms models trained on limited anatomical windows. To assess Percival's clinical knowledge, we evaluated the biologic, phenotypic and prognostic relevance using laboratory-wide, phenome-wide association studies and survival analyses, uncovering a rich latent structure aligned with physiological measurements and disease phenotypes.

CT LLM Radiology Report Whole Body Methodology In Silico Academic Lab GenAI Open Dataset

Outcome prediction and individualized treatment effect estimation in patients with large vessel occlusion stroke

Lisa Herzog, Pascal Bühler, Ezequiel de la Rosa, Beate Sick, Susanne Wegener

•preprint•Jul 3 2025

Mechanical thrombectomy has become the standard of care in patients with stroke due to large vessel occlusion (LVO). However, only 50% of successfully treated patients show a favorable outcome. We developed and evaluated interpretable deep learning models to predict functional outcomes in terms of the modified Rankin Scale score alongside individualized treatment effects (ITEs) using data of 449 LVO stroke patients from a randomized clinical trial. Besides clinical variables, we considered non-contrast CT (NCCT) and angiography (CTA) scans which were integrated using novel foundation models to make use of advanced imaging information. Clinical variables had a good predictive power for binary functional outcome prediction (AUC of 0.719 [0.666, 0.774]) which could slightly be improved when adding CTA imaging (AUC of 0.737 [0.687, 0.795]). Adding NCCT scans or a combination of NCCT and CTA scans to clinical features yielded no improvement. The most important clinical predictor for functional outcome was pre-stroke disability. While estimated ITEs were well calibrated to the average treatment effect, discriminatory ability was limited indicated by a C-for-Benefit statistic of around 0.55 in all models. In summary, the models allowed us to jointly integrate CT imaging and clinical features while achieving state-of-the-art prediction performance and ITE estimates. Yet, further research is needed to particularly improve ITE estimation.

Mixed Modality Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

PiCME: Pipeline for Contrastive Modality Evaluation and Encoding in the MIMIC Dataset

Michal Golovanevsky, Pranav Mahableshwarkar, Carsten Eickhoff, Ritambhara Singh

•preprint•Jul 3 2025

Multimodal deep learning holds promise for improving clinical prediction by integrating diverse patient data, including text, imaging, time-series, and structured demographics. Contrastive learning facilitates this integration by producing a unified representation that can be reused across tasks, reducing the need for separate models or encoders. Although contrastive learning has seen success in vision-language domains, its use in clinical settings remains largely limited to image and text pairs. We propose the Pipeline for Contrastive Modality Evaluation and Encoding (PiCME), which systematically assesses five clinical data types from MIMIC: discharge summaries, radiology reports, chest X-rays, demographics, and time-series. We pre-train contrastive models on all 26 combinations of two to five modalities and evaluate their utility on in-hospital mortality and phenotype prediction. To address performance plateaus with more modalities, we introduce a Modality-Gated LSTM that weights each modality according to its contrastively learned importance. Our results show that contrastive models remain competitive with supervised baselines, particularly in three-modality settings. Performance declines beyond three modalities, which supervised models fail to recover. The Modality-Gated LSTM mitigates this drop, improving AUROC from 73.19% to 76.93% and AUPRC from 51.27% to 62.26% in the five-modality setting. We also compare contrastively learned modality importance scores with attribution scores and evaluate generalization across demographic subgroups, highlighting strengths in interpretability and fairness. PiCME is the first to scale contrastive learning across all modality combinations in MIMIC, offering guidance for modality selection, training strategies, and equitable clinical prediction.

X-Ray Classification Chest Methodology In Silico Academic Lab Benchmark SOTA

Quantification of Optical Coherence Tomography Features in >3500 Patients with Inherited Retinal Disease Reveals Novel Genotype-Phenotype Associations

Woof, W. A., de Guimaraes, T. A. C., Al-Khuzaei, S., Daich Varela, M., Shah, M., Naik, G., Sen, S., Bagga, P., Chan, Y. W., Mendes, B. S., Lin, S., Ghoshal, B., Liefers, B., Fu, D. J., Georgiou, M., da Silva, A. S., Nguyen, Q., Liu, Y., Fujinami-Yokokawa, Y., Sumodhee, D., Furman, J., Patel, P. J., Moghul, I., Moosajee, M., Sallum, J., De Silva, S. R., Lorenz, B., Herrmann, P., Holz, F. G., Fujinami, K., Webster, A. R., Mahroo, O. A., Downes, S. M., Madhusudhan, S., Balaskas, K., Michaelides, M., Pontikos, N.

•preprint•Jul 3 2025

PurposeTo quantify spectral-domain optical coherence tomography (SD-OCT) images cross-sectionally and longitudinally in a large cohort of molecularly characterized patients with inherited retinal disease (IRDs) from the UK. DesignRetrospective study of imaging data. ParticipantsPatients with a clinical and molecularly confirmed diagnosis of IRD who have undergone macular SD-OCT imaging at Moorfields Eye Hospital (MEH) between 2011 and 2019. We retrospectively identified 4,240 IRD patients from the MEH database (198 distinct IRD genes), including 69,664 SD-OCT macular volumes. MethodsEight features of interest were defined: retina, fovea, intraretinal cystic spaces (ICS), subretinal fluid (SRF), subretinal hyper-reflective material (SHRM), pigment epithelium detachment (PED), ellipsoid zone loss (EZ-loss) and retinal pigment epithelium loss (RPE-loss). Manual annotations of five b-scans per SD-OCT volume was performed for the retinal features by four graders based on a defined grading protocol. A total of 1,749 b-scans from 360 SD-OCT volumes across 275 patients were annotated for the eight retinal features for training and testing of a neural-network-based segmentation model, AIRDetect-OCT, which was then applied to the entire imaging dataset. Main Outcome MeasuresPerformance of AIRDetect-OCT, comparing to inter-grader agreement was evaluated using Dice score on a held-out dataset. Feature prevalence, volume and area were analysed cross-sectionally and longitudinally. ResultsThe inter-grader Dice score for manual segmentation was [≥]90% for retina, ICS, SRF, SHRM and PED, >77% for both EZ-loss and RPE-loss. Model-grader agreement was >80% for segmentation of retina, ICS, SRF, SHRM, and PED, and >68% for both EZ-loss and RPE-loss. Automatic segmentation was applied to 272,168 b-scans across 7,405 SD-OCT volumes from 3,534 patients encompassing 176 unique genes. Accounting for age, male patients exhibited significantly more EZ-loss (19.6mm2 vs 17.9mm2, p<2.8x10-4) and RPE-loss (7.79mm2 vs 6.15mm2, p<3.2x10-6) than females. RPE-loss was significantly higher in Asian patients than other ethnicities (9.37mm2 vs 7.29mm2, p<0.03). ICS average total volume was largest in RS1 (0.47mm3) and NR2E3 (0.25mm3), SRF in BEST1 (0.21mm3) and PED in EFEMP1 (0.34mm3). BEST1 and PROM1 showed significantly different patterns of EZ-loss (p<10-4) and RPE-loss (p<0.02) comparing the dominant to the recessive forms. Sectoral analysis revealed significantly increased EZ-loss in the inferior quadrant compared to superior quadrant for RHO ({Delta}=-0.414 mm2, p=0.036) and EYS ({Delta}=-0.908 mm2, p=1.5x10-4). In ABCA4 retinopathy, more severe genotypes (group A) were associated with faster progression of EZ-loss (2.80{+/-}0.62 mm2/yr), whilst the p.(Gly1961Glu) variant (group D) was associated with slower progression (0.56 {+/-}0.18 mm2/yr). There were also sex differences within groups with males in group A experiencing significantly faster rates of progression of RPE-loss (2.48 {+/-}1.40 mm2/yr vs 0.87 {+/-}0.62 mm2/yr, p=0.047), but lower rates in groups B, C, and D. ConclusionsAIRDetect-OCT, a novel deep learning algorithm, enables large-scale OCT feature quantification in IRD patients uncovering cross-sectional and longitudinal phenotype correlations with demographic and genotypic parameters.

OCT Segmentation Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Radiology report generation using automatic keyword adaptation, frequency-based multi-label classification and text-to-text large language models.

He Z, Wong ANN, Yoo JS

•papers•Jul 3 2025

Radiology reports are essential in medical imaging, providing critical insights for diagnosis, treatment, and patient management by bridging the gap between radiologists and referring physicians. However, the manual generation of radiology reports is time-consuming and labor-intensive, leading to inefficiencies and delays in clinical workflows, particularly as case volumes increase. Although deep learning approaches have shown promise in automating radiology report generation, existing methods, particularly those based on the encoder-decoder framework, suffer from significant limitations. These include a lack of explainability due to black-box features generated by encoder and limited adaptability to diverse clinical settings. In this study, we address these challenges by proposing a novel deep learning framework for radiology report generation that enhances explainability, accuracy, and adaptability. Our approach replaces traditional black-box features in computer vision with transparent keyword lists, improving the interpretability of the feature extraction process. To generate these keyword lists, we apply a multi-label classification technique, which is further enhanced by an automatic keyword adaptation mechanism. This adaptation dynamically configures the multi-label classification to better adapt specific clinical environments, reducing the reliance on manually curated reference keyword lists and improving model adaptability across diverse datasets. We also introduce a frequency-based multi-label classification strategy to address the issue of keyword imbalance, ensuring that rare but clinically significant terms are accurately identified. Finally, we leverage a pre-trained text-to-text large language model (LLM) to generate human-like, clinically relevant radiology reports from the extracted keyword lists, ensuring linguistic quality and clinical coherence. We evaluate our method using two public datasets, IU-XRay and MIMIC-CXR, demonstrating superior performance over state-of-the-art methods. Our framework not only improves the accuracy and reliability of radiology report generation but also enhances the explainability of the process, fostering greater trust and adoption of AI-driven solutions in clinical practice. Comprehensive ablation studies confirm the robustness and effectiveness of each component, highlighting the significant contributions of our framework to advancing automated radiology reporting. In conclusion, we developed a novel deep-learning based radiology report generation method for preparing high-quality and explainable radiology report for chest X-ray images using the multi-label classification and a text-to-text large language model. Our method could address the lack of explainability in the current workflow and provide a clear and flexible automated pipeline to reduce the workload of radiologists and support the further applications related to Human-AI interactive communications.

X-Ray Report Generation Chest Methodology In Silico Academic Lab GenAI

A deep active learning framework for mitotic figure detection with minimal manual annotation and labelling.

Liu E, Lin A, Kakodkar P, Zhao Y, Wang B, Ling C, Zhang Q

•papers•Jul 3 2025

Accurately and efficiently identifying mitotic figures (MFs) is crucial for diagnosing and grading various cancers, including glioblastoma (GBM), a highly aggressive brain tumour requiring precise and timely intervention. Traditional manual counting of MFs in whole slide images (WSIs) is labour-intensive and prone to interobserver variability. Our study introduces a deep active learning framework that addresses these challenges with minimal human intervention. We utilized a dataset of GBM WSIs from The Cancer Genome Atlas (TCGA). Our framework integrates convolutional neural networks (CNNs) with an active learning strategy. Initially, a CNN is trained on a small, annotated dataset. The framework then identifies uncertain samples from the unlabelled data pool, which are subsequently reviewed by experts. These ambiguous cases are verified and used for model retraining. This iterative process continues until the model achieves satisfactory performance. Our approach achieved 81.75% precision and 82.48% recall for MF detection. For MF subclass classification, it attained an accuracy of 84.1%. Furthermore, this approach significantly reduced annotation time - approximately 900 min across 66 WSIs - cutting the effort nearly in half compared to traditional methods. Our deep active learning framework demonstrates a substantial improvement in both efficiency and accuracy for MF detection and classification in GBM WSIs. By reducing reliance on large annotated datasets, it minimizes manual effort while maintaining high performance. This methodology can be generalized to other medical imaging tasks, supporting broader applications in the healthcare domain.

Mixed Modality Detection Neurological Methodology In Silico Academic Lab Benchmark SOTA

Interpretable and generalizable deep learning model for preoperative assessment of microvascular invasion and outcome in hepatocellular carcinoma based on MRI: a multicenter study.

Dong X, Jia X, Zhang W, Zhang J, Xu H, Xu L, Ma C, Hu H, Luo J, Zhang J, Wang Z, Ji W, Yang D, Yang Z

•papers•Jul 3 2025

This study aimed to develop an interpretable, domain-generalizable deep learning model for microvascular invasion (MVI) assessment in hepatocellular carcinoma (HCC). Utilizing a retrospective dataset of 546 HCC patients from five centers, we developed and validated a clinical-radiological model and deep learning models aimed at MVI prediction. The models were developed on a dataset of 263 cases consisting of data from three centers, internally validated on a set of 66 patients, and externally tested on two independent sets. An adversarial network-based deep learning (AD-DL) model was developed to learn domain-invariant features from multiple centers within the training set. The area under the receiver operating characteristic curve (AUC) was calculated using pathological MVI status. With the best-performed model, early recurrence-free survival (ERFS) stratification was validated on the external test set by the log-rank test, and the differentially expressed genes (DEGs) associated with MVI status were tested on the RNA sequencing analysis of the Cancer Imaging Archive. The AD-DL model demonstrated the highest diagnostic performance and generalizability with an AUC of 0.793 in the internal test set, 0.801 in external test set 1, and 0.773 in external test set 2. The model's prediction of MVI status also demonstrated a significant correlation with ERFS (p = 0.048). DEGs associated with MVI status were primarily enriched in the metabolic processes and the Wnt signaling pathway, and the epithelial-mesenchymal transition process. The AD-DL model allows preoperative MVI prediction and ERFS stratification in HCC patients, which has a good generalizability and biological interpretability. The adversarial network-based deep learning model predicts MVI status well in HCC patients and demonstrates good generalizability. By integrating bioinformatics analysis of the model's predictions, it achieves biological interpretability, facilitating its clinical translation. Current MVI assessment models for HCC lack interpretability and generalizability. The adversarial network-based model's performance surpassed clinical radiology and squeeze-and-excitation network-based models. Biological function analysis was employed to enhance the interpretability and clinical translatability of the adversarial network-based model.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA GenAI

Can Whole-Thyroid-Based CT Radiomics Model Achieve the Performance of Lesion-Based Model in Predicting the Thyroid Nodules Malignancy? - A Comparative Study.

Yuan W, Wu J, Mai W, Li H, Li Z

•papers•Jul 3 2025

Machine learning is now extensively implemented in medical imaging for preoperative risk stratification and post-therapeutic outcome assessment, enhancing clinical decision-making. Numerous studies have focused on predicting whether thyroid nodules are benign or malignant using a nodule-based approach, which is time-consuming, inefficient, and overlooks the impact of the peritumoral region. To evaluate the effectiveness of using the whole-thyroid as the region of interest in differentiating between benign and malignant thyroid nodules, exploring the potential application value of the entire thyroid. This study enrolled 1121 patients with thyroid nodules between February 2017 and May 2023. All participants underwent contrast-enhanced CT scans prior to surgical intervention. Radiomics features were extracted from arterial phase images, and feature dimensionality reduction was performed using the Least Absolute Shrinkage and Selection Operator (LASSO) algorithm. Four machine learning models were trained on the selected features within the training cohort and subsequently evaluated on the independent validation cohort. The diagnostic performance of whole-thyroid versus nodule-based radiomics models was compared through receiver operating characteristic (ROC) curve analysis and area under the curve (AUC) metrics. The nodule-based logistic regression model achieved an AUC of 0.81 in the validation set, with sensitivity, specificity, and accuracy of 78.6%, 69.4%, and 75.6%, respectively. The whole-thyroid-based random forest model attained an AUC of 0.80, with sensitivity, specificity, and accuracy of 90.0%, 51.9.%, and 80.1%, respectively. The AUC advantage ratios on the LR, DT, RF, and SVM models are approximately - 2.47%, 0.00%, - 4.76%, and - 4.94%, respectively. The Delong test showed no significant differences among the four machine learning models regarding the region of interest defined by either the thyroid primary lesion or the whole thyroid. There was no significant difference in distinguishing between benign and malignant thyroid nodules using either a nodule-based or whole-thyroid-based strategy for ROI outlining. We hypothesize that the whole-thyroid approach provides enhanced diagnostic capability for detecting papillary thyroid carcinomas (PTCs) with ill-defined margins.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab

Group-derived and individual disconnection in stroke: recovery prediction and deep graph learning

Bey, P., Dhindsa, K., Rackoll, T., Feldheim, J., Bönstrup, M., Thomalla, G., Schulz, R., Cheng, B., Gerloff, C., Endres, M., Nave, A. H., Ritter, P.

•preprint•Jul 3 2025

Recent advances in the treatment of acute ischemic stroke contribute to improved patient outcomes, yet the mechanisms driving long-term disease trajectory are not well-understood. Current trends in the literature emphasize the distributed disruptive impact of stroke lesions on brain network organization. While most studies use population-derived data to investigate lesion interference on healthy tissue, the potential for individualized treatment strategies remains underexplored due to a lack of availability and effective utilization of the necessary clinical imaging data. To validate the potential for individualized patient evaluation, we explored and compared the differential information in network models based on normative and individual data. We further present our novel deep learning approach providing usable and accurate estimates of individual stroke impact utilizing minimal imaging data, thus bridging the data gap hindering individualized treatment planning. We created normative and individual disconnectomes for each of 78 patients (mean age 65.1 years, 32 females) from two independent cohort studies. MRI data and Barthel Index, as a measure of activities of daily living, were collected in the acute and early sub-acute phase after stroke (baseline) and at three months post stroke incident. Disconnectomes were subsequently described using 12 network metrics, including clustering coefficient and transitivity. Metrics were first compared between disconnectomes and further utilized as features in a classifier to predict a patients disease trajectory, as defined by three months Barthel Index. We then developed a deep learning architecture based on graph convolution and trained it to predict properties of the individual disconnectomes from the normative disconnectomes. Both disconnectomes showed statistically significant differences in topology and predictive power. Normative disconnectomes included a statistically significant larger number of connections (N=604 for normative versus N=210 for individual) and agreement between network properties ranged from r2=0.01 for clustering coefficient to r2=0.8 for assortativity, highlighting the impact of disconnectome choice on subsequent analysis. To predict patient deficit severity, individual data achieved an AUC score of 0.94 compared to an AUC score of 0.85 for normative based features. Our deep learning estimates showed high correlation with individual features (mean r2=0.94) and a comparable performance with an AUC score of 0.93. We were able to show how normative data-based analysis of stroke disconnections provides limited information regarding patient recovery. In contrast, individual data provided higher prognostic precision. We presented a novel approach to curb the need for individual data while retaining most of the differential information encoding individual patient disease trajectory.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab

De-speckling of medical ultrasound image using metric-optimized knowledge distillation.

Khalifa M, Hamza HM, Hosny KM

•papers•Jul 3 2025

Ultrasound imaging provides real-time views of internal organs, which are essential for accurate diagnosis and treatment. However, speckle noise, caused by wave interactions with tissues, creates a grainy texture that hides crucial details. This noise varies with image intensity, which limits the effectiveness of traditional denoising methods. We introduce the Metric-Optimized Knowledge Distillation (MK) model, a deep-learning approach that utilizes Knowledge Distillation (KD) for denoising ultrasound images. Our method transfers knowledge from a high-performing teacher network to a smaller student network designed for this task. By leveraging KD, the model removes speckle noise while preserving key anatomical details needed for accurate diagnosis. A key innovation of our paper is the metric-guided training strategy. We achieve this by repeatedly computing evaluation metrics used to assess our model. Incorporating them into the loss function enables the model to reduce noise and enhance image quality optimally. We evaluate our proposed method against state-of-the-art despeckling techniques, including DNCNN and other recent models. The results demonstrate that our approach performs superior noise reduction and image quality preservation, making it a valuable tool for enhancing the diagnostic utility of ultrasound images.

Ultrasound Reconstruction Methodology In Silico Academic Lab

Filter Papers

Tags

A Pan-Organ Vision-Language Model for Generalizable 3D CT Representations.

Outcome prediction and individualized treatment effect estimation in patients with large vessel occlusion stroke

PiCME: Pipeline for Contrastive Modality Evaluation and Encoding in the MIMIC Dataset

Quantification of Optical Coherence Tomography Features in >3500 Patients with Inherited Retinal Disease Reveals Novel Genotype-Phenotype Associations

Radiology report generation using automatic keyword adaptation, frequency-based multi-label classification and text-to-text large language models.

A deep active learning framework for mitotic figure detection with minimal manual annotation and labelling.

Interpretable and generalizable deep learning model for preoperative assessment of microvascular invasion and outcome in hepatocellular carcinoma based on MRI: a multicenter study.

Can Whole-Thyroid-Based CT Radiomics Model Achieve the Performance of Lesion-Based Model in Predicting the Thyroid Nodules Malignancy? - A Comparative Study.

Group-derived and individual disconnection in stroke: recovery prediction and deep graph learning

De-speckling of medical ultrasound image using metric-optimized knowledge distillation.

Ready to Sharpen Your Edge?