Latest Papers on Radiology AI. Tags: Mixed Modality

Intraoperative 2D/3D Registration via Spherical Similarity Learning and Inference-Time Differentiable Levenberg-Marquardt Optimization

Minheng Chen, Youyong Kong

•preprint•Sep 8 2025

Intraoperative 2D/3D registration aligns preoperative 3D volumes with real-time 2D radiographs, enabling accurate localization of instruments and implants. A recent fully differentiable similarity learning framework approximates geodesic distances on SE(3), expanding the capture range of registration and mitigating the effects of substantial disturbances, but existing Euclidean approximations distort manifold structure and slow convergence. To address these limitations, we explore similarity learning in non-Euclidean spherical feature spaces to better capture and fit complex manifold structure. We extract feature embeddings using a CNN-Transformer encoder, project them into spherical space, and approximate their geodesic distances with Riemannian distances in the bi-invariant SO(4) space. This enables a more expressive and geometrically consistent deep similarity metric, enhancing the ability to distinguish subtle pose differences. During inference, we replace gradient descent with fully differentiable Levenberg-Marquardt optimization to accelerate convergence. Experiments on real and synthetic datasets show superior accuracy in both patient-specific and patient-agnostic scenarios.

Mixed Modality Registration Methodology In Silico Academic Lab

New imaging techniques and trends in radiology.

Kantarcı M, Aydın S, Oğul H, Kızılgöz V

•papers•Sep 8 2025

Radiography is a field of medicine inherently intertwined with technology. The dependency on technology is very high for obtaining images in ultrasound (US), computed tomography (CT), and magnetic resonance imaging (MRI). Although the reduction in radiation dose is not applicable in US and MRI, advancements in technology have made it possible in CT, with ongoing studies aimed at further optimization. The resolution and diagnostic quality of images obtained through advancements in each modality are steadily improving. Additionally, technological progress has significantly shortened acquisition times for CT and MRI. The use of artificial intelligence (AI), which is becoming increasingly widespread worldwide, has also been incorporated into radiography. This technology can produce more accurate and reproducible results in US examinations. Machine learning offers great potential for improving image quality, creating more distinct and useful images, and even developing new US imaging modalities. Furthermore, AI technologies are increasingly prevalent in CT and MRI for image evaluation, image generation, and enhanced image quality.

Mixed Modality Image Synthesis Whole Body Review Concept GenAI

Deep learning for named entity recognition in Turkish radiology reports.

Abdullahi AA, Ganiz MC, Koç U, Gökhan MB, Aydın C, Özdemir AB

•papers•Sep 8 2025

The primary objective of this research is to enhance the accuracy and efficiency of information extraction from radiology reports. In addressing this objective, the study aims to develop and evaluate a deep learning framework for named entity recognition (NER). We used a synthetic dataset of 1,056 Turkish radiology reports created and labeled by the radiologists in our research team. Due to privacy concerns, actual patient data could not be used; however, the synthetic reports closely mimic genuine reports in structure and content. We employed the four-stage DYGIE++ model for the experiments. First, we performed token encoding using four bidirectional encoder representations from transformers (BERT) models: BERTurk, BioBERTurk, PubMedBERT, and XLM-RoBERTa. Second, we introduced adaptive span enumeration, considering the word count of a sentence in Turkish. Third, we adopted span graph propagation to generate a multidirectional graph crucial for coreference resolution. Finally, we used a two-layered feed-forward neural network to classify the named entity. The experiments conducted on the labeled dataset showcase the approach's effectiveness. The study achieved an F1 score of 80.1 for the NER task, with the BioBERTurk model, which is pre-trained on Turkish Wikipedia, radiology reports, and biomedical texts, proving to be the most effective of the four BERT models used in the experiment. We show how different dataset labels affect the model's performance. The results demonstrate the model's ability to handle the intricacies of Turkish radiology reports, providing a detailed analysis of precision, recall, and F1 scores for each label. Additionally, this study compares its findings with related research in other languages. Our approach provides clinicians with more precise and comprehensive insights to improve patient care by extracting relevant information from radiology reports. This innovation in information extraction streamlines the diagnostic process and helps expedite patient treatment decisions.

Mixed Modality LLM Radiology Report Methodology In Silico Academic Lab Open Dataset

Prognostic Utility of a Deep Learning Radiomics Nomogram Integrating Ultrasound and Multi-Sequence MRI in Triple-Negative Breast Cancer Treated with Neoadjuvant Chemotherapy.

Cheng C, Peng X, Sang K, Zhao H, Wu D, Li H, Wang Y, Wang W, Xu F, Zhao J

•papers•Sep 8 2025

The aim of this study is to evaluate the prognostic performance of a nomogram integrating clinical parameters with deep learning radiomics (DLRN) features derived from ultrasound and multi-sequence magnetic resonance imaging (MRI) for predicting survival, recurrence, and metastasis in patients diagnosed with triple-negative breast cancer (TNBC) undergoing neoadjuvant chemotherapy (NAC). This retrospective, multicenter study included 103 patients with histopathologically confirmed TNBC across four institutions. The training group comprised 72 cases from the First People's Hospital of Lianyungang, while the validation group included 31 cases from three external centers. Clinical and follow-up data were collected to assess prognostic outcomes. Radiomics features were extracted from two-dimensional ultrasound and three-dimensional MRI images following image segmentation. A DLRN model was developed, and its prognostic performance was evaluated using the concordance index (C-index) in comparison with alternative modeling approaches. Risk stratification for postoperative recurrence was subsequently performed, and recurrence and metastasis rates were compared between low- and high-risk groups. The DLRN model demonstrated strong predictive capability for DFS (C-index: 0.859-0.887) and moderate performance for overall survival (OS) (C-index: 0.800-0.811). For DFS prediction, the DLRN model outperformed other models, whereas its performance in predicting OS was slightly lower than that of the combined MRI + US radiomics model. The 3-year recurrence and metastasis rates were significantly lower in the low-risk group than in the high-risk group (21.43-35.71% vs 77.27-82.35%). The preoperative DLRN model, integrating ultrasound and multi-sequence MRI, shows promise as a prognostic tool for recurrence, metastasis, and survival outcomes in patients with TNBC undergoing NAC. The derived risk score may facilitate individualized prognostic evaluation and aid in preoperative risk stratification within clinical settings.

Mixed Modality Classification Breast Retrospective Clinical In Silico Academic Lab

Barlow-Swin: Toward a novel siamese-based segmentation architecture using Swin-Transformers

Morteza Kiani Haftlang, Mohammadhossein Malmir, Foroutan Parand, Umberto Michelucci, Safouane El Ghazouali

•preprint•Sep 8 2025

Medical image segmentation is a critical task in clinical workflows, particularly for the detection and delineation of pathological regions. While convolutional architectures like U-Net have become standard for such tasks, their limited receptive field restricts global context modeling. Recent efforts integrating transformers have addressed this, but often result in deep, computationally expensive models unsuitable for real-time use. In this work, we present a novel end-to-end lightweight architecture designed specifically for real-time binary medical image segmentation. Our model combines a Swin Transformer-like encoder with a U-Net-like decoder, connected via skip pathways to preserve spatial detail while capturing contextual information. Unlike existing designs such as Swin Transformer or U-Net, our architecture is significantly shallower and competitively efficient. To improve the encoder's ability to learn meaningful features without relying on large amounts of labeled data, we first train it using Barlow Twins, a self-supervised learning method that helps the model focus on important patterns by reducing unnecessary repetition in the learned features. After this pretraining, we fine-tune the entire model for our specific task. Experiments on benchmark binary segmentation tasks demonstrate that our model achieves competitive accuracy with substantially reduced parameter count and faster inference, positioning it as a practical alternative for deployment in real-time and resource-limited clinical environments. The code for our method is available at Github repository: https://github.com/mkianih/Barlow-Swin.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code

DG-TTA: Out-of-Domain Medical Image Segmentation Through Augmentation, Descriptor-Driven Domain Generalization, and Test-Time Adaptation.

Weihsbach C, Kruse CN, Bigalke A, Heinrich MP

•papers•Sep 8 2025

Applying pre-trained medical deep learning segmentation models to out-of-domain images often yields predictions of insufficient quality. In this study, we propose using a robust generalizing descriptor, along with augmentation, to enable domain-generalized pre-training and test-time adaptation, thereby achieving high-quality segmentation in unseen domains. In this study, five different publicly available datasets, including 3D CT and MRI images, are used to evaluate segmentation performance in out-of-domain scenarios. The settings include abdominal, spine, and cardiac imaging. Domain-generalized pre-training on source data is used to obtain the best initial performance in the target domain. We introduce a combination of the generalizing SSC descriptor and GIN intensity augmentation for optimal generalization. Segmentation results are subsequently optimized at test time, where we propose adapting the pre-trained models for every unseen scan using a consistency scheme with the augmentation-descriptor combination. The proposed generalized pre-training and subsequent test-time adaptation improve model performance significantly in CT to MRI cross-domain prediction for abdominal (+46.2 and +28.2 Dice), spine (+72.9), and cardiac (+14.2 and +55.7 Dice) scenarios (p < 0.001). Our method enables the optimal, independent use of source and target data, successfully bridging domain gaps with a compact and efficient methodology.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

MedSeqFT: Sequential Fine-tuning Foundation Models for 3D Medical Image Segmentation

Yiwen Ye, Yicheng Wu, Xiangde Luo, He Zhang, Ziyang Chen, Ting Dang, Yanning Zhang, Yong Xia

•preprint•Sep 7 2025

Foundation models have become a promising paradigm for advancing medical image analysis, particularly for segmentation tasks where downstream applications often emerge sequentially. Existing fine-tuning strategies, however, remain limited: parallel fine-tuning isolates tasks and fails to exploit shared knowledge, while multi-task fine-tuning requires simultaneous access to all datasets and struggles with incremental task integration. To address these challenges, we propose MedSeqFT, a sequential fine-tuning framework that progressively adapts pre-trained models to new tasks while refining their representational capacity. MedSeqFT introduces two core components: (1) Maximum Data Similarity (MDS) selection, which identifies downstream samples most representative of the original pre-training distribution to preserve general knowledge, and (2) Knowledge and Generalization Retention Fine-Tuning (K&G RFT), a LoRA-based knowledge distillation scheme that balances task-specific adaptation with the retention of pre-trained knowledge. Extensive experiments on two multi-task datasets covering ten 3D segmentation tasks demonstrate that MedSeqFT consistently outperforms state-of-the-art fine-tuning strategies, yielding substantial performance gains (e.g., an average Dice improvement of 3.0%). Furthermore, evaluations on two unseen tasks (COVID-19-20 and Kidney) verify that MedSeqFT enhances transferability, particularly for tumor segmentation. Visual analyses of loss landscapes and parameter variations further highlight the robustness of MedSeqFT. These results establish sequential fine-tuning as an effective, knowledge-retentive paradigm for adapting foundation models to evolving clinical tasks. Code will be released.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA Open Code

PM2: A new prompting multi-modal model paradigm for few-shot medical image classification.

Wang Z, Sun Q, Zhang B, Wang P, Zhang J, Zhang Q

•papers•Sep 6 2025

Few-shot learning has emerged as a key technological solution to address challenges such as limited data and the difficulty of acquiring annotations in medical image classification. However, relying solely on a single image modality is insufficient to capture conceptual categories. Therefore, medical image classification requires a comprehensive approach to capture conceptual category information that aids in the interpretation of image content. This study proposes a novel medical image classification paradigm based on a multi-modal foundation model, called PM2. In addition to the image modality, PM2 introduces supplementary text input (prompt) to further describe images or conceptual categories and facilitate cross-modal few-shot learning. We empirically studied five different prompting schemes under this new paradigm. Furthermore, linear probing in multi-modal models only takes class token as input, ignoring the rich statistical data contained in high-level visual tokens. Therefore, we alternately perform linear classification on the feature distributions of visual tokens and class token. To effectively extract statistical information, we use global covariance pool with efficient matrix power normalization to aggregate the visual tokens. We then combine two classification heads: one for handling image class token and prompt representations encoded by the text encoder, and the other for classifying the feature distributions of visual tokens. Experimental results on three datasets: breast cancer, brain tumor, and diabetic retinopathy demonstrate that PM2 effectively improves the performance of medical image classification. Compared to existing multi-modal models, PM2 achieves state-of-the-art performance. Integrating text prompts as supplementary samples effectively enhances the model's performance. Additionally, by leveraging second-order features of visual tokens to enrich the category feature space and combining them with class token, the model's representational capacity is significantly strengthened.

Mixed Modality Classification Methodology In Silico Benchmark SOTA

Multi-task learning for classification and prediction of adolescent idiopathic scoliosis based on fringe-projection three-dimensional imaging.

Feng CK, Chen YJ, Dinh QT, Tran KT, Liu CY

•papers•Sep 6 2025

This study aims to address the limitations of radiographic imaging and single-task learning models in adolescent idiopathic scoliosis assessment by developing a noninvasive, radiation-free diagnostic framework. A multi-task deep learning model was trained using structured back surface data acquired via fringe projection three-dimensional imaging. The model was designed to simultaneously predict the Cobb angle, curve type (thoracic, lumbar, mixed, none), and curve direction (left, right, none) by learning shared morphological features. The multi-task model achieved a mean absolute error (MAE) of 2.9° and a root mean square error (RMSE) of 6.9° for Cobb angle prediction, outperforming the single-task baseline (5.4° MAE, 12.5° RMSE). It showed strong correlation with radiographic measurements (R = 0.96, R² = 0.91). For curve classification, it reached 89% sensitivity in lumbar and mixed types, and 80% and 75% sensitivity for right and left directions, respectively, with an 87% positive predictive value for right-sided curves. The proposed multi-task learning model demonstrates that jointly learning related clinical tasks allows for the extraction of more robust and clinically meaningful geometric features from surface data. It outperforms traditional single-task approaches in both accuracy and stability. This framework provides a safe, efficient, and non-invasive alternative to X-ray-based scoliosis assessment and has the potential to support real-time screening and long-term monitoring of adolescent idiopathic scoliosis in clinical practice.

Mixed Modality Classification Musculoskeletal Methodology In Silico Academic Lab

Interpretable Semi-federated Learning for Multimodal Cardiac Imaging and Risk Stratification: A Privacy-Preserving Framework.

Liu X, Li S, Zhu Q, Xu S, Jin Q

•papers•Sep 5 2025

The growing heterogeneity of cardiac patient data from hospitals and wearables necessitates predictive models that are tailored, comprehensible, and safeguard privacy. This study introduces PerFed-Cardio, a lightweight and interpretable semi-federated learning (Semi-FL) system for real-time cardiovascular risk stratification utilizing multimodal data, including cardiac imaging, physiological signals, and electronic health records (EHR). In contrast to conventional federated learning, where all clients engage uniformly, our methodology employs a personalized Semi-FL approach that enables high-capacity nodes (e.g., hospitals) to conduct comprehensive training, while edge devices (e.g., wearables) refine shared models via modality-specific subnetworks. Cardiac MRI and echocardiography pictures are analyzed via lightweight convolutional neural networks enhanced with local attention modules to highlight diagnostically significant areas. Physiological characteristics (e.g., ECG, activity) and EHR data are amalgamated through attention-based fusion layers. Model transparency is attained using Local Interpretable Model-agnostic Explanations (LIME) and Grad-CAM, which offer spatial and feature-level elucidations for each prediction. Assessments on authentic multimodal datasets from 123 patients across five simulated institutions indicate that PerFed-Cardio attains an AUC-ROC of 0.972 with an inference latency of 130 ms. The customized model calibration and targeted training diminish communication load by 28%, while maintaining an F1-score over 92% in noisy situations. These findings underscore PerFed-Cardio as a privacy-conscious, adaptive, and interpretable system for scalable cardiac risk assessment.

Mixed Modality Classification Cardiac Methodology In Silico Ethics GenAI

Filter Papers

Tags

Intraoperative 2D/3D Registration via Spherical Similarity Learning and Inference-Time Differentiable Levenberg-Marquardt Optimization

New imaging techniques and trends in radiology.

Deep learning for named entity recognition in Turkish radiology reports.

Prognostic Utility of a Deep Learning Radiomics Nomogram Integrating Ultrasound and Multi-Sequence MRI in Triple-Negative Breast Cancer Treated with Neoadjuvant Chemotherapy.

Barlow-Swin: Toward a novel siamese-based segmentation architecture using Swin-Transformers

DG-TTA: Out-of-Domain Medical Image Segmentation Through Augmentation, Descriptor-Driven Domain Generalization, and Test-Time Adaptation.

MedSeqFT: Sequential Fine-tuning Foundation Models for 3D Medical Image Segmentation

PM<sup>2</sup>: A new prompting multi-modal model paradigm for few-shot medical image classification.

Multi-task learning for classification and prediction of adolescent idiopathic scoliosis based on fringe-projection three-dimensional imaging.

Interpretable Semi-federated Learning for Multimodal Cardiac Imaging and Risk Stratification: A Privacy-Preserving Framework.

Ready to Sharpen Your Edge?