Latest Papers on Radiology AI. Tags: None

Brain Stroke Detection and Classification Using CT Imaging with Transformer Models and Explainable AI

Shomukh Qari, Maha A. Thafar

•preprint•Jul 13 2025

Stroke is one of the leading causes of death globally, making early and accurate diagnosis essential for improving patient outcomes, particularly in emergency settings where timely intervention is critical. CT scans are the key imaging modality because of their speed, accessibility, and cost-effectiveness. This study proposed an artificial intelligence framework for multiclass stroke classification (ischemic, hemorrhagic, and no stroke) using CT scan images from a dataset provided by the Republic of Turkey's Ministry of Health. The proposed method adopted MaxViT, a state-of-the-art Vision Transformer, as the primary deep learning model for image-based stroke classification, with additional transformer variants (vision transformer, transformer-in-transformer, and ConvNext). To enhance model generalization and address class imbalance, we applied data augmentation techniques, including synthetic image generation. The MaxViT model trained with augmentation achieved the best performance, reaching an accuracy and F1-score of 98.00%, outperforming all other evaluated models and the baseline methods. The primary goal of this study was to distinguish between stroke types with high accuracy while addressing crucial issues of transparency and trust in artificial intelligence models. To achieve this, Explainable Artificial Intelligence (XAI) was integrated into the framework, particularly Grad-CAM++. It provides visual explanations of the model's decisions by highlighting relevant stroke regions in the CT scans and establishing an accurate, interpretable, and clinically applicable solution for early stroke detection. This research contributed to the development of a trustworthy AI-assisted diagnostic tool for stroke, facilitating its integration into clinical practice and enhancing access to timely and optimal stroke diagnosis in emergency departments, thereby saving more lives.

CT Classification Neurological Methodology In Silico Academic Lab GenAI

Early breast cancer detection via infrared thermography using a CNN enhanced with particle swarm optimization.

Alzahrani RM, Sikkandar MY, Begum SS, Babetat AFS, Alhashim M, Alduraywish A, Prakash NB, Ng EYK

•papers•Jul 13 2025

Breast cancer remains the most prevalent cause of cancer-related mortality among women worldwide, with an estimated incidence exceeding 500,000 new cases annually. Timely diagnosis is vital for enhancing therapeutic outcomes and increasing survival probabilities. Although conventional diagnostic tools such as mammography are widely used and generally effective, they are often invasive, costly, and exhibit reduced efficacy in patients with dense breast tissue. Infrared thermography, by contrast, offers a non-invasive and economical alternative; however, its clinical adoption has been limited, largely due to difficulties in accurate thermal image interpretation and the suboptimal tuning of machine learning algorithms. To overcome these limitations, this study proposes an automated classification framework that employs convolutional neural networks (CNNs) for distinguishing between malignant and benign thermographic breast images. An Enhanced Particle Swarm Optimization (EPSO) algorithm is integrated to automatically fine-tune CNN hyperparameters, thereby minimizing manual effort and enhancing computational efficiency. The methodology also incorporates advanced image preprocessing techniques-including Mamdani fuzzy logic-based edge detection, Contrast-Limited Adaptive Histogram Equalization (CLAHE) for contrast enhancement, and median filtering for noise suppression-to bolster classification performance. The proposed model achieves a superior classification accuracy of 98.8%, significantly outperforming conventional CNN implementations in terms of both computational speed and predictive accuracy. These findings suggest that the developed system holds substantial potential for early, reliable, and cost-effective breast cancer screening in real-world clinical environments.

OCT Classification Breast Methodology In Silico Academic Lab Benchmark SOTA

Prompt4Trust: A Reinforcement Learning Prompt Augmentation Framework for Clinically-Aligned Confidence Calibration in Multimodal Large Language Models

Anita Kriz, Elizabeth Laura Janes, Xing Shen, Tal Arbel

•preprint•Jul 12 2025

Multimodal large language models (MLLMs) hold considerable promise for applications in healthcare. However, their deployment in safety-critical settings is hindered by two key limitations: (i) sensitivity to prompt design, and (ii) a tendency to generate incorrect responses with high confidence. As clinicians may rely on a model's stated confidence to gauge the reliability of its predictions, it is especially important that when a model expresses high confidence, it is also highly accurate. We introduce Prompt4Trust, the first reinforcement learning (RL) framework for prompt augmentation targeting confidence calibration in MLLMs. A lightweight LLM is trained to produce context-aware auxiliary prompts that guide a downstream task MLLM to generate responses in which the expressed confidence more accurately reflects predictive accuracy. Unlike conventional calibration techniques, Prompt4Trust specifically prioritizes aspects of calibration most critical for safe and trustworthy clinical decision-making. Beyond improvements driven by this clinically motivated calibration objective, our proposed method also improves task accuracy, achieving state-of-the-art medical visual question answering (VQA) performance on the PMC-VQA benchmark, which is composed of multiple-choice questions spanning diverse medical imaging modalities. Moreover, our framework trained with a small downstream task MLLM showed promising zero-shot generalization to larger MLLMs in our experiments, suggesting the potential for scalable calibration without the associated computational costs. This work demonstrates the potential of automated yet human-aligned prompt engineering for improving the the trustworthiness of MLLMs in safety critical settings. Our codebase can be found at https://github.com/xingbpshen/prompt4trust.

Mixed Modality LLM Radiology Report Methodology In Silico Academic Lab Benchmark SOTA Open Code

Efficient needle guidance: multi-camera augmented reality navigation without patient-specific calibration.

Wei Y, Huang B, Zhao B, Lin Z, Zhou SZ

•papers•Jul 12 2025

Augmented reality (AR) technology holds significant promise for enhancing surgical navigation in needle-based procedures such as biopsies and ablations. However, most existing AR systems rely on patient-specific markers, which disrupt clinical workflows and require time-consuming preoperative calibrations, thereby hindering operational efficiency and precision. We developed a novel multi-camera AR navigation system that eliminates the need for patient-specific markers by utilizing ceiling-mounted markers mapped to fixed medical imaging devices. A hierarchical optimization framework integrates both marker mapping and multi-camera calibration. Deep learning techniques are employed to enhance marker detection and registration accuracy. Additionally, a vision-based pose compensation method is implemented to mitigate errors caused by patient movement, improving overall positional accuracy. Validation through phantom experiments and simulated clinical scenarios demonstrated an average puncture accuracy of 3.72 ± 1.21 mm. The system reduced needle placement time by 20 s compared to traditional marker-based methods. It also effectively corrected errors induced by patient movement, with a mean positional error of 0.38 pixels and an angular deviation of 0.51 <math xmlns="http://www.w3.org/1998/Math/MathML"><mmultiscripts><mrow></mrow> <mrow></mrow> <mo>∘</mo></mmultiscripts> </math> . These results highlight the system's precision, adaptability, and reliability in realistic surgical conditions. This marker-free AR guidance system significantly streamlines surgical workflows while enhancing needle navigation accuracy. Its simplicity, cost-effectiveness, and adaptability make it an ideal solution for both high- and low-resource clinical environments, offering the potential for improved precision, reduced procedural time, and better patient outcomes.

Mixed Modality Registration Methodology Phantom/Animal Academic Lab Benchmark SOTA

PanoDiff-SR: Synthesizing Dental Panoramic Radiographs using Diffusion and Super-resolution

Sanyam Jain, Bruna Neves de Freitas, Andreas Basse-OConnor, Alexandros Iosifidis, Ruben Pauwels

•preprint•Jul 12 2025

There has been increasing interest in the generation of high-quality, realistic synthetic medical images in recent years. Such synthetic datasets can mitigate the scarcity of public datasets for artificial intelligence research, and can also be used for educational purposes. In this paper, we propose a combination of diffusion-based generation (PanoDiff) and Super-Resolution (SR) for generating synthetic dental panoramic radiographs (PRs). The former generates a low-resolution (LR) seed of a PR (256 X 128) which is then processed by the SR model to yield a high-resolution (HR) PR of size 1024 X 512. For SR, we propose a state-of-the-art transformer that learns local-global relationships, resulting in sharper edges and textures. Experimental results demonstrate a Frechet inception distance score of 40.69 between 7243 real and synthetic images (in HR). Inception scores were 2.55, 2.30, 2.90 and 2.98 for real HR, synthetic HR, real LR and synthetic LR images, respectively. Among a diverse group of six clinical experts, all evaluating a mixture of 100 synthetic and 100 real PRs in a time-limited observation, the average accuracy in distinguishing real from synthetic images was 68.5% (with 50% corresponding to random guessing).

X-Ray Image Synthesis Methodology In Silico Open Dataset

Characterizing aging-related genetic and physiological determinants of spinal curvature.

Wang FM, Ruby JG, Sethi A, Veras MA, Telis N, Melamud E

•papers•Jul 12 2025

Increased spinal curvature is one of the most recognizable aging traits in the human population. However, despite high prevalence, the etiology of this condition remains poorly understood. To gain better insight into the physiological, biochemical, and genetic risk factors involved, we developed a novel machine learning method to automatically derive thoracic kyphosis and lumbar lordosis angles from dual-energy X-ray absorptiometry (DXA) scans in the UK Biobank Imaging cohort. We carry out genome-wide association and epidemiological association studies to identify genetic and physiological risk factors for both traits. In 41,212 participants, we find that on average males and females gain 2.42° in kyphotic and 1.48° in lordotic angle per decade of life. Increased spinal curvature shows a strong association with decreased muscle mass and bone mineral density. Adiposity demonstrates opposing associations, with decreased kyphosis and increased lordosis. Using Mendelian randomization, we show that genes fundamental to the maintenance of musculoskeletal function (COL11A1, PTHLH, ETFA, TWIST1) and cellular homeostasis such as RNA transcription and DNA repair (RAD9A, MMS22L, HIF1A, RAB28) are likely involved in increased spinal curvature. Our findings reveal a complex interplay between genetics, musculoskeletal health, and age-related changes in spinal curvature, suggesting potential drivers of this universal aging trait.

X-Ray Segmentation Musculoskeletal Retrospective Clinical In Silico Academic Lab

Accurate and real-time brain tumour detection and classification using optimized YOLOv5 architecture.

Saranya M, Praveena R

•papers•Jul 12 2025

The brain tumours originate in the brain or its surrounding structures, such as the pituitary and pineal glands, and can be benign or malignant. While benign tumours may grow into neighbouring tissues, metastatic tumours occur when cancer from other organs spreads to the brain. This is because identification and staging of such tumours are critical because basically all aspects involving a patient's disease entail accurate diagnosis as well as the staging of the tumour. Image segmentation is incredibly valuable to medical imaging since it can make possible to simulate surgical operations, diseases diagnosis, anatomical and pathologic analysis. This study performs the prediction and classification of brain tumours present in MRI, a combined classification and localization framework model is proposed connecting Fully Convolutional Neural Network (FCNN) and You Only Look Once version 5 (YOLOv5). The FCNN model is designed to classify images into four categories: benign - glial, adenomas and pituitary related, and meningeal. It utilizes a derivative of Root Mean Square Propagation (RMSProp)optimization to boost the classification rate, based upon which the performance was evaluated with the standard measures that are precision, recall, F1 coefficient, specificity and accuracy. Subsequently, the YOLOv5 architectural design for more accurate detection of tumours is incorporated, with the subsequent use of FCNN for creation of the segmentation's masks of the tumours. Thus, the analysis proves that the suggested approach has more accuracy than the existing system with 98.80% average accuracy in the identification and categorization of brain tumour. This integration of detection and segmentation models presents one of the most effective techniques for enhancing the diagnostic performance of the system to add value within the medical imaging field. On the basis of these findings, it becomes possible to conclude that the advancements in the deep learning structures could apparently improve the tumour diagnosis while contributing to the finetuning of the clinical management.

MRI Detection Neurological Methodology In Silico

Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift

Behraj Khan, Tahir Syed

•preprint•Jul 12 2025

Foundation models like CLIP and SAM have transformed computer vision and medical imaging via low-shot transfer learning. However, deployment of these models hindered by two key challenges: \textit{distribution shift} between training and test data, and \textit{confidence misalignment} that leads to overconfident incorrect predictions. These issues manifest differently in vision-language classification and medical segmentation tasks, yet existing solutions remain domain-specific. We propose \textit{StaRFM}, a unified framework addressing both challenges. It introduces a Fisher information penalty (FIP), extended to 3D medical data via patch-wise regularization, to reduce covariate shift in CLIP and SAM embeddings. Additionally, a confidence misalignment penalty (CMP), reformulated for voxel-level predictions, calibrates uncertainty in segmentation tasks. We theoretically derive PAC-Bayes bounds showing FIP controls generalization via the Fisher-Rao norm, while CMP minimizes calibration error through Brier score optimization. StaRFM shows consistent performance like \texttt{+}3.5\% accuracy and 28\% lower ECE on 19 vision datasets (e.g., ImageNet, Office-Home), 84.7\% DSC and 4.8mm HD95 in medical segmentation (e.g., BraTS, ATLAS), and 40\% lower cross-domain performance gap compared to prior benchmarking methods. The framework is plug-and-play, requiring minimal architectural changes for seamless integration with foundation models. Code and models will be released at https://anonymous.4open.science/r/StaRFM-C0CD/README.md

MRI Segmentation Neurological Methodology In Silico Academic Lab Benchmark SOTA Open Code

Establishing an AI-based diagnostic framework for pulmonary nodules in computed tomography.

Jia R, Liu B, Ali M

•papers•Jul 12 2025

Pulmonary nodules seen by computed tomography (CT) can be benign or malignant, and early detection is important for optimal management. The existing manual methods of identifying nodules have limitations, such as being time-consuming and erroneous. This study aims to develop an Artificial Intelligence (AI) diagnostic scheme that improves the performance of identifying and categorizing pulmonary nodules using CT scans. The proposed deep learning framework used convolutional neural networks, and the image database totaled 1,056 3D-DICOM CT images. The framework was initially preprocessing, including lung segmentation, nodule detection, and classification. Nodule detection was done using the Retina-UNet model, while the features were classified using a Support Vector Machine (SVM). Performance measures, including accreditation, sensitivity, specificity, and the AUROC, were used to evaluate the model's performance during training and validation. Overall, the developed AI model received an AUROC of 0.9058. The diagnostic accuracy was 90.58%, with an overall positive predictive value of 89% and an overall negative predictive value of 86%. The algorithm effectively handled the CT images at the preprocessing stage, and the deep learning model performed well in detecting and classifying nodules. The application of the new diagnostic framework based on AI algorithms increased the accuracy of the diagnosis compared with the traditional approach. It also provides high reliability for detecting pulmonary nodules and classifying the lesions, thus minimizing intra-observer differences and improving the clinical outcome. In perspective, the advancements may include increasing the size of the annotated data-set and fine-tuning the model due to detection issues of non-solitary nodules.

CT Detection Chest Methodology In Silico

Accuracy of large language models in generating differential diagnosis from clinical presentation and imaging findings in pediatric cases.

Jung J, Phillipi M, Tran B, Chen K, Chan N, Ho E, Sun S, Houshyar R

•papers•Jul 12 2025

Large language models (LLM) have shown promise in assisting medical decision-making. However, there is limited literature exploring the diagnostic accuracy of LLMs in generating differential diagnoses from text-based image descriptions and clinical presentations in pediatric radiology. To examine the performance of multiple proprietary LLMs in producing accurate differential diagnoses for text-based pediatric radiological cases without imaging. One hundred sixty-four cases were retrospectively selected from a pediatric radiology textbook and converted into two formats: (1) image description only, and (2) image description with clinical presentation. The ChatGPT-4 V, Claude 3.5 Sonnet, and Gemini 1.5 Pro algorithms were given these inputs and tasked with providing a top 1 diagnosis and a top 3 differential diagnoses. Accuracy of responses was assessed by comparison with the original literature. Top 1 accuracy was defined as whether the top 1 diagnosis matched the textbook, and top 3 differential accuracy was defined as the number of diagnoses in the model-generated top 3 differential that matched any of the top 3 diagnoses in the textbook. McNemar's test, Cochran's Q test, Friedman test, and Wilcoxon signed-rank test were used to compare algorithms and assess the impact of added clinical information, respectively. There was no significant difference in top 1 accuracy between ChatGPT-4 V, Claude 3.5 Sonnet, and Gemini 1.5 Pro when only image descriptions were provided (56.1% [95% CI 48.4-63.5], 64.6% [95% CI 57.1-71.5], 61.6% [95% CI 54.0-68.7]; P = 0.11). Adding clinical presentation to image description significantly improved top 1 accuracy for ChatGPT-4 V (64.0% [95% CI 56.4-71.0], P = 0.02) and Claude 3.5 Sonnet (80.5% [95% CI 73.8-85.8], P < 0.001). For image description and clinical presentation cases, Claude 3.5 Sonnet significantly outperformed both ChatGPT-4 V and Gemini 1.5 Pro (P < 0.001). For top 3 differential accuracy, no significant differences were observed between ChatGPT-4 V, Claude 3.5 Sonnet, and Gemini 1.5 Pro, regardless of whether the cases included only image descriptions (1.29 [95% CI 1.16-1.41], 1.35 [95% CI 1.23-1.48], 1.37 [95% CI 1.25-1.49]; P = 0.60) or both image descriptions and clinical presentations (1.33 [95% CI 1.20-1.45], 1.52 [95% CI 1.41-1.64], 1.48 [95% 1.36-1.59]; P = 0.72). Only Claude 3.5 Sonnet performed significantly better when clinical presentation was added (P < 0.001). Commercial LLMs performed similarly on pediatric radiology cases in providing top 1 accuracy and top 3 differential accuracy when only a text-based image description was used. Adding clinical presentation significantly improved top 1 accuracy for ChatGPT-4 V and Claude 3.5 Sonnet, with Claude showing the largest improvement. Claude 3.5 Sonnet outperformed both ChatGPT-4 V and Gemini 1.5 Pro in top 1 accuracy when both image and clinical data were provided. No significant differences were found in top 3 differential accuracy across models in any condition.

Mixed Modality LLM Radiology Report Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

Brain Stroke Detection and Classification Using CT Imaging with Transformer Models and Explainable AI

Early breast cancer detection via infrared thermography using a CNN enhanced with particle swarm optimization.

Prompt4Trust: A Reinforcement Learning Prompt Augmentation Framework for Clinically-Aligned Confidence Calibration in Multimodal Large Language Models

Efficient needle guidance: multi-camera augmented reality navigation without patient-specific calibration.

PanoDiff-SR: Synthesizing Dental Panoramic Radiographs using Diffusion and Super-resolution

Characterizing aging-related genetic and physiological determinants of spinal curvature.

Accurate and real-time brain tumour detection and classification using optimized YOLOv5 architecture.

Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift

Establishing an AI-based diagnostic framework for pulmonary nodules in computed tomography.

Accuracy of large language models in generating differential diagnosis from clinical presentation and imaging findings in pediatric cases.

Ready to Sharpen Your Edge?