Latest Papers on Radiology AI. Tags: Open Dataset

LUNETR: Language-Infused UNETR for precise pancreatic tumor segmentation in 3D medical image.

Shi Z, Zhang R, Wei X, Yu C, Xie H, Hu Z, Chen X, Zhang Y, Xie B, Luo Z, Peng W, Xie X, Li F, Long X, Li L, Hu L

•papers•Jul 1 2025

The identification of early micro-lesions and adjacent blood vessels in CT scans plays a pivotal role in the clinical diagnosis of pancreatic cancer, considering its aggressive nature and high fatality rate. Despite the widespread application of deep learning methods for this task, several challenges persist: (1) the complex background environment in abdominal CT scans complicates the accurate localization of potential micro-tumors; (2) the subtle contrast between micro-lesions within pancreatic tissue and the surrounding tissues makes it challenging for models to capture these features accurately; and (3) tumors that invade adjacent blood vessels pose significant barriers to surgical procedures. To address these challenges, we propose LUNETR (Language-Infused UNETR), an advanced multimodal encoder model that combines textual and image information for precise medical image segmentation. The integration of an autoencoding language model with cross-attention enabling our model to effectively leverage semantic associations between textual and image data, thereby facilitating precise localization of potential pancreatic micro-tumors. Additionally, we designed a Multi-scale Aggregation Attention (MSAA) module to comprehensively capture both spatial and channel characteristics of global multi-scale image data, enhancing the model's capacity to extract features from micro-lesions embedded within pancreatic tissue. Furthermore, in order to facilitate precise segmentation of pancreatic tumors and nearby blood vessels and address the scarcity of multimodal medical datasets, we collaborated with Zhuzhou Central Hospital to construct a multimodal dataset comprising CT images and corresponding pathology reports from 135 pancreatic cancer patients. Our experimental results surpass current state-of-the-art models, with the incorporation of the semantic encoder improving the average Dice score for pancreatic tumor segmentation by 2.23 %. For the Medical Segmentation Decathlon (MSD) liver and lung cancer datasets, our model achieved an average Dice score improvement of 4.31 % and 3.67 %, respectively, demonstrating the efficacy of the LUNETR.

CT Segmentation Abdominal Methodology In Silico Academic Lab Benchmark SOTA Open Dataset

Liver lesion segmentation in ultrasound: A benchmark and a baseline network.

Li J, Zhu L, Shen G, Zhao B, Hu Y, Zhang H, Wang W, Wang Q

•papers•Jul 1 2025

Accurate liver lesion segmentation in ultrasound is a challenging task due to high speckle noise, ambiguous lesion boundaries, and inhomogeneous intensity distribution inside the lesion regions. This work first collected and annotated a dataset for liver lesion segmentation in ultrasound. In this paper, we propose a novel convolutional neural network to learn dual self-attentive transformer features for boosting liver lesion segmentation by leveraging the complementary information among non-local features encoded at different layers of the transformer architecture. To do so, we devise a dual self-attention refinement (DSR) module to synergistically utilize self-attention and reverse self-attention mechanisms to extract complementary lesion characteristics between cascaded multi-layer feature maps, assisting the model to produce more accurate segmentation results. Moreover, we propose a False-Positive-Negative loss to enable our network to further suppress the non-liver-lesion noise at shallow transformer layers and enhance more target liver lesion details into CNN features at deep transformer layers. Experimental results show that our network outperforms state-of-the-art methods quantitatively and qualitatively.

Ultrasound Segmentation Abdominal Methodology In Silico Academic Lab Open Dataset Benchmark SOTA

A novel deep learning framework for retinal disease detection leveraging contextual and local features cues from retinal images.

Khan SD, Basalamah S, Lbath A

•papers•Jul 1 2025

Retinal diseases are a serious global threat to human vision, and early identification is essential for effective prevention and treatment. However, current diagnostic methods rely on manual analysis of fundus images, which heavily depends on the expertise of ophthalmologists. This manual process is time-consuming and labor-intensive and can sometimes lead to missed diagnoses. With advancements in computer vision technology, several automated models have been proposed to improve diagnostic accuracy for retinal diseases and medical imaging in general. However, these methods face challenges in accurately detecting specific diseases within images due to inherent issues associated with fundus images, including inter-class similarities, intra-class variations, limited local information, insufficient contextual understanding, and class imbalances within datasets. To address these challenges, we propose a novel deep learning framework for accurate retinal disease classification. This framework is designed to achieve high accuracy in identifying various retinal diseases while overcoming inherent challenges associated with fundus images. Generally, the framework consists of three main modules. The first module is Densely Connected Multidilated Convolution Neural Network (DCM-CNN) that extracts global contextual information by effectively integrating novel Casual Dilated Dense Convolutional Blocks (CDDCBs). The second module of the framework, namely, Local-Patch-based Convolution Neural Network (LP-CNN), utilizes Class Activation Map (CAM) (obtained from DCM-CNN) to extract local and fine-grained information. To identify the correct class and minimize the error, a synergic network is utilized that takes the feature maps of both DCM-CNN and LP-CNN and connects both maps in a fully connected fashion to identify the correct class and minimize the errors. The framework is evaluated through a comprehensive set of experiments, both quantitatively and qualitatively, using two publicly available benchmark datasets: RFMiD and ODIR-5K. Our experimental results demonstrate the effectiveness of the proposed framework and achieves higher performance on RFMiD and ODIR-5K datasets compared to reference methods.

OCT Classification Methodology In Silico Academic Lab Benchmark SOTA Open Dataset

Multi-modal and Multi-view Cervical Spondylosis Imaging Dataset.

Yu QS, Shan JY, Ma J, Gao G, Tao BZ, Qiao GY, Zhang JN, Wang T, Zhao YF, Qin XL, Yin YH

•papers•Jul 1 2025

Multi-modal and multi-view imaging is essential for diagnosis and assessment of cervical spondylosis. Deep learning has increasingly been developed to assist in diagnosis and assessment, which can help improve clinical management and provide new ideas for clinical research. To support the development and testing of deep learning models for cervical spondylosis, we have publicly shared a multi-modal and multi-view imaging dataset of cervical spondylosis, named MMCSD. This dataset comprises MRI and CT images from 250 patients. It includes axial bone and soft tissue window CT scans, sagittal T1-weighted and T2-weighted MRI, as well as axial T2-weighted MRI. Neck pain is one of the most common symptoms of cervical spondylosis. We use the MMCSD to develop a deep learning model for predicting postoperative neck pain in patients with cervical spondylosis, thereby validating its usability. We hope that the MMCSD will contribute to the advancement of neural network models for cervical spondylosis and neck pain, further optimizing clinical diagnostic assessments and treatment decision-making for these conditions.

Mixed Modality Classification Musculoskeletal Dataset Release In Silico Academic Lab Open Dataset

Lessons learned from RadiologyNET foundation models for transfer learning in medical radiology.

Napravnik M, Hržić F, Urschler M, Miletić D, Štajduhar I

•papers•Jul 1 2025

Deep learning models require large amounts of annotated data, which are hard to obtain in the medical field, as the annotation process is laborious and depends on expert knowledge. This data scarcity hinders a model's ability to generalise effectively on unseen data, and recently, foundation models pretrained on large datasets have been proposed as a promising solution. RadiologyNET is a custom medical dataset that comprises 1,902,414 medical images covering various body parts and modalities of image acquisition. We used the RadiologyNET dataset to pretrain several popular architectures (ResNet18, ResNet34, ResNet50, VGG16, EfficientNetB3, EfficientNetB4, InceptionV3, DenseNet121, MobileNetV3Small and MobileNetV3Large). We compared the performance of ImageNet and RadiologyNET foundation models against training from randomly initialiased weights on several publicly available medical datasets: (i) Segmentation-LUng Nodule Analysis Challenge, (ii) Regression-RSNA Pediatric Bone Age Challenge, (iii) Binary classification-GRAZPEDWRI-DX and COVID-19 datasets, and (iv) Multiclass classification-Brain Tumor MRI dataset. Our results indicate that RadiologyNET-pretrained models generally perform similarly to ImageNet models, with some advantages in resource-limited settings. However, ImageNet-pretrained models showed competitive performance when fine-tuned on sufficient data. The impact of modality diversity on model performance was tested, with the results varying across tasks, highlighting the importance of aligning pretraining data with downstream applications. Based on our findings, we provide guidelines for using foundation models in medical applications and publicly release our RadiologyNET-pretrained models to support further research and development in the field. The models are available at https://github.com/AIlab-RITEH/RadiologyNET-TL-models .

Mixed Modality Classification Whole Body Methodology In Silico Academic Lab Open Code Open Dataset

Deep Learning-Based Semantic Segmentation for Real-Time Kidney Imaging and Measurements with Augmented Reality-Assisted Ultrasound

Gijs Luijten, Roberto Maria Scardigno, Lisle Faray de Paiva, Peter Hoyer, Jens Kleesiek, Domenico Buongiorno, Vitoantonio Bevilacqua, Jan Egger

•preprint•Jun 30 2025

Ultrasound (US) is widely accessible and radiation-free but has a steep learning curve due to its dynamic nature and non-standard imaging planes. Additionally, the constant need to shift focus between the US screen and the patient poses a challenge. To address these issues, we integrate deep learning (DL)-based semantic segmentation for real-time (RT) automated kidney volumetric measurements, which are essential for clinical assessment but are traditionally time-consuming and prone to fatigue. This automation allows clinicians to concentrate on image interpretation rather than manual measurements. Complementing DL, augmented reality (AR) enhances the usability of US by projecting the display directly into the clinician's field of view, improving ergonomics and reducing the cognitive load associated with screen-to-patient transitions. Two AR-DL-assisted US pipelines on HoloLens-2 are proposed: one streams directly via the application programming interface for a wireless setup, while the other supports any US device with video output for broader accessibility. We evaluate RT feasibility and accuracy using the Open Kidney Dataset and open-source segmentation models (nnU-Net, Segmenter, YOLO with MedSAM and LiteMedSAM). Our open-source GitHub pipeline includes model implementations, measurement algorithms, and a Wi-Fi-based streaming solution, enhancing US training and diagnostics, especially in point-of-care settings.

Ultrasound Segmentation Abdominal Methodology Prototype Academic Lab Open Code Open Dataset

$μ^2$Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation

Siyou Li, Pengyao Qin, Huanan Wu, Dong Nie, Arun J. Thirunavukarasu, Juntao Yu, Le Zhang

•preprint•Jun 30 2025

Automated radiology report generation (RRG) aims to produce detailed textual reports from clinical imaging, such as computed tomography (CT) scans, to improve the accuracy and efficiency of diagnosis and provision of management advice. RRG is complicated by two key challenges: (1) inherent complexity in extracting relevant information from imaging data under resource constraints, and (2) difficulty in objectively evaluating discrepancies between model-generated and expert-written reports. To address these challenges, we propose $\mu^2$LLM, a $\underline{\textbf{mu}}$ltiscale $\underline{\textbf{mu}}$ltimodal large language models for RRG tasks. The novel ${\mu}^2$Tokenizer, as an intermediate layer, integrates multi-modal features from the multiscale visual tokenizer and the text tokenizer, then enhances report generation quality through direct preference optimization (DPO), guided by GREEN-RedLlama. Experimental results on four large CT image-report medical datasets demonstrate that our method outperforms existing approaches, highlighting the potential of our fine-tuned $\mu^2$LLMs on limited data for RRG tasks. At the same time, for prompt engineering, we introduce a five-stage, LLM-driven pipeline that converts routine CT reports into paired visual-question-answer triples and citation-linked reasoning narratives, creating a scalable, high-quality supervisory corpus for explainable multimodal radiology LLM. All code, datasets, and models will be publicly available in our official repository. https://github.com/Siyou-Li/u2Tokenizer

CT Report Generation Whole Body Methodology In Silico Academic Lab Open Code Open Dataset GenAI

ToolCAP: Novel Tools to improve management of paediatric Community-Acquired Pneumonia - a randomized controlled trial- Statistical Analysis Plan

Cicconi, S., Glass, T., Du Toit, J., Bresser, M., Dhalla, F., Faye, P. M., Lal, L., Langet, H., Manji, K., Moser, A., Ndao, M. A., Palmer, M., Tine, J. A. D., Van Hoving, N., Keitel, K.

•preprint•Jun 30 2025

The ToolCAP cohort study is a prospective, observational, multi-site platform study designed to collect harmonized, high-quality clinical, imaging, and biological data on children with IMCI-defined pneumonia in low- and middle-income countries (LMICs). The primary objective is to inform the development and validation of diagnostic and prognostic tools, including lung ultrasound (LUS), point-of-care biomarkers, and AI-based models, to improve pneumonia diagnosis, management, and antimicrobial stewardship. This statistical analysis plan (SAP) outlines the analytic strategy for describing the study population, assessing the performance of candidate diagnostic tools, and enabling data sharing in support of secondary research questions and AI model development. Children under 12 years presenting with suspected pneumonia are enrolled within 24 hours of presentation and undergo clinical assessment, digital auscultation, LUS, and optional biological sampling. Follow-up occurs on Day 8 and Day 29 to assess outcomes including recovery, treatment response, and complications. The SAP details variable definitions, data management strategies, and pre-specified analyses, including descriptive summaries, sensitivity and specificity of diagnostic tools against clinical reference standards, and exploratory subgroup analyses.

Ultrasound Classification Chest Prospective Concept Academic Lab Open Dataset

In-silico CT simulations of deep learning generated heterogeneous phantoms.

Salinas CS, Magudia K, Sangal A, Ren L, Segars PW

•papers•Jun 30 2025

Current virtual imaging phantoms primarily emphasize geometricaccuracy of anatomical structures. However, to enhance realism, it is also importantto incorporate intra-organ detail. Because biological tissues are heterogeneous incomposition, virtual phantoms should reflect this by including realistic intra-organtexture and material variation.We propose training two 3D Double U-Net conditional generative adversarialnetworks (3D DUC-GAN) to generate sixteen unique textures that encompass organsfound within the torso. The model was trained on 378 CT image-segmentationpairs taken from a publicly available dataset with 18 additional pairs reserved fortesting. Textured phantoms were generated and imaged using DukeSim, a virtual CTsimulation platform.Results showed that the deep learning model was able to synthesize realisticheterogeneous phantoms from a set of homogeneous phantoms. These phantoms werecompared with original CT scans and had a mean absolute difference of 46.15 ± 1.06HU. The structural similarity index (SSIM) and peak signal-to-noise ratio (PSNR)were 0.86 ± 0.004 and 28.62 ± 0.14, respectively. The maximum mean discrepancybetween the generated and actual distribution was 0.0016. These metrics markedan improvement of 27%, 5.9%, 6.2%, and 28% respectively, compared to currenthomogeneous texture methods. The generated phantoms that underwent a virtualCT scan had a closer visual resemblance to the true CT scan compared to the previousmethod.The resulting heterogeneous phantoms offer a significant step toward more realisticin silico trials, enabling enhanced simulation of imaging procedures with greater fidelityto true anatomical variation.

CT Image Synthesis Abdominal Methodology In Silico Academic Lab Open Dataset

Towards 3D Semantic Image Synthesis for Medical Imaging

Wenwu Tang, Khaled Seyam, Bin Yang

•preprint•Jun 30 2025

In the medical domain, acquiring large datasets is challenging due to both accessibility issues and stringent privacy regulations. Consequently, data availability and privacy protection are major obstacles to applying machine learning in medical imaging. To address this, our study proposes the Med-LSDM (Latent Semantic Diffusion Model), which operates directly in the 3D domain and leverages de-identified semantic maps to generate synthetic data as a method of privacy preservation and data augmentation. Unlike many existing methods that focus on generating 2D slices, Med-LSDM is designed specifically for 3D semantic image synthesis, making it well-suited for applications requiring full volumetric data. Med-LSDM incorporates a guiding mechanism that controls the 3D image generation process by applying a diffusion model within the latent space of a pre-trained VQ-GAN. By operating in the compressed latent space, the model significantly reduces computational complexity while still preserving critical 3D spatial details. Our approach demonstrates strong performance in 3D semantic medical image synthesis, achieving a 3D-FID score of 0.0054 on the conditional Duke Breast dataset and similar Dice scores (0.70964) to those of real images (0.71496). These results demonstrate that the synthetic data from our model have a small domain gap with real data and are useful for data augmentation.

Mixed Modality Image Synthesis Breast Methodology In Silico Academic Lab Open Dataset GenAI

Filter Papers

Tags

LUNETR: Language-Infused UNETR for precise pancreatic tumor segmentation in 3D medical image.

Liver lesion segmentation in ultrasound: A benchmark and a baseline network.

A novel deep learning framework for retinal disease detection leveraging contextual and local features cues from retinal images.

Multi-modal and Multi-view Cervical Spondylosis Imaging Dataset.

Lessons learned from RadiologyNET foundation models for transfer learning in medical radiology.

Deep Learning-Based Semantic Segmentation for Real-Time Kidney Imaging and Measurements with Augmented Reality-Assisted Ultrasound

$μ^2$Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation

ToolCAP: Novel Tools to improve management of paediatric Community-Acquired Pneumonia - a randomized controlled trial- Statistical Analysis Plan

In-silico CT simulations of deep learning generated heterogeneous phantoms.

Towards 3D Semantic Image Synthesis for Medical Imaging

Ready to Sharpen Your Edge?