Sort by:
Page 1 of 657 results
Next

Exploring the Design Space of 3D MLLMs for CT Report Generation

Mohammed Baharoon, Jun Ma, Congyu Fang, Augustin Toma, Bo Wang

arxiv logopreprintJun 26 2025
Multimodal Large Language Models (MLLMs) have emerged as a promising way to automate Radiology Report Generation (RRG). In this work, we systematically investigate the design space of 3D MLLMs, including visual input representation, projectors, Large Language Models (LLMs), and fine-tuning techniques for 3D CT report generation. We also introduce two knowledge-based report augmentation methods that improve performance on the GREEN score by up to 10\%, achieving the 2nd place on the MICCAI 2024 AMOS-MM challenge. Our results on the 1,687 cases from the AMOS-MM dataset show that RRG is largely independent of the size of LLM under the same training protocol. We also show that larger volume size does not always improve performance if the original ViT was pre-trained on a smaller volume size. Lastly, we show that using a segmentation mask along with the CT volume improves performance. The code is publicly available at https://github.com/bowang-lab/AMOS-MM-Solution

Computed tomography-derived quantitative imaging biomarkers enable the prediction of disease manifestations and survival in patients with systemic sclerosis.

Sieren MM, Grasshoff H, Riemekasten G, Berkel L, Nensa F, Hosch R, Barkhausen J, Kloeckner R, Wegner F

pubmed logopapersJun 25 2025
Systemic sclerosis (SSc) is a complex inflammatory vasculopathy with diverse symptoms and variable disease progression. Despite its known impact on body composition (BC), clinical decision-making has yet to incorporate these biomarkers. This study aims to extract quantitative BC imaging biomarkers from CT scans to assess disease severity, define BC phenotypes, track changes over time and predict survival. CT exams were extracted from a prospectively maintained cohort of 452 SSc patients. 128 patients with at least one CT exam were included. An artificial intelligence-based 3D body composition analysis (BCA) algorithm assessed muscle volume, different adipose tissue compartments, and bone mineral density. These parameters were analysed with regard to various clinical, laboratory, functional parameters and survival. Phenotypes were identified performing K-means cluster analysis. Longitudinal evaluation of BCA changes employed regression analyses. A regression model using BCA parameters outperformed models based on Body Mass Index and clinical parameters in predicting survival (area under the curve (AUC)=0.75). Longitudinal development of the cardiac marker enabled prediction of survival with an AUC=0.82. Patients with altered BCA parameters had increased ORs for various complications, including interstitial lung disease (p<0.05). Two distinct BCA phenotypes were identified, showing significant differences in gastrointestinal disease manifestations (p<0.01). This study highlights several parameters with the potential to reshape clinical pathways for SSc patients. Quantitative BCA biomarkers offer a means to predict survival and individual disease manifestations, in part outperforming established parameters. These insights open new avenues for research into the mechanisms driving body composition changes in SSc and for developing enhanced disease management tools, ultimately leading to more personalised and effective patient care.

[Analysis of the global competitive landscape in artificial intelligence medical device research].

Chen J, Pan L, Long J, Yang N, Liu F, Lu Y, Ouyang Z

pubmed logopapersJun 25 2025
The objective of this study is to map the global scientific competitive landscape in the field of artificial intelligence (AI) medical devices using scientific data. A bibliometric analysis was conducted using the Web of Science Core Collection to examine global research trends in AI-based medical devices. As of the end of 2023, a total of 55 147 relevant publications were identified worldwide, with 76.6% published between 2018 and 2024. Research in this field has primarily focused on AI-assisted medical image and physiological signal analysis. At the national level, China (17 991 publications) and the United States (14 032 publications) lead in output. China has shown a rapid increase in publication volume, with its 2023 output exceeding twice that of the U.S.; however, the U.S. maintains a higher average citation per paper (China: 16.29; U.S.: 35.99). At the institutional level, seven Chinese institutions and three U.S. institutions rank among the global top ten in terms of publication volume. At the researcher level, prominent contributors include Acharya U Rajendra, Rueckert Daniel and Tian Jie, who have extensively explored AI-assisted medical imaging. Some researchers have specialized in specific imaging applications, such as Yang Xiaofeng (AI-assisted precision radiotherapy for tumors) and Shen Dinggang (brain imaging analysis). Others, including Gao Xiaorong and Ming Dong, focus on AI-assisted physiological signal analysis. The results confirm the rapid global development of AI in the medical device field, with "AI + imaging" emerging as the most mature direction. China and the U.S. maintain absolute leadership in this area-China slightly leads in publication volume, while the U.S., having started earlier, demonstrates higher research quality. Both countries host a large number of active research teams in this domain.

How well do multimodal LLMs interpret CT scans? An auto-evaluation framework for analyses.

Zhu Q, Hou B, Mathai TS, Mukherjee P, Jin Q, Chen X, Wang Z, Cheng R, Summers RM, Lu Z

pubmed logopapersJun 25 2025
This study introduces a novel evaluation framework, GPTRadScore, to systematically assess the performance of multimodal large language models (MLLMs) in generating clinically accurate findings from CT imaging. Specifically, GPTRadScore leverages LLMs as an evaluation metric, aiming to provide a more accurate and clinically informed assessment than traditional language-specific methods. Using this framework, we evaluate the capability of several MLLMs, including GPT-4 with Vision (GPT-4V), Gemini Pro Vision, LLaVA-Med, and RadFM, to interpret findings in CT scans. This retrospective study leverages a subset of the public DeepLesion dataset to evaluate the performance of several multimodal LLMs in describing findings in CT slices. GPTRadScore was developed to assess the generated descriptions (location, body part, and type) using GPT-4, alongside traditional metrics. RadFM was fine-tuned using a subset of the DeepLesion dataset with additional labeled examples targeting complex findings. Post fine-tuning, performance was reassessed using GPTRadScore to measure accuracy improvements. Evaluations demonstrated a high correlation of GPTRadScore with clinician assessments, with Pearson's correlation coefficients of 0.87, 0.91, 0.75, 0.90, and 0.89. These results highlight its superiority over traditional metrics, such as BLEU, METEOR, and ROUGE, and indicate that GPTRadScore can serve as a reliable evaluation metric. Using GPTRadScore, it was observed that while GPT-4V and Gemini Pro Vision outperformed other models, significant areas for improvement remain, primarily due to limitations in the datasets used for training. Fine-tuning RadFM resulted in substantial accuracy gains: location accuracy increased from 3.41% to 12.8%, body part accuracy improved from 29.12% to 53%, and type accuracy rose from 9.24% to 30%. These findings reinforce the hypothesis that fine-tuning RadFM can significantly enhance its performance. GPT-4 effectively correlates with expert assessments, validating its use as a reliable metric for evaluating multimodal LLMs in radiological diagnostics. Additionally, the results underscore the efficacy of fine-tuning approaches in improving the descriptive accuracy of LLM-generated medical imaging findings.

MS-IQA: A Multi-Scale Feature Fusion Network for PET/CT Image Quality Assessment

Siqiao Li, Chen Hui, Wei Zhang, Rui Liang, Chenyue Song, Feng Jiang, Haiqi Zhu, Zhixuan Li, Hong Huang, Xiang Li

arxiv logopreprintJun 25 2025
Positron Emission Tomography / Computed Tomography (PET/CT) plays a critical role in medical imaging, combining functional and anatomical information to aid in accurate diagnosis. However, image quality degradation due to noise, compression and other factors could potentially lead to diagnostic uncertainty and increase the risk of misdiagnosis. When evaluating the quality of a PET/CT image, both low-level features like distortions and high-level features like organ anatomical structures affect the diagnostic value of the image. However, existing medical image quality assessment (IQA) methods are unable to account for both feature types simultaneously. In this work, we propose MS-IQA, a novel multi-scale feature fusion network for PET/CT IQA, which utilizes multi-scale features from various intermediate layers of ResNet and Swin Transformer, enhancing its ability of perceiving both local and global information. In addition, a multi-scale feature fusion module is also introduced to effectively combine high-level and low-level information through a dynamically weighted channel attention mechanism. Finally, to fill the blank of PET/CT IQA dataset, we construct PET-CT-IQA-DS, a dataset containing 2,700 varying-quality PET/CT images with quality scores assigned by radiologists. Experiments on our dataset and the publicly available LDCTIQAC2023 dataset demonstrate that our proposed model has achieved superior performance against existing state-of-the-art methods in various IQA metrics. This work provides an accurate and efficient IQA method for PET/CT. Our code and dataset are available at https://github.com/MS-IQA/MS-IQA/.

Filling of incomplete sinograms from sparse PET detector configurations using a residual U-Net

Klara Leffler, Luigi Tommaso Luppino, Samuel Kuttner, Karin Söderkvist, Jan Axelsson

arxiv logopreprintJun 24 2025
Long axial field-of-view PET scanners offer increased field-of-view and sensitivity compared to traditional PET scanners. However, a significant cost is associated with the densely packed photodetectors required for the extended-coverage systems, limiting clinical utilisation. To mitigate the cost limitations, alternative sparse system configurations have been proposed, allowing an extended field-of-view PET design with detector costs similar to a standard PET system, albeit at the expense of image quality. In this work, we propose a deep sinogram restoration network to fill in the missing sinogram data. Our method utilises a modified Residual U-Net, trained on clinical PET scans from a GE Signa PET/MR, simulating the removal of 50% of the detectors in a chessboard pattern (retaining only 25% of all lines of response). The model successfully recovers missing counts, with a mean absolute error below two events per pixel, outperforming 2D interpolation in both sinogram and reconstructed image domain. Notably, the predicted sinograms exhibit a smoothing effect, leading to reconstructed images lacking sharpness in finer details. Despite these limitations, the model demonstrates a substantial capacity for compensating for the undersampling caused by the sparse detector configuration. This proof-of-concept study suggests that sparse detector configurations, combined with deep learning techniques, offer a viable alternative to conventional PET scanner designs. This approach supports the development of cost-effective, total body PET scanners, allowing a significant step forward in medical imaging technology.

Recurrent Visual Feature Extraction and Stereo Attentions for CT Report Generation

Yuanhe Tian, Lei Mao, Yan Song

arxiv logopreprintJun 24 2025
Generating reports for computed tomography (CT) images is a challenging task, while similar to existing studies for medical image report generation, yet has its unique characteristics, such as spatial encoding of multiple images, alignment between image volume and texts, etc. Existing solutions typically use general 2D or 3D image processing techniques to extract features from a CT volume, where they firstly compress the volume and then divide the compressed CT slices into patches for visual encoding. These approaches do not explicitly account for the transformations among CT slices, nor do they effectively integrate multi-level image features, particularly those containing specific organ lesions, to instruct CT report generation (CTRG). In considering the strong correlation among consecutive slices in CT scans, in this paper, we propose a large language model (LLM) based CTRG method with recurrent visual feature extraction and stereo attentions for hierarchical feature modeling. Specifically, we use a vision Transformer to recurrently process each slice in a CT volume, and employ a set of attentions over the encoded slices from different perspectives to selectively obtain important visual information and align them with textual features, so as to better instruct an LLM for CTRG. Experiment results and further analysis on the benchmark M3D-Cap dataset show that our method outperforms strong baseline models and achieves state-of-the-art results, demonstrating its validity and effectiveness.

Assessing accuracy and legitimacy of multimodal large language models on Japan Diagnostic Radiology Board Examination

Hirano, Y., Miki, S., Yamagishi, Y., Hanaoka, S., Nakao, T., Kikuchi, T., Nakamura, Y., Nomura, Y., Yoshikawa, T., Abe, O.

medrxiv logopreprintJun 23 2025
PurposeTo assess and compare the accuracy and legitimacy of multimodal large language models (LLMs) on the Japan Diagnostic Radiology Board Examination (JDRBE). Materials and methodsThe dataset comprised questions from JDRBE 2021, 2023, and 2024, with ground-truth answers established through consensus among multiple board-certified diagnostic radiologists. Questions without associated images and those lacking unanimous agreement on answers were excluded. Eight LLMs were evaluated: GPT-4 Turbo, GPT-4o, GPT-4.5, GPT-4.1, o3, o4-mini, Claude 3.7 Sonnet, and Gemini 2.5 Pro. Each model was evaluated under two conditions: with inputting images (vision) and without (text-only). Performance differences between the conditions were assessed using McNemars exact test. Two diagnostic radiologists (with 2 and 18 years of experience) independently rated the legitimacy of responses from four models (GPT-4 Turbo, Claude 3.7 Sonnet, o3, and Gemini 2.5 Pro) using a five-point Likert scale, blinded to model identity. Legitimacy scores were analyzed using Friedmans test, followed by pairwise Wilcoxon signed-rank tests with Holm correction. ResultsThe dataset included 233 questions. Under the vision condition, o3 achieved the highest accuracy at 72%, followed by o4-mini (70%) and Gemini 2.5 Pro (70%). Under the text-only condition, o3 topped the list with an accuracy of 67%. Addition of image input significantly improved the accuracy of two models (Gemini 2.5 Pro and GPT-4.5), but not the others. Both o3 and Gemini 2.5 Pro received significantly higher legitimacy scores than GPT-4 Turbo and Claude 3.7 Sonnet from both raters. ConclusionRecent multimodal LLMs, particularly o3 and Gemini 2.5 Pro, have demonstrated remarkable progress on JDRBE questions, reflecting their rapid evolution in diagnostic radiology. Secondary abstract Eight multimodal large language models were evaluated on the Japan Diagnostic Radiology Board Examination. OpenAIs o3 and Google DeepMinds Gemini 2.5 Pro achieved high accuracy rates (72% and 70%) and received good legitimacy scores from human raters, demonstrating steady progress.

Emergency radiology: roadmap for radiology departments.

Aydin S, Ece B, Cakmak V, Kocak B, Onur MR

pubmed logopapersJun 20 2025
Emergency radiology has evolved into a significant subspecialty over the past 2 decades, facing unique challenges including escalating imaging volumes, increasing study complexity, and heightened expectations from clinicians and patients. This review provides a comprehensive overview of the key requirements for an effective emergency radiology unit. Emergency radiologists play a crucial role in real-time decision-making by providing continuous 24/7 support, requiring expertise across various organ systems and close collaboration with emergency physicians and specialists. Beyond image interpretation, emergency radiologists are responsible for organizing staff schedules, planning equipment, determining imaging protocols, and establishing standardized reporting systems. Operational considerations in emergency radiology departments include efficient scheduling models such as circadian-based scheduling, strategic equipment organization with primary imaging modalities positioned near emergency departments, and effective imaging management through structured ordering systems and standardized protocols. Preparedness for mass casualty incidents requires a well-organized workflow process map detailing steps from patient transfer to image acquisition and interpretation, with clear task allocation and imaging pathways. Collaboration between emergency radiologists and physicians is essential, with accurate communication facilitated through various channels and structured reporting templates. Artificial intelligence has emerged as a transformative tool in emergency radiology, offering potential benefits in both interpretative domains (detecting intracranial hemorrhage, pulmonary embolism, acute ischemic stroke) and non-interpretative applications (triage systems, protocol assistance, quality control). Despite implementation challenges including clinician skepticism, financial considerations, and ethical issues, AI can enhance diagnostic accuracy and workflow optimization. Teleradiology provides solutions for staff shortages, particularly during off-hours, with hybrid models allowing radiologists to work both on-site and remotely. This review aims to guide stakeholders in establishing and maintaining efficient emergency radiology services to improve patient outcomes.

MRI-CORE: A Foundation Model for Magnetic Resonance Imaging

Haoyu Dong, Yuwen Chen, Hanxue Gu, Nicholas Konz, Yaqian Chen, Qihang Li, Maciej A. Mazurowski

arxiv logopreprintJun 13 2025
The widespread use of Magnetic Resonance Imaging (MRI) and the rise of deep learning have enabled the development of powerful predictive models for a wide range of diagnostic tasks in MRI, such as image classification or object segmentation. However, training models for specific new tasks often requires large amounts of labeled data, which is difficult to obtain due to high annotation costs and data privacy concerns. To circumvent this issue, we introduce MRI-CORE (MRI COmprehensive Representation Encoder), a vision foundation model pre-trained using more than 6 million slices from over 110,000 MRI volumes across 18 main body locations. Experiments on five diverse object segmentation tasks in MRI demonstrate that MRI-CORE can significantly improve segmentation performance in realistic scenarios with limited labeled data availability, achieving an average gain of 6.97% 3D Dice Coefficient using only 10 annotated slices per task. We further demonstrate new model capabilities in MRI such as classification of image properties including body location, sequence type and institution, and zero-shot segmentation. These results highlight the value of MRI-CORE as a generalist vision foundation model for MRI, potentially lowering the data annotation resource barriers for many applications.
Page 1 of 657 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.