Sort by:
Page 1 of 659 results
Next

Deep learning-based time-of-flight (ToF) enhancement of non-ToF PET scans for different radiotracers.

Mehranian A, Wollenweber SD, Bradley KM, Fielding PA, Huellner M, Iagaru A, Dedja M, Colwell T, Kotasidis F, Johnsen R, Jansen FP, McGowan DR

pubmed logopapersJul 1 2025
To evaluate a deep learning-based time-of-flight (DLToF) model trained to enhance the image quality of non-ToF PET images for different tracers, reconstructed using BSREM algorithm, towards ToF images. A 3D residual U-NET model was trained using 8 different tracers (FDG: 75% and non-FDG: 25%) from 11 sites from US, Europe and Asia. A total of 309 training and 33 validation datasets scanned on GE Discovery MI (DMI) ToF scanners were used for development of DLToF models of three strengths: low (L), medium (M) and high (H). The training and validation pairs consisted of target ToF and input non-ToF BSREM reconstructions using site-preferred regularisation parameters (beta values). The contrast and noise properties of each model were defined by adjusting the beta value of target ToF images. A total of 60 DMI datasets, consisting of a set of 4 tracers (<sup>18</sup>F-FDG, <sup>18</sup>F-PSMA, <sup>68</sup>Ga-PSMA, <sup>68</sup>Ga-DOTATATE) and 15 exams each, were collected for testing and quantitative analysis of the models based on standardized uptake value (SUV) in regions of interest (ROI) placed in lesions, lungs and liver. Each dataset includes 5 image series: ToF and non-ToF BSREM and three DLToF images. The image series (300 in total) were blind scored on a 5-point Likert score by 4 readers based on lesion detectability, diagnostic confidence, and image noise/quality. In lesion SUV<sub>max</sub> quantification with respect to ToF BSREM, DLToF-H achieved the best results among the three models by reducing the non-ToF BSREM errors from -39% to -6% for <sup>18</sup>F-FDG (38 lesions); from -42% to -7% for <sup>18</sup>F-PSMA (35 lesions); from -34% to -4% for <sup>68</sup>Ga-PSMA (23 lesions) and from -34% to -12% for <sup>68</sup>Ga-DOTATATE (32 lesions). Quantification results in liver and lung also showed ToF-like performance of DLToF models. Clinical reader resulted showed that DLToF-H results in an improved lesion detectability on average for all four radiotracers whereas DLToF-L achieved the highest scores for image quality (noise level). The results of DLToF-M however showed that this model results in the best trade-off between lesion detection and noise level and hence achieved the highest score for diagnostic confidence on average for all radiotracers. This study demonstrated that the DLToF models are suitable for both FDG and non-FDG tracers and could be utilized for digital BGO PET/CT scanners to provide an image quality and lesion detectability comparable and close to ToF.

Robust and generalizable artificial intelligence for multi-organ segmentation in ultra-low-dose total-body PET imaging: a multi-center and cross-tracer study.

Wang H, Qiao X, Ding W, Chen G, Miao Y, Guo R, Zhu X, Cheng Z, Xu J, Li B, Huang Q

pubmed logopapersJul 1 2025
Positron Emission Tomography (PET) is a powerful molecular imaging tool that visualizes radiotracer distribution to reveal physiological processes. Recent advances in total-body PET have enabled low-dose, CT-free imaging; however, accurate organ segmentation using PET-only data remains challenging. This study develops and validates a deep learning model for multi-organ PET segmentation across varied imaging conditions and tracers, addressing critical needs for fully PET-based quantitative analysis. This retrospective study employed a 3D deep learning-based model for automated multi-organ segmentation on PET images acquired under diverse conditions, including low-dose and non-attenuation-corrected scans. Using a dataset of 798 patients from multiple centers with varied tracers, model robustness and generalizability were evaluated via multi-center and cross-tracer tests. Ground-truth labels for 23 organs were generated from CT images, and segmentation accuracy was assessed using the Dice similarity coefficient (DSC). In the multi-center dataset from four different institutions, our model achieved average DSC values of 0.834, 0.825, 0.819, and 0.816 across varying dose reduction factors and correction conditions for FDG PET images. In the cross-tracer dataset, the model reached average DSC values of 0.737, 0.573, 0.830, 0.661, and 0.708 for DOTATATE, FAPI, FDG, Grazytracer, and PSMA, respectively. The proposed model demonstrated effective, fully PET-based multi-organ segmentation across a range of imaging conditions, centers, and tracers, achieving high robustness and generalizability. These findings underscore the model's potential to enhance clinical diagnostic workflows by supporting ultra-low dose PET imaging. Not applicable. This is a retrospective study based on collected data, which has been approved by the Research Ethics Committee of Ruijin Hospital affiliated to Shanghai Jiao Tong University School of Medicine.

Exploring the Design Space of 3D MLLMs for CT Report Generation

Mohammed Baharoon, Jun Ma, Congyu Fang, Augustin Toma, Bo Wang

arxiv logopreprintJun 26 2025
Multimodal Large Language Models (MLLMs) have emerged as a promising way to automate Radiology Report Generation (RRG). In this work, we systematically investigate the design space of 3D MLLMs, including visual input representation, projectors, Large Language Models (LLMs), and fine-tuning techniques for 3D CT report generation. We also introduce two knowledge-based report augmentation methods that improve performance on the GREEN score by up to 10\%, achieving the 2nd place on the MICCAI 2024 AMOS-MM challenge. Our results on the 1,687 cases from the AMOS-MM dataset show that RRG is largely independent of the size of LLM under the same training protocol. We also show that larger volume size does not always improve performance if the original ViT was pre-trained on a smaller volume size. Lastly, we show that using a segmentation mask along with the CT volume improves performance. The code is publicly available at https://github.com/bowang-lab/AMOS-MM-Solution

Computed tomography-derived quantitative imaging biomarkers enable the prediction of disease manifestations and survival in patients with systemic sclerosis.

Sieren MM, Grasshoff H, Riemekasten G, Berkel L, Nensa F, Hosch R, Barkhausen J, Kloeckner R, Wegner F

pubmed logopapersJun 25 2025
Systemic sclerosis (SSc) is a complex inflammatory vasculopathy with diverse symptoms and variable disease progression. Despite its known impact on body composition (BC), clinical decision-making has yet to incorporate these biomarkers. This study aims to extract quantitative BC imaging biomarkers from CT scans to assess disease severity, define BC phenotypes, track changes over time and predict survival. CT exams were extracted from a prospectively maintained cohort of 452 SSc patients. 128 patients with at least one CT exam were included. An artificial intelligence-based 3D body composition analysis (BCA) algorithm assessed muscle volume, different adipose tissue compartments, and bone mineral density. These parameters were analysed with regard to various clinical, laboratory, functional parameters and survival. Phenotypes were identified performing K-means cluster analysis. Longitudinal evaluation of BCA changes employed regression analyses. A regression model using BCA parameters outperformed models based on Body Mass Index and clinical parameters in predicting survival (area under the curve (AUC)=0.75). Longitudinal development of the cardiac marker enabled prediction of survival with an AUC=0.82. Patients with altered BCA parameters had increased ORs for various complications, including interstitial lung disease (p<0.05). Two distinct BCA phenotypes were identified, showing significant differences in gastrointestinal disease manifestations (p<0.01). This study highlights several parameters with the potential to reshape clinical pathways for SSc patients. Quantitative BCA biomarkers offer a means to predict survival and individual disease manifestations, in part outperforming established parameters. These insights open new avenues for research into the mechanisms driving body composition changes in SSc and for developing enhanced disease management tools, ultimately leading to more personalised and effective patient care.

How well do multimodal LLMs interpret CT scans? An auto-evaluation framework for analyses.

Zhu Q, Hou B, Mathai TS, Mukherjee P, Jin Q, Chen X, Wang Z, Cheng R, Summers RM, Lu Z

pubmed logopapersJun 25 2025
This study introduces a novel evaluation framework, GPTRadScore, to systematically assess the performance of multimodal large language models (MLLMs) in generating clinically accurate findings from CT imaging. Specifically, GPTRadScore leverages LLMs as an evaluation metric, aiming to provide a more accurate and clinically informed assessment than traditional language-specific methods. Using this framework, we evaluate the capability of several MLLMs, including GPT-4 with Vision (GPT-4V), Gemini Pro Vision, LLaVA-Med, and RadFM, to interpret findings in CT scans. This retrospective study leverages a subset of the public DeepLesion dataset to evaluate the performance of several multimodal LLMs in describing findings in CT slices. GPTRadScore was developed to assess the generated descriptions (location, body part, and type) using GPT-4, alongside traditional metrics. RadFM was fine-tuned using a subset of the DeepLesion dataset with additional labeled examples targeting complex findings. Post fine-tuning, performance was reassessed using GPTRadScore to measure accuracy improvements. Evaluations demonstrated a high correlation of GPTRadScore with clinician assessments, with Pearson's correlation coefficients of 0.87, 0.91, 0.75, 0.90, and 0.89. These results highlight its superiority over traditional metrics, such as BLEU, METEOR, and ROUGE, and indicate that GPTRadScore can serve as a reliable evaluation metric. Using GPTRadScore, it was observed that while GPT-4V and Gemini Pro Vision outperformed other models, significant areas for improvement remain, primarily due to limitations in the datasets used for training. Fine-tuning RadFM resulted in substantial accuracy gains: location accuracy increased from 3.41% to 12.8%, body part accuracy improved from 29.12% to 53%, and type accuracy rose from 9.24% to 30%. These findings reinforce the hypothesis that fine-tuning RadFM can significantly enhance its performance. GPT-4 effectively correlates with expert assessments, validating its use as a reliable metric for evaluating multimodal LLMs in radiological diagnostics. Additionally, the results underscore the efficacy of fine-tuning approaches in improving the descriptive accuracy of LLM-generated medical imaging findings.

MS-IQA: A Multi-Scale Feature Fusion Network for PET/CT Image Quality Assessment

Siqiao Li, Chen Hui, Wei Zhang, Rui Liang, Chenyue Song, Feng Jiang, Haiqi Zhu, Zhixuan Li, Hong Huang, Xiang Li

arxiv logopreprintJun 25 2025
Positron Emission Tomography / Computed Tomography (PET/CT) plays a critical role in medical imaging, combining functional and anatomical information to aid in accurate diagnosis. However, image quality degradation due to noise, compression and other factors could potentially lead to diagnostic uncertainty and increase the risk of misdiagnosis. When evaluating the quality of a PET/CT image, both low-level features like distortions and high-level features like organ anatomical structures affect the diagnostic value of the image. However, existing medical image quality assessment (IQA) methods are unable to account for both feature types simultaneously. In this work, we propose MS-IQA, a novel multi-scale feature fusion network for PET/CT IQA, which utilizes multi-scale features from various intermediate layers of ResNet and Swin Transformer, enhancing its ability of perceiving both local and global information. In addition, a multi-scale feature fusion module is also introduced to effectively combine high-level and low-level information through a dynamically weighted channel attention mechanism. Finally, to fill the blank of PET/CT IQA dataset, we construct PET-CT-IQA-DS, a dataset containing 2,700 varying-quality PET/CT images with quality scores assigned by radiologists. Experiments on our dataset and the publicly available LDCTIQAC2023 dataset demonstrate that our proposed model has achieved superior performance against existing state-of-the-art methods in various IQA metrics. This work provides an accurate and efficient IQA method for PET/CT. Our code and dataset are available at https://github.com/MS-IQA/MS-IQA/.

[Analysis of the global competitive landscape in artificial intelligence medical device research].

Chen J, Pan L, Long J, Yang N, Liu F, Lu Y, Ouyang Z

pubmed logopapersJun 25 2025
The objective of this study is to map the global scientific competitive landscape in the field of artificial intelligence (AI) medical devices using scientific data. A bibliometric analysis was conducted using the Web of Science Core Collection to examine global research trends in AI-based medical devices. As of the end of 2023, a total of 55 147 relevant publications were identified worldwide, with 76.6% published between 2018 and 2024. Research in this field has primarily focused on AI-assisted medical image and physiological signal analysis. At the national level, China (17 991 publications) and the United States (14 032 publications) lead in output. China has shown a rapid increase in publication volume, with its 2023 output exceeding twice that of the U.S.; however, the U.S. maintains a higher average citation per paper (China: 16.29; U.S.: 35.99). At the institutional level, seven Chinese institutions and three U.S. institutions rank among the global top ten in terms of publication volume. At the researcher level, prominent contributors include Acharya U Rajendra, Rueckert Daniel and Tian Jie, who have extensively explored AI-assisted medical imaging. Some researchers have specialized in specific imaging applications, such as Yang Xiaofeng (AI-assisted precision radiotherapy for tumors) and Shen Dinggang (brain imaging analysis). Others, including Gao Xiaorong and Ming Dong, focus on AI-assisted physiological signal analysis. The results confirm the rapid global development of AI in the medical device field, with "AI + imaging" emerging as the most mature direction. China and the U.S. maintain absolute leadership in this area-China slightly leads in publication volume, while the U.S., having started earlier, demonstrates higher research quality. Both countries host a large number of active research teams in this domain.

Recurrent Visual Feature Extraction and Stereo Attentions for CT Report Generation

Yuanhe Tian, Lei Mao, Yan Song

arxiv logopreprintJun 24 2025
Generating reports for computed tomography (CT) images is a challenging task, while similar to existing studies for medical image report generation, yet has its unique characteristics, such as spatial encoding of multiple images, alignment between image volume and texts, etc. Existing solutions typically use general 2D or 3D image processing techniques to extract features from a CT volume, where they firstly compress the volume and then divide the compressed CT slices into patches for visual encoding. These approaches do not explicitly account for the transformations among CT slices, nor do they effectively integrate multi-level image features, particularly those containing specific organ lesions, to instruct CT report generation (CTRG). In considering the strong correlation among consecutive slices in CT scans, in this paper, we propose a large language model (LLM) based CTRG method with recurrent visual feature extraction and stereo attentions for hierarchical feature modeling. Specifically, we use a vision Transformer to recurrently process each slice in a CT volume, and employ a set of attentions over the encoded slices from different perspectives to selectively obtain important visual information and align them with textual features, so as to better instruct an LLM for CTRG. Experiment results and further analysis on the benchmark M3D-Cap dataset show that our method outperforms strong baseline models and achieves state-of-the-art results, demonstrating its validity and effectiveness.

Filling of incomplete sinograms from sparse PET detector configurations using a residual U-Net

Klara Leffler, Luigi Tommaso Luppino, Samuel Kuttner, Karin Söderkvist, Jan Axelsson

arxiv logopreprintJun 24 2025
Long axial field-of-view PET scanners offer increased field-of-view and sensitivity compared to traditional PET scanners. However, a significant cost is associated with the densely packed photodetectors required for the extended-coverage systems, limiting clinical utilisation. To mitigate the cost limitations, alternative sparse system configurations have been proposed, allowing an extended field-of-view PET design with detector costs similar to a standard PET system, albeit at the expense of image quality. In this work, we propose a deep sinogram restoration network to fill in the missing sinogram data. Our method utilises a modified Residual U-Net, trained on clinical PET scans from a GE Signa PET/MR, simulating the removal of 50% of the detectors in a chessboard pattern (retaining only 25% of all lines of response). The model successfully recovers missing counts, with a mean absolute error below two events per pixel, outperforming 2D interpolation in both sinogram and reconstructed image domain. Notably, the predicted sinograms exhibit a smoothing effect, leading to reconstructed images lacking sharpness in finer details. Despite these limitations, the model demonstrates a substantial capacity for compensating for the undersampling caused by the sparse detector configuration. This proof-of-concept study suggests that sparse detector configurations, combined with deep learning techniques, offer a viable alternative to conventional PET scanner designs. This approach supports the development of cost-effective, total body PET scanners, allowing a significant step forward in medical imaging technology.

Assessing accuracy and legitimacy of multimodal large language models on Japan Diagnostic Radiology Board Examination

Hirano, Y., Miki, S., Yamagishi, Y., Hanaoka, S., Nakao, T., Kikuchi, T., Nakamura, Y., Nomura, Y., Yoshikawa, T., Abe, O.

medrxiv logopreprintJun 23 2025
PurposeTo assess and compare the accuracy and legitimacy of multimodal large language models (LLMs) on the Japan Diagnostic Radiology Board Examination (JDRBE). Materials and methodsThe dataset comprised questions from JDRBE 2021, 2023, and 2024, with ground-truth answers established through consensus among multiple board-certified diagnostic radiologists. Questions without associated images and those lacking unanimous agreement on answers were excluded. Eight LLMs were evaluated: GPT-4 Turbo, GPT-4o, GPT-4.5, GPT-4.1, o3, o4-mini, Claude 3.7 Sonnet, and Gemini 2.5 Pro. Each model was evaluated under two conditions: with inputting images (vision) and without (text-only). Performance differences between the conditions were assessed using McNemars exact test. Two diagnostic radiologists (with 2 and 18 years of experience) independently rated the legitimacy of responses from four models (GPT-4 Turbo, Claude 3.7 Sonnet, o3, and Gemini 2.5 Pro) using a five-point Likert scale, blinded to model identity. Legitimacy scores were analyzed using Friedmans test, followed by pairwise Wilcoxon signed-rank tests with Holm correction. ResultsThe dataset included 233 questions. Under the vision condition, o3 achieved the highest accuracy at 72%, followed by o4-mini (70%) and Gemini 2.5 Pro (70%). Under the text-only condition, o3 topped the list with an accuracy of 67%. Addition of image input significantly improved the accuracy of two models (Gemini 2.5 Pro and GPT-4.5), but not the others. Both o3 and Gemini 2.5 Pro received significantly higher legitimacy scores than GPT-4 Turbo and Claude 3.7 Sonnet from both raters. ConclusionRecent multimodal LLMs, particularly o3 and Gemini 2.5 Pro, have demonstrated remarkable progress on JDRBE questions, reflecting their rapid evolution in diagnostic radiology. Secondary abstract Eight multimodal large language models were evaluated on the Japan Diagnostic Radiology Board Examination. OpenAIs o3 and Google DeepMinds Gemini 2.5 Pro achieved high accuracy rates (72% and 70%) and received good legitimacy scores from human raters, demonstrating steady progress.
Page 1 of 659 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.