Sort by:
Page 27 of 44431 results

A Robust Deep Learning Method with Uncertainty Estimation for the Pathological Classification of Renal Cell Carcinoma Based on CT Images.

Yao N, Hu H, Chen K, Huang H, Zhao C, Guo Y, Li B, Nan J, Li Y, Han C, Zhu F, Zhou W, Tian L

pubmed logopapersJun 1 2025
This study developed and validated a deep learning-based diagnostic model with uncertainty estimation to aid radiologists in the preoperative differentiation of pathological subtypes of renal cell carcinoma (RCC) based on computed tomography (CT) images. Data from 668 consecutive patients with pathologically confirmed RCC were retrospectively collected from Center 1, and the model was trained using fivefold cross-validation to classify RCC subtypes into clear cell RCC (ccRCC), papillary RCC (pRCC), and chromophobe RCC (chRCC). An external validation with 78 patients from Center 2 was conducted to evaluate the performance of the model. In the fivefold cross-validation, the area under the receiver operating characteristic curve (AUC) for the classification of ccRCC, pRCC, and chRCC was 0.868 (95% CI, 0.826-0.923), 0.846 (95% CI, 0.812-0.886), and 0.839 (95% CI, 0.802-0.88), respectively. In the external validation set, the AUCs were 0.856 (95% CI, 0.838-0.882), 0.787 (95% CI, 0.757-0.818), and 0.793 (95% CI, 0.758-0.831) for ccRCC, pRCC, and chRCC, respectively. The model demonstrated robust performance in predicting the pathological subtypes of RCC, while the incorporated uncertainty emphasized the importance of understanding model confidence. The proposed approach, integrated with uncertainty estimation, offers clinicians a dual advantage: accurate RCC subtype predictions complemented by diagnostic confidence metrics, thereby promoting informed decision-making for patients with RCC.

Intra-Individual Reproducibility of Automated Abdominal Organ Segmentation-Performance of TotalSegmentator Compared to Human Readers and an Independent nnU-Net Model.

Abel L, Wasserthal J, Meyer MT, Vosshenrich J, Yang S, Donners R, Obmann M, Boll D, Merkle E, Breit HC, Segeroth M

pubmed logopapersJun 1 2025
The purpose of this study is to assess segmentation reproducibility of artificial intelligence-based algorithm, TotalSegmentator, across 34 anatomical structures using multiphasic abdominal CT scans comparing unenhanced, arterial, and portal venous phases in the same patients. A total of 1252 multiphasic abdominal CT scans acquired at our institution between January 1, 2012, and December 31, 2022, were retrospectively included. TotalSegmentator was used to derive volumetric measurements of 34 abdominal organs and structures from the total of 3756 CT series. Reproducibility was evaluated across three contrast phases per CT and compared to two human readers and an independent nnU-Net trained on the BTCV dataset. Relative deviation in segmented volumes and absolute volume deviations (AVD) were reported. Volume deviation within 5% was considered reproducible. Thus, non-inferiority testing was conducted using a 5% margin. Twenty-nine out of 34 structures had volume deviations within 5% and were considered reproducible. Volume deviations for the adrenal glands, gallbladder, spleen, and duodenum were above 5%. Highest reproducibility was observed for bones (- 0.58% [95% CI: - 0.58, - 0.57]) and muscles (- 0.33% [- 0.35, - 0.32]). Among abdominal organs, volume deviation was 1.67% (1.60, 1.74). TotalSegmentator outperformed the reproducibility of the nnU-Net trained on the BTCV dataset with an AVD of 6.50% (6.41, 6.59) vs. 10.03% (9.86, 10.20; p < 0.0001), most notably in cases with pathologic findings. Similarly, TotalSegmentator's AVD between different contrast phases was superior compared to the interreader AVD for the same contrast phase (p = 0.036). TotalSegmentator demonstrated high intra-individual reproducibility for most abdominal structures in multiphasic abdominal CT scans. Although reproducibility was lower in pathologic cases, it outperforms both human readers and a nnU-Net trained on the BTCV dataset.

Cone-beam computed tomography (CBCT) image-quality improvement using a denoising diffusion probabilistic model conditioned by pseudo-CBCT of pelvic regions.

Hattori M, Chai H, Hiraka T, Suzuki K, Yuasa T

pubmed logopapersJun 1 2025
Cone-beam computed tomography (CBCT) is widely used in radiotherapy to image patient configuration before treatment but its image quality is lower than planning CT due to scattering, motion, and reconstruction methods. This reduces the accuracy of Hounsfield units (HU) and limits its use in adaptive radiation therapy (ART). However, synthetic CT (sCT) generation using deep learning methods for CBCT intensity correction faces challenges due to deformation. To address these issues, we propose enhancing CBCT quality using a conditional denoising diffusion probability model (CDDPM), which is trained on pseudo-CBCT created by adding pseudo-scatter to planning CT. The CDDPM transforms CBCT into high-quality sCT, improving HU accuracy while preserving anatomical configuration. The performance evaluation of the proposed sCT showed a reduction in mean absolute error (MAE) from 81.19 HU for CBCT to 24.89 HU for the sCT. Peak signal-to-noise ratio (PSNR) improved from 31.20 dB for CBCT to 33.81 dB for the sCT. The Dice and Jaccard coefficients between CBCT and sCT for the colon, prostate, and bladder ranged from 0.69 to 0.91. When compared to other deep learning models, the proposed sCT outperformed them in terms of accuracy and anatomical preservation. The dosimetry analysis for prostate cancer revealed a dose error of over 10% with CBCT but nearly 0% with the sCT. Gamma pass rates for the proposed sCT exceeded 90% for all dose criteria, indicating high agreement with CT-based dose distributions. These results show that the proposed sCT improves image quality, dosimetry accuracy, and treatment planning, advancing ART for pelvic cancer.

Quantitative analysis of ureteral jets with dynamic magnetic resonance imaging and a deep-learning approach.

Wu M, Zeng W, Li Y, Ni C, Zhang J, Kong X, Zhang JL

pubmed logopapersJun 1 2025
To develop dynamic MRU protocol that focuses on the bladder to capture ureteral jets and to automatically estimate frequency and duration of ureteral jets from the dynamic images. Between February and July 2023, we collected 51 sets of dynamic MRU data from 5 healthy subjects. To capture the entire longitudinal trajectory of ureteral jets, we optimized orientation and thickness of the imaging slice for dynamic MRU, and developed a deep-learning method to automatically estimate frequency and duration of ureteral jets from the dynamic images. Among the 15 sets of images with different slice positioning, the positioning with slice thickness of 25 mm and orientation of 30° was optimal. Of the 36 sets of dynamic images acquired with the optimal protocol, 27 sets or 2529 images were used to train a U-Net model for automatically detecting the presence of ureteral jets. On the other 9 sets or 760 images, accuracy of the trained model was found to be 84.9 %. Based on the results of automatic detection, frequency of ureteral jet in each set of dynamic images was estimated as 8.0 ± 1.4 min<sup>-1</sup>, deviating from reference by -3.3 % ± 10.0 %; duration of each individual ureteral jet was estimated as 7.3 ± 2.8 s, deviating from reference by 2.4 % ± 32.2 %. The accumulative duration of ureteral jets estimated by the method correlated well (with coefficient of 0.936) with the bladder expansion recorded in the dynamic images. The proposed method was capable of quantitatively characterizing ureteral jets, potentially providing valuable information on functional status of ureteral peristalsis.

Semi-supervised spatial-frequency transformer for metal artifact reduction in maxillofacial CT and evaluation with intraoral scan.

Li Y, Ma C, Li Z, Wang Z, Han J, Shan H, Liu J

pubmed logopapersJun 1 2025
To develop a semi-supervised domain adaptation technique for metal artifact reduction with a spatial-frequency transformer (SFTrans) model (Semi-SFTrans), and to quantitatively compare its performance with supervised models (Sup-SFTrans and ResUNet) and traditional linear interpolation MAR method (LI) in oral and maxillofacial CT. Supervised models, including Sup-SFTrans and a state-of-the-art model termed ResUNet, were trained with paired simulated CT images, while semi-supervised model, Semi-SFTrans, was trained with both paired simulated and unpaired clinical CT images. For evaluation on the simulated data, we calculated Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) on the images corrected by four methods: LI, ResUNet, Sup-SFTrans, and Semi-SFTrans. For evaluation on the clinical data, we collected twenty-two clinical cases with real metal artifacts, and the corresponding intraoral scan data. Three radiologists visually assessed the severity of artifacts using Likert scales on the original, Sup-SFTrans-corrected, and Semi-SFTrans-corrected images. Quantitative MAR evaluation was conducted by measuring Mean Hounsfield Unit (HU) values, standard deviations, and Signal-to-Noise Ratios (SNRs) across Regions of Interest (ROIs) such as the tongue, bilateral buccal, lips, and bilateral masseter muscles, using paired t-tests and Wilcoxon signed-rank tests. Further, teeth integrity in the corrected images was assessed by comparing teeth segmentation results from the corrected images against the ground-truth segmentation derived from registered intraoral scan data, using Dice Score and Hausdorff Distance. Sup-SFTrans outperformed LI, ResUNet and Semi-SFTrans on the simulated dataset. Visual assessments from the radiologists showed that average scores were (2.02 ± 0.91) for original CT, (4.46 ± 0.51) for Semi-SFTrans CT, and (3.64 ± 0.90) for Sup-SFTrans CT, with intra correlation coefficients (ICCs)>0.8 of all groups and p < 0.001 between groups. On soft tissue, both Semi-SFTrans and Sup-SFTrans significantly reduced metal artifacts in tongue (p < 0.001), lips, bilateral buccal regions, and masseter muscle areas (p < 0.05). Semi-SFTrans achieved superior metal artifact reduction than Sup-SFTrans in all ROIs (p < 0.001). SNR results indicated significant differences between Semi-SFTrans and Sup-SFTrans in tongue (p = 0.0391), bilateral buccal (p = 0.0067), lips (p = 0.0208), and bilateral masseter muscle areas (p = 0.0031). Notably, Semi-SFTrans demonstrated better teeth integrity preservation than Sup-SFTrans (Dice Score: p < 0.001; Hausdorff Distance: p = 0.0022). The semi-supervised MAR model, Semi-SFTrans, demonstrated superior metal artifact reduction performance over supervised counterparts in real dental CT images.

Conversion of Mixed-Language Free-Text CT Reports of Pancreatic Cancer to National Comprehensive Cancer Network Structured Reporting Templates by Using GPT-4.

Kim H, Kim B, Choi MH, Choi JI, Oh SN, Rha SE

pubmed logopapersJun 1 2025
To evaluate the feasibility of generative pre-trained transformer-4 (GPT-4) in generating structured reports (SRs) from mixed-language (English and Korean) narrative-style CT reports for pancreatic ductal adenocarcinoma (PDAC) and to assess its accuracy in categorizing PDCA resectability. This retrospective study included consecutive free-text reports of pancreas-protocol CT for staging PDAC, from two institutions, written in English or Korean from January 2021 to December 2023. Both the GPT-4 Turbo and GPT-4o models were provided prompts along with the free-text reports via an application programming interface and tasked with generating SRs and categorizing tumor resectability according to the National Comprehensive Cancer Network guidelines version 2.2024. Prompts were optimized using the GPT-4 Turbo model and 50 reports from Institution B. The performances of the GPT-4 Turbo and GPT-4o models in the two tasks were evaluated using 115 reports from Institution A. Results were compared with a reference standard that was manually derived by an abdominal radiologist. Each report was consecutively processed three times, with the most frequent response selected as the final output. Error analysis was guided by the decision rationale provided by the models. Of the 115 narrative reports tested, 96 (83.5%) contained both English and Korean. For SR generation, GPT-4 Turbo and GPT-4o demonstrated comparable accuracies (92.3% [1592/1725] and 92.2% [1590/1725], respectively; <i>P</i> = 0.923). In the resectability categorization, GPT-4 Turbo showed higher accuracy than GPT-4o (81.7% [94/115] vs. 67.0% [77/115], respectively; <i>P</i> = 0.002). In the error analysis of GPT-4 Turbo, the SR generation error rate was 7.7% (133/1725 items), which was primarily attributed to inaccurate data extraction (54.1% [72/133]). The resectability categorization error rate was 18.3% (21/115), with the main cause being violation of the resectability criteria (61.9% [13/21]). Both GPT-4 Turbo and GPT-4o demonstrated acceptable accuracy in generating NCCN-based SRs on PDACs from mixed-language narrative reports. However, oversight by human radiologists is essential for determining resectability based on CT findings.

Leveraging GPT-4 enables patient comprehension of radiology reports.

van Driel MHE, Blok N, van den Brand JAJG, van de Sande D, de Vries M, Eijlers B, Smits F, Visser JJ, Gommers D, Verhoef C, van Genderen ME, Grünhagen DJ, Hilling DE

pubmed logopapersJun 1 2025
To assess the feasibility of using GPT-4 to simplify radiology reports into B1-level Dutch for enhanced patient comprehension. This study utilised GPT-4, optimised through prompt engineering in Microsoft Azure. The researchers iteratively refined prompts to ensure accurate and comprehensive translations of radiology reports. Two radiologists assessed the simplified outputs for accuracy, completeness, and patient suitability. A third radiologist independently validated the final versions. Twelve colorectal cancer patients were recruited from two hospitals in the Netherlands. Semi-structured interviews were conducted to evaluate patients' comprehension and satisfaction with AI-generated reports. The optimised GPT-4 tool produced simplified reports with high accuracy (mean score 3.33/4). Patient comprehension improved significantly from 2.00 (original reports) to 3.28 (simplified reports) and 3.50 (summaries). Correct classification of report outcomes increased from 63.9% to 83.3%. Patient satisfaction was high (mean 8.30/10), with most preferring the long simplified report. RADiANT successfully enhances patient understanding and satisfaction through automated AI-driven report simplification, offering a scalable solution for patient-centred communication in clinical practice. This tool reduces clinician workload and supports informed patient decision-making, demonstrating the potential of LLMs beyond English-based healthcare contexts.

An explainable adaptive channel weighting-based deep convolutional neural network for classifying renal disorders in computed tomography images.

Loganathan G, Palanivelan M

pubmed logopapersJun 1 2025
Renal disorders are a significant public health concern and a cause of mortality related to renal failure. Manual diagnosis is subjective, labor-intensive, and depends on the expertise of nephrologists in renal anatomy. To improve workflow efficiency and enhance diagnosis accuracy, we propose an automated deep learning model, called EACWNet, which incorporates adaptive channel weighting-based deep convolutional neural network and explainable artificial intelligence. The proposed model categorizes renal computed tomography images into various classes, such as cyst, normal, tumor, and stone. The adaptive channel weighting module utilizes both global and local contextual insights to refine the final feature map channel weights through the integration of a scale-adaptive channel attention module in the higher convolutional blocks of the VGG-19 backbone model employed in the proposed method. The efficacy of the EACWNet model has been assessed using a publicly available renal CT images dataset, attaining an accuracy of 98.87% and demonstrating a 1.75% improvement over the backbone model. However, this model exhibits class-wise precision variation, achieving higher precision for cyst, normal, and tumor cases but lower precision for the stone class due to its inherent variability and heterogeneity. Furthermore, the model predictions have been subjected to additional analysis using the explainable artificial intelligence method such as local interpretable model-agnostic explanations, to visualize better and understand the model predictions.

Deep learning for liver lesion segmentation and classification on staging CT scans of colorectal cancer patients: a multi-site technical validation study.

Bashir U, Wang C, Smillie R, Rayabat Khan AK, Tamer Ahmed H, Ordidge K, Power N, Gerlinger M, Slabaugh G, Zhang Q

pubmed logopapersJun 1 2025
To validate a liver lesion detection and classification model using staging computed tomography (CT) scans of colorectal cancer (CRC) patients. A UNet-based deep learning model was trained on 272 public liver tumour CT scans and tested on 220 CRC staging CTs acquired from a single institution (2014-2019). Performance metrics included lesion detection rates by size (<10 mm, 10-20 mm, >20 mm), segmentation accuracy (dice similarity coefficient, DSC), volume measurement agreement (Bland-Altman limits of agreement, LOAs; intraclass correlation coefficient, ICC), and classification accuracy (malignant vs benign) at patient and lesion levels (detected lesions only). The model detected 743 out of 884 lesions (84%), with detection rates of 75%, 91.3%, and 96% for lesions <10 mm, 10-20 mm, and >20 mm, respectively. The median DSC was 0.76 (95% CI: 0.72-0.80) for lesions <10 mm, 0.83 (95% CI: 0.79-0.86) for 10-20 mm, and 0.85 (95% CI: 0.82-0.88) for >20 mm. Bland-Altman analysis showed a mean volume bias of -0.12 cm<sup>3</sup> (LOAs: -1.68 to +1.43 cm<sup>3</sup>), and ICC was 0.81. Lesion-level classification showed 99.5% sensitivity, 65.7% specificity, 53.8% positive predictive value (PPV), 99.7% negative predictive value (NPV), and 75.4% accuracy. Patient-level classification had 100% sensitivity, 27.1% specificity, 59.2% PPV, 100% NPV, and 64.5% accuracy. The model demonstrates strong lesion detection and segmentation performance, particularly for sub-centimetre lesions. Although classification accuracy was moderate, the 100% NPV suggests strong potential as a CRC staging screening tool. Future studies will assess its impact on radiologist performance and efficiency.

Large Language Models for Diagnosing Focal Liver Lesions From CT/MRI Reports: A Comparative Study With Radiologists.

Sheng L, Chen Y, Wei H, Che F, Wu Y, Qin Q, Yang C, Wang Y, Peng J, Bashir MR, Ronot M, Song B, Jiang H

pubmed logopapersJun 1 2025
Whether large language models (LLMs) could be integrated into the diagnostic workflow of focal liver lesions (FLLs) remains unclear. We aimed to investigate two generic LLMs (ChatGPT-4o and Gemini) regarding their diagnostic accuracies referring to the CT/MRI reports, compared to and combined with radiologists of different experience levels. From April 2022 to April 2024, this single-center retrospective study included consecutive adult patients who underwent contrast-enhanced CT/MRI for single FLL and subsequent histopathologic examination. The LLMs were prompted by clinical information and the "findings" section of radiology reports three times to provide differential diagnoses in the descending order of likelihood, with the first considered the final diagnosis. In the research setting, six radiologists (three junior and three middle-level) independently reviewed the CT/MRI images and clinical information in two rounds (first alone, then with LLM assistance). In the clinical setting, diagnoses were retrieved from the "impressions" section of radiology reports. Diagnostic accuracy was investigated against histopathology. 228 patients (median age, 59 years; 155 males) with 228 FLLs (median size, 3.6 cm) were included. Regarding the final diagnosis, the accuracy of two-step ChatGPT-4o (78.9%) was higher than single-step ChatGPT-4o (68.0%, p < 0.001) and single-step Gemini (73.2%, p = 0.004), similar to real-world radiology reports (80.0%, p = 0.34) and junior radiologists (78.9%-82.0%; p-values, 0.21 to > 0.99), but lower than middle-level radiologists (84.6%-85.5%; p-values, 0.001 to 0.02). No incremental diagnostic value of ChatGPT-4o was observed for any radiologist (p-values, 0.63 to > 0.99). Two-step ChatGPT-4o showed matching accuracies to real-world radiology reports and junior radiologists for diagnosing FLLs but was less accurate than middle-level radiologists and demonstrated little incremental diagnostic value.
Page 27 of 44431 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.