Sort by:
Page 25 of 99990 results

A Multimodal Large Language Model as an End-to-End Classifier of Thyroid Nodule Malignancy Risk: Usability Study.

Sng GGR, Xiang Y, Lim DYZ, Tung JYM, Tan JH, Chng CL

pubmed logopapersAug 19 2025
Thyroid nodules are common, with ultrasound imaging as the primary modality for their assessment. Risk stratification systems like the American College of Radiology Thyroid Imaging Reporting and Data System (ACR TI-RADS) have been developed but suffer from interobserver variability and low specificity. Artificial intelligence, particularly large language models (LLMs) with multimodal capabilities, presents opportunities for efficient end-to-end diagnostic processes. However, their clinical utility remains uncertain. This study evaluates the accuracy and consistency of multimodal LLMs for thyroid nodule risk stratification using the ACR TI-RADS system, examining the effects of model fine-tuning, image annotation, prompt engineering, and comparing open-source versus commercial models. In total, 3 multimodal vision-language models were evaluated: Microsoft's open-source Large Language and Visual Assistant (LLaVA) model, its medically fine-tuned variant (Large Language and Vision Assistant for bioMedicine [LLaVA-Med]), and OpenAI's commercial o3 model. A total of 192 thyroid nodules from publicly available ultrasound image datasets were assessed. Each model was evaluated using 2 prompts (basic and modified) and 2 image scenarios (unlabeled vs radiologist-annotated), yielding 6912 responses. Model outputs were compared with expert ratings for accuracy and consistency. Statistical comparisons included Chi-square tests, Mann-Whitney U tests, and Fleiss' kappa for interrater reliability. Overall, 88.4% (6110/6912) of responses were valid, with the o3 model producing the highest validity rate (2273/2304, 98.6%), followed by LLaVA (2108/2304, 91.5%) and LLaVA-Med (1729/2304, 75%; P<.001). The o3 model demonstrated the highest accuracy overall, achieving up to 57.3% accuracy in Thyroid Imaging Reporting and Data System (TI-RADS) classification, although still remaining suboptimal. Labeled images improved accuracy marginally in nodule margin assessment only when evaluating LLaVA models (407/768, 53% to 447/768, 58.2%; P=.04). Prompt engineering improved accuracy for composition (649/1,152, 56.3% vs 483/1152, 41.9%; P<.001), but significantly reduced accuracy for shape, margins, and overall classification. Consistency was the highest with the o3 model (up to 85.4%), but was comparable for LLaVA and significantly improved with image labeling and modified prompts across multiple TI-RADS categories (P<.001). Subgroup analysis for o3 alone showed prompt engineering did not affect accuracy significantly but markedly improved consistency across all TI-RADS categories (up to 97.1% for shape, P<.001). Interrater reliability was consistently poor across all combinations (Fleiss' kappa<0.60). The study demonstrates the comparative advantages and limitations of multimodal LLMs for thyroid nodule risk stratification. While the commercial model (o3) consistently outperformed open-source models in accuracy and consistency, even the best-performing model outputs remained suboptimal for direct clinical deployment. Prompt engineering significantly enhanced output consistency, particularly in the commercial model. These findings underline the importance of strategic model optimization techniques and highlight areas requiring further development before multimodal LLMs can be reliably used in clinical thyroid imaging workflows.

Automated surgical planning with nnU-Net: delineation of the anatomy in hepatobiliary phase MRI

Karin A. Olthof, Matteo Fusagli, Bianca Güttner, Tiziano Natali, Bram Westerink, Stefanie Speidel, Theo J. M. Ruers, Koert F. D. Kuhlmann, Andrey Zhylka

arxiv logopreprintAug 19 2025
Background: The aim of this study was to develop and evaluate a deep learning-based automated segmentation method for hepatic anatomy (i.e., parenchyma, tumors, portal vein, hepatic vein and biliary tree) from the hepatobiliary phase of gadoxetic acid-enhanced MRI. This method should ease the clinical workflow of preoperative planning. Methods: Manual segmentation was performed on hepatobiliary phase MRI scans from 90 consecutive patients who underwent liver surgery between January 2020 and October 2023. A deep learning network (nnU-Net v1) was trained on 72 patients with an extra focus on thin structures and topography preservation. Performance was evaluated on an 18-patient test set by comparing automated and manual segmentations using Dice similarity coefficient (DSC). Following clinical integration, 10 segmentations (assessment dataset) were generated using the network and manually refined for clinical use to quantify required adjustments using DSC. Results: In the test set, DSCs were 0.97+/-0.01 for liver parenchyma, 0.80+/-0.04 for hepatic vein, 0.79+/-0.07 for biliary tree, 0.77+/-0.17 for tumors, and 0.74+/-0.06 for portal vein. Average tumor detection rate was 76.6+/-24.1%, with a median of one false-positive per patient. The assessment dataset showed minor adjustments were required for clinical use of the 3D models, with high DSCs for parenchyma (1.00+/-0.00), portal vein (0.98+/-0.01) and hepatic vein (0.95+/-0.07). Tumor segmentation exhibited greater variability (DSC 0.80+/-0.27). During prospective clinical use, the model detected three additional tumors initially missed by radiologists. Conclusions: The proposed nnU-Net-based segmentation method enables accurate and automated delineation of hepatic anatomy. This enables 3D planning to be applied efficiently as a standard-of-care for every patient undergoing liver surgery.

State of Abdominal CT Datasets: A Critical Review of Bias, Clinical Relevance, and Real-world Applicability

Saeide Danaei, Zahra Dehghanian, Elahe Meftah, Nariman Naderi, Seyed Amir Ahmad Safavi-Naini, Faeze Khorasanizade, Hamid R. Rabiee

arxiv logopreprintAug 19 2025
This systematic review critically evaluates publicly available abdominal CT datasets and their suitability for artificial intelligence (AI) applications in clinical settings. We examined 46 publicly available abdominal CT datasets (50,256 studies). Across all 46 datasets, we found substantial redundancy (59.1\% case reuse) and a Western/geographic skew (75.3\% from North America and Europe). A bias assessment was performed on the 19 datasets with >=100 cases; within this subset, the most prevalent high-risk categories were domain shift (63\%) and selection bias (57\%), both of which may undermine model generalizability across diverse healthcare environments -- particularly in resource-limited settings. To address these challenges, we propose targeted strategies for dataset improvement, including multi-institutional collaboration, adoption of standardized protocols, and deliberate inclusion of diverse patient populations and imaging technologies. These efforts are crucial in supporting the development of more equitable and clinically robust AI models for abdominal imaging.

TME-guided deep learning predicts chemotherapy and immunotherapy response in gastric cancer with attention-enhanced residual Swin Transformer.

Sang S, Sun Z, Zheng W, Wang W, Islam MT, Chen Y, Yuan Q, Cheng C, Xi S, Han Z, Zhang T, Wu L, Li W, Xie J, Feng W, Chen Y, Xiong W, Yu J, Li G, Li Z, Jiang Y

pubmed logopapersAug 19 2025
Adjuvant chemotherapy and immune checkpoint blockade exert quite durable anti-tumor responses, but the lack of effective biomarkers limits the therapeutic benefits. Utilizing multi-cohorts of 3,095 patients with gastric cancer, we propose an attention-enhanced residual Swin Transformer network to predict chemotherapy response (main task), and two predicting subtasks (ImmunoScore and periostin [POSTN]) are used as intermediate tasks to improve the model's performance. Furthermore, we assess whether the model can identify which patients would benefit from immunotherapy. The deep learning model achieves high accuracy in predicting chemotherapy response and the tumor microenvironment (ImmunoScore and POSTN). We further find that the model can identify which patient may benefit from checkpoint blockade immunotherapy. This approach offers precise chemotherapy and immunotherapy response predictions, opening avenues for personalized treatment options. Prospective studies are warranted to validate its clinical utility.

Longitudinal CE-MRI-based Siamese network with machine learning to predict tumor response in HCC after DEB-TACE.

Wei N, Mathy RM, Chang DH, Mayer P, Liermann J, Springfeld C, Dill MT, Longerich T, Lurje G, Kauczor HU, Wielpütz MO, Öcal O

pubmed logopapersAug 19 2025
Accurate prediction of tumor response after drug-eluting beads transarterial chemoembolization (DEB-TACE) remains challenging in hepatocellular carcinoma (HCC), given tumor heterogeneity and dynamic changes over time. Existing prediction models based on single timepoint imaging do not capture dynamic treatment-induced changes. This study aims to develop and validate a predictive model that integrates deep learning and machine learning algorithms on longitudinal contrast-enhanced MRI (CE-MRI) to predict treatment response in HCC patients undergoing DEB-TACE. This retrospective study included 202 HCC patients treated with DEB-TACE from 2004 to 2023, divided into a training cohort (<i>n</i> = 141) and validation cohort (<i>n</i> = 61). Radiomics and deep learning features were extracted from standardized longitudinal CE-MRI to capture dynamic tumor changes. Feature selection involved correlation analysis, minimum redundancy maximum relevance, and least absolute shrinkage and selection operator regression. The patients were categorized into two groups: the objective response group (<i>n</i> = 123, 60.9%; complete response = 35, 28.5%; partial response = 88, 71.5%) and the non-response group (<i>n</i> = 79, 39.1%; stable disease = 62, 78.5%; progressive disease = 17, 21.5%). Predictive models were constructed using radiomics, deep learning, and integrated features. The area under the receiver operating characteristic curve (AUC) was used to evaluate the performance of the models. We retrospectively evaluated 202 patients (62.67 ± 9.25 years old) with HCC treated after DEB-TACE. A total of 7,182 radiomics features and 4,096 deep learning features were extracted from the longitudinal CE-MRI images. The integrated model was developed using 13 quantitative radiomics features and 4 deep learning features and demonstrated acceptable and robust performance with an receiver operating characteristic curve (AUC) of 0.941 (95%CI: 0.893–0.989) in the training cohort, and AUC of 0.925 (95%CI: 0.850–0.998) with accuracy of 86.9%, sensitivity of 83.7%, as well as specificity of 94.4% in the validation set. This study presents a predictive model based on longitudinal CE-MRI data to estimate tumor response to DEB-TACE in HCC patients. By capturing tumor dynamics and integrating radiomics features with deep learning features, the model has the potential to guide individualized treatment strategies and inform clinical decision-making regarding patient management. The online version contains supplementary material available at 10.1186/s40644-025-00926-5.

CT-based auto-segmentation of multiple target volumes for all-in-one radiotherapy in rectal cancer patients.

Li X, Wang L, Yang M, Li X, Zhao T, Wang M, Lu S, Ji Y, Zhang W, Jia L, Peng R, Wang J, Wang H

pubmed logopapersAug 19 2025
This study aimed to evaluate the clinical feasibility and performance of CT-based auto-segmentation models integrated into an All-in-One radiotherapy workflow for rectal cancer. This study included 312 rectal cancer patients, with 272 used to train three nnU-Net models for CTV45, CTV50, and GTV segmentation, and 40 for evaluation across one internal (<i>n</i> = 10), one clinical AIO (<i>n</i> = 10), and two external cohorts (<i>n</i> = 10 each). Segmentation accuracy (DSC, HD, HD95, ASSD, ASD) and time efficiency were assessed. In the internal testing set, mean DSC of CTV45, CTV50, and GTV were 0.90, 0.86, and 0.71; HD were 17.08, 25.48, and 79.59 mm; HD 95 were 4.89, 7.33, and 56.49 mm; ASSD were 1.23, 1.90, and 6.69 mm; and ASD were 1.24, 1.58, and 11.61 mm. Auto-segmentation reduced manual delineation time by 63.3–88.3% (<i>p</i> < 0.0001). In clinical practice, average DSC of CTV45, CTV50 and GTV were 0.93, 0.88, and 0.78; HD were 13.56, 23.84, and 35.38 mm; HD 95 were 3.33, 6.46, and 21.34 mm; ASSD were 0.78, 1.49, and 3.30 mm; and ASD were 0.74, 1.18, and 2.13 mm. The results from the multi-center testing also showed applicability of these models, since the average DSC of CTV45 and GTV were 0.84 and 0.80 respectively. The models demonstrated high accuracy and clinical utility, effectively streamlining target volume delineation and reducing manual workload in routine practice. The study protocol was approved by the Institutional Review Board of Peking University Third Hospital (Approval No. (2024) Medical Ethics Review No. 182-01).

Advanced liver fibrosis detection using a two-stage deep learning approach on standard T2-weighted MRI.

Gupta P, Singh S, Gulati A, Dutta N, Aggarwal Y, Kalra N, Premkumar M, Taneja S, Verma N, De A, Duseja A

pubmed logopapersAug 19 2025
To develop and validate a deep learning model for automated detection of advanced liver fibrosis using standard T2-weighted MRI. We utilized two datasets: the public CirrMRI600 + dataset (n = 374) containing T2-weighted MRI scans from patients with cirrhosis (n = 318) and healthy subjects (n = 56), and an in-house dataset of chronic liver disease patients (n = 187). A two-stage deep learning pipeline was developed: first, an automated liver segmentation model using nnU-Net architecture trained on CirrMRI600 + and then applied to segment livers in our in-house dataset; second, a Masked Attention ResNet classification model. For classification model training, patients with liver stiffness measurement (LSM) > 12 kPa were classified as advanced fibrosis (n = 104). In contrast, healthy subjects from CirrMRI600 + and patients with LSM ≤ 12 kPa were classified as non-advanced fibrosis (n = 116). Model validation was exclusively performed on a separate test set of 23 patients with histopathological confirmation of the degree of fibrosis (METAVIR ≥ F3 indicating advanced fibrosis). We additionally compared our two-stage approach with direct classification without segmentation, and evaluated alternative architectures including DenseNet121 and SwinTransformer. The liver segmentation model performed excellently on the test set (mean Dice score: 0.960 ± 0.009; IoU: 0.923 ± 0.016). On the pathologically confirmed independent test set (n = 23), our two-stage model achieved strong diagnostic performance (sensitivity: 0.778, specificity: 0.800, AUC: 0.811, accuracy: 0.783), significantly outperforming direct classification without segmentation (AUC: 0.743). Classification performance was highly dependent on segmentation quality, with cases having excellent segmentation (Score 1) showing higher accuracy (0.818) than those with poor segmentation (Score 3, accuracy: 0.625). Alternative architectures with masked attention showed comparable but slightly lower performance (DenseNet121: AUC 0.795; SwinTransformer: AUC 0.782). Our fully automated deep learning pipeline effectively detects advanced liver fibrosis using standard non-contrast T2-weighted MRI, potentially offering a non-invasive alternative to current diagnostic approaches. The segmentation-first approach provides significant performance gains over direct classification.

A systematic review of comparisons of AI and radiologists in the diagnosis of HCC in multiphase CT: implications for practice.

Younger J, Morris E, Arnold N, Athulathmudali C, Pinidiyapathirage J, MacAskill W

pubmed logopapersAug 18 2025
This systematic review aims to examine the literature of artificial intelligence (AI) algorithms in the diagnosis of hepatocellular carcinoma (HCC) among focal liver lesions compared to radiologists on multiphase CT images, focusing on performance metrics that include sensitivity and specificity as a minimum. We searched Embase, PubMed and Web of Science for studies published from January 2018 to May 2024. Eligible studies evaluated AI algorithms for diagnosing HCC using multiphase CT, with radiologist interpretation as a comparator. The performance of AI models and radiologists was recorded using sensitivity and specificity from each study. TRIPOD + AI was used for quality appraisal and PROBAST was used to assess the risk of bias. Seven studies out of the 3532 reviewed were included in the review. All seven studies analysed the performance of AI models and radiologists. Two studies additionally assessed performance with and without supplementary clinical information to assist the AI model in diagnosis. Three studies additionally evaluated the performance of radiologists with assistance of the AI algorithm in diagnosis. The AI algorithms demonstrated a sensitivity ranging from 63.0 to 98.6% and a specificity of 82.0-98.6%. In comparison, junior radiologists (with less than 10 years of experience) exhibited a sensitivity of 41.2-92.0% and a specificity of 72.2-100%, while senior radiologists (with more than 10 years of experience) achieved a sensitivity between 63.9% and 93.7% and a specificity ranging from 71.9 to 99.9%. AI algorithms demonstrate adequate performance in the diagnosis of HCC from focal liver lesions on multiphase CT images. Across geographic settings, AI could help streamline workflows and improve access to timely diagnosis. However, thoughtful implementation strategies are still needed to mitigate bias and overreliance.

Machine learning driven diagnostic pathway for clinically significant prostate cancer: the role of micro-ultrasound.

Saitta C, Buffi N, Avolio P, Beatrici E, Paciotti M, Lazzeri M, Fasulo V, Cella L, Garofano G, Piccolini A, Contieri R, Nazzani S, Silvani C, Catanzaro M, Nicolai N, Hurle R, Casale P, Saita A, Lughezzani G

pubmed logopapersAug 18 2025
Detecting clinically significant prostate cancer (csPCa) remains a top priority in delivering high-quality care, yet consensus on an optimal diagnostic pathway is constantly evolving. In this study, we present an innovative diagnostic approach, leveraging a machine learning model tailored to the emerging role of prostate micro-ultrasound (micro-US) in the setting of csPCa diagnosis. We queried our prospective database for patients who underwent Micro-US for a clinical suspicious of prostate cancer. CsPCa was defined as any Gleason group grade > 1. Primary outcome was the development of a diagnostic pathway which implements clinical and radiological findings using machine learning algorithm. The dataset was divided into training (70%) and testing subsets. Boruta algorithms was used for variable selection, then based on the importance coefficients multivariable logistic regression model (MLR) was fitted to predict csPCA. Classification and Regression Tree (CART) model was fitted to create the decision tree. Accuracy of the model was tested using receiver characteristic curve (ROC) analysis using estimated area under the curve (AUC). Overall, 1422 patients were analysed. Multivariable LR revealed PRI-MUS score ≥ 3 (OR 4.37, p < 0.001), PI-RADS score ≥ 3 (OR 2.01, p < 0.001), PSA density ≥ 0.15 (OR 2.44, p < 0.001), DRE (OR 1.93, p < 0.001), anterior lesions (OR 1.49, p = 0.004), prostate cancer familiarity (OR 1.54, p = 0.005) and increasing age (OR 1.031, p < 0.001) as the best predictors for csPCa, demonstrating an AUC in the validation cohort of 83%, 78% sensitivity, 72.1% specificity and 81% negative predictive value. CART analysis revealed elevated PRIMUS score as the main node to stratify our cohort. By integrating clinical features, serum biomarkers, and imaging findings, we have developed a point of care model that accurately predicts the presence of csPCa. Our findings support a paradigm shift towards adopting MicroUS as a first level diagnostic tool for csPCa detection, potentially optimizing clinical decision making. This approach could improve the identification of patients at higher risk for csPca and guide the selection of the most appropriate diagnostic exams. External validation is essential to confirm these results.

MCBL-UNet: A Hybrid Mamba-CNN Boundary Enhanced Light-weight UNet for Placenta Ultrasound Image Segmentation.

Jiang C, Zhu C, Guo H, Tan G, Liu C, Li K

pubmed logopapersAug 18 2025
The shape and size of the placenta are closely related to fetal development in the second and third trimesters of pregnancy. Accurately segmenting the placental contour in ultrasound images is a challenge because it is limited by image noise, fuzzy boundaries, and tight clinical resources. To address these issues, we propose MCBL-UNet, a novel lightweight segmentation framework that combines the long-range modeling capabilities of Mamba and the local feature extraction advantages of convolutional neural networks (CNNs) to achieve efficient segmentation through multi-information fusion. Based on a compact 6-layer U-Net architecture, MCBL-UNet introduces several key modules: a boundary enhancement module (BEM) to extract fine-grained edge and texture features; a multi-dimensional global context module (MGCM) to capture global semantics and edge information in the deep stages of the encoder and decoder; and a parallel channel spatial attention module (PCSAM) to suppress redundant information in skip connections while enhancing spatial and channel correlations. To further improve feature reconstruction and edge preservation capabilities, we introduce an attention downsampling module (ADM) and a content-aware upsampling module (CUM). MCBL-UNet has achieved excellent segmentation performance on multiple medical ultrasound datasets (placenta, gestational sac, thyroid nodules). Using only 1.31M parameters and 1.26G FLOPs, the model outperforms 13 existing mainstream methods in key indicators such as Dice coefficient and mIoU, showing a perfect balance between high accuracy and low computational cost. This model is not only suitable for resource-constrained clinical environments, but also provides a new idea for introducing the Mamba structure into medical image segmentation.
Page 25 of 99990 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.