Sort by:
Page 64 of 2342333 results

Exposing and Mitigating Calibration Biases and Demographic Unfairness in MLLM Few-Shot In-Context Learning for Medical Image Classification

Xing Shen, Justin Szeto, Mingyang Li, Hengguan Huang, Tal Arbel

arxiv logopreprintJun 29 2025
Multimodal large language models (MLLMs) have enormous potential to perform few-shot in-context learning in the context of medical image analysis. However, safe deployment of these models into real-world clinical practice requires an in-depth analysis of the accuracies of their predictions, and their associated calibration errors, particularly across different demographic subgroups. In this work, we present the first investigation into the calibration biases and demographic unfairness of MLLMs' predictions and confidence scores in few-shot in-context learning for medical image classification. We introduce CALIN, an inference-time calibration method designed to mitigate the associated biases. Specifically, CALIN estimates the amount of calibration needed, represented by calibration matrices, using a bi-level procedure: progressing from the population level to the subgroup level prior to inference. It then applies this estimation to calibrate the predicted confidence scores during inference. Experimental results on three medical imaging datasets: PAPILA for fundus image classification, HAM10000 for skin cancer classification, and MIMIC-CXR for chest X-ray classification demonstrate CALIN's effectiveness at ensuring fair confidence calibration in its prediction, while improving its overall prediction accuracies and exhibiting minimum fairness-utility trade-off.

Frequency-enhanced Multi-granularity Context Network for Efficient Vertebrae Segmentation

Jian Shi, Tianqi You, Pingping Zhang, Hongli Zhang, Rui Xu, Haojie Li

arxiv logopreprintJun 29 2025
Automated and accurate segmentation of individual vertebra in 3D CT and MRI images is essential for various clinical applications. Due to the limitations of current imaging techniques and the complexity of spinal structures, existing methods still struggle with reducing the impact of image blurring and distinguishing similar vertebrae. To alleviate these issues, we introduce a Frequency-enhanced Multi-granularity Context Network (FMC-Net) to improve the accuracy of vertebrae segmentation. Specifically, we first apply wavelet transform for lossless downsampling to reduce the feature distortion in blurred images. The decomposed high and low-frequency components are then processed separately. For the high-frequency components, we apply a High-frequency Feature Refinement (HFR) to amplify the prominence of key features and filter out noises, restoring fine-grained details in blurred images. For the low-frequency components, we use a Multi-granularity State Space Model (MG-SSM) to aggregate feature representations with different receptive fields, extracting spatially-varying contexts while capturing long-range dependencies with linear complexity. The utilization of multi-granularity contexts is essential for distinguishing similar vertebrae and improving segmentation accuracy. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches on both CT and MRI vertebrae segmentation datasets. The source code is publicly available at https://github.com/anaanaa/FMCNet.

Identifying visible tissue in intraoperative ultrasound: a method and application.

Weld A, Dixon L, Dyck M, Anichini G, Ranne A, Camp S, Giannarou S

pubmed logopapersJun 28 2025
Intraoperative ultrasound scanning is a demanding visuotactile task. It requires operators to simultaneously localise the ultrasound perspective and manually perform slight adjustments to the pose of the probe, making sure not to apply excessive force or breaking contact with the tissue, while also characterising the visible tissue. To analyse the probe-tissue contact, an iterative filtering and topological method is proposed to identify the underlying visible tissue, which can be used to detect acoustic shadow and construct confidence maps of perceptual salience. For evaluation, datasets containing both in vivo and medical phantom data are created. A suite of evaluations is performed, including an evaluation of acoustic shadow classification. Compared to an ablation, deep learning, and statistical method, the proposed approach achieves superior classification on in vivo data, achieving an <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>F</mi> <mi>β</mi></msub> </math> score of 0.864, in comparison with 0.838, 0.808, and 0.808. A novel framework for evaluating the confidence estimation of probe-tissue contact is created. The phantom data are captured specifically for this, and comparison is made against two established methods. The proposed method produced the superior response, achieving an average normalised root-mean-square error of 0.168, in comparison with 1.836 and 4.542. Evaluation is also extended to determine the algorithm's robustness to parameter perturbation, speckle noise, data distribution shift, and capability for guiding a robotic scan. The results of this comprehensive set of experiments justify the potential clinical value of the proposed algorithm, which can be used to support clinical training and robotic ultrasound automation.

Developing ultrasound-based machine learning models for accurate differentiation between sclerosing adenosis and invasive ductal carcinoma.

Liu G, Yang N, Qu Y, Chen G, Wen G, Li G, Deng L, Mai Y

pubmed logopapersJun 28 2025
This study aimed to develop a machine learning model using breast ultrasound images to improve the non-invasive differential diagnosis between Sclerosing Adenosis (SA) and Invasive Ductal Carcinoma (IDC). 2046 ultrasound images from 772 SA and IDC patients were collected, Regions of Interest (ROI) were delineated, and features were extracted. The dataset was split into training and test cohorts, and feature selection was performed by correlation coefficients and Recursive Feature Elimination. 10 classifiers with Grid Search and 5-fold cross-validation were applied during model training. Receiver Operating Characteristic (ROC) curve and Youden index were used to model evaluation. SHapley Additive exPlanations (SHAP) was employed for model interpretation. Another 224 ROIs of 84 patients from other hospitals were used for external validation. For the ROI-level model, XGBoost with 18 features achieved an area under the curve (AUC) of 0.9758 (0.9654-0.9847) in the test cohort and 0.9906 (0.9805-0.9973) in the validation cohort. For the patient-level model, logistic regression with 9 features achieved an AUC of 0.9653 (0.9402-0.9859) in the test cohort and 0.9846 (0.9615-0.9978) in the validation cohort. The feature "Original shape Major Axis Length" was identified as the most important, with its value positively correlated with a higher likelihood of the sample being IDC. Feature contributions for specific ROIs were visualized as well. We developed explainable, ultrasound-based machine learning models with high performance for differentiating SA and IDC, offering a potential non-invasive tool for improved differential diagnosis. Question Accurately distinguishing between sclerosing adenosis (SA) and invasive ductal carcinoma (IDC) in a non-invasive manner has been a diagnostic challenge. Findings Explainable, ultrasound-based machine learning models with high performance were developed for differentiating SA and IDC, and validated well in external validation cohort. Critical relevance These models provide non-invasive tools to reduce misdiagnoses of SA and improve early detection for IDC.

Comparative analysis of iterative vs AI-based reconstruction algorithms in CT imaging for total body assessment: Objective and subjective clinical analysis.

Tucciariello RM, Botte M, Calice G, Cammarota A, Cammarota F, Capasso M, Nardo GD, Lancellotti MI, Palmese VP, Sarno A, Villonio A, Bianculli A

pubmed logopapersJun 28 2025
This study evaluates the performance of Iterative and AI-based Reconstruction algorithms in CT imaging for brain, chest, and upper abdomen assessments. Using a 320-slice CT scanner, phantom images were analysed through quantitative metrics such as Noise, Contrast-to-Noise-Ratio and Target Transfer Function. Additionally, five radiologists performed subjective evaluations on real patient images by scoring clinical parameters related to anatomical structures across the three body sites. The study aimed to relate results obtained with the typical approach related to parameters involved in medical physics using a Catphan physical phantom, with the evaluations assigned by the radiologists to the clinical parameters chosen in this study, and to determine whether the physical approach alone can ensure the implementation of new procedures and the optimization in clinical practice. AI-based algorithms demonstrated superior performance in chest and abdominal imaging, enhancing parenchymal and vascular detail with notable reductions in noise. However, their performance in brain imaging was less effective, as the aggressive noise reduction led to excessive smoothing, which affected diagnostic interpretability. Iterative reconstruction methods provided balanced results for brain imaging, preserving structural details and maintaining diagnostic clarity. The findings emphasize the need for region-specific optimization of reconstruction protocols. While AI-based methods can complement traditional IR techniques, they should not be assumed to inherently improve outcomes. A critical and cautious introduction of AI-based techniques is essential, ensuring radiologists adapt effectively without compromising diagnostic accuracy.

Emerging Artificial Intelligence Innovations in Rheumatoid Arthritis and Challenges to Clinical Adoption.

Gilvaz VJ, Sudheer A, Reginato AM

pubmed logopapersJun 28 2025
This review was written to inform practicing clinical rheumatologists about recent advances in artificial intelligence (AI) based research in rheumatoid arthritis (RA), using accessible and practical language. We highlight developments from 2023 to early 2025 across diagnostic imaging, treatment prediction, drug discovery, and patient-facing tools. Given the increasing clinical interest in AI and its potential to augment care delivery, this article aims to bridge the gap between technical innovation and real-world rheumatology practice. Several AI models have demonstrated high accuracy in early RA detection using imaging modalities such as thermal imaging and nuclear scans. Predictive models for treatment response have leveraged routinely collected electronic health record (EHR) data, moving closer to practical application in clinical workflows. Patient-facing tools like mobile symptom checkers and large language models (LLMs) such as ChatGPT show promise in enhancing education and engagement, although accuracy and safety remain variable. AI has also shown utility in identifying novel biomarkers and accelerating drug discovery. Despite these advances, as of early 2025, no AI-based tools have received FDA approval for use in rheumatology, in contrast to other specialties. Artificial intelligence holds tremendous promise to enhance clinical care in RA-from early diagnosis to personalized therapy. However, clinical adoption remains limited due to regulatory, technical, and implementation challenges. A streamlined regulatory framework and closer collaboration between clinicians, researchers, and industry partners are urgently needed. With thoughtful integration, AI can serve as a valuable adjunct in addressing clinical complexity and workforce shortages in rheumatology.

Automated Evaluation of Female Pelvic Organ Descent on Transperineal Ultrasound: Model Development and Validation.

Wu S, Wu J, Xu Y, Tan J, Wang R, Zhang X

pubmed logopapersJun 28 2025
Transperineal ultrasound (TPUS) is a widely used tool for evaluating female pelvic organ prolapse (POP), but its accurate interpretation relies on experience, causing diagnostic variability. This study aims to develop and validate a multi-task deep learning model to automate POP assessment using TPUS images. TPUS images from 1340 female patients (January-June 2023) were evaluated by two experienced physicians. The presence and severity of cystocele, uterine prolapse, rectocele, and excessive mobility of perineal body (EMoPB) were documented. After preprocessing, 1072 images were used for training and 268 for validation. The model used ResNet34 as the feature extractor and four parallel fully connected layers to predict the conditions. Model performance was assessed using confusion matrix and area under the curve (AUC). Gradient-weighted class activation mapping (Grad-CAM) visualized the model's focus areas. The model demonstrated strong diagnostic performance, with accuracies and AUC values as follows: cystocele, 0.869 (95% CI, 0.824-0.905) and 0.947 (95% CI, 0.930-0.962); uterine prolapse, 0.799 (95% CI, 0.746-0.842) and 0.931 (95% CI, 0.911-0.948); rectocele, 0.978 (95% CI, 0.952-0.990) and 0.892 (95% CI, 0.849-0.927); and EMoPB, 0.869 (95% CI, 0.824-0.905) and 0.942 (95% CI, 0.907-0.967). Grad-CAM heatmaps revealed that the model's focus areas were consistent with those observed by human experts. This study presents a multi-task deep learning model for automated POP assessment using TPUS images, showing promising efficacy and potential to benefit a broader population of women.

Revealing the Infiltration: Prognostic Value of Automated Segmentation of Non-Contrast-Enhancing Tumor in Glioblastoma

Gomez-Mahiques, M., Lopez-Mateu, C., Gil-Terron, F. J., Montosa-i-Mico, V., Svensson, S. F., Mendoza Mireles, E. E., Vik-Mo, E. O., Emblem, K., Balana, C., Puig, J., Garcia-Gomez, J. M., Fuster-Garcia, E.

medrxiv logopreprintJun 28 2025
BackgroundPrecise delineation of non-contrast-enhancing tumor (nCET) in glioblastoma (GB) is critical for maximal safe resection, yet routine imaging cannot reliably separate infiltrative tumor from vasogenic edema. The aim of this study was to develop and validate an automated method to identify nCET and assess its prognostic value. MethodsPre-operative T2-weighted and FLAIR MRI from 940 patients with newly diagnosed GB in four multicenter cohorts were analyzed. A deep-learning model segmented enhancing tumor, edema and necrosis; a non-local spatially varying finite mixture model then isolated edema subregions containing nCET. The ratio of nCET to total edema volume--the Diffuse Infiltration Index (DII)--was calculated. Associations between DII and overall survival (OS) were examined with Kaplan-Meier curves and multivariable Cox regression. ResultsThe algorithm distinguished nCET from vasogenic edema in 97.5 % of patients, showing a mean signal-intensity gap > 5 %. Higher DII is able to stratify patients with shorter OS. In the NCT03439332 cohort, DII above the optimal threshold doubled the hazard of death (hazard ratio 2.09, 95 % confidence interval 1.34-3.25; p = 0.0012) and reduced median survival by 122 days. Significant, though smaller, effects were confirmed in GLIOCAT & BraTS (hazard ratio 1.31; p = 0.022), OUS (hazard ratio 1.28; p = 0.007) and in pooled analysis (hazard ratio 1.28; p = 0.0003). DII remained an independent predictor after adjustment for age, extent of resection and MGMT methylation. ConclusionsWe present a reproducible, server-hosted tool for automated nCET delineation and DII biomarker extraction that enables robust, independent prognostic stratification. It promises to guide supramaximal surgical planning and personalized neuro-oncology research and care. Key Points- KP1: Robust automated MRI tool segments non-contrast-enhancing (nCET) glioblastoma. - KP2: Introduced and validated the Diffuse Infiltration Index with prognostic value. - KP3: nCET mapping enables RANO supramaximal resection for personalized surgery. Importance of the StudyThis study underscores the clinical importance of accurately delineating non-contrast-enhancing tumor (nCET) regions in glioblastoma (GB) using standard MRI. Despite their lack of contrast enhancement, nCET areas often harbor infiltrative tumor cells critical for disease progression and recurrence. By integrating deep learning segmentation with a non-local finite mixture model, we developed a reproducible, automated methodology for nCET delineation and introduced the Diffuse Infiltration Index (DII), a novel imaging biomarker. Higher DII values were independently associated with reduced overall survival across large, heterogeneous cohorts. These findings highlight the prognostic relevance of imaging-defined infiltration patterns and support the use of nCET segmentation in clinical decision-making. Importantly, this methodology aligns with and operationalizes recent RANO criteria on supramaximal resection, offering a practical, image-based tool to improve surgical planning. In doing so, our work advances efforts toward more personalized neuro-oncological care, potentially improving outcomes while minimizing functional compromise.

Inpainting is All You Need: A Diffusion-based Augmentation Method for Semi-supervised Medical Image Segmentation

Xinrong Hu, Yiyu Shi

arxiv logopreprintJun 28 2025
Collecting pixel-level labels for medical datasets can be a laborious and expensive process, and enhancing segmentation performance with a scarcity of labeled data is a crucial challenge. This work introduces AugPaint, a data augmentation framework that utilizes inpainting to generate image-label pairs from limited labeled data. AugPaint leverages latent diffusion models, known for their ability to generate high-quality in-domain images with low overhead, and adapts the sampling process for the inpainting task without need for retraining. Specifically, given a pair of image and label mask, we crop the area labeled with the foreground and condition on it during reversed denoising process for every noise level. Masked background area would gradually be filled in, and all generated images are paired with the label mask. This approach ensures the accuracy of match between synthetic images and label masks, setting it apart from existing dataset generation methods. The generated images serve as valuable supervision for training downstream segmentation models, effectively addressing the challenge of limited annotations. We conducted extensive evaluations of our data augmentation method on four public medical image segmentation datasets, including CT, MRI, and skin imaging. Results across all datasets demonstrate that AugPaint outperforms state-of-the-art label-efficient methodologies, significantly improving segmentation performance.

AI-Derived Splenic Response in Cardiac PET Predicts Mortality: A Multi-Site Study

Dharmavaram, N., Ramirez, G., Shanbhag, A., Miller, R. J. H., Kavanagh, P., Yi, J., Lemley, M., Builoff, V., Marcinkiewicz, A. M., Dey, D., Hainer, J., Wopperer, S., Knight, S., Le, V. T., Mason, S., Alexanderson, E., Carvajal-Juarez, I., Packard, R. R. S., Rosamond, T. L., Al-Mallah, M. H., Slipczuk, L., Travin, M., Acampa, W., Einstein, A., Chareonthaitawee, P., Berman, D., Di Carli, M., Slomka, P.

medrxiv logopreprintJun 28 2025
BackgroundInadequate pharmacologic stress may limit the diagnostic and prognostic accuracy of myocardial perfusion imaging (MPI). The splenic ratio (SR), a measure of stress adequacy, has emerged as a potential imaging biomarker. ObjectivesTo evaluate the prognostic value of artificial intelligence (AI)-derived SR in a large multicenter 82Rb-PET cohort undergoing regadenoson stress testing. MethodsWe retrospectively analyzed 10,913 patients from three sites in the REFINE PET registry with clinically indicated MPI and linked clinical outcomes. SR was calculated using fully automated algorithms as the ratio of splenic uptake at stress versus rest. Patients were stratified by SR into high ([&ge;]90th percentile) and low (<90th percentile) groups. The primary outcome was major adverse cardiovascular events (MACE). Survival analysis was conducted using Kaplan-Meier and Cox proportional hazards models adjusted for clinical and imaging covariates, including myocardial flow reserve (MFR [&ge;]2 vs. <2). ResultsThe cohort had a median age of 68 years, with 57% male patients. Common risk factors included hypertension (84%), dyslipidemia (76%), diabetes (33%), and prior coronary artery disease (31%). Median follow-up was 4.6 years. Patients with high SR (n=1,091) had an increased risk of MACE (HR 1.18, 95% CI 1.06-1.31, p=0.002). Among patients with preserved MFR ([&ge;]2; n=7,310), high SR remained independently associated with MACE (HR 1.44, 95% CI 1.24-1.67, p<0.0001). ConclusionsElevated AI-derived SR was independently associated with adverse cardiovascular outcomes, including among patients with preserved MFR. These findings support SR as a novel, automated imaging biomarker for risk stratification in 82Rb PET MPI. Condensed AbstractAI-derived splenic ratio (SR), a marker of pharmacologic stress adequacy, was independently associated with increased cardiovascular risk in a large 82Rb PET cohort, even among patients with preserved myocardial flow reserve (MFR). High SR identified individuals with elevated MACE risk despite normal perfusion and flow findings, suggesting unrecognized physiologic vulnerability. Incorporating automated SR into PET MPI interpretation may enhance risk stratification and identify patients who could benefit from intensified preventive care, particularly when traditional imaging markers appear reassuring. These findings support SR as a clinically meaningful, easily integrated biomarker in stress PET imaging.
Page 64 of 2342333 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.