Sort by:
Page 10 of 19183 results

Understanding Dataset Bias in Medical Imaging: A Case Study on Chest X-rays

Ethan Dack, Chengliang Dai

arxiv logopreprintJul 10 2025
Recent work has revisited the infamous task Name that dataset and established that in non-medical datasets, there is an underlying bias and achieved high Accuracies on the dataset origin task. In this work, we revisit the same task applied to popular open-source chest X-ray datasets. Medical images are naturally more difficult to release for open-source due to their sensitive nature, which has led to certain open-source datasets being extremely popular for research purposes. By performing the same task, we wish to explore whether dataset bias also exists in these datasets. % We deliberately try to increase the difficulty of the task by dataset transformations. We apply simple transformations of the datasets to try to identify bias. Given the importance of AI applications in medical imaging, it's vital to establish whether modern methods are taking shortcuts or are focused on the relevant pathology. We implement a range of different network architectures on the datasets: NIH, CheXpert, MIMIC-CXR and PadChest. We hope this work will encourage more explainable research being performed in medical imaging and the creation of more open-source datasets in the medical domain. The corresponding code will be released upon acceptance.

MRI-based interpretable clinicoradiological and radiomics machine learning model for preoperative prediction of pituitary macroadenomas consistency: a dual-center study.

Liang M, Wang F, Yang Y, Wen L, Wang S, Zhang D

pubmed logopapersJul 9 2025
To establish an interpretable and non-invasive machine learning (ML) model using clinicoradiological predictors and magnetic resonance imaging (MRI) radiomics features to predict the consistency of pituitary macroadenomas (PMAs) preoperatively. Total 350 patients with PMA (272 from Xinqiao Hospital of Army Medical University and 78 from Daping Hospital of Army Medical University) were stratified and randomly divided into training and test cohorts in a 7:3 ratio. The tumor consistency was classified as soft or firm. Clinicoradiological predictors were examined utilizing univariate and multivariate regression analyses. Radiomics features were selected employing the minimum redundancy maximum relevance (mRMR) and least absolute shrinkage and selection operator (LASSO) algorithms. Logistic regression (LR) and random forest (RF) classifiers were applied to construct the models. Receiver operating characteristic (ROC) curves and decision curve analyses (DCA) were performed to compare and validate the predictive capacities of the models. A comparative study of the area under the curve (AUC), accuracy (ACC), sensitivity (SEN), and specificity (SPE) was performed. The Shapley additive explanation (SHAP) was applied to investigate the optimal model's interpretability. The combined model predicted the PMAs' consistency more effectively than the clinicoradiological and radiomics models. Specifically, the LR-combined model displayed optimal prediction performance (test cohort: AUC = 0.913; ACC = 0.840). The SHAP-based explanation of the LR-combined model suggests that the wavelet-transformed and Laplacian of Gaussian (LoG) filter features extracted from T<sub>2</sub>WI and CE-T<sub>1</sub>WI occupy a dominant position. Meanwhile, the skewness of the original first-order features extracted from T<sub>2</sub>WI (T<sub>2</sub>WI_original_first-order_Skewness) demonstrated the most substantial contribution. An interpretable machine learning model incorporating clinicoradiological predictors and multiparametric MRI (mpMRI)-based radiomics features may predict PMAs consistency, enabling tailored and precise therapies for patients with PMA.

Impact of polymer source variations on hydrogel structure and product performance in dexamethasone-loaded ophthalmic inserts.

VandenBerg MA, Zaman RU, Plavchak CL, Smith WC, Nejad HB, Beringhs AO, Wang Y, Xu X

pubmed logopapersJul 9 2025
Localized drug delivery can enhance therapeutic efficacy while minimizing systemic side effects, making sustained-release ophthalmic inserts an attractive alternative to traditional eye drops. Such inserts offer improved patient compliance through prolonged therapeutic effects and a reduced need for frequent administration. This study focuses on dexamethasone-containing ophthalmic inserts. These inserts utilize a key excipient, polyethylene glycol (PEG), which forms a hydrogel upon contact with tear fluid. Developing generic equivalents of PEG-based inserts is challenging due to difficulties in characterizing inactive ingredients and the absence of standardized physicochemical characterization methods to demonstrate similarity. To address this gap, a suite of analytical approaches was applied to both PEG precursor materials sourced from different vendors and manufactured inserts. <sup>1</sup>H NMR, FTIR, MALDI, and SEC revealed variations in end-group functionalization, impurity content, and molecular weight distribution of the excipient. These differences led to changes in the finished insert network properties such as porosity, pore size and structure, gel mechanical strength, and crystallinity, which were corroborated by X-ray microscopy, AI-based image analysis, thermal, mechanical, and density measurements. In vitro release testing revealed distinct drug release profiles across formulations, with swelling rate correlated to release rate (i.e., faster release with rapid swelling). The use of non-micronized and micronized dexamethasone also contributed to release profile differences. Through comprehensive characterization of these PEG-based dexamethasone inserts, correlations between polymer quality, hydrogel microstructure, and release kinetics were established. The study highlights how excipient differences can alter product performance, emphasizing the importance of thorough analysis in developing generic equivalents of complex drug products.

An autonomous agent for auditing and improving the reliability of clinical AI models

Lukas Kuhn, Florian Buettner

arxiv logopreprintJul 8 2025
The deployment of AI models in clinical practice faces a critical challenge: models achieving expert-level performance on benchmarks can fail catastrophically when confronted with real-world variations in medical imaging. Minor shifts in scanner hardware, lighting or demographics can erode accuracy, but currently reliability auditing to identify such catastrophic failure cases before deployment is a bespoke and time-consuming process. Practitioners lack accessible and interpretable tools to expose and repair hidden failure modes. Here we introduce ModelAuditor, a self-reflective agent that converses with users, selects task-specific metrics, and simulates context-dependent, clinically relevant distribution shifts. ModelAuditor then generates interpretable reports explaining how much performance likely degrades during deployment, discussing specific likely failure modes and identifying root causes and mitigation strategies. Our comprehensive evaluation across three real-world clinical scenarios - inter-institutional variation in histopathology, demographic shifts in dermatology, and equipment heterogeneity in chest radiography - demonstrates that ModelAuditor is able correctly identify context-specific failure modes of state-of-the-art models such as the established SIIM-ISIC melanoma classifier. Its targeted recommendations recover 15-25% of performance lost under real-world distribution shift, substantially outperforming both baseline models and state-of-the-art augmentation methods. These improvements are achieved through a multi-agent architecture and execute on consumer hardware in under 10 minutes, costing less than US$0.50 per audit.

Deep supervised transformer-based noise-aware network for low-dose PET denoising across varying count levels.

Azimi MS, Felfelian V, Zeraatkar N, Dadgar H, Arabi H, Zaidi H

pubmed logopapersJul 8 2025
Reducing radiation dose from PET imaging is essential to minimize cancer risks; however, it often leads to increased noise and degraded image quality, compromising diagnostic reliability. Recent advances in deep learning have shown promising results in addressing these limitations through effective denoising. However, existing networks trained on specific noise levels often fail to generalize across diverse acquisition conditions. Moreover, training multiple models for different noise levels is impractical due to data and computational constraints. This study aimed to develop a supervised Swin Transformer-based unified noise-aware (ST-UNN) network that handles diverse noise levels and reconstructs high-quality images in low-dose PET imaging. We present a Swin Transformer-based Noise-Aware Network (ST-UNN), which incorporates multiple sub-networks, each designed to address specific noise levels ranging from 1 % to 10 %. An adaptive weighting mechanism dynamically integrates the outputs of these sub-networks to achieve effective denoising. The model was trained and evaluated using PET/CT dataset encompassing the entire head and malignant lesions in the head and neck region. Performance was assessed using a combination of structural and statistical metrics, including the Structural Similarity Index (SSIM), Peak Signal-to-Noise Ratio (PSNR), Standardized Uptake Value (SUV) mean bias, SUV<sub>max</sub> bias, and Root Mean Square Error (RMSE). This comprehensive evaluation ensured reliable results for both global and localized regions within PET images. The ST-UNN consistently outperformed conventional networks, particularly in ultra-low-dose scenarios. At 1 % count level, it achieved a PSNR of 34.77, RMSE of 0.05, and SSIM of 0.97, notably surpassing the baseline networks. It also achieved the lowest SUV<sub>mean</sub> bias (0.08) and RMSE lesion (0.12) at this level. Across all count levels, ST-UNN maintained high performance and low error, demonstrating strong generalization and diagnostic integrity. ST-UNN offers a scalable, transformer-based solution for low-dose PET imaging. By dynamically integrating sub-networks, it effectively addresses noise variability and provides superior image quality, thereby advancing the capabilities of low-dose and dynamic PET imaging.

Development and retrospective validation of an artificial intelligence system for diagnostic assessment of prostate biopsies: study protocol.

Mulliqi N, Blilie A, Ji X, Szolnoky K, Olsson H, Titus M, Martinez Gonzalez G, Boman SE, Valkonen M, Gudlaugsson E, Kjosavik SR, Asenjo J, Gambacorta M, Libretti P, Braun M, Kordek R, Łowicki R, Hotakainen K, Väre P, Pedersen BG, Sørensen KD, Ulhøi BP, Rantalainen M, Ruusuvuori P, Delahunt B, Samaratunga H, Tsuzuki T, Janssen EAM, Egevad L, Kartasalo K, Eklund M

pubmed logopapersJul 7 2025
Histopathological evaluation of prostate biopsies using the Gleason scoring system is critical for prostate cancer diagnosis and treatment selection. However, grading variability among pathologists can lead to inconsistent assessments, risking inappropriate treatment. Similar challenges complicate the assessment of other prognostic features like cribriform cancer morphology and perineural invasion. Many pathology departments are also facing an increasingly unsustainable workload due to rising prostate cancer incidence and a decreasing pathologist workforce coinciding with increasing requirements for more complex assessments and reporting. Digital pathology and artificial intelligence (AI) algorithms for analysing whole slide images show promise in improving the accuracy and efficiency of histopathological assessments. Studies have demonstrated AI's capability to diagnose and grade prostate cancer comparably to expert pathologists. However, external validations on diverse data sets have been limited and often show reduced performance. Historically, there have been no well-established guidelines for AI study designs and validation methods. Diagnostic assessments of AI systems often lack preregistered protocols and rigorous external cohort sampling, essential for reliable evidence of their safety and accuracy. This study protocol covers the retrospective validation of an AI system for prostate biopsy assessment. The primary objective of the study is to develop a high-performing and robust AI model for diagnosis and Gleason scoring of prostate cancer in core needle biopsies, and at scale evaluate whether it can generalise to fully external data from independent patients, pathology laboratories and digitalisation platforms. The secondary objectives cover AI performance in estimating cancer extent and detecting cribriform prostate cancer and perineural invasion. This protocol outlines the steps for data collection, predefined partitioning of data cohorts for AI model training and validation, model development and predetermined statistical analyses, ensuring systematic development and comprehensive validation of the system. The protocol adheres to Transparent Reporting of a multivariable prediction model of Individual Prognosis Or Diagnosis+AI (TRIPOD+AI), Protocol Items for External Cohort Evaluation of a Deep Learning System in Cancer Diagnostics (PIECES), Checklist for AI in Medical Imaging (CLAIM) and other relevant best practices. Data collection and usage were approved by the respective ethical review boards of each participating clinical laboratory, and centralised anonymised data handling was approved by the Swedish Ethical Review Authority. The study will be conducted in agreement with the Helsinki Declaration. The findings will be disseminated in peer-reviewed publications (open access).

Introducing Image-Space Preconditioning in the Variational Formulation of MRI Reconstructions

Bastien Milani, Jean-Baptist Ledoux, Berk Can Acikgoz, Xavier Richard

arxiv logopreprintJul 7 2025
The aim of the present article is to enrich the comprehension of iterative magnetic resonance imaging (MRI) reconstructions, including compressed sensing (CS) and iterative deep learning (DL) reconstructions, by describing them in the general framework of finite-dimensional inner-product spaces. In particular, we show that image-space preconditioning (ISP) and data-space preconditioning (DSP) can be formulated as non-conventional inner-products. The main gain of our reformulation is an embedding of ISP in the variational formulation of the MRI reconstruction problem (in an algorithm-independent way) which allows in principle to naturally and systematically propagate ISP in all iterative reconstructions, including many iterative DL and CS reconstructions where preconditioning is lacking. The way in which we apply linear algebraic tools to MRI reconstructions as presented in this article is a novelty. A secondary aim of our article is to offer a certain didactic material to scientists who are new in the field of MRI reconstruction. Since we explore here some mathematical concepts of reconstruction, we take that opportunity to recall some principles that may be understood for experts, but which may be hard to find in the literature for beginners. In fact, the description of many mathematical tools of MRI reconstruction is fragmented in the literature or sometimes missing because considered as a general knowledge. Further, some of those concepts can be found in mathematic manuals, but not in a form that is oriented toward MRI. For example, we think of the conjugate gradient descent, the notion of derivative with respect to non-conventional inner products, or simply the notion of adjoint. The authors believe therefore that it is beneficial for their field of research to dedicate some space to such a didactic material.

Self-supervised Deep Learning for Denoising in Ultrasound Microvascular Imaging

Lijie Huang, Jingyi Yin, Jingke Zhang, U-Wai Lok, Ryan M. DeRuiter, Jieyang Jin, Kate M. Knoll, Kendra E. Petersen, James D. Krier, Xiang-yang Zhu, Gina K. Hesley, Kathryn A. Robinson, Andrew J. Bentall, Thomas D. Atwell, Andrew D. Rule, Lilach O. Lerman, Shigao Chen, Chengwu Huang

arxiv logopreprintJul 7 2025
Ultrasound microvascular imaging (UMI) is often hindered by low signal-to-noise ratio (SNR), especially in contrast-free or deep tissue scenarios, which impairs subsequent vascular quantification and reliable disease diagnosis. To address this challenge, we propose Half-Angle-to-Half-Angle (HA2HA), a self-supervised denoising framework specifically designed for UMI. HA2HA constructs training pairs from complementary angular subsets of beamformed radio-frequency (RF) blood flow data, across which vascular signals remain consistent while noise varies. HA2HA was trained using in-vivo contrast-free pig kidney data and validated across diverse datasets, including contrast-free and contrast-enhanced data from pig kidneys, as well as human liver and kidney. An improvement exceeding 15 dB in both contrast-to-noise ratio (CNR) and SNR was observed, indicating a substantial enhancement in image quality. In addition to power Doppler imaging, denoising directly in the RF domain is also beneficial for other downstream processing such as color Doppler imaging (CDI). CDI results of human liver derived from the HA2HA-denoised signals exhibited improved microvascular flow visualization, with a suppressed noisy background. HA2HA offers a label-free, generalizable, and clinically applicable solution for robust vascular imaging in both contrast-free and contrast-enhanced UMI.

PGMI assessment in mammography: AI software versus human readers.

Santner T, Ruppert C, Gianolini S, Stalheim JG, Frei S, Hondl M, Fröhlich V, Hofvind S, Widmann G

pubmed logopapersJul 5 2025
The aim of this study was to evaluate human inter-reader agreement of parameters included in PGMI (perfect-good-moderate-inadequate) classification of screening mammograms and explore the role of artificial intelligence (AI) as an alternative reader. Five radiographers from three European countries independently performed a PGMI assessment of 520 anonymized mammography screening examinations randomly selected from representative subsets from 13 imaging centres within two European countries. As a sixth reader, a dedicated AI software was used. Accuracy, Cohen's Kappa, and confusion matrices were calculated to compare the predictions of the software against the individual assessment of the readers, as well as potential discrepancies between them. A questionnaire and a personality test were used to better understand the decision-making processes of the human readers. Significant inter-reader variability among human readers with poor to moderate agreement (κ = -0.018 to κ = 0.41) was observed, with some showing more homogenous interpretations of single features and overall quality than others. In comparison, the software surpassed human inter-reader agreement in detecting glandular tissue cuts, mammilla deviation, pectoral muscle detection, and pectoral angle measurement, while remaining features and overall image quality exhibited comparable performance to human assessment. Notably, human inter-reader disagreement of PGMI assessment in mammography is considerably high. AI software may already reliably categorize quality. Its potential for standardization and immediate feedback to achieve and monitor high levels of quality in screening programs needs further attention and should be included in future approaches. AI has promising potential for automated assessment of diagnostic image quality. Faster, more representative and more objective feedback may support radiographers in their quality management processes. Direct transformation of common PGMI workflows into an AI algorithm could be challenging.

Impact of super-resolution deep learning-based reconstruction for hippocampal MRI: A volunteer and phantom study.

Takada S, Nakaura T, Yoshida N, Uetani H, Shiraishi K, Kobayashi N, Matsuo K, Morita K, Nagayama Y, Kidoh M, Yamashita Y, Takayanagi R, Hirai T

pubmed logopapersJul 5 2025
To evaluate the effects of super-resolution deep learning-based reconstruction (SR-DLR) on thin-slice T2-weighted hippocampal MR image quality using 3 T MRI, in both human volunteers and phantoms. Thirteen healthy volunteers underwent hippocampal MRI at standard and high resolutions. Original (standard-resolution; StR) images were reconstructed with and without deep learning-based reconstruction (DLR) (Matrix = 320 × 320), and with SR-DLR (Matrix = 960 × 960). High-resolution (HR) images were also reconstructed with/without DLR (Matrix = 960 × 960). Contrast, contrast-to-noise ratio (CNR), and septum slope were analyzed. Two radiologists evaluated the images for noise, contrast, artifacts, sharpness, and overall quality. Quantitative and qualitative results are reported as medians and interquartile ranges (IQR). Comparisons used the Wilcoxon signed-rank test with Holm correction. We also scanned an American College of Radiology (ACR) phantom to evaluate the ability of our SR-DLR approach to reduce artifacts induced by zero-padding interpolation (ZIP). SR-DLR exhibited contrast comparable to original images and significantly higher than HR-images. Its slope was comparable to that of HR images but was significantly steeper than that of StR images (p < 0.01). Furthermore, the CNR of SR-DLR (10.53; IQR: 10.08, 11.69) was significantly superior to the StR-images without DLR (7.5; IQR: 6.4, 8.37), StR-images with DLR (8.73; IQR: 7.68, 9.0), HR-images without DLR (2.24; IQR: 1.43, 2.38), and HR-images with DLR (4.84; IQR: 2.99, 5.43) (p < 0.05). In the phantom study, artifacts induced by ZIP were scarcely observed when using SR-DLR. SR-DLR for hippocampal MRI potentially improves image quality beyond that of actual HR-images while reducing acquisition time.
Page 10 of 19183 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.