Sort by:
Page 55 of 2252247 results

Exposing and Mitigating Calibration Biases and Demographic Unfairness in MLLM Few-Shot In-Context Learning for Medical Image Classification

Xing Shen, Justin Szeto, Mingyang Li, Hengguan Huang, Tal Arbel

arxiv logopreprintJun 29 2025
Multimodal large language models (MLLMs) have enormous potential to perform few-shot in-context learning in the context of medical image analysis. However, safe deployment of these models into real-world clinical practice requires an in-depth analysis of the accuracies of their predictions, and their associated calibration errors, particularly across different demographic subgroups. In this work, we present the first investigation into the calibration biases and demographic unfairness of MLLMs' predictions and confidence scores in few-shot in-context learning for medical image classification. We introduce CALIN, an inference-time calibration method designed to mitigate the associated biases. Specifically, CALIN estimates the amount of calibration needed, represented by calibration matrices, using a bi-level procedure: progressing from the population level to the subgroup level prior to inference. It then applies this estimation to calibrate the predicted confidence scores during inference. Experimental results on three medical imaging datasets: PAPILA for fundus image classification, HAM10000 for skin cancer classification, and MIMIC-CXR for chest X-ray classification demonstrate CALIN's effectiveness at ensuring fair confidence calibration in its prediction, while improving its overall prediction accuracies and exhibiting minimum fairness-utility trade-off.

Hierarchical Corpus-View-Category Refinement for Carotid Plaque Risk Grading in Ultrasound

Zhiyuan Zhu, Jian Wang, Yong Jiang, Tong Han, Yuhao Huang, Ang Zhang, Kaiwen Yang, Mingyuan Luo, Zhe Liu, Yaofei Duan, Dong Ni, Tianhong Tang, Xin Yang

arxiv logopreprintJun 29 2025
Accurate carotid plaque grading (CPG) is vital to assess the risk of cardiovascular and cerebrovascular diseases. Due to the small size and high intra-class variability of plaque, CPG is commonly evaluated using a combination of transverse and longitudinal ultrasound views in clinical practice. However, most existing deep learning-based multi-view classification methods focus on feature fusion across different views, neglecting the importance of representation learning and the difference in class features. To address these issues, we propose a novel Corpus-View-Category Refinement Framework (CVC-RF) that processes information from Corpus-, View-, and Category-levels, enhancing model performance. Our contribution is four-fold. First, to the best of our knowledge, we are the foremost deep learning-based method for CPG according to the latest Carotid Plaque-RADS guidelines. Second, we propose a novel center-memory contrastive loss, which enhances the network's global modeling capability by comparing with representative cluster centers and diverse negative samples at the Corpus level. Third, we design a cascaded down-sampling attention module to fuse multi-scale information and achieve implicit feature interaction at the View level. Finally, a parameter-free mixture-of-experts weighting strategy is introduced to leverage class clustering knowledge to weight different experts, enabling feature decoupling at the Category level. Experimental results indicate that CVC-RF effectively models global features via multi-level refinement, achieving state-of-the-art performance in the challenging CPG task.

MedRegion-CT: Region-Focused Multimodal LLM for Comprehensive 3D CT Report Generation

Sunggu Kyung, Jinyoung Seo, Hyunseok Lim, Dongyeong Kim, Hyungbin Park, Jimin Sung, Jihyun Kim, Wooyoung Jo, Yoojin Nam, Namkug Kim

arxiv logopreprintJun 29 2025
The recent release of RadGenome-Chest CT has significantly advanced CT-based report generation. However, existing methods primarily focus on global features, making it challenging to capture region-specific details, which may cause certain abnormalities to go unnoticed. To address this, we propose MedRegion-CT, a region-focused Multi-Modal Large Language Model (MLLM) framework, featuring three key innovations. First, we introduce Region Representative ($R^2$) Token Pooling, which utilizes a 2D-wise pretrained vision model to efficiently extract 3D CT features. This approach generates global tokens representing overall slice features and region tokens highlighting target areas, enabling the MLLM to process comprehensive information effectively. Second, a universal segmentation model generates pseudo-masks, which are then processed by a mask encoder to extract region-centric features. This allows the MLLM to focus on clinically relevant regions, using six predefined region masks. Third, we leverage segmentation results to extract patient-specific attributions, including organ size, diameter, and locations. These are converted into text prompts, enriching the MLLM's understanding of patient-specific contexts. To ensure rigorous evaluation, we conducted benchmark experiments on report generation using the RadGenome-Chest CT. MedRegion-CT achieved state-of-the-art performance, outperforming existing methods in natural language generation quality and clinical relevance while maintaining interpretability. The code for our framework is publicly available.

Frequency-enhanced Multi-granularity Context Network for Efficient Vertebrae Segmentation

Jian Shi, Tianqi You, Pingping Zhang, Hongli Zhang, Rui Xu, Haojie Li

arxiv logopreprintJun 29 2025
Automated and accurate segmentation of individual vertebra in 3D CT and MRI images is essential for various clinical applications. Due to the limitations of current imaging techniques and the complexity of spinal structures, existing methods still struggle with reducing the impact of image blurring and distinguishing similar vertebrae. To alleviate these issues, we introduce a Frequency-enhanced Multi-granularity Context Network (FMC-Net) to improve the accuracy of vertebrae segmentation. Specifically, we first apply wavelet transform for lossless downsampling to reduce the feature distortion in blurred images. The decomposed high and low-frequency components are then processed separately. For the high-frequency components, we apply a High-frequency Feature Refinement (HFR) to amplify the prominence of key features and filter out noises, restoring fine-grained details in blurred images. For the low-frequency components, we use a Multi-granularity State Space Model (MG-SSM) to aggregate feature representations with different receptive fields, extracting spatially-varying contexts while capturing long-range dependencies with linear complexity. The utilization of multi-granularity contexts is essential for distinguishing similar vertebrae and improving segmentation accuracy. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches on both CT and MRI vertebrae segmentation datasets. The source code is publicly available at https://github.com/anaanaa/FMCNet.

Cognition-Eye-Brain Connection in Alzheimer's Disease Spectrum Revealed by Multimodal Imaging.

Shi Y, Shen T, Yan S, Liang J, Wei T, Huang Y, Gao R, Zheng N, Ci R, Zhang M, Tang X, Qin Y, Zhu W

pubmed logopapersJun 29 2025
The connection between cognition, eye, and brain remains inconclusive in Alzheimer's disease (AD) spectrum disorders. To explore the relationship between cognitive function, retinal biometrics, and brain alterations in the AD spectrum. Prospective. Healthy control (HC) (n = 16), subjective cognitive decline (SCD) (n = 35), mild cognitive impairment (MCI) (n = 18), and AD group (n = 7). 3-T, 3D T1-weighted Brain Volume (BRAVO) and resting-state functional MRI (fMRI). In all subgroups, cortical thickness was measured from BRAVO and segmented using the Desikan-Killiany-Tourville (DKT) atlas. The fractional amplitude of low-frequency fluctuations (FALFF) and regional homogeneity (ReHo) were measured in fMRI using voxel-based analysis. The eye was imaged by optical coherence tomography angiography (OCTA), with the deep learning model FARGO segmenting the foveal avascular zone (FAZ) and retinal vessels. FAZ area and perimeter, retinal blood vessels curvature (RBVC), thicknesses of the retinal nerve fiber layer (RNFL) and ganglion cell layer-inner plexiform layer (GCL-IPL) were calculated. Cognition-eye-brain associations were compared across the HC group and each AD spectrum stage using multivariable linear regression. Multivariable linear regression analysis. Statistical significance was set at p < 0.05 with FWE correction for fMRI and p < 1/62 (Bonferroni-corrected) for structural analyses. Reductions of FALFF in temporal regions, especially the left superior temporal gyrus (STG) in MCI patients, were linked to decreased RNFL thickness and increased FAZ area significantly. In AD patients, reduced ReHo values in occipital regions, especially the right middle occipital gyrus (MOG), were significantly associated with an enlarged FAZ area. The SCD group showed widespread cortical thickening significantly associated with all aforementioned retinal biometrics, with notable thickening in the right fusiform gyrus (FG) and right parahippocampal gyrus (PHG) correlating with reduced GCL-IPL thickness. Brain function and structure may be associated with cognition and retinal biometrics across the AD spectrum. Specifically, cognition-eye-brain connections may be present in SCD. 2. 3.

Perivascular Space Burden in Children With Autism Spectrum Disorder Correlates With Neurodevelopmental Severity.

Frigerio G, Rizzato G, Peruzzo D, Ciceri T, Mani E, Lanteri F, Mariani V, Molteni M, Agarwal N

pubmed logopapersJun 29 2025
Cerebral perivascular spaces (PVS) are involved in cerebrospinal fluid (CSF) circulation and clearance of metabolic waste in adult humans. A high number of PVS has been reported in autism spectrum disorder (ASD) but its relationship with CSF and disease severity is unclear. To quantify PVS in children with ASD through MRI. Retrospective. Sixty six children with ASD (mean age: 4.7 ± 1.5 years; males/females: 59/7). 3T, 3D T1-weighted GRE and 3D T2-weighted turbo spin echo sequences. PVS were segmented using a weakly supervised PVS algorithm. PVS count, white matter-perivascular spaces (WM-PVS<sub>tot</sub>) and normalized volume (WM-PVS<sub>voln</sub>) were analyzed in the entire white matter. Six regions: frontal, parietal, limbic, occipital, temporal, and deep WM (WM-PVS<sub>sr</sub>). WM, GM, CSF, and extra-axial CSF (eaCSF) volumes were also calculated. Autism Diagnostic Observation Schedule, Wechsler Intelligence Scale, and Griffiths Mental Developmental scales were used to assess clinical severity and developmental quotient (DQ). Kendall correlation analysis (continuous variables) and Friedman (categorical variables) tests were used to compare medians of PVS variables across different WM regions. Post hoc pairwise comparisons with Wilcoxon tests were used to evaluate distributions of PVS in WM regions. Generalized linear models were employed to assess DQ, clinical severity, age, and eaCSF volume in relation to PVS variables. A p-value < 0.05 indicated statistical significance. Severe DQ (β = 0.0089), mild form of autism (β = -0.0174), and larger eaCSF (β = 0.0082) volume was significantly associated with greater WM-PVS<sub>tot</sub> count. WM-PVS<sub>voln</sub> was predominantly affected by normalized eaCSF volume (eaCSF<sub>voln</sub>) (β = 0.0242; adjusted for WM volumes). The percentage of WM-PVS<sub>sr</sub> was higher in the frontal areas (32%) and was lowest in the temporal regions (11%). PVS count and volume in ASD are associated with eaCSF<sub>voln</sub>. PVS count is related to clinical severity and DQ. PVS count was higher in frontal regions and lower in temporal regions. 4. Stage 3.

CA-Diff: Collaborative Anatomy Diffusion for Brain Tissue Segmentation

Qilong Xing, Zikai Song, Yuteng Ye, Yuke Chen, Youjia Zhang, Na Feng, Junqing Yu, Wei Yang

arxiv logopreprintJun 28 2025
Segmentation of brain structures from MRI is crucial for evaluating brain morphology, yet existing CNN and transformer-based methods struggle to delineate complex structures accurately. While current diffusion models have shown promise in image segmentation, they are inadequate when applied directly to brain MRI due to neglecting anatomical information. To address this, we propose Collaborative Anatomy Diffusion (CA-Diff), a framework integrating spatial anatomical features to enhance segmentation accuracy of the diffusion model. Specifically, we introduce distance field as an auxiliary anatomical condition to provide global spatial context, alongside a collaborative diffusion process to model its joint distribution with anatomical structures, enabling effective utilization of anatomical features for segmentation. Furthermore, we introduce a consistency loss to refine relationships between the distance field and anatomical structures and design a time adapted channel attention module to enhance the U-Net feature fusion procedure. Extensive experiments show that CA-Diff outperforms state-of-the-art (SOTA) methods.

Hierarchical Characterization of Brain Dynamics via State Space-based Vector Quantization

Yanwu Yang, Thomas Wolfers

arxiv logopreprintJun 28 2025
Understanding brain dynamics through functional Magnetic Resonance Imaging (fMRI) remains a fundamental challenge in neuroscience, particularly in capturing how the brain transitions between various functional states. Recently, metastability, which refers to temporarily stable brain states, has offered a promising paradigm to quantify complex brain signals into interpretable, discretized representations. In particular, compared to cluster-based machine learning approaches, tokenization approaches leveraging vector quantization have shown promise in representation learning with powerful reconstruction and predictive capabilities. However, most existing methods ignore brain transition dependencies and lack a quantification of brain dynamics into representative and stable embeddings. In this study, we propose a Hierarchical State space-based Tokenization network, termed HST, which quantizes brain states and transitions in a hierarchical structure based on a state space-based model. We introduce a refined clustered Vector-Quantization Variational AutoEncoder (VQ-VAE) that incorporates quantization error feedback and clustering to improve quantization performance while facilitating metastability with representative and stable token representations. We validate our HST on two public fMRI datasets, demonstrating its effectiveness in quantifying the hierarchical dynamics of the brain and its potential in disease diagnosis and reconstruction performance. Our method offers a promising framework for the characterization of brain dynamics, facilitating the analysis of metastability.

Inpainting is All You Need: A Diffusion-based Augmentation Method for Semi-supervised Medical Image Segmentation

Xinrong Hu, Yiyu Shi

arxiv logopreprintJun 28 2025
Collecting pixel-level labels for medical datasets can be a laborious and expensive process, and enhancing segmentation performance with a scarcity of labeled data is a crucial challenge. This work introduces AugPaint, a data augmentation framework that utilizes inpainting to generate image-label pairs from limited labeled data. AugPaint leverages latent diffusion models, known for their ability to generate high-quality in-domain images with low overhead, and adapts the sampling process for the inpainting task without need for retraining. Specifically, given a pair of image and label mask, we crop the area labeled with the foreground and condition on it during reversed denoising process for every noise level. Masked background area would gradually be filled in, and all generated images are paired with the label mask. This approach ensures the accuracy of match between synthetic images and label masks, setting it apart from existing dataset generation methods. The generated images serve as valuable supervision for training downstream segmentation models, effectively addressing the challenge of limited annotations. We conducted extensive evaluations of our data augmentation method on four public medical image segmentation datasets, including CT, MRI, and skin imaging. Results across all datasets demonstrate that AugPaint outperforms state-of-the-art label-efficient methodologies, significantly improving segmentation performance.

Deep Learning-Based Automated Detection of the Middle Cerebral Artery in Transcranial Doppler Ultrasound Examinations.

Lee H, Shi W, Mukaddim RA, Brunelle E, Palisetti A, Imaduddin SM, Rajendram P, Incontri D, Lioutas VA, Heldt T, Raju BI

pubmed logopapersJun 28 2025
Transcranial Doppler (TCD) ultrasound has significant clinical value for assessing cerebral hemodynamics, but its reliance on operator expertise limits broader clinical adoption. In this work, we present a lightweight real-time deep learning-based approach capable of automatically identifying the middle cerebral artery (MCA) in TCD Color Doppler images. Two state-of-the-art object detection models, YOLOv10 and Real-Time Detection Transformers (RT-DETR), were investigated for automated MCA detection in real-time. TCD Color Doppler data (41 subjects; 365 videos; 61,611 frames) were collected from neurologically healthy individuals (n = 31) and stroke patients (n = 10). MCA bounding box annotations were performed by clinical experts on all frames. Model training consisted of pretraining utilizing a large abdominal ultrasound dataset followed by subsequent fine-tuning on acquired TCD data. Detection performance at the instance and frame levels, and inference speed were assessed through four-fold cross-validation. Inter-rater agreement between model and two human expert readers was assessed using distance between bounding boxes and inter-rater variability was quantified using the individual equivalence coefficient (IEC) metric. Both YOLOv10 and RT-DETR models showed comparable frame level accuracy for MCA presence, with F1 scores of 0.884 ± 0.023 and 0.884 ± 0.019 respectively. YOLOv10 outperformed RT-DETR for instance-level localization accuracy (AP: 0.817 vs. 0.780) and had considerably faster inference speed on a desktop CPU (11.6 ms vs. 91.14 ms). Furthermore, YOLOv10 showed an average inference time of 36 ms per frame on a tablet device. The IEC was -1.08 with 95 % confidence interval: [-1.45, -0.19], showing that the AI predictions deviated less from each reader than the readers' annotations deviated from each other. Real-time automated detection of the MCA is feasible and can be implemented on mobile platforms, potentially enabling wider clinical adoption by less-trained operators in point-of-care settings.
Page 55 of 2252247 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.