Sort by:
Page 15 of 46451 results

Pixel-wise Modulated Dice Loss for Medical Image Segmentation

Seyed Mohsen Hosseini

arxiv logopreprintJun 17 2025
Class imbalance and the difficulty imbalance are the two types of data imbalance that affect the performance of neural networks in medical segmentation tasks. In class imbalance the loss is dominated by the majority classes and in difficulty imbalance the loss is dominated by easy to classify pixels. This leads to an ineffective training. Dice loss, which is based on a geometrical metric, is very effective in addressing the class imbalance compared to the cross entropy (CE) loss, which is adopted directly from classification tasks. To address the difficulty imbalance, the common approach is employing a re-weighted CE loss or a modified Dice loss to focus the training on difficult to classify areas. The existing modification methods are computationally costly and with limited success. In this study we propose a simple modification to the Dice loss with minimal computational cost. With a pixel level modulating term, we take advantage of the effectiveness of Dice loss in handling the class imbalance to also handle the difficulty imbalance. Results on three commonly used medical segmentation tasks show that the proposed Pixel-wise Modulated Dice loss (PM Dice loss) outperforms other methods, which are designed to tackle the difficulty imbalance problem.

SCISSOR: Mitigating Semantic Bias through Cluster-Aware Siamese Networks for Robust Classification

Shuo Yang, Bardh Prenkaj, Gjergji Kasneci

arxiv logopreprintJun 17 2025
Shortcut learning undermines model generalization to out-of-distribution data. While the literature attributes shortcuts to biases in superficial features, we show that imbalances in the semantic distribution of sample embeddings induce spurious semantic correlations, compromising model robustness. To address this issue, we propose SCISSOR (Semantic Cluster Intervention for Suppressing ShORtcut), a Siamese network-based debiasing approach that remaps the semantic space by discouraging latent clusters exploited as shortcuts. Unlike prior data-debiasing approaches, SCISSOR eliminates the need for data augmentation and rewriting. We evaluate SCISSOR on 6 models across 4 benchmarks: Chest-XRay and Not-MNIST in computer vision, and GYAFC and Yelp in NLP tasks. Compared to several baselines, SCISSOR reports +5.3 absolute points in F1 score on GYAFC, +7.3 on Yelp, +7.7 on Chest-XRay, and +1 on Not-MNIST. SCISSOR is also highly advantageous for lightweight models with ~9.5% improvement on F1 for ViT on computer vision datasets and ~11.9% for BERT on NLP. Our study redefines the landscape of model generalization by addressing overlooked semantic biases, establishing SCISSOR as a foundational framework for mitigating shortcut learning and fostering more robust, bias-resistant AI systems.

A Digital Twin Framework for Adaptive Treatment Planning in Radiotherapy

Chih-Wei Chang, Sri Akkineni, Mingzhe Hu, Keyur D. Shah, Jun Zhou, Xiaofeng Yang

arxiv logopreprintJun 17 2025
This study aims to develop and evaluate a digital twin (DT) framework to enhance adaptive proton therapy for prostate stereotactic body radiotherapy (SBRT), focusing on improving treatment precision for dominant intraprostatic lesions (DILs) while minimizing organ-at-risk (OAR) toxicity. We propose a decision-theoretic (DT) framework combining deep learning (DL)-based deformable image registration (DIR) with a prior treatment database to generate synthetic CTs (sCTs) for predicting interfractional anatomical changes. Using daily CBCT from five prostate SBRT patients with DILs, the framework precomputes multiple plans with high (DT-H) and low (DT-L) similarity sCTs. Plan optimization is performed in RayStation 2023B, assuming a constant RBE of 1.1 and robustly accounting for positional and range uncertainties. Plan quality is evaluated via a modified ProKnow score across two fractions, with reoptimization limited to 10 minutes. Daily CBCT evaluation showed clinical plans often violated OAR constraints (e.g., bladder V20.8Gy, rectum V23Gy), with DIL V100 < 90% in 2 patients, indicating SIFB failure. DT-H plans, using high-similarity sCTs, achieved better or comparable DIL/CTV coverage and lower OAR doses, with reoptimization completed within 10 min (e.g., DT-H-REopt-A score: 154.3-165.9). DT-L plans showed variable outcomes; lower similarity correlated with reduced DIL coverage (e.g., Patient 4: 84.7%). DT-H consistently outperformed clinical plans within time limits, while extended optimization brought DT-L and clinical plans closer to DT-H quality. This DT framework enables rapid, personalized adaptive proton therapy, improving DIL targeting and reducing toxicity. By addressing geometric uncertainties, it supports outcome gains in ultra-hypofractionated prostate RT and lays groundwork for future multimodal anatomical prediction.

DGG-XNet: A Hybrid Deep Learning Framework for Multi-Class Brain Disease Classification with Explainable AI

Sumshun Nahar Eity, Mahin Montasir Afif, Tanisha Fairooz, Md. Mortuza Ahmmed, Md Saef Ullah Miah

arxiv logopreprintJun 17 2025
Accurate diagnosis of brain disorders such as Alzheimer's disease and brain tumors remains a critical challenge in medical imaging. Conventional methods based on manual MRI analysis are often inefficient and error-prone. To address this, we propose DGG-XNet, a hybrid deep learning model integrating VGG16 and DenseNet121 to enhance feature extraction and classification. DenseNet121 promotes feature reuse and efficient gradient flow through dense connectivity, while VGG16 contributes strong hierarchical spatial representations. Their fusion enables robust multiclass classification of neurological conditions. Grad-CAM is applied to visualize salient regions, enhancing model transparency. Trained on a combined dataset from BraTS 2021 and Kaggle, DGG-XNet achieved a test accuracy of 91.33\%, with precision, recall, and F1-score all exceeding 91\%. These results highlight DGG-XNet's potential as an effective and interpretable tool for computer-aided diagnosis (CAD) of neurodegenerative and oncological brain disorders.

Risk Estimation of Knee Osteoarthritis Progression via Predictive Multi-task Modelling from Efficient Diffusion Model using X-ray Images

David Butler, Adrian Hilton, Gustavo Carneiro

arxiv logopreprintJun 17 2025
Medical imaging plays a crucial role in assessing knee osteoarthritis (OA) risk by enabling early detection and disease monitoring. Recent machine learning methods have improved risk estimation (i.e., predicting the likelihood of disease progression) and predictive modelling (i.e., the forecasting of future outcomes based on current data) using medical images, but clinical adoption remains limited due to their lack of interpretability. Existing approaches that generate future images for risk estimation are complex and impractical. Additionally, previous methods fail to localize anatomical knee landmarks, limiting interpretability. We address these gaps with a new interpretable machine learning method to estimate the risk of knee OA progression via multi-task predictive modelling that classifies future knee OA severity and predicts anatomical knee landmarks from efficiently generated high-quality future images. Such image generation is achieved by leveraging a diffusion model in a class-conditioned latent space to forecast disease progression, offering a visual representation of how particular health conditions may evolve. Applied to the Osteoarthritis Initiative dataset, our approach improves the state-of-the-art (SOTA) by 2\%, achieving an AUC of 0.71 in predicting knee OA progression while offering ~9% faster inference time.

Latent Anomaly Detection: Masked VQ-GAN for Unsupervised Segmentation in Medical CBCT

Pengwei Wang

arxiv logopreprintJun 17 2025
Advances in treatment technology now allow for the use of customizable 3D-printed hydrogel wound dressings for patients with osteoradionecrosis (ORN) of the jaw (ONJ). Meanwhile, deep learning has enabled precise segmentation of 3D medical images using tools like nnUNet. However, the scarcity of labeled data in ONJ imaging makes supervised training impractical. This study aims to develop an unsupervised training approach for automatically identifying anomalies in imaging scans. We propose a novel two-stage training pipeline. In the first stage, a VQ-GAN is trained to accurately reconstruct normal subjects. In the second stage, random cube masking and ONJ-specific masking are applied to train a new encoder capable of recovering the data. The proposed method achieves successful segmentation on both simulated and real patient data. This approach provides a fast initial segmentation solution, reducing the burden of manual labeling. Additionally, it has the potential to be directly used for 3D printing when combined with hand-tuned post-processing.

BRISC: Annotated Dataset for Brain Tumor Segmentation and Classification with Swin-HAFNet

Amirreza Fateh, Yasin Rezvani, Sara Moayedi, Sadjad Rezvani, Fatemeh Fateh, Mansoor Fateh

arxiv logopreprintJun 17 2025
Accurate segmentation and classification of brain tumors from Magnetic Resonance Imaging (MRI) remain key challenges in medical image analysis, largely due to the lack of high-quality, balanced, and diverse datasets. In this work, we present a new curated MRI dataset designed specifically for brain tumor segmentation and classification tasks. The dataset comprises 6,000 contrast-enhanced T1-weighted MRI scans annotated by certified radiologists and physicians, spanning three major tumor types-glioma, meningioma, and pituitary-as well as non-tumorous cases. Each sample includes high-resolution labels and is categorized across axial, sagittal, and coronal imaging planes to facilitate robust model development and cross-view generalization. To demonstrate the utility of the dataset, we propose a transformer-based segmentation model and benchmark it against established baselines. Our method achieves the highest weighted mean Intersection-over-Union (IoU) of 82.3%, with improvements observed across all tumor categories. Importantly, this study serves primarily as an introduction to the dataset, establishing foundational benchmarks for future research. We envision this dataset as a valuable resource for advancing machine learning applications in neuro-oncology, supporting both academic research and clinical decision-support development. datasetlink: https://www.kaggle.com/datasets/briscdataset/brisc2025/

MultiViT2: A Data-augmented Multimodal Neuroimaging Prediction Framework via Latent Diffusion Model

Bi Yuda, Jia Sihan, Gao Yutong, Abrol Anees, Fu Zening, Calhoun Vince

arxiv logopreprintJun 16 2025
Multimodal medical imaging integrates diverse data types, such as structural and functional neuroimaging, to provide complementary insights that enhance deep learning predictions and improve outcomes. This study focuses on a neuroimaging prediction framework based on both structural and functional neuroimaging data. We propose a next-generation prediction model, \textbf{MultiViT2}, which combines a pretrained representative learning base model with a vision transformer backbone for prediction output. Additionally, we developed a data augmentation module based on the latent diffusion model that enriches input data by generating augmented neuroimaging samples, thereby enhancing predictive performance through reduced overfitting and improved generalizability. We show that MultiViT2 significantly outperforms the first-generation model in schizophrenia classification accuracy and demonstrates strong scalability and portability.

ViT-NeBLa: A Hybrid Vision Transformer and Neural Beer-Lambert Framework for Single-View 3D Reconstruction of Oral Anatomy from Panoramic Radiographs

Bikram Keshari Parida, Anusree P. Sunilkumar, Abhijit Sen, Wonsang You

arxiv logopreprintJun 16 2025
Dental diagnosis relies on two primary imaging modalities: panoramic radiographs (PX) providing 2D oral cavity representations, and Cone-Beam Computed Tomography (CBCT) offering detailed 3D anatomical information. While PX images are cost-effective and accessible, their lack of depth information limits diagnostic accuracy. CBCT addresses this but presents drawbacks including higher costs, increased radiation exposure, and limited accessibility. Existing reconstruction models further complicate the process by requiring CBCT flattening or prior dental arch information, often unavailable clinically. We introduce ViT-NeBLa, a vision transformer-based Neural Beer-Lambert model enabling accurate 3D reconstruction directly from single PX. Our key innovations include: (1) enhancing the NeBLa framework with Vision Transformers for improved reconstruction capabilities without requiring CBCT flattening or prior dental arch information, (2) implementing a novel horseshoe-shaped point sampling strategy with non-intersecting rays that eliminates intermediate density aggregation required by existing models due to intersecting rays, reducing sampling point computations by $52 \%$, (3) replacing CNN-based U-Net with a hybrid ViT-CNN architecture for superior global and local feature extraction, and (4) implementing learnable hash positional encoding for better higher-dimensional representation of 3D sample points compared to existing Fourier-based dense positional encoding. Experiments demonstrate that ViT-NeBLa significantly outperforms prior state-of-the-art methods both quantitatively and qualitatively, offering a cost-effective, radiation-efficient alternative for enhanced dental diagnostics.

PRO: Projection Domain Synthesis for CT Imaging

Kang Chen, Bin Huang, Xuebin Yang, Junyan Zhang, Qiegen Liu

arxiv logopreprintJun 16 2025
Synthesizing high quality CT images remains a signifi-cant challenge due to the limited availability of annotat-ed data and the complex nature of CT imaging. In this work, we present PRO, a novel framework that, to the best of our knowledge, is the first to perform CT image synthesis in the projection domain using latent diffusion models. Unlike previous approaches that operate in the image domain, PRO learns rich structural representa-tions from raw projection data and leverages anatomi-cal text prompts for controllable synthesis. This projec-tion domain strategy enables more faithful modeling of underlying imaging physics and anatomical structures. Moreover, PRO functions as a foundation model, capa-ble of generalizing across diverse downstream tasks by adjusting its generative behavior via prompt inputs. Experimental results demonstrated that incorporating our synthesized data significantly improves perfor-mance across multiple downstream tasks, including low-dose and sparse-view reconstruction, even with limited training data. These findings underscore the versatility and scalability of PRO in data generation for various CT applications. These results highlight the potential of projection domain synthesis as a powerful tool for data augmentation and robust CT imaging. Our source code is publicly available at: https://github.com/yqx7150/PRO.
Page 15 of 46451 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.