Sort by:
Page 5 of 18173 results

SCISSOR: Mitigating Semantic Bias through Cluster-Aware Siamese Networks for Robust Classification

Shuo Yang, Bardh Prenkaj, Gjergji Kasneci

arxiv logopreprintJun 17 2025
Shortcut learning undermines model generalization to out-of-distribution data. While the literature attributes shortcuts to biases in superficial features, we show that imbalances in the semantic distribution of sample embeddings induce spurious semantic correlations, compromising model robustness. To address this issue, we propose SCISSOR (Semantic Cluster Intervention for Suppressing ShORtcut), a Siamese network-based debiasing approach that remaps the semantic space by discouraging latent clusters exploited as shortcuts. Unlike prior data-debiasing approaches, SCISSOR eliminates the need for data augmentation and rewriting. We evaluate SCISSOR on 6 models across 4 benchmarks: Chest-XRay and Not-MNIST in computer vision, and GYAFC and Yelp in NLP tasks. Compared to several baselines, SCISSOR reports +5.3 absolute points in F1 score on GYAFC, +7.3 on Yelp, +7.7 on Chest-XRay, and +1 on Not-MNIST. SCISSOR is also highly advantageous for lightweight models with ~9.5% improvement on F1 for ViT on computer vision datasets and ~11.9% for BERT on NLP. Our study redefines the landscape of model generalization by addressing overlooked semantic biases, establishing SCISSOR as a foundational framework for mitigating shortcut learning and fostering more robust, bias-resistant AI systems.

Beyond the First Read: AI-Assisted Perceptual Error Detection in Chest Radiography Accounting for Interobserver Variability

Adhrith Vutukuri, Akash Awasthi, David Yang, Carol C. Wu, Hien Van Nguyen

arxiv logopreprintJun 16 2025
Chest radiography is widely used in diagnostic imaging. However, perceptual errors -- especially overlooked but visible abnormalities -- remain common and clinically significant. Current workflows and AI systems provide limited support for detecting such errors after interpretation and often lack meaningful human--AI collaboration. We introduce RADAR (Radiologist--AI Diagnostic Assistance and Review), a post-interpretation companion system. RADAR ingests finalized radiologist annotations and CXR images, then performs regional-level analysis to detect and refer potentially missed abnormal regions. The system supports a "second-look" workflow and offers suggested regions of interest (ROIs) rather than fixed labels to accommodate inter-observer variation. We evaluated RADAR on a simulated perceptual-error dataset derived from de-identified CXR cases, using F1 score and Intersection over Union (IoU) as primary metrics. RADAR achieved a recall of 0.78, precision of 0.44, and an F1 score of 0.56 in detecting missed abnormalities in the simulated perceptual-error dataset. Although precision is moderate, this reduces over-reliance on AI by encouraging radiologist oversight in human--AI collaboration. The median IoU was 0.78, with more than 90% of referrals exceeding 0.5 IoU, indicating accurate regional localization. RADAR effectively complements radiologist judgment, providing valuable post-read support for perceptual-error detection in CXR interpretation. Its flexible ROI suggestions and non-intrusive integration position it as a promising tool in real-world radiology workflows. To facilitate reproducibility and further evaluation, we release a fully open-source web implementation alongside a simulated error dataset. All code, data, demonstration videos, and the application are publicly available at https://github.com/avutukuri01/RADAR.

Next-generation machine learning model to measure the Norberg angle on canine hip radiographs increases accuracy and time to completion.

Hansen GC, Yao Y, Fischetti AJ, Gonzalez A, Porter I, Todhunter RJ, Zhang Y

pubmed logopapersJun 16 2025
To apply machine learning (ML) to measure the Norberg angle (NA) on canine ventrodorsal hip-extended pelvic radiographs. In this observational study, an NA-AI model was trained on real and synthetic radiographs. Additional radiographs were used for validation and testing. Each NA was predicted using a hybrid architecture derived from 2 ML vision models. The NAs were measured by 4 authors, and the model all were compared to each other. The time taken to correct the NAs predicted by the model was compared to unassisted human measurements. The NA-AI model was trained on 733 real and 1,474 synthetic radiographs; 105 real radiographs were used for validation and 128 for testing. The mean absolute error between each human measurement ranged from 3° to 10° ± SD = 3° to 10° with an intraclass correlation between humans of 0.38 to 0.92. The mean absolute error between the NA-AI model prediction and the human measurements was 5° to 6° ± SD = 5° (intraclass correlation, 0.39 to 0.94). Bland-Altman plots showed good agreement between human and AI measurements when the NAs were greater than 80°. The time taken to check the accuracy of the NA measurement compared to unassisted measurements was reduced by 45% to 80%. The NA-AI model proved more accurate than the original model except when the hip dysplasia was severe, and its assistance decreased the time needed to analyze radiographs. The assistance of the NA-AI model reduces the time taken for radiographic hip analysis for clinical applications. However, it is less reliable in cases involving severe osteoarthritic change, requiring manual review for such cases.

ViT-NeBLa: A Hybrid Vision Transformer and Neural Beer-Lambert Framework for Single-View 3D Reconstruction of Oral Anatomy from Panoramic Radiographs

Bikram Keshari Parida, Anusree P. Sunilkumar, Abhijit Sen, Wonsang You

arxiv logopreprintJun 16 2025
Dental diagnosis relies on two primary imaging modalities: panoramic radiographs (PX) providing 2D oral cavity representations, and Cone-Beam Computed Tomography (CBCT) offering detailed 3D anatomical information. While PX images are cost-effective and accessible, their lack of depth information limits diagnostic accuracy. CBCT addresses this but presents drawbacks including higher costs, increased radiation exposure, and limited accessibility. Existing reconstruction models further complicate the process by requiring CBCT flattening or prior dental arch information, often unavailable clinically. We introduce ViT-NeBLa, a vision transformer-based Neural Beer-Lambert model enabling accurate 3D reconstruction directly from single PX. Our key innovations include: (1) enhancing the NeBLa framework with Vision Transformers for improved reconstruction capabilities without requiring CBCT flattening or prior dental arch information, (2) implementing a novel horseshoe-shaped point sampling strategy with non-intersecting rays that eliminates intermediate density aggregation required by existing models due to intersecting rays, reducing sampling point computations by $52 \%$, (3) replacing CNN-based U-Net with a hybrid ViT-CNN architecture for superior global and local feature extraction, and (4) implementing learnable hash positional encoding for better higher-dimensional representation of 3D sample points compared to existing Fourier-based dense positional encoding. Experiments demonstrate that ViT-NeBLa significantly outperforms prior state-of-the-art methods both quantitatively and qualitatively, offering a cost-effective, radiation-efficient alternative for enhanced dental diagnostics.

Finding Optimal Kernel Size and Dimension in Convolutional Neural Networks An Architecture Optimization Approach

Shreyas Rajeev, B Sathish Babu

arxiv logopreprintJun 16 2025
Kernel size selection in Convolutional Neural Networks (CNNs) is a critical but often overlooked design decision that affects receptive field, feature extraction, computational cost, and model accuracy. This paper proposes the Best Kernel Size Estimation Function (BKSEF), a mathematically grounded and empirically validated framework for optimal, layer-wise kernel size determination. BKSEF balances information gain, computational efficiency, and accuracy improvements by integrating principles from information theory, signal processing, and learning theory. Extensive experiments on CIFAR-10, CIFAR-100, ImageNet-lite, ChestX-ray14, and GTSRB datasets demonstrate that BKSEF-guided architectures achieve up to 3.1 percent accuracy improvement and 42.8 percent reduction in FLOPs compared to traditional models using uniform 3x3 kernels. Two real-world case studies further validate the approach: one for medical image classification in a cloud-based setup, and another for traffic sign recognition on edge devices. The former achieved enhanced interpretability and accuracy, while the latter reduced latency and model size significantly, with minimal accuracy trade-off. These results show that kernel size can be an active, optimizable parameter rather than a fixed heuristic. BKSEF provides practical heuristics and theoretical support for researchers and developers seeking efficient and application-aware CNN designs. It is suitable for integration into neural architecture search pipelines and real-time systems, offering a new perspective on CNN optimization.

Boundary-Aware Vision Transformer for Angiography Vascular Network Segmentation

Nabil Hezil, Suraj Singh, Vita Vlasova, Oleg Rogov, Ahmed Bouridane, Rifat Hamoudi

arxiv logopreprintJun 15 2025
Accurate segmentation of vascular structures in coronary angiography remains a core challenge in medical image analysis due to the complexity of elongated, thin, and low-contrast vessels. Classical convolutional neural networks (CNNs) often fail to preserve topological continuity, while recent Vision Transformer (ViT)-based models, although strong in global context modeling, lack precise boundary awareness. In this work, we introduce BAVT, a Boundary-Aware Vision Transformer, a ViT-based architecture enhanced with an edge-aware loss that explicitly guides the segmentation toward fine-grained vascular boundaries. Unlike hybrid transformer-CNN models, BAVT retains a minimal, scalable structure that is fully compatible with large-scale vision foundation model (VFM) pretraining. We validate our approach on the DCA-1 coronary angiography dataset, where BAVT achieves superior performance across medical image segmentation metrics outperforming both CNN and hybrid baselines. These results demonstrate the effectiveness of combining plain ViT encoders with boundary-aware supervision for clinical-grade vascular segmentation.

Automated Measurements of Spinal Parameters for Scoliosis Using Deep Learning.

Meng X, Zhu S, Yang Q, Zhu F, Wang Z, Liu X, Dong P, Wang S, Fan L

pubmed logopapersJun 15 2025
Retrospective single-institution study. To develop and validate an automated convolutional neural network (CNN) to measure the Cobb angle, T1 tilt angle, coronal balance, clavicular angle, height of the shoulders, T5-T12 Cobb angle, and sagittal balance for accurate scoliosis diagnosis. Scoliosis, characterized by a Cobb angle >10°, requires accurate and reliable measurements to guide treatment. Traditional manual measurements are time-consuming and have low interobserver and intraobserver reliability. While some automated tools exist, they often require manual intervention and focus primarily on the Cobb angle. In this study, we utilized four data sets comprising the anterior-posterior (AP) and lateral radiographs of 1682 patients with scoliosis. The CNN includes coarse segmentation, landmark localization, and fine segmentation. The measurements were evaluated using the dice coefficient, mean absolute error (MAE), and percentage of correct key-points (PCK) with a 3-mm threshold. An internal testing set, including 87 adolescent (7-16 yr) and 26 older adult patients (≥60 yr), was used to evaluate the agreement between automated and manual measurements. The automated measures by the CNN achieved high mean dice coefficients (>0.90), PCK of 89.7%-93.7%, and MAE for vertebral corners of 2.87-3.62 mm on AP radiographs. Agreement on the internal testing set for manual measurements was acceptable, with an MAE of 0.26 mm or degree-0.51 mm or degree for the adolescent subgroup and 0.29 mm or degree-4.93 mm or degree for the older adult subgroup on AP radiographs. The MAE for the T5-T12 Cobb angle and sagittal balance, on lateral radiographs, was 1.03° and 0.84 mm, respectively, in adolescents, and 4.60° and 9.41 mm, respectively, in older adults. Automated measurement time was significantly shorter compared with manual measurements. The deep learning automated system provides rapid, accurate, and reliable measurements for scoliosis diagnosis, which could improve clinical workflow efficiency and guide scoliosis treatment. Level III.

Inference of single cell profiles from histology stains with the Single-Cell omics from Histology Analysis Framework (SCHAF)

Comiter, C., Chen, X., Vaishnav, E. D., Kobayashi-Kirschvink, K. J., Ciapmricotti, M., Zhang, K., Murray, J., Monticolo, F., Qi, J., Tanaka, R., Brodowska, S. E., Li, B., Yang, Y., Rodig, S. J., Karatza, A., Quintanal Villalonga, A., Turner, M., Pfaff, K. L., Jane-Valbuena, J., Slyper, M., Waldman, J., Vigneau, S., Wu, J., Blosser, T. R., Segerstolpe, A., Abravanel, D., Wagle, N., Demehri, S., Zhuang, X., Rudin, C. M., Klughammer, J., Rozenblatt-Rosen, O., Stultz, C. M., Shu, J., Regev, A.

biorxiv logopreprintJun 13 2025
Tissue biology involves an intricate balance between cell-intrinsic processes and interactions between cells organized in specific spatial patterns, which can be respectively captured by single cell profiling methods, such as single cell RNA-seq (scRNA-seq) and spatial transcriptomics, and histology imaging data, such as Hematoxylin-and-Eosin (H&E) stains. While single cell profiles provide rich molecular information, they can be challenging to collect routinely in the clinic and either lack spatial resolution or high gene throughput. Conversely, histological H&E assays have been a cornerstone of tissue pathology for decades, but do not directly report on molecular details, although the observed structure they capture arises from molecules and cells. Here, we leverage vision transformers and adversarial deep learning to develop the Single Cell omics from Histology Analysis Framework (SCHAF), which generates a tissue sample's spatially-resolved whole transcriptome single cell omics dataset from its H&E histology image. We demonstrate SCHAF on a variety of tissues--including lung cancer, metastatic breast cancer, placentae, and whole mouse pups--training with matched samples analyzed by sc/snRNA-seq, H&E staining, and, when available, spatial transcriptomics. SCHAF generated appropriate single cell profiles from histology images in test data, related them spatially, and compared well to ground-truth scRNA-Seq, expert pathologist annotations, or direct spatial transcriptomic measurements, with some limitations. SCHAF opens the way to next-generation H&E analyses and an integrated understanding of cell and tissue biology in health and disease.

Anatomy-Grounded Weakly Supervised Prompt Tuning for Chest X-ray Latent Diffusion Models

Konstantinos Vilouras, Ilias Stogiannidis, Junyu Yan, Alison Q. O'Neil, Sotirios A. Tsaftaris

arxiv logopreprintJun 12 2025
Latent Diffusion Models have shown remarkable results in text-guided image synthesis in recent years. In the domain of natural (RGB) images, recent works have shown that such models can be adapted to various vision-language downstream tasks with little to no supervision involved. On the contrary, text-to-image Latent Diffusion Models remain relatively underexplored in the field of medical imaging, primarily due to limited data availability (e.g., due to privacy concerns). In this work, focusing on the chest X-ray modality, we first demonstrate that a standard text-conditioned Latent Diffusion Model has not learned to align clinically relevant information in free-text radiology reports with the corresponding areas of the given scan. Then, to alleviate this issue, we propose a fine-tuning framework to improve multi-modal alignment in a pre-trained model such that it can be efficiently repurposed for downstream tasks such as phrase grounding. Our method sets a new state-of-the-art on a standard benchmark dataset (MS-CXR), while also exhibiting robust performance on out-of-distribution data (VinDr-CXR). Our code will be made publicly available.

A fully open AI foundation model applied to chest radiography.

Ma D, Pang J, Gotway MB, Liang J

pubmed logopapersJun 11 2025
Chest radiography frequently serves as baseline imaging for most lung diseases<sup>1</sup>. Deep learning has great potential for automating the interpretation of chest radiography<sup>2</sup>. However, existing chest radiographic deep learning models are limited in diagnostic scope, generalizability, adaptability, robustness and extensibility. To overcome these limitations, we have developed Ark<sup>+</sup>, a foundation model applied to chest radiography and pretrained by cyclically accruing and reusing the knowledge from heterogeneous expert labels in numerous datasets. Ark<sup>+</sup> excels in diagnosing thoracic diseases. It expands the diagnostic scope and addresses potential misdiagnosis. It can adapt to evolving diagnostic needs and respond to novel diseases. It can learn rare conditions from a few samples and transfer to new diagnostic settings without training. It tolerates data biases and long-tailed distributions, and it supports federated learning to preserve privacy. All codes and pretrained models have been released, so that Ark<sup>+</sup> is open for fine-tuning, local adaptation and improvement. It is extensible to several modalities. Thus, it is a foundation model for medical imaging. The exceptional capabilities of Ark<sup>+</sup> stem from our insight: aggregating various datasets diversifies the patient populations and accrues knowledge from many experts to yield unprecedented performance while reducing annotation costs<sup>3</sup>. The development of Ark<sup>+</sup> reveals that open models trained by accruing and reusing knowledge from heterogeneous expert annotations with a multitude of public (big or small) datasets can surpass the performance of proprietary models trained on large data. We hope that our findings will inspire more researchers to share code and datasets or federate privacy-preserving data to create open foundation models with diverse, global expertise and patient populations, thus accelerating open science and democratizing AI for medicine.
Page 5 of 18173 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.