Sort by:
Page 17 of 81804 results

Modality-Aware Feature Matching: A Comprehensive Review of Single- and Cross-Modality Techniques

Weide Liu, Wei Zhou, Jun Liu, Ping Hu, Jun Cheng, Jungong Han, Weisi Lin

arxiv logopreprintJul 30 2025
Feature matching is a cornerstone task in computer vision, essential for applications such as image retrieval, stereo matching, 3D reconstruction, and SLAM. This survey comprehensively reviews modality-based feature matching, exploring traditional handcrafted methods and emphasizing contemporary deep learning approaches across various modalities, including RGB images, depth images, 3D point clouds, LiDAR scans, medical images, and vision-language interactions. Traditional methods, leveraging detectors like Harris corners and descriptors such as SIFT and ORB, demonstrate robustness under moderate intra-modality variations but struggle with significant modality gaps. Contemporary deep learning-based methods, exemplified by detector-free strategies like CNN-based SuperPoint and transformer-based LoFTR, substantially improve robustness and adaptability across modalities. We highlight modality-aware advancements, such as geometric and depth-specific descriptors for depth images, sparse and dense learning methods for 3D point clouds, attention-enhanced neural networks for LiDAR scans, and specialized solutions like the MIND descriptor for complex medical image matching. Cross-modal applications, particularly in medical image registration and vision-language tasks, underscore the evolution of feature matching to handle increasingly diverse data interactions.

LAMA-Net: A Convergent Network Architecture for Dual-Domain Reconstruction

Chi Ding, Qingchao Zhang, Ge Wang, Xiaojing Ye, Yunmei Chen

arxiv logopreprintJul 30 2025
We propose a learnable variational model that learns the features and leverages complementary information from both image and measurement domains for image reconstruction. In particular, we introduce a learned alternating minimization algorithm (LAMA) from our prior work, which tackles two-block nonconvex and nonsmooth optimization problems by incorporating a residual learning architecture in a proximal alternating framework. In this work, our goal is to provide a complete and rigorous convergence proof of LAMA and show that all accumulation points of a specified subsequence of LAMA must be Clarke stationary points of the problem. LAMA directly yields a highly interpretable neural network architecture called LAMA-Net. Notably, in addition to the results shown in our prior work, we demonstrate that the convergence property of LAMA yields outstanding stability and robustness of LAMA-Net in this work. We also show that the performance of LAMA-Net can be further improved by integrating a properly designed network that generates suitable initials, which we call iLAMA-Net. To evaluate LAMA-Net/iLAMA-Net, we conduct several experiments and compare them with several state-of-the-art methods on popular benchmark datasets for Sparse-View Computed Tomography.

Deep learning for tooth detection and segmentation in panoramic radiographs: a systematic review and meta-analysis.

Bonfanti-Gris M, Herrera A, Salido Rodríguez-Manzaneque MP, Martínez-Rus F, Pradíes G

pubmed logopapersJul 30 2025
This systematic review and meta-analysis aimed to summarize and evaluate the available information regarding the performance of deep learning methods for tooth detection and segmentation in orthopantomographies. Electronic databases (Medline, Embase and Cochrane) were searched up to September 2023 for relevant observational studies and both, randomized and controlled clinical trials. Two reviewers independently conducted the study selection, data extraction, and quality assessments. GRADE (Grading of Recommendations, Assessment, Development, and Evaluation) assessment was adopted for collective grading of the overall body of evidence. From the 2,207 records identified, 20 studies were included in the analysis. Meta-analysis was conducted for the comparison of mesiodens detection and segmentation (n = 6) using sensitivity and specificity as the two main diagnostic parameters. A graphical summary of the analysis was also plotted and a Hierarchical Summary Receiver Operating Characteristic curve, prediction region, summary point, and confidence region were illustrated. The included studies quantitative analysis showed pooled sensitivity, specificity, positive LR, negative LR, and diagnostic odds ratio of 0.92 (95% confidence interval [CI], 0.84-0.96), 0.94 (95% CI, 0.89-0.97), 15.7 (95% CI, 7.6-32.2), 0.08 (95% CI, 0.04-0.18), and 186 (95% CI, 44-793), respectively. A graphical summary of the meta-analysis was plotted based on sensitivity and specificity. Hierarchical Summary Receiver Operating Characteristic curves showed a positive correlation between logit-transformed sensitivity and specificity (r = 0.886). Based on the results of the meta-analysis and GRADE assessment, a moderate recommendation is advised to dental operators when relying on AI-based tools for tooth detection and segmentation in panoramic radiographs.

A privacy preserving machine learning framework for medical image analysis using quantized fully connected neural networks with TFHE based inference.

Selvakumar S, Senthilkumar B

pubmed logopapersJul 30 2025
Medical image analysis using deep learning algorithms has become a basis of modern healthcare, enabling early detection, diagnosis, treatment planning, and disease monitoring. However, sharing sensitive raw medical data with third parties for analysis raises significant privacy concerns. This paper presents a privacy-preserving machine learning (PPML) framework using a Fully Connected Neural Network (FCNN) for secure medical image analysis using the MedMNIST dataset. The proposed PPML framework leverages a torus-based fully homomorphic encryption (TFHE) to ensure data privacy during inference, maintain patient confidentiality, and ensure compliance with privacy regulations. The FCNN model is trained in a plaintext environment for FHE compatibility using Quantization-Aware Training to optimize weights and activations. The quantized FCNN model is then validated under FHE constraints through simulation and compiled into an FHE-compatible circuit for encrypted inference on sensitive data. The proposed framework is evaluated on the MedMNIST datasets to assess its accuracy and inference time in both plaintext and encrypted environments. Experimental results reveal that the PPML framework achieves a prediction accuracy of 88.2% in the plaintext setting and 87.5% during encrypted inference, with an average inference time of 150 milliseconds per image. This shows that FCNN models paired with TFHE-based encryption achieve high prediction accuracy on MedMNIST datasets with minimal performance degradation compared to unencrypted inference.

Efficacy of image similarity as a metric for augmenting small dataset retinal image segmentation.

Wallace T, Heng IS, Subasic S, Messenger C

pubmed logopapersJul 30 2025
Synthetic images are an option for augmenting limited medical imaging datasets to improve the performance of various machine learning models. A common metric for evaluating synthetic image quality is the Fréchet Inception Distance (FID) which measures the similarity of two image datasets. In this study we evaluate the relationship between this metric and the improvement which synthetic images, generated by a Progressively Growing Generative Adversarial Network (PGGAN), grant when augmenting Diabetes-related Macular Edema (DME) intraretinal fluid segmentation performed by a U-Net model with limited amounts of training data. We find that the behaviour of augmenting with standard and synthetic images agrees with previously conducted experiments. Additionally, we show that dissimilar (high FID) datasets do not improve segmentation significantly. As FID between the training and augmenting datasets decreases, the augmentation datasets are shown to contribute to significant and robust improvements in image segmentation. Finally, we find that there is significant evidence to suggest that synthetic and standard augmentations follow separate log-normal trends between FID and improvements in model performance, with synthetic data proving more effective than standard augmentation techniques. Our findings show that more similar datasets (lower FID) will be more effective at improving U-Net performance, however, the results also suggest that this improvement may only occur when images are sufficiently dissimilar.

SwinECAT: A Transformer-based fundus disease classification model with Shifted Window Attention and Efficient Channel Attention

Peiran Gu, Teng Yao, Mengshen He, Fuhao Duan, Feiyan Liu, RenYuan Peng, Bao Ge

arxiv logopreprintJul 29 2025
In recent years, artificial intelligence has been increasingly applied in the field of medical imaging. Among these applications, fundus image analysis presents special challenges, including small lesion areas in certain fundus diseases and subtle inter-disease differences, which can lead to reduced prediction accuracy and overfitting in the models. To address these challenges, this paper proposes the Transformer-based model SwinECAT, which combines the Shifted Window (Swin) Attention with the Efficient Channel Attention (ECA) Attention. SwinECAT leverages the Swin Attention mechanism in the Swin Transformer backbone to effectively capture local spatial structures and long-range dependencies within fundus images. The lightweight ECA mechanism is incorporated to guide the SwinECAT's attention toward critical feature channels, enabling more discriminative feature representation. In contrast to previous studies that typically classify fundus images into 4 to 6 categories, this work expands fundus disease classification to 9 distinct types, thereby enhancing the granularity of diagnosis. We evaluate our method on the Eye Disease Image Dataset (EDID) containing 16,140 fundus images for 9-category classification. Experimental results demonstrate that SwinECAT achieves 88.29\% accuracy, with weighted F1-score of 0.88 and macro F1-score of 0.90. The classification results of our proposed model SwinECAT significantly outperform the baseline Swin Transformer and multiple compared baseline models. To our knowledge, this represents the highest reported performance for 9-category classification on this public dataset.

VidFuncta: Towards Generalizable Neural Representations for Ultrasound Videos

Julia Wolleb, Florentin Bieder, Paul Friedrich, Hemant D. Tagare, Xenophon Papademetris

arxiv logopreprintJul 29 2025
Ultrasound is widely used in clinical care, yet standard deep learning methods often struggle with full video analysis due to non-standardized acquisition and operator bias. We offer a new perspective on ultrasound video analysis through implicit neural representations (INRs). We build on Functa, an INR framework in which each image is represented by a modulation vector that conditions a shared neural network. However, its extension to the temporal domain of medical videos remains unexplored. To address this gap, we propose VidFuncta, a novel framework that leverages Functa to encode variable-length ultrasound videos into compact, time-resolved representations. VidFuncta disentangles each video into a static video-specific vector and a sequence of time-dependent modulation vectors, capturing both temporal dynamics and dataset-level redundancies. Our method outperforms 2D and 3D baselines on video reconstruction and enables downstream tasks to directly operate on the learned 1D modulation vectors. We validate VidFuncta on three public ultrasound video datasets -- cardiac, lung, and breast -- and evaluate its downstream performance on ejection fraction prediction, B-line detection, and breast lesion classification. These results highlight the potential of VidFuncta as a generalizable and efficient representation framework for ultrasound videos. Our code is publicly available under https://github.com/JuliaWolleb/VidFuncta_public.

AI generated annotations for Breast, Brain, Liver, Lungs, and Prostate cancer collections in the National Cancer Institute Imaging Data Commons.

Murugesan GK, McCrumb D, Soni R, Kumar J, Nuernberg L, Pei L, Wagner U, Granger S, Fedorov AY, Moore S, Van Oss J

pubmed logopapersJul 29 2025
The Artificial Intelligence in Medical Imaging (AIMI) initiative aims to enhance the National Cancer Institute's (NCI) Image Data Commons (IDC) by releasing fully reproducible nnU-Net models, along with AI-assisted segmentation for cancer radiology images. In this extension of our earlier work, we created high-quality, AI-annotated imaging datasets for 11 IDC collections, spanning computed tomography (CT) and magnetic resonance imaging (MRI) of the lungs, breast, brain, kidneys, prostate, and liver. Each nnU-Net model was trained on open-source datasets, and a portion of the AI-generated annotations was reviewed and corrected by board-certified radiologists. Both the AI and radiologist annotations were encoded in compliance with the Digital Imaging and Communications in Medicine (DICOM) standard, ensuring seamless integration into the IDC collections. By making these models, images, and annotations publicly accessible, we aim to facilitate further research and development in cancer imaging.

MFFBi-Unet: Merging Dynamic Sparse Attention and Multi-scale Feature Fusion for Medical Image Segmentation.

Sun B, Liu C, Wang Q, Bi K, Zhang W

pubmed logopapersJul 29 2025
The advancement of deep learning has driven extensive research validating the effectiveness of U-Net-style symmetric encoder-decoder architectures based on Transformers for medical image segmentation. However, the inherent design requiring attention mechanisms to compute token affinities across all spatial locations leads to prohibitive computational complexity and substantial memory demands. Recent efforts have attempted to address these limitations through sparse attention mechanisms. However, existing approaches employing artificial, content-agnostic sparse attention patterns demonstrate limited capability in modeling long-range dependencies effectively. We propose MFFBi-Unet, a novel architecture incorporating dynamic sparse attention through bi-level routing, enabling context-aware computation allocation with enhanced adaptability. The encoder-decoder module integrates BiFormer to optimize semantic feature extraction and facilitate high-fidelity feature map reconstruction. A novel Multi-scale Feature Fusion (MFF) module in skip connections synergistically combines multi-level contextual information with processed multi-scale features. Extensive evaluations on multiple public medical benchmarks demonstrate that our method consistently exhibits significant advantages. Notably, our method achieves statistically significant improvements, outperforming state-of-the-art approaches like MISSFormer by 2.02% and 1.28% Dice scores on respective benchmarks.

Time-series X-ray image prediction of dental skeleton treatment progress via neural networks.

Kwon SW, Moon JK, Song SC, Cha JY, Kim YW, Choi YJ, Lee JS

pubmed logopapersJul 29 2025
Accurate prediction of skeletal changes during orthodontic treatment in growing patients remains challenging due to significant individual variability in craniofacial growth and treatment responses. Conventional methods, such as support vector regression and multilayer perceptrons, require multiple sequential radiographs to achieve acceptable accuracy. However, they are limited by increased radiation exposure, susceptibility to landmark identification errors, and the lack of visually interpretable predictions. To overcome these limitations, this study explored advanced generative approaches, including denoising diffusion probabilistic models (DDPMs), latent diffusion models (LDMs), and ControlNet, to predict future cephalometric radiographs using minimal input data. We evaluated three diffusion-based models-a DDPM utilizing three sequential cephalometric images (3-input DDPM), a single-image DDPM (1-input DDPM), and a single-image LDM-and a vision-based generative model, ControlNet, conditioned on patient-specific attributes such as age, sex, and orthodontic treatment type. Quantitative evaluations demonstrated that the 3-input DDPM achieved the highest numerical accuracy, whereas the single-image LDM delivered comparable predictive performance with significantly reduced clinical requirements. ControlNet also exhibited competitive accuracy, highlighting its potential effectiveness in clinical scenarios. These findings indicate that the single-image LDM and ControlNet offer practical solutions for personalized orthodontic treatment planning, reducing patient visits and radiation exposure while maintaining robust predictive accuracy.
Page 17 of 81804 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.