Sort by:
Page 8 of 13124 results

Comparing the Effects of Persistence Barcodes Aggregation and Feature Concatenation on Medical Imaging

Dashti A. Ali, Richard K. G. Do, William R. Jarnagin, Aras T. Asaad, Amber L. Simpson

arxiv logopreprintMay 29 2025
In medical image analysis, feature engineering plays an important role in the design and performance of machine learning models. Persistent homology (PH), from the field of topological data analysis (TDA), demonstrates robustness and stability to data perturbations and addresses the limitation from traditional feature extraction approaches where a small change in input results in a large change in feature representation. Using PH, we store persistent topological and geometrical features in the form of the persistence barcode whereby large bars represent global topological features and small bars encapsulate geometrical information of the data. When multiple barcodes are computed from 2D or 3D medical images, two approaches can be used to construct the final topological feature vector in each dimension: aggregating persistence barcodes followed by featurization or concatenating topological feature vectors derived from each barcode. In this study, we conduct a comprehensive analysis across diverse medical imaging datasets to compare the effects of the two aforementioned approaches on the performance of classification models. The results of this analysis indicate that feature concatenation preserves detailed topological information from individual barcodes, yields better classification performance and is therefore a preferred approach when conducting similar experiments.

Super-temporal-resolution Photoacoustic Imaging with Dynamic Reconstruction through Implicit Neural Representation in Sparse-view

Youshen Xiao, Yiling Shi, Ruixi Sun, Hongjiang Wei, Fei Gao, Yuyao Zhang

arxiv logopreprintMay 29 2025
Dynamic Photoacoustic Computed Tomography (PACT) is an important imaging technique for monitoring physiological processes, capable of providing high-contrast images of optical absorption at much greater depths than traditional optical imaging methods. However, practical instrumentation and geometric constraints limit the number of acoustic sensors available around the imaging target, leading to sparsity in sensor data. Traditional photoacoustic (PA) image reconstruction methods, when directly applied to sparse PA data, produce severe artifacts. Additionally, these traditional methods do not consider the inter-frame relationships in dynamic imaging. Temporal resolution is crucial for dynamic photoacoustic imaging, which is fundamentally limited by the low repetition rate (e.g., 20 Hz) and high cost of high-power laser technology. Recently, Implicit Neural Representation (INR) has emerged as a powerful deep learning tool for solving inverse problems with sparse data, by characterizing signal properties as continuous functions of their coordinates in an unsupervised manner. In this work, we propose an INR-based method to improve dynamic photoacoustic image reconstruction from sparse-views and enhance temporal resolution, using only spatiotemporal coordinates as input. Specifically, the proposed INR represents dynamic photoacoustic images as implicit functions and encodes them into a neural network. The weights of the network are learned solely from the acquired sparse sensor data, without the need for external training datasets or prior images. Benefiting from the strong implicit continuity regularization provided by INR, as well as explicit regularization for low-rank and sparsity, our proposed method outperforms traditional reconstruction methods under two different sparsity conditions, effectively suppressing artifacts and ensuring image quality.

Deep Modeling and Optimization of Medical Image Classification

Yihang Wu, Muhammad Owais, Reem Kateb, Ahmad Chaddad

arxiv logopreprintMay 29 2025
Deep models, such as convolutional neural networks (CNNs) and vision transformer (ViT), demonstrate remarkable performance in image classification. However, those deep models require large data to fine-tune, which is impractical in the medical domain due to the data privacy issue. Furthermore, despite the feasible performance of contrastive language image pre-training (CLIP) in the natural domain, the potential of CLIP has not been fully investigated in the medical field. To face these challenges, we considered three scenarios: 1) we introduce a novel CLIP variant using four CNNs and eight ViTs as image encoders for the classification of brain cancer and skin cancer, 2) we combine 12 deep models with two federated learning techniques to protect data privacy, and 3) we involve traditional machine learning (ML) methods to improve the generalization ability of those deep models in unseen domain data. The experimental results indicate that maxvit shows the highest averaged (AVG) test metrics (AVG = 87.03\%) in HAM10000 dataset with multimodal learning, while convnext\_l demonstrates remarkable test with an F1-score of 83.98\% compared to swin\_b with 81.33\% in FL model. Furthermore, the use of support vector machine (SVM) can improve the overall test metrics with AVG of $\sim 2\%$ for swin transformer series in ISIC2018. Our codes are available at https://github.com/AIPMLab/SkinCancerSimulation.

Comparing the Effects of Persistence Barcodes Aggregation and Feature Concatenation on Medical Imaging

Dashti A. Ali, Richard K. G. Do, William R. Jarnagin, Aras T. Asaad, Amber L. Simpson

arxiv logopreprintMay 29 2025
In medical image analysis, feature engineering plays an important role in the design and performance of machine learning models. Persistent homology (PH), from the field of topological data analysis (TDA), demonstrates robustness and stability to data perturbations and addresses the limitation from traditional feature extraction approaches where a small change in input results in a large change in feature representation. Using PH, we store persistent topological and geometrical features in the form of the persistence barcode whereby large bars represent global topological features and small bars encapsulate geometrical information of the data. When multiple barcodes are computed from 2D or 3D medical images, two approaches can be used to construct the final topological feature vector in each dimension: aggregating persistence barcodes followed by featurization or concatenating topological feature vectors derived from each barcode. In this study, we conduct a comprehensive analysis across diverse medical imaging datasets to compare the effects of the two aforementioned approaches on the performance of classification models. The results of this analysis indicate that feature concatenation preserves detailed topological information from individual barcodes, yields better classification performance and is therefore a preferred approach when conducting similar experiments.

Patch-based Reconstruction for Unsupervised Dynamic MRI using Learnable Tensor Function with Implicit Neural Representation

Yuanyuan Liu, Yuanbiao Yang, Zhuo-Xu Cui, Qingyong Zhu, Jing Cheng, Congcong Liu, Jinwen Xie, Jingran Xu, Hairong Zheng, Dong Liang, Yanjie Zhu

arxiv logopreprintMay 28 2025
Dynamic MRI plays a vital role in clinical practice by capturing both spatial details and dynamic motion, but its high spatiotemporal resolution is often limited by long scan times. Deep learning (DL)-based methods have shown promising performance in accelerating dynamic MRI. However, most existing algorithms rely on large fully-sampled datasets for training, which are difficult to acquire. Recently, implicit neural representation (INR) has emerged as a powerful scan-specific paradigm for accelerated MRI, which models signals as a continuous function over spatiotemporal coordinates. Although this approach achieves efficient continuous modeling of dynamic images and robust reconstruction, it faces challenges in recovering fine details and increasing computational demands for high dimensional data representation. To enhance both efficiency and reconstruction quality, we propose TenF-INR, a novel patch-based unsupervised framework that employs INR to model bases of tensor decomposition, enabling efficient and accurate modeling of dynamic MR images with learnable tensor functions. By exploiting strong correlations in similar spatial image patches and in the temporal direction, TenF-INR enforces multidimensional low-rankness and implements patch-based reconstruction with the benefits of continuous modeling. We compare TenF-INR with state-of-the-art methods, including supervised DL methods and unsupervised approaches. Experimental results demonstrate that TenF-INR achieves high acceleration factors up to 21, outperforming all comparison methods in image quality, temporal fidelity, and quantitative metrics, even surpassing the supervised methods.

Rep3D: Re-parameterize Large 3D Kernels with Low-Rank Receptive Modeling for Medical Imaging

Ho Hin Lee, Quan Liu, Shunxing Bao, Yuankai Huo, Bennett A. Landman

arxiv logopreprintMay 26 2025
In contrast to vision transformers, which model long-range dependencies through global self-attention, large kernel convolutions provide a more efficient and scalable alternative, particularly in high-resolution 3D volumetric settings. However, naively increasing kernel size often leads to optimization instability and degradation in performance. Motivated by the spatial bias observed in effective receptive fields (ERFs), we hypothesize that different kernel elements converge at variable rates during training. To support this, we derive a theoretical connection between element-wise gradients and first-order optimization, showing that structurally re-parameterized convolution blocks inherently induce spatially varying learning rates. Building on this insight, we introduce Rep3D, a 3D convolutional framework that incorporates a learnable spatial prior into large kernel training. A lightweight two-stage modulation network generates a receptive-biased scaling mask, adaptively re-weighting kernel updates and enabling local-to-global convergence behavior. Rep3D adopts a plain encoder design with large depthwise convolutions, avoiding the architectural complexity of multi-branch compositions. We evaluate Rep3D on five challenging 3D segmentation benchmarks and demonstrate consistent improvements over state-of-the-art baselines, including transformer-based and fixed-prior re-parameterization methods. By unifying spatial inductive bias with optimization-aware learning, Rep3D offers an interpretable, and scalable solution for 3D medical image analysis. The source code is publicly available at https://github.com/leeh43/Rep3D.

Advancements in Medical Image Classification through Fine-Tuning Natural Domain Foundation Models

Mobina Mansoori, Sajjad Shahabodini, Farnoush Bayatmakou, Jamshid Abouei, Konstantinos N. Plataniotis, Arash Mohammadi

arxiv logopreprintMay 26 2025
Using massive datasets, foundation models are large-scale, pre-trained models that perform a wide range of tasks. These models have shown consistently improved results with the introduction of new methods. It is crucial to analyze how these trends impact the medical field and determine whether these advancements can drive meaningful change. This study investigates the application of recent state-of-the-art foundation models, DINOv2, MAE, VMamba, CoCa, SAM2, and AIMv2, for medical image classification. We explore their effectiveness on datasets including CBIS-DDSM for mammography, ISIC2019 for skin lesions, APTOS2019 for diabetic retinopathy, and CHEXPERT for chest radiographs. By fine-tuning these models and evaluating their configurations, we aim to understand the potential of these advancements in medical image classification. The results indicate that these advanced models significantly enhance classification outcomes, demonstrating robust performance despite limited labeled data. Based on our results, AIMv2, DINOv2, and SAM2 models outperformed others, demonstrating that progress in natural domain training has positively impacted the medical domain and improved classification outcomes. Our code is publicly available at: https://github.com/sajjad-sh33/Medical-Transfer-Learning.

Advancing Limited-Angle CT Reconstruction Through Diffusion-Based Sinogram Completion

Jiaqi Guo, Santiago Lopez-Tapia, Aggelos K. Katsaggelos

arxiv logopreprintMay 26 2025
Limited Angle Computed Tomography (LACT) often faces significant challenges due to missing angular information. Unlike previous methods that operate in the image domain, we propose a new method that focuses on sinogram inpainting. We leverage MR-SDEs, a variant of diffusion models that characterize the diffusion process with mean-reverting stochastic differential equations, to fill in missing angular data at the projection level. Furthermore, by combining distillation with constraining the output of the model using the pseudo-inverse of the inpainting matrix, the diffusion process is accelerated and done in a step, enabling efficient and accurate sinogram completion. A subsequent post-processing module back-projects the inpainted sinogram into the image domain and further refines the reconstruction, effectively suppressing artifacts while preserving critical structural details. Quantitative experimental results demonstrate that the proposed method achieves state-of-the-art performance in both perceptual and fidelity quality, offering a promising solution for LACT reconstruction in scientific and clinical applications.

MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation

Chenglong Ma, Yuanfeng Ji, Jin Ye, Zilong Li, Chenhui Wang, Junzhi Ning, Wei Li, Lihao Liu, Qiushan Guo, Tianbin Li, Junjun He, Hongming Shan

arxiv logopreprintMay 25 2025
Advanced autoregressive models have reshaped multimodal AI. However, their transformative potential in medical imaging remains largely untapped due to the absence of a unified visual tokenizer -- one capable of capturing fine-grained visual structures for faithful image reconstruction and realistic image synthesis, as well as rich semantics for accurate diagnosis and image interpretation. To this end, we present MedITok, the first unified tokenizer tailored for medical images, encoding both low-level structural details and high-level clinical semantics within a unified latent space. To balance these competing objectives, we introduce a novel two-stage training framework: a visual representation alignment stage that cold-starts the tokenizer reconstruction learning with a visual semantic constraint, followed by a textual semantic representation alignment stage that infuses detailed clinical semantics into the latent space. Trained on the meticulously collected large-scale dataset with over 30 million medical images and 2 million image-caption pairs, MedITok achieves state-of-the-art performance on more than 30 datasets across 9 imaging modalities and 4 different tasks. By providing a unified token space for autoregressive modeling, MedITok supports a wide range of tasks in clinical diagnostics and generative healthcare applications. Model and code will be made publicly available at: https://github.com/Masaaki-75/meditok.

CDPDNet: Integrating Text Guidance with Hybrid Vision Encoders for Medical Image Segmentation

Jiong Wu, Yang Xing, Boxiao Yu, Wei Shao, Kuang Gong

arxiv logopreprintMay 25 2025
Most publicly available medical segmentation datasets are only partially labeled, with annotations provided for a subset of anatomical structures. When multiple datasets are combined for training, this incomplete annotation poses challenges, as it limits the model's ability to learn shared anatomical representations among datasets. Furthermore, vision-only frameworks often fail to capture complex anatomical relationships and task-specific distinctions, leading to reduced segmentation accuracy and poor generalizability to unseen datasets. In this study, we proposed a novel CLIP-DINO Prompt-Driven Segmentation Network (CDPDNet), which combined a self-supervised vision transformer with CLIP-based text embedding and introduced task-specific text prompts to tackle these challenges. Specifically, the framework was constructed upon a convolutional neural network (CNN) and incorporated DINOv2 to extract both fine-grained and global visual features, which were then fused using a multi-head cross-attention module to overcome the limited long-range modeling capability of CNNs. In addition, CLIP-derived text embeddings were projected into the visual space to help model complex relationships among organs and tumors. To further address the partial label challenge and enhance inter-task discriminative capability, a Text-based Task Prompt Generation (TTPG) module that generated task-specific prompts was designed to guide the segmentation. Extensive experiments on multiple medical imaging datasets demonstrated that CDPDNet consistently outperformed existing state-of-the-art segmentation methods. Code and pretrained model are available at: https://github.com/wujiong-hub/CDPDNet.git.
Page 8 of 13124 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.