Sort by:
Page 13 of 1331322 results

Enhancing 3D Medical Image Understanding with Pretraining Aided by 2D Multimodal Large Language Models.

Chen Q, Yao X, Ye H, Hong Y

pubmed logopapersSep 15 2025
Understanding 3D medical image volumes is critical in the medical field, yet existing 3D medical convolution and transformer-based self-supervised learning (SSL) methods often lack deep semantic comprehension. Recent advancements in multimodal large language models (MLLMs) provide a promising approach to enhance image understanding through text descriptions. To leverage these 2D MLLMs for improved 3D medical image understanding, we propose Med3DInsight, a novel pretraining framework that integrates 3D image encoders with 2D MLLMs via a specially designed plane-slice-aware transformer module. Additionally, our model employs a partial optimal transport based alignment, demonstrating greater tolerance to noise introduced by potential noises in LLM-generated content. Med3DInsight introduces a new paradigm for scalable multimodal 3D medical representation learning without requiring human annotations. Extensive experiments demonstrate our state-of-the-art performance on two downstream tasks, i.e., segmentation and classification, across various public datasets with CT and MRI modalities, outperforming current SSL methods. Med3DInsight can be seamlessly integrated into existing 3D medical image understanding networks, potentially enhancing their performance. Our source code, generated datasets, and pre-trained models will be available upon acceptance.

Fractal-driven self-supervised learning enhances early-stage lung cancer GTV segmentation: a novel transfer learning framework.

Tozuka R, Kadoya N, Yasunaga A, Saito M, Komiyama T, Nemoto H, Ando H, Onishi H, Jingu K

pubmed logopapersSep 15 2025
To develop and evaluate a novel deep learning strategy for automated early-stage lung cancer gross tumor volume (GTV) segmentation, utilizing pre-training with mathematically generated non-natural fractal images. This retrospective study included 104 patients (36-91 years old; 81 males; 23 females) with peripheral early-stage non-small cell lung cancer who underwent radiotherapy at our institution from December 2017 to March 2025. First, we utilized encoders from a Convolutional Neural Network and a Vision Transformer (ViT), pre-trained with four learning strategies: from scratch, ImageNet-1K (1,000 classes of natural images), FractalDB-1K (1,000 classes of fractal images), and FractalDB-10K (10,000 classes of fractal images), with the latter three utilizing publicly available models. Second, the models were fine-tuned using CT images and physician-created contour data. Model accuracy was then evaluated using the volumetric Dice Similarity Coefficient (vDSC), surface Dice Similarity Coefficient (sDSC), and 95th percentile Hausdorff Distance (HD95) between the predicted and ground truth GTV contours, averaged across the fourfold cross-validation. Additionally, the segmentation accuracy was compared between simple and complex groups, categorized by the surface-to-volume ratio, to assess the impact of GTV shape complexity. Pre-trained with FractalDB-10K yielded the best segmentation accuracy across all metrics. For the ViT model, the vDSC, sDSC, and HD95 results were 0.800 ± 0.079, 0.732 ± 0.152, and 2.04 ± 1.59 mm for FractalDB-10K; 0.779 ± 0.093, 0.688 ± 0.156, and 2.72 ± 3.12 mm for FractalDB-1K; 0.764 ± 0.102, 0.660 ± 0.156, and 3.03 ± 3.47 mm for ImageNet-1K, respectively. In conditions FractalDB-1K and ImageNet-1K, there was no significant difference in the simple group, whereas the complex group showed a significantly higher vDSC (0.743 ± 0.095 vs 0.714 ± 0.104, p = 0.006). Pre-training with fractal structures achieved comparable or superior accuracy to ImageNet pre-training for early-stage lung cancer GTV auto-segmentation.

End-to-End Learning of Multi-Organ Implicit Surfaces from 3D Medical Imaging Data

Farahdiba Zarin, Nicolas Padoy, Jérémy Dana, Vinkle Srivastav

arxiv logopreprintSep 15 2025
The fine-grained surface reconstruction of different organs from 3D medical imaging can provide advanced diagnostic support and improved surgical planning. However, the representation of the organs is often limited by the resolution, with a detailed higher resolution requiring more memory and computing footprint. Implicit representations of objects have been proposed to alleviate this problem in general computer vision by providing compact and differentiable functions to represent the 3D object shapes. However, architectural and data-related differences prevent the direct application of these methods to medical images. This work introduces ImplMORe, an end-to-end deep learning method using implicit surface representations for multi-organ reconstruction from 3D medical images. ImplMORe incorporates local features using a 3D CNN encoder and performs multi-scale interpolation to learn the features in the continuous domain using occupancy functions. We apply our method for single and multiple organ reconstructions using the totalsegmentator dataset. By leveraging the continuous nature of occupancy functions, our approach outperforms the discrete explicit representation based surface reconstruction approaches, providing fine-grained surface details of the organ at a resolution higher than the given input image. The source code will be made publicly available at: https://github.com/CAMMA-public/ImplMORe

U-Mamba2: Scaling State Space Models for Dental Anatomy Segmentation in CBCT

Zhi Qin Tan, Xiatian Zhu, Owen Addison, Yunpeng Li

arxiv logopreprintSep 15 2025
Cone-Beam Computed Tomography (CBCT) is a widely used 3D imaging technique in dentistry, providing volumetric information about the anatomical structures of jaws and teeth. Accurate segmentation of these anatomies is critical for clinical applications such as diagnosis and surgical planning, but remains time-consuming and challenging. In this paper, we present U-Mamba2, a new neural network architecture designed for multi-anatomy CBCT segmentation in the context of the ToothFairy3 challenge. U-Mamba2 integrates the Mamba2 state space models into the U-Net architecture, enforcing stronger structural constraints for higher efficiency without compromising performance. In addition, we integrate interactive click prompts with cross-attention blocks, pre-train U-Mamba2 using self-supervised learning, and incorporate dental domain knowledge into the model design to address key challenges of dental anatomy segmentation in CBCT. Extensive experiments, including independent tests, demonstrate that U-Mamba2 is both effective and efficient, securing top 3 places in both tasks of the Toothfairy3 challenge. In Task 1, U-Mamba2 achieved a mean Dice of 0.792, HD95 of 93.19 with the held-out test data, with an average inference time of XX (TBC during the ODIN workshop). In Task 2, U-Mamba2 achieved the mean Dice of 0.852 and HD95 of 7.39 with the held-out test data. The code is publicly available at https://github.com/zhiqin1998/UMamba2.

Artificial Intelligence-Derived Intramuscular Adipose Tissue Assessment Predicts Perineal Wound Complications Following Abdominoperineal Resection.

Besson A, Cao K, Kokelaar R, Hajdarevic E, Wirth L, Yeung J, Yeung JM

pubmed logopapersSep 15 2025
Perineal wound complications following abdominoperineal resection (APR) significantly impacts patient morbidity. Despite various closure techniques, no method has proven superior. Body composition is a key factor influencing postoperative outcomes. AI-assisted CT scan analysis is an accurate and efficient approach to assessing body composition. This study aimed to evaluate whether body composition characteristics can predict perineal wound complications following APR. A retrospective cohort study of APR patients from 2012 to 2024 was conducted, comparing primary closure and inferior gluteal artery myocutaneous (IGAM) flap closure outcomes. Preoperative CT scans were analyzed using a validated AI model to measure lumbosacral skeletal muscle (SM), intramuscular adipose tissue (IMAT), visceral adipose tissue, and subcutaneous adipose tissue. Greater IMAT volume correlated with increased wound dehiscence in males undergoing IGAM closure (40% vs. 4.8% and p = 0.027). Lower SM-to-IMAT volume ratio was associated with higher wound infection rates (60% vs. 19% and p = 0.04). Closure technique did not significantly impact wound infection or dehiscence rates. This study is the first to use AI derived 3D body composition analysis to assess perineal wound complications after APR. IMAT volume significantly influences wound healing in male patients having IGAM reconstruction.

Multi-scale based Network and Adaptive EfficientnetB7 with ASPP: Analysis of Novel Brain Tumor Segmentation and Classification.

Kulkarni SV, Poornapushpakala S

pubmed logopapersSep 15 2025
Medical imaging has undergone significant advancements with the integration of deep learning techniques, leading to enhanced accuracy in image analysis. These methods autonomously extract relevant features from medical images, thereby improving the detection and classification of various diseases. Among imaging modalities, Magnetic Resonance Imaging (MRI) is particularly valuable due to its high contrast resolution, which enables the differentiation of soft tissues, making it indispensable in the diagnosis of brain disorders. The accurate classification of brain tumors is crucial for diagnosing many neurological conditions. However, conventional classification techniques are often limited by high computational complexity and suboptimal accuracy. Motivated by these issues, an innovative model is proposed in this work for segmenting and classifying brain tumors. The research aims to develop a robust and efficient deep learning framework that can assist clinicians in making precise and early diagnoses, ultimately leading to more effective treatment planning. The proposed methodology begins with the acquisition of MRI images from standardized medical imaging databases. Subsequently, the abnormal regions from the images are segmented using the Multiscale Bilateral Awareness Network (MBANet), which incorporates multi-scale operations to enhance feature representation and image quality. A novel classificationarchitecture then processes the segmented images, termed Region Vision Transformer-based Adaptive EfficientNetB7 with Atrous Spatial Pyramid Pooling (RVAEB7-ASPP). To optimize the performance of the classification model, hyperparameters are fine-tuned using the Modified Random Parameter-based Hippopotamus Optimization Algorithm (MRP-HOA). The model's effectiveness is verified through a comprehensive experimental evaluation that utilizes various performance metrics and is compared to current state-of-the-art methods. The proposed MRP-HOA-RVAEB7-ASPP model achieves an impressive classification accuracy of 98.2%, significantly outperforming conventional approaches in brain tumor classification tasks. The MBANet effectively performs brain tumor segmentation, while the RVAEB7-ASPP model provides reliable classification. The integration of the MRP-HOA-RVAEB7-ASPP model optimizes feature extractions and parameter tuning, leading to improved accuracy and robustness. The integration of advanced segmentation, adaptive feature extraction, and optimal parameter tuning enhances the reliability and accuracy of the model. This framework provides a more effective and trustworthy solution for the early detection and clinical assessment of brain tumors, leading to improved patient outcomes through timely intervention.

Deep Learning for Breast Mass Discrimination: Integration of B-Mode Ultrasound & Nakagami Imaging with Automatic Lesion Segmentation

Hassan, M. W., Hossain, M. M.

medrxiv logopreprintSep 15 2025
ObjectiveThis study aims to enhance breast cancer diagnosis by developing an automated deep learning framework for real-time, quantitative ultrasound imaging. Breast cancer is the second leading cause of cancer-related deaths among women, and early detection is crucial for improving survival rates. Conventional ultrasound, valued for its non-invasive nature and real-time capability, is limited by qualitative assessments and inter-observer variability. Quantitative ultrasound (QUS) methods, including Nakagami imaging--which models the statistical distribution of backscattered signals and lesion morphology--present an opportunity for more objective analysis. MethodsThe proposed framework integrates three convolutional neural networks (CNNs): (1) NakaSynthNet, synthesizing quantitative Nakagami parameter images from B-mode ultrasound; (2) SegmentNet, enabling automated lesion segmentation; and (3) FeatureNet, which combines anatomical and statistical features for classifying lesions as benign or malignant. Training utilized a diverse dataset of 110,247 images, comprising clinical B-mode scans and various simulated examples (fruit, mammographic lesions, digital phantoms). Quantitative performance was evaluated using mean squared error (MSE), structural similarity index (SSIM), segmentation accuracy, sensitivity, specificity, and area under the curve (AUC). ResultsNakaSynthNet achieved real-time synthesis at 21 frames/s, with MSE of 0.09% and SSIM of 98%. SegmentNet reached 98.4% accuracy, and FeatureNet delivered 96.7% overall classification accuracy, 93% sensitivity, 98% specificity, and an AUC of 98%. ConclusionThe proposed multi-parametric deep learning pipeline enables accurate, real-time breast cancer diagnosis from ultrasound data using objective quantitative imaging. SignificanceThis framework advances the clinical utility of ultrasound by reducing subjectivity and providing robust, multi-parametric information for improved breast cancer detection.

Fully automatic bile duct segmentation in magnetic resonance cholangiopancreatography for biliary surgery planning using deep learning.

Tao H, Wang J, Guo K, Luo W, Zeng X, Lu M, Lin J, Li B, Qian Y, Yang J

pubmed logopapersSep 15 2025
To automatically and accurately perform three-dimensional reconstruction of dilated and non-dilated bile ducts based on magnetic resonance cholangiopancreatography (MRCP) data, assisting in the formulation of optimal surgical plans and guiding precise bile duct surgery. A total of 249 consecutive patients who underwent standardized 3D-MRCP scans were randomly divided into a training cohort (n = 208) and a testing cohort (n = 41). Ground truth segmentation was manually delineated by two hepatobiliary surgeons or radiologists following industry certification procedures and reviewed by two expert-level physicians for biliary surgery planning. The deep learning semantic segmentation model was constructed using the nnU-Net framework. Model performance was assessed by comparing model predictions with ground truth segmentation as well as real surgical scenarios. The generalization of the model was tested on a dataset of 10 3D-MRCP scans from other centers, with ground truth segmentation of biliary structures. The evaluation was performed on 41 internal test sets and 10 external test sets, with mean Dice Similarity Coefficient (DSC) values of respectively 0.9403 and 0.9070. The correlation coefficient between the 3D model based on automatic segmentation predictions and the ground truth results exceeded 0.95. The 95 % limits of agreement (LoA) for biliary tract length ranged from -4.456 to 4.781, and for biliary tract volume ranged from -3.404 to 3.650 ml. Furthermore, the intraoperative Indocyanine green (ICG) fluorescence imaging and operation situation validated that this model can accurately reconstruct biliary landmarks. By leveraging a deep learning algorithmic framework, an AI model can be trained to perform automatic and accurate 3D reconstructions of non-dilated bile ducts, thereby providing guidance for the preoperative planning of complex biliary surgeries.

Toward Next-generation Medical Vision Backbones: Modeling Finer-grained Long-range Visual Dependency

Mingyuan Meng

arxiv logopreprintSep 14 2025
Medical Image Computing (MIC) is a broad research topic covering both pixel-wise (e.g., segmentation, registration) and image-wise (e.g., classification, regression) vision tasks. Effective analysis demands models that capture both global long-range context and local subtle visual characteristics, necessitating fine-grained long-range visual dependency modeling. Compared to Convolutional Neural Networks (CNNs) that are limited by intrinsic locality, transformers excel at long-range modeling; however, due to the high computational loads of self-attention, transformers typically cannot process high-resolution features (e.g., full-scale image features before downsampling or patch embedding) and thus face difficulties in modeling fine-grained dependency among subtle medical image details. Concurrently, Multi-layer Perceptron (MLP)-based visual models are recognized as computation/memory-efficient alternatives in modeling long-range visual dependency but have yet to be widely investigated in the MIC community. This doctoral research advances deep learning-based MIC by investigating effective long-range visual dependency modeling. It first presents innovative use of transformers for both pixel- and image-wise medical vision tasks. The focus then shifts to MLPs, pioneeringly developing MLP-based visual models to capture fine-grained long-range visual dependency in medical images. Extensive experiments confirm the critical role of long-range dependency modeling in MIC and reveal a key finding: MLPs provide feasibility in modeling finer-grained long-range dependency among higher-resolution medical features containing enriched anatomical/pathological details. This finding establishes MLPs as a superior paradigm over transformers/CNNs, consistently enhancing performance across various medical vision tasks and paving the way for next-generation medical vision backbones.

Multi-encoder self-adaptive hard attention network with maximum intensity projections for lung nodule segmentation.

Usman M, Rehman A, Ur Rehman A, Shahid A, Khan TM, Razzak I, Chung M, Shin YG

pubmed logopapersSep 14 2025
Accurate lung nodule segmentation is crucial for early-stage lung cancer diagnosis, as it can substantially enhance patient survival rates. Computed tomography (CT) images are widely employed for early diagnosis in lung nodule analysis. However, the heterogeneity of lung nodules, size diversity, and the complexity of the surrounding environment pose challenges for developing robust nodule segmentation methods. In this study, we propose an efficient end-to-end framework, the Multi-Encoder Self-Adaptive Hard Attention Network (MESAHA-Net), which consists of three encoding paths, an attention block, and a decoder block that assimilates CT slice patches with both forward and backward maximum intensity projection (MIP) images. This synergy affords a profound contextual understanding of lung nodules and also results in a deluge of features. To manage the profusion of features generated, we incorporate a self-adaptive hard attention mechanism guided by region of interest (ROI) masks centered on nodular regions, which MESAHA-Net autonomously produces. The network sequentially undertakes slice-by-slice segmentation, emphasizing nodule regions to produce precise three-dimensional (3D) segmentation. The proposed framework has been comprehensively evaluated on the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) dataset, the largest publicly available dataset for lung nodule segmentation. The results demonstrate that our approach is highly robust across various lung nodule types, outperforming previous state-of-the-art techniques in terms of segmentation performance and computational complexity, making it suitable for real-time clinical implementation of artificial intelligence (AI)-driven diagnostic tools.
Page 13 of 1331322 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.