Sort by:
Page 4 of 21207 results

A Novel Downsampling Strategy Based on Information Complementarity for Medical Image Segmentation

Wenbo Yue, Chang Li, Guoping Xu

arxiv logopreprintJul 20 2025
In convolutional neural networks (CNNs), downsampling operations are crucial to model performance. Although traditional downsampling methods (such as maximum pooling and cross-row convolution) perform well in feature aggregation, receptive field expansion, and computational reduction, they may lead to the loss of key spatial information in semantic segmentation tasks, thereby affecting the pixel-by-pixel prediction accuracy.To this end, this study proposes a downsampling method based on information complementarity - Hybrid Pooling Downsampling (HPD). The core is to replace the traditional method with MinMaxPooling, and effectively retain the light and dark contrast and detail features of the image by extracting the maximum value information of the local area.Experiment on various CNN architectures on the ACDC and Synapse datasets show that HPD outperforms traditional methods in segmentation performance, and increases the DSC coefficient by 0.5% on average. The results show that the HPD module provides an efficient solution for semantic segmentation tasks.

Depthwise-Dilated Convolutional Adapters for Medical Object Tracking and Segmentation Using the Segment Anything Model 2

Guoping Xu, Christopher Kabat, You Zhang

arxiv logopreprintJul 19 2025
Recent advances in medical image segmentation have been driven by deep learning; however, most existing methods remain limited by modality-specific designs and exhibit poor adaptability to dynamic medical imaging scenarios. The Segment Anything Model 2 (SAM2) and its related variants, which introduce a streaming memory mechanism for real-time video segmentation, present new opportunities for prompt-based, generalizable solutions. Nevertheless, adapting these models to medical video scenarios typically requires large-scale datasets for retraining or transfer learning, leading to high computational costs and the risk of catastrophic forgetting. To address these challenges, we propose DD-SAM2, an efficient adaptation framework for SAM2 that incorporates a Depthwise-Dilated Adapter (DD-Adapter) to enhance multi-scale feature extraction with minimal parameter overhead. This design enables effective fine-tuning of SAM2 on medical videos with limited training data. Unlike existing adapter-based methods focused solely on static images, DD-SAM2 fully exploits SAM2's streaming memory for medical video object tracking and segmentation. Comprehensive evaluations on TrackRad2025 (tumor segmentation) and EchoNet-Dynamic (left ventricle tracking) datasets demonstrate superior performance, achieving Dice scores of 0.93 and 0.97, respectively. To the best of our knowledge, this work provides an initial attempt at systematically exploring adapter-based SAM2 fine-tuning for medical video segmentation and tracking. Code, datasets, and models will be publicly available at https://github.com/apple1986/DD-SAM2.

Performance comparison of medical image classification systems using TensorFlow Keras, PyTorch, and JAX

Merjem Bećirović, Amina Kurtović, Nordin Smajlović, Medina Kapo, Amila Akagić

arxiv logopreprintJul 19 2025
Medical imaging plays a vital role in early disease diagnosis and monitoring. Specifically, blood microscopy offers valuable insights into blood cell morphology and the detection of hematological disorders. In recent years, deep learning-based automated classification systems have demonstrated high potential in enhancing the accuracy and efficiency of blood image analysis. However, a detailed performance analysis of specific deep learning frameworks appears to be lacking. This paper compares the performance of three popular deep learning frameworks, TensorFlow with Keras, PyTorch, and JAX, in classifying blood cell images from the publicly available BloodMNIST dataset. The study primarily focuses on inference time differences, but also classification performance for different image sizes. The results reveal variations in performance across frameworks, influenced by factors such as image resolution and framework-specific optimizations. Classification accuracy for JAX and PyTorch was comparable to current benchmarks, showcasing the efficiency of these frameworks for medical image classification.

Insights into a radiology-specialised multimodal large language model with sparse autoencoders

Kenza Bouzid, Shruthi Bannur, Daniel Coelho de Castro, Anton Schwaighofer, Javier Alvarez-Valle, Stephanie L. Hyland

arxiv logopreprintJul 17 2025
Interpretability can improve the safety, transparency and trust of AI models, which is especially important in healthcare applications where decisions often carry significant consequences. Mechanistic interpretability, particularly through the use of sparse autoencoders (SAEs), offers a promising approach for uncovering human-interpretable features within large transformer-based models. In this study, we apply Matryoshka-SAE to the radiology-specialised multimodal large language model, MAIRA-2, to interpret its internal representations. Using large-scale automated interpretability of the SAE features, we identify a range of clinically relevant concepts - including medical devices (e.g., line and tube placements, pacemaker presence), pathologies such as pleural effusion and cardiomegaly, longitudinal changes and textual features. We further examine the influence of these features on model behaviour through steering, demonstrating directional control over generations with mixed success. Our results reveal practical and methodological challenges, yet they offer initial insights into the internal concepts learned by MAIRA-2 - marking a step toward deeper mechanistic understanding and interpretability of a radiology-adapted multimodal large language model, and paving the way for improved model transparency. We release the trained SAEs and interpretations: https://huggingface.co/microsoft/maira-2-sae.

Domain-randomized deep learning for neuroimage analysis

Malte Hoffmann

arxiv logopreprintJul 17 2025
Deep learning has revolutionized neuroimage analysis by delivering unprecedented speed and accuracy. However, the narrow scope of many training datasets constrains model robustness and generalizability. This challenge is particularly acute in magnetic resonance imaging (MRI), where image appearance varies widely across pulse sequences and scanner hardware. A recent domain-randomization strategy addresses the generalization problem by training deep neural networks on synthetic images with randomized intensities and anatomical content. By generating diverse data from anatomical segmentation maps, the approach enables models to accurately process image types unseen during training, without retraining or fine-tuning. It has demonstrated effectiveness across modalities including MRI, computed tomography, positron emission tomography, and optical coherence tomography, as well as beyond neuroimaging in ultrasound, electron and fluorescence microscopy, and X-ray microtomography. This tutorial paper reviews the principles, implementation, and potential of the synthesis-driven training paradigm. It highlights key benefits, such as improved generalization and resistance to overfitting, while discussing trade-offs such as increased computational demands. Finally, the article explores practical considerations for adopting the technique, aiming to accelerate the development of generalizable tools that make deep learning more accessible to domain experts without extensive computational resources or machine learning knowledge.

Insights into a radiology-specialised multimodal large language model with sparse autoencoders

Kenza Bouzid, Shruthi Bannur, Felix Meissen, Daniel Coelho de Castro, Anton Schwaighofer, Javier Alvarez-Valle, Stephanie L. Hyland

arxiv logopreprintJul 17 2025
Interpretability can improve the safety, transparency and trust of AI models, which is especially important in healthcare applications where decisions often carry significant consequences. Mechanistic interpretability, particularly through the use of sparse autoencoders (SAEs), offers a promising approach for uncovering human-interpretable features within large transformer-based models. In this study, we apply Matryoshka-SAE to the radiology-specialised multimodal large language model, MAIRA-2, to interpret its internal representations. Using large-scale automated interpretability of the SAE features, we identify a range of clinically relevant concepts - including medical devices (e.g., line and tube placements, pacemaker presence), pathologies such as pleural effusion and cardiomegaly, longitudinal changes and textual features. We further examine the influence of these features on model behaviour through steering, demonstrating directional control over generations with mixed success. Our results reveal practical and methodological challenges, yet they offer initial insights into the internal concepts learned by MAIRA-2 - marking a step toward deeper mechanistic understanding and interpretability of a radiology-adapted multimodal large language model, and paving the way for improved model transparency. We release the trained SAEs and interpretations: https://huggingface.co/microsoft/maira-2-sae.

AortaDiff: Volume-Guided Conditional Diffusion Models for Multi-Branch Aortic Surface Generation

Delin An, Pan Du, Jian-Xun Wang, Chaoli Wang

arxiv logopreprintJul 17 2025
Accurate 3D aortic construction is crucial for clinical diagnosis, preoperative planning, and computational fluid dynamics (CFD) simulations, as it enables the estimation of critical hemodynamic parameters such as blood flow velocity, pressure distribution, and wall shear stress. Existing construction methods often rely on large annotated training datasets and extensive manual intervention. While the resulting meshes can serve for visualization purposes, they struggle to produce geometrically consistent, well-constructed surfaces suitable for downstream CFD analysis. To address these challenges, we introduce AortaDiff, a diffusion-based framework that generates smooth aortic surfaces directly from CT/MRI volumes. AortaDiff first employs a volume-guided conditional diffusion model (CDM) to iteratively generate aortic centerlines conditioned on volumetric medical images. Each centerline point is then automatically used as a prompt to extract the corresponding vessel contour, ensuring accurate boundary delineation. Finally, the extracted contours are fitted into a smooth 3D surface, yielding a continuous, CFD-compatible mesh representation. AortaDiff offers distinct advantages over existing methods, including an end-to-end workflow, minimal dependency on large labeled datasets, and the ability to generate CFD-compatible aorta meshes with high geometric fidelity. Experimental results demonstrate that AortaDiff performs effectively even with limited training data, successfully constructing both normal and pathologically altered aorta meshes, including cases with aneurysms or coarctation. This capability enables the generation of high-quality visualizations and positions AortaDiff as a practical solution for cardiovascular research.

A conversational artificial intelligence based web application for medical conversations: a prototype for a chatbot

Pires, J. G.

medrxiv logopreprintJul 17 2025
BackgroundArtificial Intelligence (AI) has evolved through various trends, with different subfields gaining prominence over time. Currently, Conversational Artificial Intelligence (CAI)--particularly Generative AI--is at the forefront. CAI models are primarily focused on text-based tasks and are commonly deployed as chatbots. Recent advancements by OpenAI have enabled the integration of external, independently developed models, allowing chatbots to perform specialized, task-oriented functions beyond general language processing. ObjectiveThis study aims to develop a smart chatbot that integrates large language models (LLMs) from OpenAI with specialized domain-specific models, such as those used in medical image diagnostics. The system leverages transfer learning via Googles Teachable Machine to construct image-based classifiers and incorporates a diabetes detection model developed in TensorFlow.js. A key innovation is the chatbots ability to extract relevant parameters from user input, trigger the appropriate diagnostic model, interpret the output, and deliver responses in natural language. The overarching goal is to demonstrate the potential of combining LLMs with external models to build multimodal, task-oriented conversational agents. MethodsTwo image-based models were developed and integrated into the chatbot system. The first analyzes chest X-rays to detect viral and bacterial pneumonia. The second uses optical coherence tomography (OCT) images to identify ocular conditions such as drusen, choroidal neovascularization (CNV), and diabetic macular edema (DME). Both models were incorporated into the chatbot to enable image-based medical query handling. In addition, a text-based model was constructed to process physiological measurements for diabetes prediction using TensorFlow.js. The architecture is modular: new diagnostic models can be added without redesigning the chatbot, enabling straightforward functional expansion. ResultsThe findings demonstrate effective integration between the chatbot and the diagnostic models, with only minor deviations from expected behavior. Additionally, a stub function was implemented within the chatbot to schedule medical appointments based on the severity of a patients condition, and it was specifically tested with the OCT and X-ray models. ConclusionsThis study demonstrates the feasibility of developing advanced AI systems--including image-based diagnostic models and chatbot integration--by leveraging Artificial Intelligence as a Service (AIaaS). It also underscores the potential of AI to enhance user experiences in bioinformatics, paving the way for more intuitive and accessible interfaces in the field. Looking ahead, the modular nature of the chatbot allows for the integration of additional diagnostic models as the system evolves.

Large Language Model-Based Entity Extraction Reliably Classifies Pancreatic Cysts and Reveals Predictors of Malignancy: A Cross-Sectional and Retrospective Cohort Study

Papale, A. J., Flattau, R., Vithlani, N., Mahajan, D., Ziemba, Y., Zavadsky, T., Carvino, A., King, D., Nadella, S.

medrxiv logopreprintJul 17 2025
Pancreatic cystic lesions (PCLs) are often discovered incidentally on imaging and may progress to pancreatic ductal adenocarcinoma (PDAC). PCLs have a high incidence in the general population, and adherence to screening guidelines can be variable. With the advent of technologies that enable automated text classification, we sought to evaluate various natural language processing (NLP) tools including large language models (LLMs) for identifying and classifying PCLs from radiology reports. We correlated our classification of PCLs to clinical features to identify risk factors for a positive PDAC biopsy. We contrasted a previously described NLP classifier to LLMs for prospective identification of PCLs in radiology. We evaluated various LLMs for PCL classification into low-risk or high-risk categories based on published guidelines. We compared prompt-based PCL classification to specific entity-guided PCL classification. To this end, we developed tools to deidentify radiology and track patients longitudinally based on their radiology reports. Additionally, we used our newly developed tools to evaluate a retrospective database of patients who underwent pancreas biopsy to determine associated factors including those in their radiology reports and clinical features using multivariable logistic regression modelling. Of 14,574 prospective radiology reports, 665 (4.6%) described a pancreatic cyst, including 175 (1.2%) high-risk lesions. Our Entity-Extraction Large Language Model tool achieved recall 0.992 (95% confidence interval [CI], 0.985-0.998), precision 0.988 (0.979-0.996), and F1-score 0.990 (0.985-0.995) for detecting cysts; F1-scores were 0.993 (0.987-0.998) for low-risk and 0.977 (0.952-0.995) for high-risk classification. Among 4,285 biopsy patients, 330 had pancreatic cysts documented [≥]6 months before biopsy. In the final multivariable model (AUC = 0.877), independent predictors of adenocarcinoma were change in duct caliber with upstream atrophy (adjusted odds ratio [AOR], 4.94; 95% CI, 1.30-18.79), mural nodules (AOR, 11.02; 1.81-67.26), older age (AOR, 1.10; 1.05-1.16), lower body mass index (AOR, 0.86; 0.76-0.96), and total bilirubin (AOR, 1.81; 1.18-2.77). Automated NLP-based analysis of radiology reports using LLM-driven entity extraction can accurately identify and risk-stratify PCLs and, when retrospectively applied, reveal factors predicting malignant progression. Widespread implementation may improve surveillance and enable earlier intervention.

CytoSAE: Interpretable Cell Embeddings for Hematology

Muhammed Furkan Dasdelen, Hyesu Lim, Michele Buck, Katharina S. Götze, Carsten Marr, Steffen Schneider

arxiv logopreprintJul 16 2025
Sparse autoencoders (SAEs) emerged as a promising tool for mechanistic interpretability of transformer-based foundation models. Very recently, SAEs were also adopted for the visual domain, enabling the discovery of visual concepts and their patch-wise attribution to tokens in the transformer model. While a growing number of foundation models emerged for medical imaging, tools for explaining their inferences are still lacking. In this work, we show the applicability of SAEs for hematology. We propose CytoSAE, a sparse autoencoder which is trained on over 40,000 peripheral blood single-cell images. CytoSAE generalizes to diverse and out-of-domain datasets, including bone marrow cytology, where it identifies morphologically relevant concepts which we validated with medical experts. Furthermore, we demonstrate scenarios in which CytoSAE can generate patient-specific and disease-specific concepts, enabling the detection of pathognomonic cells and localized cellular abnormalities at the patch level. We quantified the effect of concepts on a patient-level AML subtype classification task and show that CytoSAE concepts reach performance comparable to the state-of-the-art, while offering explainability on the sub-cellular level. Source code and model weights are available at https://github.com/dynamical-inference/cytosae.
Page 4 of 21207 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.