Sort by:
Page 211 of 2522511 results

MedSG-Bench: A Benchmark for Medical Image Sequences Grounding

Jingkun Yue, Siqi Zhang, Zinan Jia, Huihuan Xu, Zongbo Han, Xiaohong Liu, Guangyu Wang

arxiv logopreprintMay 17 2025
Visual grounding is essential for precise perception and reasoning in multimodal large language models (MLLMs), especially in medical imaging domains. While existing medical visual grounding benchmarks primarily focus on single-image scenarios, real-world clinical applications often involve sequential images, where accurate lesion localization across different modalities and temporal tracking of disease progression (e.g., pre- vs. post-treatment comparison) require fine-grained cross-image semantic alignment and context-aware reasoning. To remedy the underrepresentation of image sequences in existing medical visual grounding benchmarks, we propose MedSG-Bench, the first benchmark tailored for Medical Image Sequences Grounding. It comprises eight VQA-style tasks, formulated into two paradigms of the grounding tasks, including 1) Image Difference Grounding, which focuses on detecting change regions across images, and 2) Image Consistency Grounding, which emphasizes detection of consistent or shared semantics across sequential images. MedSG-Bench covers 76 public datasets, 10 medical imaging modalities, and a wide spectrum of anatomical structures and diseases, totaling 9,630 question-answer pairs. We benchmark both general-purpose MLLMs (e.g., Qwen2.5-VL) and medical-domain specialized MLLMs (e.g., HuatuoGPT-vision), observing that even the advanced models exhibit substantial limitations in medical sequential grounding tasks. To advance this field, we construct MedSG-188K, a large-scale instruction-tuning dataset tailored for sequential visual grounding, and further develop MedSeq-Grounder, an MLLM designed to facilitate future research on fine-grained understanding across medical sequential images. The benchmark, dataset, and model are available at https://huggingface.co/MedSG-Bench

Patient-Specific Autoregressive Models for Organ Motion Prediction in Radiotherapy

Yuxiang Lai, Jike Zhong, Vanessa Su, Xiaofeng Yang

arxiv logopreprintMay 17 2025
Radiotherapy often involves a prolonged treatment period. During this time, patients may experience organ motion due to breathing and other physiological factors. Predicting and modeling this motion before treatment is crucial for ensuring precise radiation delivery. However, existing pre-treatment organ motion prediction methods primarily rely on deformation analysis using principal component analysis (PCA), which is highly dependent on registration quality and struggles to capture periodic temporal dynamics for motion modeling.In this paper, we observe that organ motion prediction closely resembles an autoregressive process, a technique widely used in natural language processing (NLP). Autoregressive models predict the next token based on previous inputs, naturally aligning with our objective of predicting future organ motion phases. Building on this insight, we reformulate organ motion prediction as an autoregressive process to better capture patient-specific motion patterns. Specifically, we acquire 4D CT scans for each patient before treatment, with each sequence comprising multiple 3D CT phases. These phases are fed into the autoregressive model to predict future phases based on prior phase motion patterns. We evaluate our method on a real-world test set of 4D CT scans from 50 patients who underwent radiotherapy at our institution and a public dataset containing 4D CT scans from 20 patients (some with multiple scans), totaling over 1,300 3D CT phases. The performance in predicting the motion of the lung and heart surpasses existing benchmarks, demonstrating its effectiveness in capturing motion dynamics from CT images. These results highlight the potential of our method to improve pre-treatment planning in radiotherapy, enabling more precise and adaptive radiation delivery.

Foundation versus Domain-Specific Models for Left Ventricular Segmentation on Cardiac Ultrasound

Chao, C.-J., Gu, Y., Kumar, W., Xiang, T., Appari, L., Wu, J., Farina, J. M., Wraith, R., Jeong, J., Arsanjani, R., Garvan, K. C., Oh, J. K., Langlotz, C. P., Banerjee, I., Li, F.-F., Adeli, E.

medrxiv logopreprintMay 17 2025
The Segment Anything Model (SAM) was fine-tuned on the EchoNet-Dynamic dataset and evaluated on external transthoracic echocardiography (TTE) and Point-of-Care Ultrasound (POCUS) datasets from CAMUS (University Hospital of St Etienne) and Mayo Clinic (99 patients: 58 TTE, 41 POCUS). Fine-tuned SAM was superior or comparable to MedSAM. The fine-tuned SAM also outperformed EchoNet and U-Net models, demonstrating strong generalization, especially on apical 2-chamber (A2C) images (fine-tuned SAM vs. EchoNet: CAMUS-A2C: DSC 0.891 {+/-} 0.040 vs. 0.752 {+/-} 0.196, p<0.0001) and POCUS (DSC 0.857 {+/-} 0.047 vs. 0.667 {+/-} 0.279, p<0.0001). Additionally, SAM-enhanced workflow reduced annotation time by 50% (11.6 {+/-} 4.5 sec vs. 5.7 {+/-} 1.7 sec, p<0.0001) while maintaining segmentation quality. We demonstrated an effective strategy for fine-tuning a vision foundation model for enhancing clinical workflow efficiency and supporting human-AI collaboration.

Computer-aided assessment for enlarged fetal heart with deep learning model.

Nurmaini S, Sapitri AI, Roseno MT, Rachmatullah MN, Mirani P, Bernolian N, Darmawahyuni A, Tutuko B, Firdaus F, Islami A, Arum AW, Bastian R

pubmed logopapersMay 16 2025
Enlarged fetal heart conditions may indicate congenital heart diseases or other complications, making early detection through prenatal ultrasound essential. However, manual assessments by sonographers are often subjective, time-consuming, and inconsistent. This paper proposes a deep learning approach using the You Only Look Once (YOLO) architecture to automate fetal heart enlargement assessment. Using a set of ultrasound videos, YOLOv8 with a CBAM module demonstrated superior performance compared to YOLOv11 with self-attention. Incorporating the ResNeXtBlock-a residual network with cardinality-additionally enhanced accuracy and prediction consistency. The model exhibits strong capability in detecting fetal heart enlargement, offering a reliable computer-aided tool for sonographers during prenatal screenings. Further validation is required to confirm its clinical applicability. By improving early and accurate detection, this approach has the potential to enhance prenatal care, facilitate timely interventions, and contribute to better neonatal health outcomes.

Automated CT segmentation for lower extremity tissues in lymphedema evaluation using deep learning.

Na S, Choi SJ, Ko Y, Urooj B, Huh J, Cha S, Jung C, Cheon H, Jeon JY, Kim KW

pubmed logopapersMay 16 2025
Clinical assessment of lymphedema, particularly for lymphedema severity and fluid-fibrotic lesions, remains challenging with traditional methods. We aimed to develop and validate a deep learning segmentation tool for automated tissue component analysis in lower extremity CT scans. For development datasets, lower extremity CT venography scans were collected in 118 patients with gynecologic cancers for algorithm training. Reference standards were created by segmentation of fat, muscle, and fluid-fibrotic tissue components using 3D slicer. A deep learning model based on the Unet++ architecture with an EfficientNet-B7 encoder was developed and trained. Segmentation accuracy of the deep learning model was validated in an internal validation set (n = 10) and an external validation set (n = 10) using Dice similarity coefficient (DSC) and volumetric similarity (VS). A graphical user interface (GUI) tool was developed for the visualization of the segmentation results. Our deep learning algorithm achieved high segmentation accuracy. Mean DSCs for each component and all components ranged from 0.945 to 0.999 in the internal validation set and 0.946 to 0.999 in the external validation set. Similar performance was observed in the VS, with mean VSs for all components ranging from 0.97 to 0.999. In volumetric analysis, mean volumes of the entire leg and each component did not differ significantly between reference standard and deep learning measurements (p > 0.05). Our GUI displays lymphedema mapping, highlighting segmented fat, muscle, and fluid-fibrotic components in the entire leg. Our deep learning algorithm provides an automated segmentation tool enabling accurate segmentation, volume measurement of tissue component, and lymphedema mapping. Question Clinical assessment of lymphedema remains challenging, particularly for tissue segmentation and quantitative severity evaluation. Findings A deep learning algorithm achieved DSCs > 0.95 and VS > 0.97 for fat, muscle, and fluid-fibrotic components in internal and external validation datasets. Clinical relevance The developed deep learning tool accurately segments and quantifies lower extremity tissue components on CT scans, enabling automated lymphedema evaluation and mapping with high segmentation accuracy.

A deep learning-based approach to automated rib fracture detection and CWIS classification.

Marting V, Borren N, van Diepen MR, van Lieshout EMM, Wijffels MME, van Walsum T

pubmed logopapersMay 16 2025
Trauma-induced rib fractures are a common injury. The number and characteristics of these fractures influence whether a patient is treated nonoperatively or surgically. Rib fractures are typically diagnosed using CT scans, yet 19.2-26.8% of fractures are still missed during assessment. Another challenge in managing rib fractures is the interobserver variability in their classification. Purpose of this study was to develop and assess an automated method that detects rib fractures in CT scans, and classifies them according to the Chest Wall Injury Society (CWIS) classification. 198 CT scans were collected, of which 170 were used for training and internal validation, and 28 for external validation. Fractures and their classifications were manually annotated in each of the scans. A detection and classification network was trained for each of the three components of the CWIS classifications. In addition, a rib number labeling network was trained for obtaining the rib number of a fracture. Experiments were performed to assess the method performance. On the internal test set, the method achieved a detection sensitivity of 80%, at a precision of 87%, and an F1-score of 83%, with a mean number of FPPS (false positives per scan) of 1.11. Classification sensitivity varied, with the lowest being 25% for complex fractures and the highest being 97% for posterior fractures. The correct rib number was assigned to 94% of the detected fractures. The custom-trained nnU-Net correctly labeled 95.5% of all ribs and 98.4% of fractured ribs in 30 patients. The detection and classification performance on the external validation dataset was slightly better, with a fracture detection sensitivity of 84%, precision of 85%, F1-score of 84%, FPPS of 0.96 and 95% of the fractures were assigned the correct rib number. The method developed is able to accurately detect and classify rib fractures in CT scans, there is room for improvement in the (rare and) underrepresented classes in the training set.

High-Performance Prompting for LLM Extraction of Compression Fracture Findings from Radiology Reports.

Kanani MM, Monawer A, Brown L, King WE, Miller ZD, Venugopal N, Heagerty PJ, Jarvik JG, Cohen T, Cross NM

pubmed logopapersMay 16 2025
Extracting information from radiology reports can provide critical data to empower many radiology workflows. For spinal compression fractures, these data can facilitate evidence-based care for at-risk populations. Manual extraction from free-text reports is laborious, and error-prone. Large language models (LLMs) have shown promise; however, fine-tuning strategies to optimize performance in specific tasks can be resource intensive. A variety of prompting strategies have achieved similar results with fewer demands. Our study pioneers the use of Meta's Llama 3.1, together with prompt-based strategies, for automated extraction of compression fractures from free-text radiology reports, outputting structured data without model training. We tested performance on a time-based sample of CT exams covering the spine from 2/20/2024 to 2/22/2024 acquired across our healthcare enterprise (637 anonymized reports, age 18-102, 47% Female). Ground truth annotations were manually generated and compared against the performance of three models (Llama 3.1 70B, Llama 3.1 8B, and Vicuna 13B) with nine different prompting configurations for a total of 27 model/prompt experiments. The highest F1 score (0.91) was achieved by the 70B Llama 3.1 model when provided with a radiologist-written background, with similar results when the background was written by a separate LLM (0.86). The addition of few-shot examples to these prompts had variable impact on F1 measurements (0.89, 0.84 respectively). Comparable ROC-AUC and PR-AUC performance was observed. Our work demonstrated that an open-weights LLM excelled at extracting compression fractures findings from free-text radiology reports using prompt-based techniques without requiring extensive manually labeled examples for model training.

Impact of sarcopenia and obesity on mortality in older adults with SARS-CoV-2 infection: automated deep learning body composition analysis in the NAPKON-SUEP cohort.

Schluessel S, Mueller B, Tausendfreund O, Rippl M, Deissler L, Martini S, Schmidmaier R, Stoecklein S, Ingrisch M, Blaschke S, Brandhorst G, Spieth P, Lehnert K, Heuschmann P, de Miranda SMN, Drey M

pubmed logopapersMay 16 2025
Severe respiratory infections pose a major challenge in clinical practice, especially in older adults. Body composition analysis could play a crucial role in risk assessment and therapeutic decision-making. This study investigates whether obesity or sarcopenia has a greater impact on mortality in patients with severe respiratory infections. The study focuses on the National Pandemic Cohort Network (NAPKON-SUEP) cohort, which includes patients over 60 years of age with confirmed severe COVID-19 pneumonia. An innovative approach was adopted, using pre-trained deep learning models for automated analysis of body composition based on routine thoracic CT scans. The study included 157 hospitalized patients (mean age 70 ± 8 years, 41% women, mortality rate 39%) from the NAPKON-SUEP cohort at 57 study sites. A pre-trained deep learning model was used to analyze body composition (muscle, bone, fat, and intramuscular fat volumes) from thoracic CT images of the NAPKON-SUEP cohort. Binary logistic regression was performed to investigate the association between obesity, sarcopenia, and mortality. Non-survivors exhibited lower muscle volume (p = 0.043), higher intramuscular fat volume (p = 0.041), and a higher BMI (p = 0.031) compared to survivors. Among all body composition parameters, muscle volume adjusted to weight was the strongest predictor of mortality in the logistic regression model, even after adjusting for factors such as sex, age, diabetes, chronic lung disease and chronic kidney disease, (odds ratio = 0.516). In contrast, BMI did not show significant differences after adjustment for comorbidities. This study identifies muscle volume derived from routine CT scans as a major predictor of survival in patients with severe respiratory infections. The results underscore the potential of AI supported CT-based body composition analysis for risk stratification and clinical decision making, not only for COVID-19 patients but also for all patients over 60 years of age with severe acute respiratory infections. The innovative application of pre-trained deep learning models opens up new possibilities for automated and standardized assessment in clinical practice.

Artificial intelligence in dentistry: awareness among dentists and computer scientists.

Costa ED, Vieira MA, Ambrosano GMB, Gaêta-Araujo H, Carneiro JA, Zancan BAG, Scaranti A, Macedo AA, Tirapelli C

pubmed logopapersMay 16 2025
For clinical application of artificial intelligence (AI) in dentistry, collaboration with computer scientists is necessary. This study aims to evaluate the knowledge of dentists and computer scientists regarding the utilization of AI in dentistry, especially in dentomaxillofacial radiology. 610 participants (374 dentists and 236 computer scientists) took part in a survey about AI in dentistry and radiographic imaging. Response options contained Likert scale of agreement/disagreement. Descriptive analyses of agreement scores were performed using quartiles (minimum value, first quartile, median, third quartile, and maximum value). Non-parametric Mann-Whitney test was used to compare response scores between two categories (α = 5%). Dentists academics had higher agreement scores for the questions: "knowing the applications of AI in dentistry", "dentists taking the lead in AI research", "AI education should be part of teaching", "AI can increase the price of dental services", "AI can lead to errors in radiographic diagnosis", "AI can negatively interfere with the choice of Radiology specialty", "AI can cause a reduction in the employment of radiologists", "patient data can be hacked using AI" (p < 0.05). Computer scientists had higher concordance scores for the questions "having knowledge in AI" and "AI's potential to speed up and improve radiographic diagnosis". Although dentists acknowledge the potential benefits of AI in dentistry, they remain skeptical about its use and consider it important to integrate the topic of AI into dental education curriculum. On the other hand, computer scientists confirm technical expertise in AI and recognize its potential in dentomaxillofacial radiology.

Machine learning prediction of pathological complete response to neoadjuvant chemotherapy with peritumoral breast tumor ultrasound radiomics: compare with intratumoral radiomics and clinicopathologic predictors.

Yao J, Zhou W, Jia X, Zhu Y, Chen X, Zhan W, Zhou J

pubmed logopapersMay 16 2025
Noninvasive, accurate and novel approaches to predict patients who will achieve pathological complete response (pCR) after neoadjuvant chemotherapy (NAC) could assist treatment strategies. The aim of this study was to explore the application of machine learning (ML) based peritumoral ultrasound radiomics signature (PURS), compared with intratumoral radiomics (IURS) and clinicopathologic factors, for early prediction of pCR. We analyzed 358 locally advanced breast cancer patients (250 in the training set and 108 in the test set), who accepted NAC and post NAC surgery at our institution. The clinical and pathological data were analyzed using the independent t test and the Chi-square test to determine the factors associated with pCR. The PURS and IURS of baseline breast tumors were extracted by using 3D-slicer and PyRadiomics software. Five ML classifiers including linear discriminant analysis (LDA), support vector machine (SVM), random forest (RF), logistic regression (LR), and adaptive boosting (AdaBoost) were applied to construct radiomics predictive models. The performance of PURS, IURS models and clinicopathologic predictors were assessed with respect to sensitivity, specificity, accuracy and the areas under the curve (AUCs). Ninety-seven patients achieved pCR. The clinicopathologic predictors obtained an AUC of 0.759. Among PURS models, the RF classifier achieved better efficacy (AUC of 0.889) than LR (0.849), AdaBoost (0.823), SVM (0.746) and LDA (0.732). The RF classifier also obtained a maximum AUC of 0.931 than 0.920 (AdaBoost), 0.875 (LR), 0.825 (SVM), and 0.798 (LDA) in IURS models in the test set. The RF based PURS yielded higher predictive ability (AUC 0.889; 95% CI 0.814, 0.947) than clinicopathologic factors (AUC 0.759; 95% CI 0.657, 0.861; p < 0.05), but lower efficacy compared with IURS (AUC 0.931; 95% CI 0.865, 0.980; p < 0.05). The peritumoral US radiomics, as a novel potential biomarker, can assist clinical therapy decisions.
Page 211 of 2522511 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.