Sort by:
Page 9 of 99982 results

Performance Comparison of Cutting-Edge Large Language Models on the ACR In-Training Examination: An Update for 2025.

Young A, Paloka R, Islam A, Prasanna P, Hill V, Payne D

pubmed logopapersSep 24 2025
This study represents a continuation of prior work by Payne et al. evaluating large language model (LLM) performance on radiology board-style assessments, specifically the ACR diagnostic radiology in-training examination (DXIT). Building upon earlier findings with GPT-4, we assess the performance of newer, cutting-edge models, such as GPT-4o, GPT-o1, GPT-o3, Claude, Gemini, and Grok on standardized DXIT questions. In addition to overall performance, we compare model accuracy on text-based versus image-based questions to assess multi-modal reasoning capabilities. As a secondary aim, we investigate the potential impact of data contamination by comparing model performance on original versus revised image-based questions. Seven LLMs - GPT-4, GPT-4o, GPT-o1, GPT-o3, Claude 3.5 Sonnet, Gemini 1.5 Pro, and Grok 2.0-were evaluated using 106 publicly available DXIT questions. Each model was prompted using a standardized instruction set to simulate a radiology resident answering board-style questions. For each question, the model's selected answer, rationale, and confidence score were recorded. Unadjusted accuracy (based on correct answer selection) and logic-adjusted accuracy (based on clinical reasoning pathways) were calculated. Subgroup analysis compared model performance on text-based versus image-based questions. Additionally, 63 image-based questions were revised to test novel reasoning while preserving the original diagnostic image to assess the impact of potential training data contamination. Across 106 DXIT questions, GPT-o1 demonstrated the highest unadjusted accuracy (71.7%), followed closely by GPT-4o (69.8%) and GPT-o3 (68.9%). GPT-4 (59.4%) and Grok 2.0 exhibited similar scores (59.4% and 52.8%). Claude 3.5 Sonnet had the lowest unadjusted accuracies (34.9%). Similar trends were observed for logic-adjusted accuracy, with GPT-o1 (60.4%), GPT-4o (59.4%), and GPT-o3 (59.4%) again outperforming other models, while Grok 2.0 and Claude 3.5 Sonnet lagged behind (34.0% and 30.2%, respectively). GPT-4o's performance was significantly higher on text-based questions compared to image-based ones. Unadjusted accuracy for the revised DXIT questions was 49.2%, compared to 56.1% on matched original DXIT questions. Logic-adjusted accuracy for the revised DXIT questions was 40.0% compared to 44.4% on matched original DXIT questions. No significant difference in performance was observed between original and revised questions. Modern LLMs, especially those from OpenAI, demonstrate strong and improved performance on board-style radiology assessments. Comparable performance on revised prompts suggests that data contamination may have played a limited role. As LLMs improve, they hold strong potential to support radiology resident learning through personalized feedback and board-style question review.

Pseudo PET synthesis from CT based on deep neural networks.

Wang H, Zou W, Wang J, Li J, Zhang B

pubmed logopapersSep 24 2025
<i>Objective</i>. Integrated PET/CT imaging plays a vital role in tumor diagnosis by offering both anatomical and functional information. However, the high cost, limited accessibility of PET imaging and concerns about cumulative radiation exposure in repeated scans may restrict its clinical use. This study aims to develop a cross-modal medical image synthesis method for generating PET images from CT scans, with a particular focus on accurately synthesizing lesion regions.&#xD;<i>Approach</i>. We propose a two-stage Generative Adversarial Network termed MMF-PAE-GAN (Multi-modal Fusion Pre-trained AutoEncoder GAN) that integrates pre-GAN and post-GAN in terms of a Pre-trained AutoEncoder (PAE). The pre-GAN produces an initial pseudo PET image and provides the post-GAN with PET related multi-scale features. Unlike traditional Sample Adaptive Encoder (SAE), the PAE enhances sample-specific representation by extracting multi-scale contextual features. To capture both lesion-related and non-lesion-related anatomical information, two CT scans processed under different window settings are fed into the post-GAN. Furthermore, a Multi-modal Weighted Feature Fusion Module (MMWFFM) is introduced to dynamically highlight informative cross-modal features while suppress redundancies. A Perceptual Loss (PL), computed based on the PAE, is also used to impose constraints in feature-space and improve the fidelity lesion synthesis. &#xD;<i>Main results</i>. On the AutoPET dataset, our method achieved a PSNR of 29.1781 dB, MAE of 0.0094, SSIM of 0.9217, NMSE of 0.3651 for pixel-level metrics, along with a Sensitivity of 85.31\%, Specificity of 97.02\% and Accuracy of 95.97\% for slice-level classification metrics. On the FAHSU dataset, these two metrics amount to a PSNR of 29.1506 dB, MAE of 0.0095, SSIM of 0.9193, NMSE of 0.3663, Sensitivity of 84.51\%, Specificity of 96.82\% and Accuracy of 95.71\%.&#xD;<i>Significance</i>. The proposed MMF-PAE-GAN can generate high-quality PET images directly from CT scans without the need for radioactive tracers, which potentially improves accessibility of functional imaging and reduces costs in clinical scenarios where PET acquisition is limited or repeated scans are not feasible.

Ethical Considerations in Patient Privacy and Data Handling for AI in Cardiovascular Imaging and Radiology.

Mehrtabar S, Marey A, Desai A, Saad AM, Desai V, Goñi J, Pal B, Umair M

pubmed logopapersSep 24 2025
The integration of artificial intelligence (AI) into cardiovascular imaging and radiology offers the potential to enhance diagnostic accuracy, streamline workflows, and personalize patient care. However, the rapid adoption of AI has introduced complex ethical challenges, particularly concerning patient privacy, data handling, informed consent, and data ownership. This narrative review explores these issues by synthesizing literature from clinical, technical, and regulatory perspectives. We examine the tensions between data utility and data protection, the evolving role of transparency and explainable AI, and the disparities in ethical and legal frameworks across jurisdictions such as the European Union, the USA, and emerging players like China. We also highlight the vulnerabilities introduced by cloud computing, adversarial attacks, and the use of commercial datasets. Ethical frameworks and regulatory guidelines are compared, and proposed mitigation strategies such as federated learning, blockchain, and differential privacy are discussed. To ensure ethical implementation, we emphasize the need for shared accountability among clinicians, developers, healthcare institutions, and policymakers. Ultimately, the responsible development of AI in medical imaging must prioritize patient trust, fairness, and equity, underpinned by robust governance and transparent data stewardship.

In-context learning enables large language models to achieve human-level performance in spinal instability neoplastic score classification from synthetic CT and MRI reports.

Russe MF, Reisert M, Fink A, Hohenhaus M, Nakagawa JM, Wilpert C, Simon CP, Kotter E, Urbach H, Rau A

pubmed logopapersSep 24 2025
To assess the performance of state-of-the-art large language models in classifying vertebral metastasis stability using the Spinal Instability Neoplastic Score (SINS) compared to human experts, and to evaluate the impact of task-specific refinement including in-context learning on their performance. This retrospective study analyzed 100 synthetic CT and MRI reports encompassing a broad range of SINS scores. Four human experts (two radiologists and two neurosurgeons) and four large language models (Mistral, Claude, GPT-4 turbo, and GPT-4o) evaluated the reports. Large language models were tested in both generic form and with task-specific refinement. Performance was assessed based on correct SINS category assignment and attributed SINS points. Human experts demonstrated high median performance in SINS classification (98.5% correct) and points calculation (92% correct), with a median point offset of 0 [0-0]. Generic large language models performed poorly with 26-63% correct category and 4-15% correct SINS points allocation. In-context learning significantly improved chatbot performance to near-human levels (96-98/100 correct for classification, 86-95/100 for scoring, no significant difference to human experts). Refined large language models performed 71-85% better in SINS points allocation. In-context learning enables state-of-the-art large language models to perform at near-human expert levels in SINS classification, offering potential for automating vertebral metastasis stability assessment. The poor performance of generic large language models highlights the importance of task-specific refinement in medical applications of artificial intelligence.

Role of artificial intelligence in screening and medical imaging of precancerous gastric diseases.

Kotelevets SM

pubmed logopapersSep 24 2025
Serological screening, endoscopic imaging, morphological visual verification of precancerous gastric diseases and changes in the gastric mucosa are the main stages of early detection, accurate diagnosis and preventive treatment of gastric precancer. Laboratory - serological, endoscopic and histological diagnostics are carried out by medical laboratory technicians, endoscopists, and histologists. Human factors have a very large share of subjectivity. Endoscopists and histologists are guided by the descriptive principle when formulating imaging conclusions. Diagnostic reports from doctors often result in contradictory and mutually exclusive conclusions. Erroneous results of diagnosticians and clinicians have fatal consequences, such as late diagnosis of gastric cancer and high mortality of patients. Effective population serological screening is only possible with the use of machine processing of laboratory test results. Currently, it is possible to replace subjective imprecise description of endoscopic and histological images by a diagnostician with objective, highly sensitive and highly specific visual recognition using convolutional neural networks with deep machine learning. There are many machine learning models to use. All machine learning models have predictive capabilities. Based on predictive models, it is necessary to identify the risk levels of gastric cancer in patients with a very high probability.

HiPerformer: A High-Performance Global-Local Segmentation Model with Modular Hierarchical Fusion Strategy

Dayu Tan, Zhenpeng Xu, Yansen Su, Xin Peng, Chunhou Zheng, Weimin Zhong

arxiv logopreprintSep 24 2025
Both local details and global context are crucial in medical image segmentation, and effectively integrating them is essential for achieving high accuracy. However, existing mainstream methods based on CNN-Transformer hybrid architectures typically employ simple feature fusion techniques such as serial stacking, endpoint concatenation, or pointwise addition, which struggle to address the inconsistencies between features and are prone to information conflict and loss. To address the aforementioned challenges, we innovatively propose HiPerformer. The encoder of HiPerformer employs a novel modular hierarchical architecture that dynamically fuses multi-source features in parallel, enabling layer-wise deep integration of heterogeneous information. The modular hierarchical design not only retains the independent modeling capability of each branch in the encoder, but also ensures sufficient information transfer between layers, effectively avoiding the degradation of features and information loss that come with traditional stacking methods. Furthermore, we design a Local-Global Feature Fusion (LGFF) module to achieve precise and efficient integration of local details and global semantic information, effectively alleviating the feature inconsistency problem and resulting in a more comprehensive feature representation. To further enhance multi-scale feature representation capabilities and suppress noise interference, we also propose a Progressive Pyramid Aggregation (PPA) module to replace traditional skip connections. Experiments on eleven public datasets demonstrate that the proposed method outperforms existing segmentation techniques, demonstrating higher segmentation accuracy and robustness. The code is available at https://github.com/xzphappy/HiPerformer.

Anomaly Detection by Clustering DINO Embeddings using a Dirichlet Process Mixture

Nico Schulthess, Ender Konukoglu

arxiv logopreprintSep 24 2025
In this work, we leverage informative embeddings from foundational models for unsupervised anomaly detection in medical imaging. For small datasets, a memory-bank of normative features can directly be used for anomaly detection which has been demonstrated recently. However, this is unsuitable for large medical datasets as the computational burden increases substantially. Therefore, we propose to model the distribution of normative DINOv2 embeddings with a Dirichlet Process Mixture model (DPMM), a non-parametric mixture model that automatically adjusts the number of mixture components to the data at hand. Rather than using a memory bank, we use the similarity between the component centers and the embeddings as anomaly score function to create a coarse anomaly segmentation mask. Our experiments show that through DPMM embeddings of DINOv2, despite being trained on natural images, achieve very competitive anomaly detection performance on medical imaging benchmarks and can do this while at least halving the computation time at inference. Our analysis further indicates that normalized DINOv2 embeddings are generally more aligned with anatomical structures than unnormalized features, even in the presence of anomalies, making them great representations for anomaly detection. The code is available at https://github.com/NicoSchulthess/anomalydino-dpmm.

Radiomics-based artificial intelligence (AI) models in colorectal cancer (CRC) diagnosis, metastasis detection, prognosis, and treatment response prediction.

Elahi R, Karami P, Amjadzadeh M, Nazari M

pubmed logopapersSep 24 2025
Colorectal cancer (CRC) is the third most common cause of cancer-related morbidity and mortality in the world. Radiomics and radiogenomics are utilized for the high-throughput quantification of features from medical images, providing non-invasive means to characterize cancer heterogeneity and gain insight into the underlying biology. Such radiomics-based artificial intelligence (AI)-methods have demonstrated great potential to improve the accuracy of CRC diagnosis and staging, to distinguish between benign and malignant lesions, to aid in the detection of lymph node and hepatic metastasis, and to predict the effects of therapy and prognosis for patients. This review presents the latest evidence on the clinical applications of radiomics models based on different imaging modalities in CRC. We also discuss the challenges facing clinical translation, including differences in image acquisition, issues related to reproducibility, a lack of standardization, and limited external validation. Given the progress of machine learning (ML) and deep learning (DL) algorithms, radiomics is expected to have an important effect on the personalized treatment of CRC and contribute to a more accurate and individualized clinical decision-making in the future.

From texture analysis to artificial intelligence: global research landscape and evolutionary trajectory of radiomics in hepatocellular carcinoma.

Teng X, Luo QN, Chen YD, Peng T

pubmed logopapersSep 24 2025
Hepatocellular carcinoma (HCC) poses a substantial global health burden with high morbidity and mortality rates. Radiomics, which extracts quantitative features from medical images to develop predictive models, has emerged as a promising non-invasive approach for HCC diagnosis and management. However, comprehensive analysis of research trends in this field remains limited. We conducted a systematic bibliometric analysis of radiomics applications in HCC using literature from the Web of Science Core Collection (January 2006-April 2025). Publications were analyzed using CiteSpace, VOSviewer, R, and Python scripts to evaluate publication patterns, citation metrics, institutional contributions, keyword evolution, and collaboration networks. Among 906 included publications, we observed exponential growth, particularly accelerating after 2019. A global landscape analysis revealed China as the leader in publication volume, while the USA acted as the primary international collaboration hub. Countries like South Korea and the UK demonstrated higher average citation impact. Sun Yat-sen University was the most productive institution. Research themes evolved from fundamental texture analysis and CT/MRI applications toward predicting microvascular invasion, assessing treatment response (especially TACE), and prognostic modeling, driven recently by the deep integration of artificial intelligence (AI) and deep learning. Co-citation analysis revealed core knowledge clusters spanning radiomics methodology, clinical management, and landmark applications, demonstrating the field's interdisciplinary nature. Radiomics in HCC represents a rapidly expanding, AI-driven field characterized by extensive multidisciplinary collaboration. Future priorities should emphasize standardization, large-scale multicenter validation, enhanced international cooperation, and clinical translation to maximize radiomics' potential in precision HCC oncology.

Exploring the role of preprocessing combinations in hyperspectral imaging for deep learning colorectal cancer detection.

Tkachenko M, Huber B, Hamotskyi S, Jansen-Winkeln B, Gockel I, Neumuth T, Köhler H, Maktabi M

pubmed logopapersSep 23 2025
This study compares various preprocessing techniques for hyperspectral deep learning-based cancer diagnostics. The study considers different spectrum scaling and noise reduction options across spatial and spectral axes of hyperspectral datacubes, as well varying levels of blood and light reflections removal. We also examine how the size of the patches extracted from the hyperspectral data affects the models' performance. We additionally explore various strategies to mitigate our dataset's imbalance (where cancerous tissues are underrepresented). Our results indicate that. Scaling: Standardization significantly improves both sensitivity and specificity compared to Normalization. Larger input patch sizes enhance performance by capturing more spatial context. Noise reduction unexpectedly degrades performance. Blood filtering is more effective than filtering reflected light pixels, although neither approach produces significant results. By carefully maintaining consistent testing conditions, we ensure a fair comparison across preprocessing methods and reproducibility. Our findings highlight the necessity of careful preprocessing selection to maximize deep learning performance in medical imaging applications.
Page 9 of 99982 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.