Sort by:
Page 2 of 549 results

Evaluating the reference accuracy of large language models in radiology: a comparative study across subspecialties.

Güneş YC, Cesur T, Çamur E

pubmed logopapersMay 12 2025
This study aimed to compare six large language models (LLMs) [Chat Generative Pre-trained Transformer (ChatGPT)o1-preview, ChatGPT-4o, ChatGPT-4o with canvas, Google Gemini 1.5 Pro, Claude 3.5 Sonnet, and Claude 3 Opus] in generating radiology references, assessing accuracy, fabrication, and bibliographic completeness. In this cross-sectional observational study, 120 open-ended questions were administered across eight radiology subspecialties (neuroradiology, abdominal, musculoskeletal, thoracic, pediatric, cardiac, head and neck, and interventional radiology), with 15 questions per subspecialty. Each question prompted the LLMs to provide responses containing four references with in-text citations and complete bibliographic details (authors, title, journal, publication year/month, volume, issue, page numbers, and PubMed Identifier). References were verified using Medline, Google Scholar, the Directory of Open Access Journals, and web searches. Each bibliographic element was scored for correctness, and a composite final score [(FS): 0-36] was calculated by summing the correct elements and multiplying this by a 5-point verification score for content relevance. The FS values were then categorized into a 5-point Likert scale reference accuracy score (RAS: 0 = fabricated; 4 = fully accurate). Non-parametric tests (Kruskal-Wallis, Tamhane's T2, Wilcoxon signed-rank test with Bonferroni correction) were used for statistical comparisons. Claude 3.5 Sonnet demonstrated the highest reference accuracy, with 80.8% fully accurate references (RAS 4) and a fabrication rate of 3.1%, significantly outperforming all other models (<i>P</i> < 0.001). Claude 3 Opus ranked second, achieving 59.6% fully accurate references and a fabrication rate of 18.3% (<i>P</i> < 0.001). ChatGPT-based models (ChatGPT-4o, ChatGPT-4o with canvas, and ChatGPT o1-preview) exhibited moderate accuracy, with fabrication rates ranging from 27.7% to 52.9% and <8% fully accurate references. Google Gemini 1.5 Pro had the lowest performance, achieving only 2.7% fully accurate references and the highest fabrication rate of 60.6% (<i>P</i> < 0.001). Reference accuracy also varied by subspecialty, with neuroradiology and cardiac radiology outperforming pediatric and head and neck radiology. Claude 3.5 Sonnet significantly outperformed all other models in generating verifiable radiology references, and Claude 3 Opus showed moderate performance. In contrast, ChatGPT models and Google Gemini 1.5 Pro delivered substantially lower accuracy with higher rates of fabricated references, highlighting current limitations in automated academic citation generation. The high accuracy of Claude 3.5 Sonnet can improve radiology literature reviews, research, and education with dependable references. The poor performance of other models, with high fabrication rates, risks misinformation in clinical and academic settings and highlights the need for refinement to ensure safe and effective use.

Preoperative prediction of malignant transformation in sinonasal inverted papilloma: a novel MRI-based deep learning approach.

Ding C, Wen B, Han Q, Hu N, Kang Y, Wang Y, Wang C, Zhang L, Xian J

pubmed logopapersMay 12 2025
To develop a novel MRI-based deep learning (DL) diagnostic model, utilizing multicenter large-sample data, for the preoperative differentiation of sinonasal inverted papilloma (SIP) from SIP-transformed squamous cell carcinoma (SIP-SCC). This study included 568 patients from four centers with confirmed SIP (n = 421) and SIP-SCC (n = 147). Deep learning models were built using T1WI, T2WI, and CE-T1WI. A combined model was constructed by integrating these features through an attention mechanism. The diagnostic performance of radiologists, both with and without the model's assistance, was compared. Model performance was evaluated through receiver operating characteristic (ROC) analysis, calibration curves, and decision curve analysis (DCA). The combined model demonstrated superior performance in differentiating SIP from SIP-SCC, achieving AUCs of 0.954, 0.897, and 0.859 in the training, internal validation, and external validation cohorts, respectively. It showed optimal accuracy, stability, and clinical benefit, as confirmed by Brier scores and calibration curves. The diagnostic performance of radiologists, especially for less experienced ones, was significantly improved with model assistance. The MRI-based deep learning model enhances the capability to predict malignant transformation of sinonasal inverted papilloma before surgery. By facilitating earlier diagnosis and promoting timely pathological examination or surgical intervention, this approach holds the potential to enhance patient prognosis. Questions Sinonasal inverted papilloma (SIP) is prone to malignant transformation locally, leading to poor prognosis; current diagnostic methods are invasive and inaccurate, necessitating effective preoperative differentiation. Findings The MRI-based deep learning model accurately diagnoses malignant transformations of SIP, enabling junior radiologists to achieve greater clinical benefits with the assistance of the model. Clinical relevance A novel MRI-based deep learning model enhances the capability of preoperative diagnosis of malignant transformation in sinonasal inverted papilloma, providing a non-invasive tool for personalized treatment planning.

Deep Learning for Detecting Periapical Bone Rarefaction in Panoramic Radiographs: A Systematic Review and Critical Assessment.

da Silva-Filho JE, da Silva Sousa Z, de-Araújo APC, Fornagero LDS, Machado MP, de Aguiar AWO, Silva CM, de Albuquerque DF, Gurgel-Filho ED

pubmed logopapersMay 12 2025
To evaluate deep learning (DL)-based models for detecting periapical bone rarefaction (PBRs) in panoramic radiographs (PRs), analyzing their feasibility and performance in dental practice. A search was conducted across seven databases and partial grey literature up to November 15, 2024, using Medical Subject Headings and entry terms related to DL, PBRs, and PRs. Studies assessing DL-based models for detecting and classifying PBRs in conventional PRs were included, while those using non-PR imaging or focusing solely on non-PBR lesions were excluded. Two independent reviewers performed screening, data extraction, and quality assessment using the Quality Assessment of Diagnostic Accuracy Studies-2 tool, with conflicts resolved by a third reviewer. Twelve studies met the inclusion criteria, mostly from Asia (58.3%). The risk of bias was moderate in 10 studies (83.3%) and high in 2 (16.7%). DL models showed moderate to high performance in PBR detection (sensitivity: 26-100%; specificity: 51-100%), with U-NET and YOLO being the most used algorithms. Only one study (8.3%) distinguished Periapical Granuloma from Periapical Cysts, revealing a classification gap. Key challenges included limited generalization due to small datasets, anatomical superimpositions in PRs, and variability in reported metrics, compromising models comparison. This review underscores that DL-based has the potential to become a valuable tool in dental image diagnostics, but it cannot yet be considered a definitive practice. Multicenter collaboration is needed to diversify data and democratize those tools. Standardized performance reporting is critical for fair comparability between different models.

Biological markers and psychosocial factors predict chronic pain conditions.

Fillingim M, Tanguay-Sabourin C, Parisien M, Zare A, Guglietti GV, Norman J, Petre B, Bortsov A, Ware M, Perez J, Roy M, Diatchenko L, Vachon-Presseau E

pubmed logopapersMay 12 2025
Chronic pain is a multifactorial condition presenting significant diagnostic and prognostic challenges. Biomarkers for the classification and the prediction of chronic pain are therefore critically needed. Here, in this multidataset study of over 523,000 participants, we applied machine learning to multidimensional biological data from the UK Biobank to identify biomarkers for 35 medical conditions associated with pain (for example, rheumatoid arthritis and gout) or self-reported chronic pain (for example, back pain and knee pain). Biomarkers derived from blood immunoassays, brain and bone imaging, and genetics were effective in predicting medical conditions associated with chronic pain (area under the curve (AUC) 0.62-0.87) but not self-reported pain (AUC 0.50-0.62). Notably, all biomarkers worked in synergy with psychosocial factors, accurately predicting both medical conditions (AUC 0.69-0.91) and self-reported pain (AUC 0.71-0.92). These findings underscore the necessity of adopting a holistic approach in the development of biomarkers to enhance their clinical utility.

Learning-based multi-material CBCT image reconstruction with ultra-slow kV switching.

Ma C, Zhu J, Zhang X, Cui H, Tan Y, Guo J, Zheng H, Liang D, Su T, Sun Y, Ge Y

pubmed logopapersMay 11 2025
ObjectiveThe purpose of this study is to perform multiple (<math xmlns="http://www.w3.org/1998/Math/MathML"><mo>≥</mo><mn>3</mn></math>) material decomposition with deep learning method for spectral cone-beam CT (CBCT) imaging based on ultra-slow kV switching.ApproachIn this work, a novel deep neural network called SkV-Net is developed to reconstruct multiple material density images from the ultra-sparse spectral CBCT projections acquired using the ultra-slow kV switching technique. In particular, the SkV-Net has a backbone structure of U-Net, and a multi-head axial attention module is adopted to enlarge the perceptual field. It takes the CT images reconstructed from each kV as input, and output the basis material images automatically based on their energy-dependent attenuation characteristics. Numerical simulations and experimental studies are carried out to evaluate the performance of this new approach.Main ResultsIt is demonstrated that the SkV-Net is able to generate four different material density images, i.e., fat, muscle, bone and iodine, from five spans of kV switched spectral projections. Physical experiments show that the decomposition errors of iodine and CaCl<math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow></mrow><mn>2</mn></msub></math> are less than 6<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>%</mi></math>, indicating high precision of this novel approach in distinguishing materials.SignificanceSkV-Net provides a promising multi-material decomposition approach for spectral CBCT imaging systems implemented with the ultra-slow kV switching scheme.

The March to Harmonized Imaging Standards for Retinal Imaging.

Gim N, Ferguson AN, Blazes M, Lee CS, Lee AY

pubmed logopapersMay 11 2025
The adoption of standardized imaging protocols in retinal imaging is critical to overcoming challenges posed by fragmented data formats across devices and manufacturers. The lack of standardization hinders clinical interoperability, collaborative research, and the development of artificial intelligence (AI) models that depend on large, high-quality datasets. The Digital Imaging and Communication in Medicine (DICOM) standard offers a robust solution for ensuring interoperability in medical imaging. Although DICOM is widely utilized in radiology and cardiology, its adoption in ophthalmology remains limited. Retinal imaging modalities such as optical coherence tomography (OCT), fundus photography, and OCT angiography (OCTA) have revolutionized retinal disease management but are constrained by proprietary and non-standardized formats. This review underscores the necessity for harmonized imaging standards in ophthalmology, detailing DICOM standards for retinal imaging including ophthalmic photography (OP), OCT, and OCTA, and their requisite metadata information. Additionally, the potential of DICOM standardization for advancing AI applications in ophthalmology is explored. A notable example is the Artificial Intelligence Ready and Equitable Atlas for Diabetes Insights (AI-READI) dataset, the first publicly available standards-compliant DICOM retinal imaging dataset. This dataset encompasses diverse retinal imaging modalities, including color fundus photography, infrared, autofluorescence, OCT, and OCTA. By leveraging multimodal retinal imaging, AI-READI provides a transformative resource for studying diabetes and its complications, setting a blueprint for future datasets aimed at harmonizing imaging formats and enabling AI-driven breakthroughs in ophthalmology. Our manuscript also addresses challenges in retinal imaging for diabetic patients, retinal imaging-based AI applications for studying diabetes, and potential advancements in retinal imaging standardization.

Intra- and Peritumoral Radiomics Based on Ultrasound Images for Preoperative Differentiation of Follicular Thyroid Adenoma, Carcinoma, and Follicular Tumor With Uncertain Malignant Potential.

Fu Y, Mei F, Shi L, Ma Y, Liang H, Huang L, Fu R, Cui L

pubmed logopapersMay 10 2025
Differentiating between follicular thyroid adenoma (FTA), carcinoma (FTC), and follicular tumor with uncertain malignant potential (FT-UMP) remains challenging due to their overlapping ultrasound characteristics. This retrospective study aimed to enhance preoperative diagnostic accuracy by utilizing intra- and peritumoral radiomics based on ultrasound images. We collected post-thyroidectomy ultrasound images from 774 patients diagnosed with FTA (n = 429), FTC (n = 158), or FT-UMP (n = 187) between January 2018 and December 2023. Six peritumoral regions were expanded by 5%-30% in 5% increments, with the segment-anything model utilizing prompt learning to detect the field of view and constrain the expanded boundaries. A stepwise classification strategy addressing three tasks was implemented: distinguishing FTA from the other types (task 1), differentiating FTC from FT-UMP (task 2), and classifying all three tumors. Diagnostic models were developed by combining radiomic features from tumor and peritumoral regions with clinical characteristics. Clinical characteristics combined with intratumoral and 5% peritumoral radiomic features performed best across all tasks (Test set: area under the curves, 0.93 for task 1 and 0.90 for task 2; diagnostic accuracy, 79.9%). The DeLong test indicated that all peritumoral radiomics significantly improved intratumoral radiomics performance and clinical characteristics (p < 0.04). The 5% peritumoral regions showed the best performance, though not all results were significant (p = 0.01-0.91). Ultrasound-based intratumoral and peritumoral radiomics can significantly enhance preoperative diagnostic accuracy for FTA, FTC, and FT-UMP, leading to improved treatment strategies and patient outcomes. Furthermore, the 5% peritumoral area may indicate regions of potential tumor invasion requiring further investigation.

Evaluating an information theoretic approach for selecting multimodal data fusion methods.

Zhang T, Ding R, Luong KD, Hsu W

pubmed logopapersMay 10 2025
Interest has grown in combining radiology, pathology, genomic, and clinical data to improve the accuracy of diagnostic and prognostic predictions toward precision health. However, most existing works choose their datasets and modeling approaches empirically and in an ad hoc manner. A prior study proposed four partial information decomposition (PID)-based metrics to provide a theoretical understanding of multimodal data interactions: redundancy, uniqueness of each modality, and synergy. However, these metrics have only been evaluated in a limited collection of biomedical data, and the existing work does not elucidate the effect of parameter selection when calculating the PID metrics. In this work, we evaluate PID metrics on a wider range of biomedical data, including clinical, radiology, pathology, and genomic data, and propose potential improvements to the PID metrics. We apply the PID metrics to seven different modality pairs across four distinct cohorts (datasets). We compare and interpret trends in the resulting PID metrics and downstream model performance in these multimodal cohorts. The downstream tasks being evaluated include predicting the prognosis (either overall survival or recurrence) of patients with non-small cell lung cancer, prostate cancer, and glioblastoma. We found that, while PID metrics are informative, solely relying on these metrics to decide on a fusion approach does not always yield a machine learning model with optimal performance. Of the seven different modality pairs, three had poor (0%), three had moderate (66%-89%), and only one had perfect (100%) consistency between the PID values and model performance. We propose two improvements to the PID metrics (determining the optimal parameters and uncertainty estimation) and identified areas where PID metrics could be further improved. The current PID metrics are not accurate enough for estimating the multimodal data interactions and need to be improved before they can serve as a reliable tool. We propose improvements and provide suggestions for future work. Code: https://github.com/zhtyolivia/pid-multimodal.

Improving Generalization of Medical Image Registration Foundation Model

Jing Hu, Kaiwei Yu, Hongjiang Xian, Shu Hu, Xin Wang

arxiv logopreprintMay 10 2025
Deformable registration is a fundamental task in medical image processing, aiming to achieve precise alignment by establishing nonlinear correspondences between images. Traditional methods offer good adaptability and interpretability but are limited by computational efficiency. Although deep learning approaches have significantly improved registration speed and accuracy, they often lack flexibility and generalizability across different datasets and tasks. In recent years, foundation models have emerged as a promising direction, leveraging large and diverse datasets to learn universal features and transformation patterns for image registration, thus demonstrating strong cross-task transferability. However, these models still face challenges in generalization and robustness when encountering novel anatomical structures, varying imaging conditions, or unseen modalities. To address these limitations, this paper incorporates Sharpness-Aware Minimization (SAM) into foundation models to enhance their generalization and robustness in medical image registration. By optimizing the flatness of the loss landscape, SAM improves model stability across diverse data distributions and strengthens its ability to handle complex clinical scenarios. Experimental results show that foundation models integrated with SAM achieve significant improvements in cross-dataset registration performance, offering new insights for the advancement of medical image registration technology. Our code is available at https://github.com/Promise13/fm_sam}{https://github.com/Promise13/fm\_sam.

Deeply Explainable Artificial Neural Network

David Zucker

arxiv logopreprintMay 10 2025
While deep learning models have demonstrated remarkable success in numerous domains, their black-box nature remains a significant limitation, especially in critical fields such as medical image analysis and inference. Existing explainability methods, such as SHAP, LIME, and Grad-CAM, are typically applied post hoc, adding computational overhead and sometimes producing inconsistent or ambiguous results. In this paper, we present the Deeply Explainable Artificial Neural Network (DxANN), a novel deep learning architecture that embeds explainability ante hoc, directly into the training process. Unlike conventional models that require external interpretation methods, DxANN is designed to produce per-sample, per-feature explanations as part of the forward pass. Built on a flow-based framework, it enables both accurate predictions and transparent decision-making, and is particularly well-suited for image-based tasks. While our focus is on medical imaging, the DxANN architecture is readily adaptable to other data modalities, including tabular and sequential data. DxANN marks a step forward toward intrinsically interpretable deep learning, offering a practical solution for applications where trust and accountability are essential.
Page 2 of 549 results
Show
per page
Get Started

Upload your X-ray image and get interpretation.

Upload now →

Disclaimer: X-ray Interpreter's AI-generated results are for informational purposes only and not a substitute for professional medical advice. Always consult a healthcare professional for medical diagnosis and treatment.