Sort by:
Page 2 of 45449 results

NextGen lung disease diagnosis with explainable artificial intelligence.

Veeramani N, S A RS, S SP, S S, Jayaraman P

pubmed logopapersSep 26 2025
The COVID-19 pandemic has been the most catastrophic global health emergency of the [Formula: see text] century, resulting in hundreds of millions of reported cases and five million deaths. Chest X-ray (CXR) images are highly valuable for early detection of lung diseases in monitoring and investigating pulmonary disorders such as COVID-19, pneumonia, and tuberculosis. These CXR images offer crucial features about the lung's health condition and can assist in making accurate diagnoses. Manual interpretation of CXR images is challenging even for expert radiologists due to the overlapping radiological features. Therefore, Artificial Intelligence (AI) based image processing took over the charge in healthcare. But still it is uncertain to trust the prediction results by an AI model. However, this can be resolved by implementing explainable artificial intelligence (XAI) tools that transform a black-box AI into a glass-box model. In this research article, we have proposed a novel XAI-TRANS model with inception based transfer learning addressing the challenge of overlapping features in multiclass classification of CXR images. Also, we proposed an improved U-Net Lung segmentation dedicated to obtaining the radiological features for classification. The proposed approach achieved a maximum precision of 98% and accuracy of 97% in multiclass lung disease classification. By leveraging XAI techniques with the evident improvement of 4.75%, specifically LIME and Grad-CAM, to provide detailed and accurate explanations for the model's prediction.

Enhanced CoAtNet based hybrid deep learning architecture for automated tuberculosis detection in human chest X-rays.

Siddharth G, Ambekar A, Jayakumar N

pubmed logopapersSep 26 2025
Tuberculosis (TB) is a serious infectious disease that remains a global health challenge. While chest X-rays (CXRs) are widely used for TB detection, manual interpretation can be subjective and time-consuming. Automated classification of CXRs into TB and non-TB cases can significantly support healthcare professionals in timely and accurate diagnosis. This paper introduces a hybrid deep learning approach for classifying CXR images. The solution is based on the CoAtNet framework, which combines the strengths of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). The model is pre-trained on the large-scale ImageNet dataset to ensure robust generalization across diverse images. The evaluation is conducted on the IN-CXR tuberculosis dataset from ICMR-NIRT, which contains a comprehensive collection of CXR images of both normal and abnormal categories. The hybrid model achieves a binary classification accuracy of 86.39% and an ROC-AUC score of 93.79%, outperforming tested baseline models that rely exclusively on either CNNs or ViTs when trained on this dataset. Furthermore, the integration of Local Interpretable Model-agnostic Explanations (LIME) enhances the interpretability of the model's predictions. This combination of reliable performance and transparent, interpretable results strengthens the model's role in AI-driven medical imaging research. Code will be made available upon request.

A Deep Learning-Based EffConvNeXt Model for Automatic Classification of Cystic Bronchiectasis: An Explainable AI Approach.

Tekin V, Tekinhatun M, Özçelik STA, Fırat H, Üzen H

pubmed logopapersSep 25 2025
Cystic bronchiectasis and pneumonia are respiratory conditions that significantly impact morbidity and mortality worldwide. Diagnosing these diseases accurately is crucial, as early detection can greatly improve patient outcomes. These diseases are respiratory conditions that present with overlapping features on chest X-rays (CXR), making accurate diagnosis challenging. Recent advancements in deep learning (DL) have improved diagnostic accuracy in medical imaging. This study proposes the EffConvNeXt model, a hybrid approach combining EfficientNetB1 and ConvNeXtTiny, designed to enhance classification accuracy for cystic bronchiectasis, pneumonia, and normal cases in CXRs. The model effectively balances EfficientNetB1's efficiency with ConvNeXtTiny's advanced feature extraction, allowing for better identification of complex patterns in CXR images. Additionally, the EffConvNeXt model combines EfficientNetB1 and ConvNeXtTiny, addressing limitations of each model individually: EfficientNetB1's SE blocks improve focus on critical image areas while keeping the model lightweight and fast, and ConvNeXtTiny enhances detection of subtle abnormalities, making the combined model highly effective for rapid and accurate CXR image analysis in clinical settings. For the performance analysis of the EffConvNeXt model, experimental studies were conducted using 5899 CXR images collected from Dicle University Medical Faculty. When used individually, ConvNeXtTiny achieved an accuracy rate of 97.12%, while EfficientNetB1 reached 97.79%. By combining both models, the EffConvNeXt raised the accuracy to 98.25%, showing a 0.46% improvement. With this result, the other tested DL models fell behind. These findings indicate that EffConvNeXt provides a reliable, automated solution for distinguishing cystic bronchiectasis and pneumonia, supporting clinical decision-making with enhanced diagnostic accuracy.

Proof-of-concept comparison of an artificial intelligence-based bone age assessment tool with Greulich-Pyle and Tanner-Whitehouse version 2 methods in a pediatric cohort.

Marinelli L, Lo Mastro A, Grassi F, Berritto D, Russo A, Patanè V, Festa A, Grassi E, Grandone A, Nasto LA, Pola E, Reginelli A

pubmed logopapersSep 25 2025
Bone age assessment is essential in evaluating pediatric growth disorders. Artificial intelligence (AI) systems offer potential improvements in accuracy and reproducibility compared to traditional methods. To compare the performance of a commercially available artificial intelligence-based software (BoneView BoneAge, Gleamer, Paris, France) against two human-assessed methods-the Greulich-Pyle (GP) atlas and Tanner-Whitehouse version 2 (TW2)-in a pediatric population. This proof-of-concept study included 203 pediatric patients (mean age, 9.0 years; range, 2.0-17.0 years) who underwent hand and wrist radiographs for suspected endocrine or growth-related conditions. After excluding technically inadequate images, 157 cases were analyzed using AI and GP-assessed methods. A subset of 35 patients was also evaluated using the TW2 method by a pediatric endocrinologist. Performance was measured using mean absolute error (MAE), root mean square error (RMSE), bias, and Pearson's correlation coefficient, using chronological age as reference. The AI model achieved a MAE of 1.38 years, comparable to the radiologist's GP-based estimate (MAE, 1.30 years), and superior to TW2 (MAE, 2.86 years). RMSE values were 1.75 years, 1.80 years, and 3.88 years, respectively. AI showed minimal bias (-0.05 years), while TW2-based assessments systematically underestimated bone age (bias, -2.63 years). Strong correlations with chronological age were observed for AI (r=0.857) and GP (r=0.894), but not for TW2 (r=0.490). BoneView demonstrated comparable accuracy to radiologist-assessed GP method and outperformed TW2 assessments in this cohort. AI-based systems may enhance consistency in pediatric bone age estimation but require careful validation, especially in ethnically diverse populations.

Clinical deployment and prospective validation of an AI model for limb-length discrepancy measurements using an open-source platform.

Tsai A, Samal S, Lamonica P, Morris N, McNeil J, Pienaar R

pubmed logopapersSep 24 2025
To deploy an AI model to measure limb-length discrepancy (LLD) and prospectively validate its performance. We encoded the inference of an LLD AI model into a docker container, incorporated it into a computational platform for clinical deployment, and conducted two prospective validation studies: a shadow trial (07/2024-9/2024) and a clinical trial (11/2024-01/2025). During each trial period, we queried for LLD EOS scanograms to serve as inputs to our model. For the shadow trial, we hid the AI-annotated outputs from the radiologists, and for the clinical trial, we displayed the AI-annotated output to the radiologists at the time of study interpretation. Afterward, we collected the bilateral femoral and tibial lengths from the radiology reports and compared them against those generated by the AI model. We used median absolute difference (MAD) and interquartile range (IQR) as summary statistics to assess the performance of our model. Our shadow trial consisted of 84 EOS scanograms from 84 children, with 168 femoral and tibial lengths. The MAD (IQR) of the femoral and tibial lengths were 0.2 cm (0.3 cm) and 0.2 cm (0.3 cm), respectively. Our clinical trial consisted of 114 EOS scanograms from 114 children, with 228 femoral and tibial lengths. The MAD (IQR) of the femoral and tibial lengths were 0.3 cm (0.4 cm) and 0.2 cm (0.3 cm), respectively. We successfully employed a computational platform for seamless integration and deployment of an LLD AI model into our clinical workflow, and prospectively validated its performance. Question No AI models have been clinically deployed for limb-length discrepancy (LLD) assessment in children, and the prospective validation of these models is unknown. Findings We deployed an LLD AI model using a homegrown platform, with prospective trials showing a median absolute difference of 0.2-0.3 cm in estimating bone lengths. Clinical relevance An LLD AI model with performance comparable to that of radiologists can serve as a secondary reader in increasing the confidence and accuracy of LLD measurements.

Revisiting Performance Claims for Chest X-Ray Models Using Clinical Context

Andrew Wang, Jiashuo Zhang, Michael Oberst

arxiv logopreprintSep 24 2025
Public healthcare datasets of Chest X-Rays (CXRs) have long been a popular benchmark for developing computer vision models in healthcare. However, strong average-case performance of machine learning (ML) models on these datasets is insufficient to certify their clinical utility. In this paper, we use clinical context, as captured by prior discharge summaries, to provide a more holistic evaluation of current ``state-of-the-art'' models for the task of CXR diagnosis. Using discharge summaries recorded prior to each CXR, we derive a ``prior'' or ``pre-test'' probability of each CXR label, as a proxy for existing contextual knowledge available to clinicians when interpreting CXRs. Using this measure, we demonstrate two key findings: First, for several diagnostic labels, CXR models tend to perform best on cases where the pre-test probability is very low, and substantially worse on cases where the pre-test probability is higher. Second, we use pre-test probability to assess whether strong average-case performance reflects true diagnostic signal, rather than an ability to infer the pre-test probability as a shortcut. We find that performance drops sharply on a balanced test set where this shortcut does not exist, which may indicate that much of the apparent diagnostic power derives from inferring this clinical context. We argue that this style of analysis, using context derived from clinical notes, is a promising direction for more rigorous and fine-grained evaluation of clinical vision models.

Comparative Evaluation of Radiomics and Deep Learning Models for Disease Detection in Chest Radiography.

He Z, McMillan AB

pubmed logopapersSep 23 2025
The application of artificial intelligence (AI) in medical imaging has revolutionized diagnostic practices, enabling advanced analysis and interpretation of radiological data. This study presents a comprehensive evaluation of radiomics-based and deep learning-based approaches for disease detection in chest radiography, focusing on COVID-19, lung opacity, and viral pneumonia. While deep learning models, particularly convolutional neural networks (CNNs) and vision transformers (ViTs), learn directly from image data, radiomics-based models extract handcrafted features, offering potential advantages in data-limited scenarios. We systematically compared the diagnostic performance of various AI models, including Decision Trees, Gradient Boosting, Random Forests, Support Vector Machines (SVMs), and Multi-Layer Perceptrons (MLPs) for radiomics, against state-of-the-art deep learning models such as InceptionV3, EfficientNetL, and ConvNeXtXLarge. Performance was evaluated across multiple sample sizes. At 24 samples, EfficientNetL achieved an AUC of 0.839, outperforming SVM (AUC = 0.762). At 4000 samples, InceptionV3 achieved the highest AUC of 0.996, compared to 0.885 for Random Forest. A Scheirer-Ray-Hare test confirmed significant main and interaction effects of model type and sample size on all metrics. Post hoc Mann-Whitney U tests with Bonferroni correction further revealed consistent performance advantages for deep learning models across most conditions. These findings provide statistically validated, data-driven recommendations for model selection in diagnostic AI. Deep learning models demonstrated higher performance and better scalability with increasing data availability, while radiomics-based models may remain useful in low-data contexts. This study addresses a critical gap in AI-based diagnostic research by offering practical guidance for deploying AI models across diverse clinical environments.

Explainable AI-driven analysis of radiology reports using text and image data: An experimental study.

Zamir MT, Khan SU, Gelbukh A, Felipe Riverón EM, Gelbukh I

pubmed logopapersSep 22 2025
Artificial intelligence is increasingly being integrated into clinical diagnostics, yet its lack of transparency hinders trust and adoption among healthcare professionals. The explainable AI (XAI) has the potential to improve interpretability and reliability of AI-based decisions in clinical practice. This study evaluates the use of Explainable AI (XAI) for interpreting radiology reports to improve healthcare practitioners' confidence and comprehension of AI-assisted diagnostics. This study employed the Indiana University chest X-ray Dataset containing 3169 textual reports and 6471 images. Textual were being classified as either normal or abnormal by using a range of machine learning approaches. This includes traditional machine learning models and ensemble methods, deep learning models (LSTM), and advanced transformer-based language models (GPT-2, T5, LLaMA-2, LLaMA-3.1). For image-based classifications, convolution neural networks (CNNs) including DenseNet121, and DenseNet169 were used. Top performing models were interpreted using Explainable AI (XAI) methods SHAP and LIME to support clinical decision making by enhancing transparency and trust in model predictions. LLaMA-3.1 model achieved highest accuracy of 98% in classifying the textual radiology reports. Statistical analysis confirmed the model robustness, with Cohen's kappa (k=0.981) indicating near perfect agreement beyond chance, both Chi-Square and Fisher's Exact test revealing a high significant association between actual and predicted labels (p<0.0001). Although McNemar's Test yielded a non-significant result (p=0.25) suggests balance class performance. While the highest accuracy of 84% was achieved in the analysis of imaging data using the DenseNet169 and DenseNet121 models. To assess explainability, LIME and SHAP were applied to best performing models. These models consistently highlighted the medical related terms such as "opacity", "consolidation" and "pleural" are clear indication for abnormal finding in textual reports. The research underscores that explainability is an essential component of any AI systems used in diagnostics and helpful in the design and implementation of AI in the healthcare sector. Such approach improves the accuracy of the diagnosis and builds confidence in health workers, who in the future will use explainable AI in clinical settings, particularly in the application of AI explainability for medical purposes.

The optimal diagnostic assistance system for predicting three-dimensional contact between mandibular third molars and the mandibular canal on panoramic radiographs.

Fukuda M, Nomoto D, Nozawa M, Kise Y, Kuwada C, Kubo H, Ariji E, Ariji Y

pubmed logopapersSep 22 2025
This study aimed to identify the most effective diagnostic assistance system for assessing the relationship between mandibular third molars (M3M) and mandibular canals (MC) using panoramic radiographs. In total, 2,103 M3M were included from patients in whom the M3M and MC overlapped on panoramic radiographs. All M3M were classified into high-risk and low-risk groups based on the degree of contact with the MC observed on computed tomography. The contact classification was evaluated using four machine learning models (Prediction One software, AdaBoost, XGBoost, and random forest), three convolutional neural networks (CNNs) (EfficientNet-B0, ResNet18, and Inception v3), and three human observers (two radiologists and one oral surgery resident). Receiver operating characteristic curves were plotted; the area under the curve (AUC), accuracy, sensitivity, and specificity were calculated. Factors contributing to prediction of high-risk cases by machine learning models were identified. Machine learning models demonstrated AUC values ranging from 0.84 to 0.88, with accuracy ranging from 0.81 to 0.88 and sensitivity of 0.80, indicating consistently strong performance. Among the CNNs, ResNet18 achieved the best performance, with an AUC of 0.83. The human observers exhibited AUC values between 0.67 and 0.80. Three factors were identified as contributing to prediction of high-risk cases by machine learning models: increased root radiolucency, diversion of the MC, and narrowing of the MC. Machine learning models demonstrated strong performance in predicting the three-dimensional relationship between the M3M and MC.

Beyond Diagnosis: Evaluating Multimodal LLMs for Pathology Localization in Chest Radiographs

Advait Gosai, Arun Kavishwar, Stephanie L. McNamara, Soujanya Samineni, Renato Umeton, Alexander Chowdhury, William Lotter

arxiv logopreprintSep 22 2025
Recent work has shown promising performance of frontier large language models (LLMs) and their multimodal counterparts in medical quizzes and diagnostic tasks, highlighting their potential for broad clinical utility given their accessible, general-purpose nature. However, beyond diagnosis, a fundamental aspect of medical image interpretation is the ability to localize pathological findings. Evaluating localization not only has clinical and educational relevance but also provides insight into a model's spatial understanding of anatomy and disease. Here, we systematically assess two general-purpose MLLMs (GPT-4 and GPT-5) and a domain-specific model (MedGemma) in their ability to localize pathologies on chest radiographs, using a prompting pipeline that overlays a spatial grid and elicits coordinate-based predictions. Averaged across nine pathologies in the CheXlocalize dataset, GPT-5 exhibited a localization accuracy of 49.7%, followed by GPT-4 (39.1%) and MedGemma (17.7%), all lower than a task-specific CNN baseline (59.9%) and a radiologist benchmark (80.1%). Despite modest performance, error analysis revealed that GPT-5's predictions were largely in anatomically plausible regions, just not always precisely localized. GPT-4 performed well on pathologies with fixed anatomical locations, but struggled with spatially variable findings and exhibited anatomically implausible predictions more frequently. MedGemma demonstrated the lowest performance on all pathologies, showing limited capacity to generalize to this novel task. Our findings highlight both the promise and limitations of current MLLMs in medical imaging and underscore the importance of integrating them with task-specific tools for reliable use.
Page 2 of 45449 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.