Sort by:
Page 5 of 768 results

Dharma: A novel machine learning framework for pediatric appendicitis--diagnosis, severity assessment and evidence-based clinical decision support.

Thapa, A., Pahari, S., Timilsina, S., Chapagain, B.

medrxiv logopreprintMay 29 2025
BackgroundAcute appendicitis remains a challenging diagnosis in pediatric populations, with high rates of misdiagnosis and negative appendectomies despite advances in imaging modalities. Current diagnostic tools, including clinical scoring systems like Alvarado and Pediatric Appendicitis Score (PAS), lack sufficient sensitivity and specificity, while reliance on CT scans raises concerns about radiation exposure, contrast hazards and sedation in children. Moreover, no established tool effectively predicts progression from uncomplicated to complicated appendicitis, creating a critical gap in clinical decision-making. ObjectiveTo develop and evaluate a machine learning model that integrates clinical, laboratory, and radiological findings for accurate diagnosis and complication prediction in pediatric appendicitis and to deploy this model as an interpretable web-based tool for clinical decision support. MethodsWe analyzed data from 780 pediatric patients (ages 0-18) with suspected appendicitis admitted to Childrens Hospital St. Hedwig, Regensburg, between 2016 and 2021. For severity prediction, our dataset was augmented with 430 additional cases from published literature and only the confirmed cases of acute appendicitis(n=602) were used. After feature selection using statistical methods and recursive feature elimination, we developed a Random Forest model named Dharma, optimized through hyperparameter tuning and cross-validation. Model performance was evaluated on independent test sets and compared with conventional diagnostic tools. ResultsDharma demonstrated superior diagnostic performance with an AUC-ROC of 0.96 ({+/-}0.02 SD) in cross-validation and 0.97-0.98 on independent test sets. At an optimal threshold of 64%, the model achieved specificity of 88%-98%, sensitivity of 89%-95%, and positive predictive value of 93%-99%. For complication prediction, Dharma attained a sensitivity of 93% ({+/-}0.05 SD) in cross-validation and 96% on the test set, with a negative predictive value of 98%. The model maintained strong performance even in cases where the appendix could not be visualized on ultrasonography (AUC-ROC 0.95, sensitivity 89%, specificity 87% at the threshold of 30%). ConclusionDharma is a novel, interpretable machine learning based clinical decision support tool designed to address the diagnostic challenges of pediatric appendicitis by integrating easily obtainable clinical, laboratory, and radiological data into a unified, real-time predictive framework. Unlike traditional scoring systems and imaging modalities, which may lack specificity or raise safety concerns in children, Dharma demonstrates high accuracy in diagnosing appendicitis and predicting progression from uncomplicated to complicated cases, potentially reducing unnecessary surgeries and CT scans. Its robust performance, even with incomplete imaging data, underscores its utility in resource-limited settings. Delivered through an intuitive, transparent, and interpretable web application, Dharma supports frontline providers--particularly in low- and middle-income settings--in making timely, evidence-based decisions, streamlining patient referrals, and improving clinical outcomes. By bridging critical gaps in current diagnostic and prognostic tools, Dharma offers a practical and accessible 21st-century solution tailored to real-world pediatric surgical care across diverse healthcare contexts. Furthermore, the underlying framework and concepts of Dharma may be adaptable to other clinical challenges beyond pediatric appendicitis, providing a foundation for broader applications of machine learning in healthcare. Author SummaryAccurate diagnosis of pediatric appendicitis remains challenging, with current clinical scores and imaging tests limited by sensitivity, specificity, predictive values, and safety concerns. We developed Dharma, an interpretable machine learning model that integrates clinical, laboratory, and radiological data to assist in diagnosing appendicitis and predicting its severity in children. Evaluated on a large dataset supplemented by published cases, Dharma demonstrated strong diagnostic and prognostic performance, including in cases with incomplete imaging--making it potentially especially useful in resource-limited settings for early decision-making and streamlined referrals. Available as a web-based tool, it provides real-time support to healthcare providers in making evidence-based decisions that could reduce negative appendectomies while avoiding hazards associated with advanced imaging modalities such as sedation, contrast, or radiation exposure. Furthermore, the open-access concepts and framework underlying Dharma have the potential to address diverse healthcare challenges beyond pediatric appendicitis.

ROC Analysis of Biomarker Combinations in Fragile X Syndrome-Specific Clinical Trials: Evaluating Treatment Efficacy via Exploratory Biomarkers

Norris, J. E., Berry-Kravis, E. M., Harnett, M. D., Reines, S. A., Reese, M., Auger, E. K., Outterson, A., Furman, J., Gurney, M. E., Ethridge, L. E.

medrxiv logopreprintMay 29 2025
Fragile X Syndrome (FXS) is a rare neurodevelopmental disorder caused by a trinucleotide repeat expansion on the 5 untranslated region of the FMR1 gene. FXS is characterized by intellectual disability, anxiety, sensory hypersensitivity, and difficulties with executive function. A recent phase 2 placebo-controlled clinical trial assessing BPN14770, a first-in-class phosphodiesterase 4D allosteric inhibitor, in 30 adult males (age 18-41 years) with FXS demonstrated cognitive improvements on the NIH Toolbox Cognitive Battery in domains related to language and caregiver reports of improvement in both daily functioning and language. However, individual physiological measures from electroencephalography (EEG) demonstrated only marginal significance for trial efficacy. A secondary analysis of resting state EEG data collected as part of the phase 2 clinical trial evaluating BPN14770 was conducted using a machine learning classification algorithm to classify trial conditions (i.e., baseline, drug, placebo) via linear EEG variable combinations. The algorithm identified a composite of peak alpha frequencies (PAF) across multiple brain regions as a potential biomarker demonstrating BPN14770 efficacy. Increased PAF from baseline was associated with drug but not placebo. Given the relationship between PAF and cognitive function among typically developed adults and those with intellectual disability, as well as previously reported reductions in alpha frequency and power in FXS, PAF represents a potential physiological measure of BPN14770 efficacy.

Multi-class classification of central and non-central geographic atrophy using Optical Coherence Tomography

Siraz, S., Kamanda, H., Gholami, S., Nabil, A. S., Ong, S. S. Y., Alam, M. N.

medrxiv logopreprintMay 28 2025
PurposeTo develop and validate deep learning (DL)-based models for classifying geographic atrophy (GA) subtypes using Optical Coherence Tomography (OCT) scans across four clinical classification tasks. DesignRetrospective comparative study evaluating three DL architectures on OCT data with two experimental approaches. Subjects455 OCT volumes (258 Central GA [CGA], 74 Non-Central GA [NCGA], 123 no GA [NGA]) from 104 patients at Atrium Health Wake Forest Baptist. For GA versus age-related macular degeneration (AMD) classification, we supplemented our dataset with AMD cases from four public repositories. MethodsWe implemented ResNet50, MobileNetV2, and Vision Transformer (ViT-B/16) architectures using two approaches: (1) utilizing all B-scans within each OCT volume and (2) selectively using B-scans containing foveal regions. Models were trained using transfer learning, standardized data augmentation, and patient-level data splitting (70:15:15 ratio) for training, validation, and testing. Main Outcome MeasuresArea under the receiver operating characteristic curve (AUC-ROC), F1 score, and accuracy for each classification task (CGA vs. NCGA, CGA vs. NCGA vs. NGA, GA vs. NGA, and GA vs. other forms of AMD). ResultsViT-B/16 consistently outperformed other architectures across all classification tasks. For CGA versus NCGA classification, ViT-B/16 achieved an AUC-ROC of 0.728{+/-}0.083 and accuracy of 0.831{+/-}0.006 using selective B-scans. In GA versus NGA classification, ViT-B/16 attained an AUC-ROC of 0.950{+/-}0.002 and accuracy of 0.873{+/-}0.012 with selective B-scans. All models demonstrated exceptional performance in distinguishing GA from other AMD forms (AUC-ROC>0.998). For multi-class classification, ViT-B/16 achieved an AUC-ROC of 0.873{+/-}0.003 and accuracy of 0.751{+/-}0.002 using selective B-scans. ConclusionsOur DL approach successfully classifies GA subtypes with clinically relevant accuracy. ViT-B/16 demonstrates superior performance due to its ability to capture spatial relationships between atrophic regions and the foveal center. Focusing on B-scans containing foveal regions improved diagnostic accuracy while reducing computational requirements, better aligning with clinical practice workflows.

Deep Learning for Pneumonia Diagnosis: A Custom CNN Approach with Superior Performance on Chest Radiographs

Mehta, A., Vyas, M.

medrxiv logopreprintMay 26 2025
A major global health and wellness issue causing major health problems and death, pneumonia underlines the need of quickly and precisely identifying and treating it. Though imaging technology has advanced, radiologists manual reading of chest X-rays still constitutes the basic method for pneumonia detection, which causes delays in both treatment and medical diagnosis. This study proposes a pneumonia detection method to automate the process using deep learning techniques. The concept employs a bespoke convolutional neural network (CNN) trained on different pneumonia-positive and pneumonia-negative cases from several healthcare providers. Various pre-processing steps were done on the chest radiographs to increase integrity and efficiency before teaching the design. Based on the comparison study with VGG19, ResNet50, InceptionV3, DenseNet201, and MobileNetV3, our bespoke CNN model was discovered to be the most efficient in balancing accuracy, recall, and parameter complexity. It shows 96.5% accuracy and 96.6% F1 score. This study contributes to the expansion of an automated, paired with a reliable, pneumonia finding system, which could improve personal outcomes and increase healthcare efficiency. The full project is available at here.

Novel Deep Learning Framework for Simultaneous Assessment of Left Ventricular Mass and Longitudinal Strain: Clinical Feasibility and Validation in Patients with Hypertrophic Cardiomyopathy

Park, J., Yoon, Y. E., Jang, Y., Jung, T., Jeon, J., Lee, S.-A., Choi, H.-M., Hwang, I.-C., Chun, E. J., Cho, G.-Y., Chang, H.-J.

medrxiv logopreprintMay 23 2025
BackgroundThis study aims to present the Segmentation-based Myocardial Advanced Refinement Tracking (SMART) system, a novel artificial intelligence (AI)-based framework for transthoracic echocardiography (TTE) that incorporates motion tracking and left ventricular (LV) myocardial segmentation for automated LV mass (LVM) and global longitudinal strain (LVGLS) assessment. MethodsThe SMART system demonstrates LV speckle tracking based on motion vector estimation, refined by structural information using endocardial and epicardial segmentation throughout the cardiac cycle. This approach enables automated measurement of LVMSMART and LVGLSSMART. The feasibility of SMART is validated in 111 hypertrophic cardiomyopathy (HCM) patients (median age: 58 years, 69% male) who underwent TTE and cardiac magnetic resonance imaging (CMR). ResultsLVGLSSMART showed a strong correlation with conventional manual LVGLS measurements (Pearsons correlation coefficient [PCC] 0.851; mean difference 0 [-2-0]). When compared to CMR as the reference standard for LVM, the conventional dimension-based TTE method overestimated LVM (PCC 0.652; mean difference: 106 [90-123]), whereas LVMSMART demonstrated excellent agreement with CMR (PCC 0.843; mean difference: 1 [-11-13]). For predicting extensive myocardial fibrosis, LVGLSSMART and LVMSMART exhibited performance comparable to conventional LVGLS and CMR (AUC: 0.72 and 0.66, respectively). Patients identified as high-risk for extensive fibrosis by LVGLSSMART and LVMSMART had significantly higher rates of adverse outcomes, including heart failure hospitalization, new-onset atrial fibrillation, and defibrillator implantation. ConclusionsThe SMART technique provides a comparable LVGLS evaluation and a more accurate LVM assessment than conventional TTE, with predictive values for myocardial fibrosis and adverse outcomes. These findings support its utility in HCM management.

Artificial Intelligence enhanced R1 maps can improve lesion detection in focal epilepsy in children

Doumou, G., D'Arco, F., Figini, M., Lin, H., Lorio, S., Piper, R., O'Muircheartaigh, J., Cross, H., Weiskopf, N., Alexander, D., Carmichael, D. W.

medrxiv logopreprintMay 23 2025
Background and purposeMRI is critical for the detection of subtle cortical pathology in epilepsy surgery assessment. This can be aided by improved MRI quality and resolution using ultra-high field (7T). But poor access and long scan durations limit widespread use, particularly in a paediatric setting. AI-based learning approaches may provide similar information by enhancing data obtained with conventional MRI (3T). We used a convolutional neural network trained on matched 3T and 7T images to enhance quantitative R1-maps (longitudinal relaxation rate) obtained at 3T in paediatric epilepsy patients and to determine their potential clinical value for lesion identification. Materials and MethodsA 3D U-Net was trained using paired patches from 3T and 7T R1-maps from n=10 healthy volunteers. The trained network was applied to enhance paediatric focal epilepsy 3T R1 images from a different scanner/site (n=17 MRI lesion positive / n=14 MR-negative). Radiological review assessed image quality, as well as lesion identification and visualization of enhanced maps in comparison to the 3T R1-maps without clinical information. Lesion appearance was then compared to 3D-FLAIR. ResultsAI enhanced R1 maps were superior in terms of image quality in comparison to the original 3T R1 maps, while preserving and enhancing the visibility of lesions. After exclusion of 5/31 patients (due to movement artefact or incomplete data), lesions were detected in AI Enhanced R1 maps for 14/15 (93%) MR-positive and 4/11 (36%) MR-negative patients. ConclusionAI enhanced R1 maps improved the visibility of lesions in MR positive patients, as well as providing higher sensitivity in the MR-negative group compared to either the original 3T R1-maps or 3D-FLAIR. This provides promising initial evidence that 3T quantitative maps can outperform conventional 3T imaging via enhancement by an AI model trained on 7T MRI data, without the need for pathology-specific information.

FLAMeS: A Robust Deep Learning Model for Automated Multiple Sclerosis Lesion Segmentation

Dereskewicz, E., La Rosa, F., dos Santos Silva, J., Sizer, E., Kohli, A., Wynen, M., Mullins, W. A., Maggi, P., Levy, S., Onyemeh, K., Ayci, B., Solomon, A. J., Assländer, J., Al-Louzi, O., Reich, D. S., Sumowski, J. F., Beck, E. S.

medrxiv logopreprintMay 22 2025
Background and Purpose Assessment of brain lesions on MRI is crucial for research in multiple sclerosis (MS). Manual segmentation is time consuming and inconsistent. We aimed to develop an automated MS lesion segmentation algorithm for T2-weighted fluid-attenuated inversion recovery (FLAIR) MRI. Methods We developed FLAIR Lesion Analysis in Multiple Sclerosis (FLAMeS), a deep learning-based MS lesion segmentation algorithm based on the nnU-Net 3D full-resolution U-Net and trained on 668 FLAIR 1.5 and 3 tesla scans from persons with MS. FLAMeS was evaluated on three external datasets: MSSEG-2 (n=14), MSLesSeg (n=51), and a clinical cohort (n=10), and compared to SAMSEG, LST-LPA, and LST-AI. Performance was assessed qualitatively by two blinded experts and quantitatively by comparing automated and ground truth lesion masks using standard segmentation metrics. Results In a blinded qualitative review of 20 scans, both raters selected FLAMeS as the most accurate segmentation in 15 cases, with one rater favoring FLAMeS in two additional cases. Across all testing datasets, FLAMeS achieved a mean Dice score of 0.74, a true positive rate of 0.84, and an F1 score of 0.78, consistently outperforming the benchmark methods. For other metrics, including positive predictive value, relative volume difference, and false positive rate, FLAMeS performed similarly or better than benchmark methods. Most lesions missed by FLAMeS were smaller than 10 mm3, whereas the benchmark methods missed larger lesions in addition to smaller ones. Conclusions FLAMeS is an accurate, robust method for MS lesion segmentation that outperforms other publicly available methods.

Radiomics-Based Early Triage of Prostate Cancer: A Multicenter Study from the CHAIMELEON Project

Vraka, A., Marfil-Trujillo, M., Ribas-Despuig, G., Flor-Arnal, S., Cerda-Alberich, L., Jimenez-Gomez, P., Jimenez-Pastor, A., Marti-Bonmati, L.

medrxiv logopreprintMay 22 2025
Prostate cancer (PCa) is the most commonly diagnosed malignancy in men worldwide. Accurate triage of patients based on tumor aggressiveness and staging is critical for selecting appropriate management pathways. While magnetic resonance imaging (MRI) has become a mainstay in PCa diagnosis, most predictive models rely on multiparametric imaging or invasive inputs, limiting generalizability in real-world clinical settings. This study aimed to develop and validate machine learning (ML) models using radiomic features extracted from T2-weighted MRI--alone and in combination with clinical variables--to predict ISUP grade (tumor aggressiveness), lymph node involvement (cN) and distant metastasis (cM). A retrospective multicenter cohort from three European sites in the Chaimeleon project was analyzed. Radiomic features were extracted from prostate zone segmentations and lesion masks, following standardized preprocessing and ComBat harmonization. Feature selection and model optimization were performed using nested cross-validation and Bayesian tuning. Hybrid models were trained using XGBoost and interpreted with SHAP values. The ISUP model achieved an AUC of 0.66, while the cN and cM models reached AUCs of 0.77 and 0.80, respectively. The best-performing models consistently combined prostate zone radiomics with clinical features such as PSA, PIRADSv2 and ISUP grade. SHAP analysis confirmed the importance of both clinical and texture-based radiomic features, with entropy and non-uniformity measures playing central roles in all tasks. Our results demonstrate the feasibility of using T2-weighted MRI and zonal radiomics for robust prediction of aggressiveness, nodal involvement and distant metastasis in PCa. This fully automated pipeline offers an interpretable, accessible and clinically translatable tool for first-line PCa triage, with potential integration into real-world diagnostic workflows.

A Deep Learning Vision-Language Model for Diagnosing Pediatric Dental Diseases

Pham, T.

medrxiv logopreprintMay 22 2025
This study proposes a deep learning vision-language model for the automated diagnosis of pediatric dental diseases, with a focus on differentiating between caries and periapical infections. The model integrates visual features extracted from panoramic radiographs using methods of non-linear dynamics and textural encoding with textual descriptions generated by a large language model. These multimodal features are concatenated and used to train a 1D-CNN classifier. Experimental results demonstrate that the proposed model outperforms conventional convolutional neural networks and standalone language-based approaches, achieving high accuracy (90%), sensitivity (92%), precision (92%), and an AUC of 0.96. This work highlights the value of combining structured visual and textual representations in improving diagnostic accuracy and interpretability in dental radiology. The approach offers a promising direction for the development of context-aware, AI-assisted diagnostic tools in pediatric dental care.

Cardiac Magnetic Resonance Imaging in the German National Cohort: Automated Segmentation of Short-Axis Cine Images and Post-Processing Quality Control

Full, P. M., Schirrmeister, R. T., Hein, M., Russe, M. F., Reisert, M., Ammann, C., Greiser, K. H., Niendorf, T., Pischon, T., Schulz-Menger, J., Maier-Hein, K. H., Bamberg, F., Rospleszcz, S., Schlett, C. L., Schuppert, C.

medrxiv logopreprintMay 21 2025
PurposeTo develop a segmentation and quality control pipeline for short-axis cardiac magnetic resonance (CMR) cine images from the prospective, multi-center German National Cohort (NAKO). Materials and MethodsA deep learning model for semantic segmentation, based on the nnU-Net architecture, was applied to full-cycle short-axis cine images from 29,908 baseline participants. The primary objective was to determine data on structure and function for both ventricles (LV, RV), including end diastolic volumes (EDV), end systolic volumes (ESV), and LV myocardial mass. Quality control measures included a visual assessment of outliers in morphofunctional parameters, inter- and intra-ventricular phase differences, and LV time-volume curves (TVC). These were adjudicated using a five-point rating scale, ranging from five (excellent) to one (non-diagnostic), with ratings of three or lower subject to exclusion. The predictive value of outlier criteria for inclusion and exclusion was analyzed using receiver operating characteristics. ResultsThe segmentation model generated complete data for 29,609 participants (incomplete in 1.0%) and 5,082 cases (17.0 %) were visually assessed. Quality assurance yielded a sample of 26,899 participants with excellent or good quality (89.9%; exclusion of 1,875 participants due to image quality issues and 835 cases due to segmentation quality issues). TVC was the strongest single discriminator between included and excluded participants (AUC: 0.684). Of the two-category combinations, the pairing of TVC and phases provided the greatest improvement over TVC alone (AUC difference: 0.044; p<0.001). The best performance was observed when all three categories were combined (AUC: 0.748). Extending the quality-controlled sample to include acceptable quality ratings, a total of 28,413 (95.0%) participants were available. ConclusionThe implemented pipeline facilitated the automated segmentation of an extensive CMR dataset, integrating quality control measures. This methodology ensures that ensuing quantitative analyses are conducted with a diminished risk of bias.
Page 5 of 768 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.