Sort by:
Page 8 of 6036030 results

Kushan Choudhury, Shubhrodeep Roy, Ankur Chanda, Shubhajit Biswas, Somenath Kuiry

arxiv logopreprintOct 22 2025
Deep learning models, especially convolutional neural networks, have achieved impressive results in medical image classification. However, these models often produce overconfident predictions, which can undermine their reliability in critical healthcare settings. While traditional label smoothing offers a simple way to reduce such overconfidence, it fails to consider relationships between classes by treating all non-target classes equally. In this study, we explore the use of Online Label Smoothing (OLS), a dynamic approach that adjusts soft labels throughout training based on the model's own prediction patterns. We evaluate OLS on the large-scale RadImageNet dataset using three widely used architectures: ResNet-50, MobileNetV2, and VGG-19. Our results show that OLS consistently improves both Top-1 and Top-5 classification accuracy compared to standard training methods, including hard labels, conventional label smoothing, and teacher-free knowledge distillation. In addition to accuracy gains, OLS leads to more compact and well-separated feature embeddings, indicating improved representation learning. These findings suggest that OLS not only strengthens predictive performance but also enhances calibration, making it a practical and effective solution for developing trustworthy AI systems in the medical imaging domain.

Shiqi Dai, Wei Dai, Jiaee Cheong, Paul Pu Liang

arxiv logopreprintOct 22 2025
Medical artificial intelligence systems have achieved remarkable diagnostic capabilities, yet they consistently exhibit performance disparities across demographic groups, causing real-world harm to underrepresented populations. While recent multimodal reasoning foundation models have advanced clinical diagnosis through integrated analysis of diverse medical data, reasoning trainings via reinforcement learning inherit and often amplify biases present in training datasets dominated by majority populations. We introduce Fairness-aware Group Relative Policy Optimization (FairGRPO), a hierarchical reinforcement learning approach that promotes equitable learning across heterogeneous clinical populations. FairGRPO employs adaptive importance weighting of advantages based on representation, task difficulty, and data source. To address the common issue of missing demographic labels in the clinical domain, we further employ unsupervised clustering, which automatically discovers latent demographic groups when labels are unavailable. Through comprehensive experiments across 7 clinical diagnostic datasets spanning 5 clinical modalities across X-ray, CT scan, dermoscropy, mammography and ultrasound, we demonstrate that FairGRPO reduces predictive parity by 27.2% against all vanilla and bias mitigated RL baselines, while improving F1 score by 12.49%. Furthermore, training dynamics analysis reveals that FairGRPO progressively improves fairness throughout optimization, while baseline RL methods exhibit deteriorating fairness as training progresses. Based on FairGRPO, we release FairMedGemma-4B, a fairness-aware clinical VLLM that achieves state-of-the-art performance while demonstrating significantly reduced disparities across demographic groups.

Chen P, Wang J, Guo Y, Wang Y

pubmed logopapersOct 22 2025
Deformable image registration is essential in medical image analysis. The state-of-the-art approaches are unsupervised methods based on convolutional neural networks (CNN) and vision transformers (ViT). While CNNs perform well in extracting local features, ViTs perform better in extracting global features. This study aimed to compare the performance of CNN and ViT in unsupervised deformable image registration. We have proposed a unified registration framework and evaluated both architectures. Experiments have been conducted using 4D-CT. The results have shown ViT-based registration to achieve superior performance compared to CNN-based methods. The findings have indicated vision transformer architectures to be more effective than convolutional networks for unsupervised deformable registration on 4D-CT data. Foundation Item: This work is supported by the National Natural Science Foundation of China (No.61801413).

Löffler MT, Joseph GB, Lynch JA, Lane NE, Pedoia V, Majumdar S, Nevitt M, McCulloch C, Link TM

pubmed logopapersOct 22 2025
We investigated whether cartilage composition and thickness and its change over time were associated with future intermittent and constant knee pain. Osteoarthritis Initiative participants with 3T MRI scans from baseline to 36-month visits were selected. Outcomes were Intermittent and Constant Osteoarthritis Pain (ICOAP) scores in the right knee at the 48-month visit (0 to 100 = highest pain). We measured T<sub>2</sub> values and cartilage thickness in 5 regions in the right knee from baseline to 36-months using deep-learning-based segmentation. Associations between baseline and change in cartilage biomarkers with pain scores were tested using adjusted logistic and linear regression models. Of 3780 included participants, 1042(28%) had symptomatic knee OA in any knee at baseline. At 48 months, 1671(44%) had intermittent and 265(7%) constant pain in the right knee. Odds for having intermittent knee pain increased with longer baseline T<sub>2</sub> values in medial and lateral femoral cartilage (OR[95%CI]: 1.05[1.02–1.08] and 1.06[1.03–1.09] for 1 ms longer) and thinner baseline patellar cartilage (0.65[0.53–0.81] for 1 mm thicker). Greater annual rates of patellar cartilage thinning were associated with higher odds of constant knee pain (93.4[7.66–1139] for 1 mm/yr greater). Among those with knee pain, greater annual rates of increase in medial and lateral tibial cartilage T<sub>2</sub> led to more intermittent knee pain (percent change[95%CI]: 8.02[2.87–13.4] and 7.85[3.39–12.5] for 1 ms/yr greater). Thicker lateral tibial cartilage at baseline led to less constant knee pain (beta coeff.[95%CI]: -11.8[-19.8–3.76] for 1 mm thicker). Impaired femoral cartilage composition, indicated by longer T<sub>2</sub> values, preceded intermittent knee pain found in early-stage OA. Constant knee pain characteristic for late-stage OA was related to greater cartilage thickness loss. The online version contains supplementary material available at 10.1186/s13075-025-03667-9.

Wang W, Li D, Luo F, Zeng W, Dan Q, Dai C, Hu Y, Zhong J

pubmed logopapersOct 22 2025
Osteoporosis (OP), characterized by bone mineral density (BMD) loss and microstructural deterioration, remains underdiagnosed due to the limitations of conventional methods (DXA/QCT). Early and accurate diagnosis of OP is crucial for optimizing treatment strategies and improving prognosis. To develop and validate a predictive model integrating clinical data, MRI radiomics, and Vision Transformer (ViT) features for enhanced diagnosis and risk assessment of OP. This retrospective dual-center study enrolled 1,095 patients with chronic low back pain (median age: 69 years; 60% female). We developed a 3D ViT model using combined T1WI and T2WI lumbar MRI, simultaneously extracting ViT-based deep features and radiomic features from segmented L1-L3 vertebrae. Feature selection was performed using t-test and LASSO regression. Logistic regression classifiers were constructed to compare standalone ViT and radiomics models, followed by an integrated model incorporating clinical variables, radiomic features, and ViT features. Model performance was assessed using AUC, accuracy, sensitivity, specificity, F1 score, precision, confusion matrices, calibration curves, and decision curve analysis (DCA). Interpretability was achieved through clinical nomogram and SHAP visualization. Among 1,095 patients (age 69[9] years; 657 [60%] female), age and gender emerged as clinical risk factors. The MRI-based ViT model achieved higher AUCs than the radiomics model in both internal (0.844 vs. 0.697) and external (0.745 vs. 0.654) test sets. The combined model demonstrated superior performance with AUCs of 0.855 (internal) and 0.806 (external). The combined model significantly improves OP diagnostic accuracy and clinical utility, with ViT features critically enhancing predictive performance, establishing a promising tool for OP diagnosis and management. The online version contains supplementary material available at 10.1186/s12880-025-01960-2.

Wang Y, Chen HJ, Cheng Y, Xie Y, Cheng Y, Zhao S, Jiang Y, Bai T, Huo Y, Wang K, Zhang M, Huang W, Feng G, Han Y, Shu N

pubmed logopapersOct 22 2025
Alzheimer's disease (AD), the most prevalent neurodegenerative disorder, is marked by the accumulation of amyloid-β (Aβ) plaques. Although cerebral Aβ positron emission tomography (Aβ-PET) remains the gold standard for assessing cerebral Aβ burden, its clinical utility is hindered by cost, radiation exposure, and limited availability. Plasma biomarkers have emerged as promising, non‑invasive indicators of Aβ pathology, yet they do not incorporate individual genetic risk or neuroanatomical context. To address this gap, we developed a multimodal machine‑learning framework that integrates plasma biomarkers, MRI‑derived brain structural features (regional volumes, cortical thickness, cortical area and structural connectivity), and genetic risk profiles to predict cerebral Aβ burden. This approach was evaluated in 150 participants from the Alzheimer's Disease Neuroimaging Initiative (ADNI) and 101 participants from a domestic Chinese Sino Longitudinal Study of Cognitive Decline (SILCODE). Incorporating multimodal features substantially improved predictive performance: the baseline model using plasma and clinical variables alone achieved an R<sup>2</sup> of 0.56, whereas integrating neuroimaging and genetic information increased accuracy (R<sup>2</sup> = 0.63 with apolipoprotein E genotypes and R<sup>2</sup> = 0.64 with polygenic risk scores). Furthermore, a multiclass classifier trained on the same multimodal features achieved robust discrimination of cognitive status, with area‑under‑the‑curve values of 0.87 for normal controls, 0.76 for mild cognitive impairment, and 0.95 for AD dementia. These findings highlight the value of combining plasma, imaging, and genetic data to non-invasively estimate cerebral Aβ burden, offering a potential alternative to PET imaging for early AD risk assessment.

Nowroozi, A., Bondarenko, M., Serapio, A., Schnitzler, T., Brar, S. S., Sohn, J. H.

medrxiv logopreprintOct 21 2025
PurposeTo investigate the performance of LLMs in radiology numerical tasks and perform a comprehensive error analysis. Materials and MethodsWe defined six tasks: extracting 1-minimum T-score from DEXA report, 2-maximum common bile duct (CBD) diameter from ultrasound report, and 3-maximum lung nodule size from CT report, and judging 1-presence of a highly hypermetabolic region on a PET report, 2-whether a patient is osteoporotic based on a DEXA report, and 3-whether a patient has a dilated CBD based on an ultrasound report. Reports were extracted from the MIMIC III and our institutions databases, and the ground truths were extracted manually. The models used were Llama 3.1 8b, DeepSeek R1 distilled Llama 8b, OpenAI o1-mini, and OpenAI GPT-5-mini. We manually reviewed all incorrect outputs and performed a comprehensive error analysis. ResultsIn extraction tasks, while Llama showed relatively variable results (ranging 86%-98.7%) across tasks, other models performed consistently well (accuracies >95%). In judgement tasks, the lowest accuracies of Llama, DeepSeek, o1-mini, and GPT-5-mini were 62.0%, 91.7%, 91.7%, and 99.0%, respectively, while o1-mini and GPT-5-mini did reach 100% performance in detecting osteoporosis. We found no mathematical errors in the outputs of o1-mini and GPT-5-mini. Answer-only output format significantly reduced performance in Llama and DeepSeek but not in o1-mini or GPT-5-mini. ConclusionTrue reasoning models perform consistently well in radiology numerical tasks and show no mathematical errors. Simpler non-true reasoning models may also achieve acceptable performance depending on the task.

Hod EA, Habeck C, Zhuang H, Dimov A, Spincemaille P, Kessler D, Bitan ZC, Feit Y, Fliginger D, Stone EF, Roh D, Eisler L, Dashnaw S, Caccappolo E, McMahon DJ, Stern Y, Wang Y, Spitalnik SL, Brittenham GM

pubmed logopapersOct 21 2025
Blood donation increases the risk of iron deficiency, but its impact on brain iron, myelination, and neurocognition remains unclear. This ancillary study enrolled 67 iron-deficient blood donors, 19-73 years of age, participating in a double-blind, randomized trial. After donating blood, positive and negative susceptibility were measured using Quantitative Susceptibility Mapping (QSM) magnetic resonance imaging (MRI) to estimate brain iron and myelin levels, respectively. Furthermore, neurocognitive function was evaluated using the NIH Toolbox, and neural network activation patterns were assessed during neurocognitive tasks using functional MRI (fMRI). Donors were randomized to intravenous iron repletion (one-gram iron) or placebo, and outcome measures repeated approximately four months later. Iron repletion corrected systemic iron deficiency and led to trends toward increased whole brain iron (P=0.04) and myelination (P=0.02), with no change in the placebo group. Although overall cognitive performance did not differ significantly between groups, iron-treated participants showed improved engagement of functional neural networks (e.g., memory pattern activation during speed tasks, P<0.001). Brain region-specific changes in iron and myelin correlated with cognitive performance: iron in the putamen correlated with working memory scores (P<0.01), and thalamic myelination correlated with attention and inhibitory control (P<0.01). Iron repletion in iron-deficient blood donors may influence brain iron, myelination, and function, with region-specific changes in iron and myelination linked to distinct cognitive domains. gov NCT02990559. NIH grants HL133049, HL139489, and UL1TR001873.

Nampim N, Panyarak W, Suttapak W, Wantanajittikul K, Nirunsittirat A, Wattanarat O

pubmed logopapersOct 21 2025
To evaluate the performance of a deep learning-based convolutional neural network (CNN) algorithm and compare it with that of general dentists and dental students in classifying caries lesions in primary teeth using bitewing radiographs. A total of 1400 bitewing radiographs (4715 tooth images) were divided into training, validation, and testing datasets, with carious lesions classified into four and seven classes. After training, the best-performing ResNet model was selected and compared with three general dentists and three dental students via a reference test. The accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), macro F1 score, area under the curve (AUC), and confusion matrices were evaluated. ResNet-152 outperformed ResNet-50 and ResNet-101 in the validation process. In the 4 class classification, ResNet-152 and Dental student 1 achieved accuracies exceeding 0.7, while most examiners ranged between 0.62 and 0.67. Only Dental student 1 and ResNet-152 achieved specificity and NPV values of 0.9 or higher. ResNet-152 and most examiners exhibited lower sensitivity for initial and moderate lesions than for extensive lesions. In the 7 class classification, the accuracy ranged from 0.37 to 0.58, with the best-performing comparators-Dental student 1, ResNet-152, and General dentist 3 exceeding 0.5. The sensitivity, PPV, and macro F1 score followed similar trends. ResNet-152 achieved a favourable AUC of 0.85. ResNet-152 performed comparably to its leading human comparators, general dentists and dental students, demonstrating favourable performance in caries lesion classification. CNNs could serve as a second option in caries lesion classification, potentially leading to improved treatment decisions.

Schluessel S, Mueller B, Drey M, Tausendfreund O, Rippl M, Deissler L, Martini S, Schmidmaier R, Stoecklein S, Ingrisch M

pubmed logopapersOct 21 2025
In order to identify patients with sarcopenia, the use of routine imaging could provide valuable support. One of the most common radiological examinations, especially in geriatric inpatient care, is CT thoracic imaging. Therefore, it would be desirable to generate muscle volumes from these images using automated body composition analysis. The aim of this study is to determine the muscle volumes of geriatric patients and to investigate to what extent these correspond to the values of one of the current reference standards in diagnosing sarcopenia, the Dual-energy X-ray Absorptiometry (DXA) measurement. This retrospective study included 208 geriatric patients (mean age: 81 ± 7 years, 53.4% women) treated at the Acute Geriatric Ward at LMU University Hospital between 2015 and 2022. All participants underwent DXA measurement to assess appendicular skeletal muscle mass (ASM). Pretrained deep learning models were used to analyze body composition from routinely obtained thoracic CT images. Correlations between CT and DXA data were calculated using Pearson correlations, taking into account different normalization variants (height<sup>2</sup>, weight, bone volume and total volume). Multivariable linear regression analysis was performed to predict DXA-measured ASM. Women and men differed significantly in bone volume, muscle volume, and intramuscular fat. A reliable correlation was found between muscle volume from CT-thorax analysis and ASM from DXA, especially for absolute muscle volume (r = 0.669, p < 0.001) and muscle volume normalized to height<sup>2</sup> (r = 0.529, p < 0.001). In regression analysis, CT muscle volume alone explained 44.5% of the variance in ASM (R² = 0.445, p < 0.001). When body weight was added, the model's explanatory power increased significantly to 68.9% (R² = 0.689, p < 0.001). The fully adjusted model, which included height, age, and sex, further improved the explained variance only slightly (R² = 0.713, p < 0.001). Among all predictors, body weight showed the strongest effect, followed by CT muscle volume, while sex had no significant influence. The results show that the automated analysis of CT thoracic scans is a useful method for determining muscle volume and agrees well with the DXA analysis. Furthermore, the predictive value of CT muscle volume is significantly enhanced in combination with anthropometric parameters, particularly body weight. Further prospective studies are required to validate the findings and refine CT-based sarcopenia diagnostics.
Page 8 of 6036030 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.