Sort by:
Page 1 of 15 results

Diagnostic Performance of Universal versus Stratified Computer-Aided Detection Thresholds for Chest X-Ray-Based Tuberculosis Screening

Sung, J., Kitonsa, P. J., Nalutaaya, A., Isooba, D., Birabwa, S., Ndyabayunga, K., Okura, R., Magezi, J., Nantale, D., Mugabi, I., Nakiiza, V., Dowdy, D. W., Katamba, A., Kendall, E. A.

medrxiv logopreprintJun 24 2025
BackgroundComputer-aided detection (CAD) software analyzes chest X-rays for features suggestive of tuberculosis (TB) and provides a numeric abnormality score. However, estimates of CAD accuracy for TB screening are hindered by the lack of confirmatory data among people with lower CAD scores, including those without symptoms. Additionally, the appropriate CAD score thresholds for obtaining further testing may vary according to population and client characteristics. MethodsWe screened for TB in Ugandan individuals aged [&ge;]15 years using portable chest X-rays with CAD (qXR v3). Participants were offered screening regardless of their symptoms. Those with X-ray scores above a threshold of 0.1 (range, 0 - 1) were asked to provide sputum for Xpert Ultra testing. We estimated the diagnostic accuracy of CAD for detecting Xpert-positive TB when using the same threshold for all individuals (under different assumptions about TB prevalence among people with X-ray scores <0.1), and compared this estimate to age- and/or sex-stratified approaches. FindingsOf 52,835 participants screened for TB using CAD, 8,949 (16.9%) had X-ray scores [&ge;]0.1. Of 7,219 participants with valid Xpert Ultra results, 382 (5.3%) were Xpert-positive, including 81 with trace results. Assuming 0.1% of participants with X-ray scores <0.1 would have been Xpert-positive if tested, qXR had an estimated AUC of 0.920 (95% confidence interval 0.898-0.941) for Xpert-positive TB. Stratifying CAD thresholds according to age and sex improved accuracy; for example, at 96.1% specificity, estimated sensitivity was 75.0% for a universal threshold (of [&ge;]0.65) versus 76.9% for thresholds stratified by age and sex (p=0.046). InterpretationThe accuracy of CAD for TB screening among all screening participants, including those without symptoms or abnormal chest X-rays, is higher than previously estimated. Stratifying CAD thresholds based on client characteristics such as age and sex could further improve accuracy, enabling a more effective and personalized approach to TB screening. FundingNational Institutes of Health Research in contextO_ST_ABSEvidence before this studyC_ST_ABSThe World Health Organization (WHO) has endorsed computer-aided detection (CAD) as a screening tool for tuberculosis (TB), but the appropriate CAD score that triggers further diagnostic evaluation for tuberculosis varies by population. The WHO recommends determining the appropriate CAD threshold for specific settings and population and considering unique thresholds for specific populations, including older age groups, among whom CAD may perform poorly. We performed a PubMed literature search for articles published until September 9, 2024, using the search terms "tuberculosis" AND ("computer-aided detection" OR "computer aided detection" OR "CAD" OR "computer-aided reading" OR "computer aided reading" OR "artificial intelligence"), which resulted in 704 articles. Among them, we identified studies that evaluated the performance of CAD for tuberculosis screening and additionally reviewed relevant references. Most prior studies reported area under the curves (AUC) ranging from 0.76 to 0.88 but limited their evaluations to individuals with symptoms or abnormal chest X-rays. Some prior studies identified subgroups (including older individuals and people with prior TB) among whom CAD had lower-than-average AUCs, and authors discussed how the prevalence of such characteristics could affect the optimal value of a population-wide CAD threshold; however, none estimated the accuracy that could be gained with adjusting CAD thresholds between individuals based on personal characteristics. Added value of this studyIn this study, all consenting individuals in a high-prevalence setting were offered chest X-ray screening, regardless of symptoms, if they were [&ge;]15 years old, not pregnant, and not on TB treatment. A very low CAD score cutoff (qXR v3 score of 0.1 on a 0-1 scale) was used to select individuals for confirmatory sputum molecular testing, enabling the detection of radiographically mild forms of TB and facilitating comparisons of diagnostic accuracy at different CAD thresholds. With this more expansive, symptom-neutral evaluation of CAD, we estimated an AUC of 0.920, and we found that the qXR v3 threshold needed to decrease to under 0.1 to meet the WHO target product profile goal of [&ge;]90% sensitivity and [&ge;]70% specificity. Compared to using the same thresholds for all participants, adjusting CAD thresholds by age and sex strata resulted in a 1 to 2% increase in sensitivity without affecting specificity. Implications of all the available evidenceTo obtain high sensitivity with CAD screening in high-prevalence settings, low score thresholds may be needed. However, countries with a high burden of TB often do not have sufficient resources to test all individuals above a low threshold. In such settings, adjusting CAD thresholds based on individual characteristics associated with TB prevalence (e.g., male sex) and those associated with false-positive X-ray results (e.g., old age) can potentially improve the efficiency of TB screening programs.

CEREBLEED: Automated quantification and severity scoring of intracranial hemorrhage on non-contrast CT

Cepeda, S., Esteban-Sinovas, O., Arrese, I., Sarabia, R.

medrxiv logopreprintJun 13 2025
BackgroundIntracranial hemorrhage (ICH), whether spontaneous or traumatic, is a neurological emergency with high morbidity and mortality. Accurate assessment of severity is essential for neurosurgical decision-making. This study aimed to develop and evaluate a fully automated, deep learning-based tool for the standardized assessment of ICH severity, based on the segmentation of the hemorrhage and intracranial structures, and the computation of an objective severity index. MethodsNon-contrast cranial CT scans from patients with spontaneous or traumatic ICH were retrospectively collected from public datasets and a tertiary care center. Deep learning models were trained to segment hemorrhages and intracranial structures. These segmentations were used to compute a severity index reflecting bleeding burden and mass effect through volumetric relationships. Segmentation performance was evaluated on a hold-out test cohort. In a prospective cohort, the severity index was assessed in relation to expert-rated CT severity, clinical outcomes, and the need for urgent neurosurgical intervention. ResultsA total of 1,110 non-contrast cranial CT scans were analyzed, 900 from the retrospective cohort and 200 from the prospective evaluation cohort. The binary segmentation model achieved a median Dice score of 0.90 for total hemorrhage. The multilabel model yielded Dice scores ranging from 0.55 to 0.94 across hemorrhage subtypes. The severity index significantly correlated with expert-rated CT severity (p < 0.001), the modified Rankin Scale (p = 0.007), and the Glasgow Outcome Scale-Extended (p = 0.039), and independently predicted the need for urgent surgery (p < 0.001). A threshold [~]300 was identified as a decision point for surgical management (AUC = 0.83). ConclusionWe developed a fully automated and openly accessible pipeline for the analysis of non-contrast cranial CT in intracranial hemorrhage. It computes a novel index that objectively quantifies hemorrhage severity and is significantly associated with clinically relevant outcomes, including the need for urgent neurosurgical intervention.

Slide-free surface histology enables rapid colonic polyp interpretation across specialties and foundation AI

Yong, A., Husna, N., Tan, K. H., Manek, G., Sim, R., Loi, R., Lee, O., Tang, S., Soon, G., Chan, D., Liang, K.

medrxiv logopreprintJun 11 2025
Colonoscopy is a mainstay of colorectal cancer screening and has helped to lower cancer incidence and mortality. The resection of polyps during colonoscopy is critical for tissue diagnosis and prevention of colorectal cancer, albeit resulting in increased resource requirements and expense. Discarding resected benign polyps without sending for histopathological processing and confirmatory diagnosis, known as the resect and discard strategy, could enhance efficiency but is not commonly practiced due to endoscopists predominant preference for pathological confirmation. The inaccessibility of histopathology from unprocessed resected tissue hampers endoscopic decisions. We show that intraprocedural fibre-optic microscopy with ultraviolet-C surface excitation (FUSE) of polyps post-resection enables rapid diagnosis, potentially complementing endoscopic interpretation and incorporating pathologist oversight. In a clinical study of 28 patients, slide-free FUSE microscopy of freshly resected polyps yielded mucosal views that greatly magnified the surface patterns observed on endoscopy and revealed previously unavailable histopathological signatures. We term this new cross-specialty readout surface histology. In blinded interpretations of 42 polyps (19 training, 23 reading) by endoscopists and pathologists of varying experience, surface histology differentiated normal/benign, low-grade dysplasia, and high-grade dysplasia and cancer, with 100% performance in classifying high/low risk. This FUSE dataset was also successfully interpreted by foundation AI models pretrained on histopathology slides, illustrating a new potential for these models to not only expedite conventional pathology tasks but also autonomously provide instant expert feedback during procedures that typically lack pathologists. Surface histology readouts during colonoscopy promise to empower endoscopist decisions and broadly enhance confidence and participation in resect and discard. One Sentence SummaryRapid microscopy of resected polyps during colonoscopy yielded accurate diagnoses, promising to enhance colorectal screening.

Novel Deep Learning Framework for Simultaneous Assessment of Left Ventricular Mass and Longitudinal Strain: Clinical Feasibility and Validation in Patients with Hypertrophic Cardiomyopathy

Park, J., Yoon, Y. E., Jang, Y., Jung, T., Jeon, J., Lee, S.-A., Choi, H.-M., Hwang, I.-C., Chun, E. J., Cho, G.-Y., Chang, H.-J.

medrxiv logopreprintMay 23 2025
BackgroundThis study aims to present the Segmentation-based Myocardial Advanced Refinement Tracking (SMART) system, a novel artificial intelligence (AI)-based framework for transthoracic echocardiography (TTE) that incorporates motion tracking and left ventricular (LV) myocardial segmentation for automated LV mass (LVM) and global longitudinal strain (LVGLS) assessment. MethodsThe SMART system demonstrates LV speckle tracking based on motion vector estimation, refined by structural information using endocardial and epicardial segmentation throughout the cardiac cycle. This approach enables automated measurement of LVMSMART and LVGLSSMART. The feasibility of SMART is validated in 111 hypertrophic cardiomyopathy (HCM) patients (median age: 58 years, 69% male) who underwent TTE and cardiac magnetic resonance imaging (CMR). ResultsLVGLSSMART showed a strong correlation with conventional manual LVGLS measurements (Pearsons correlation coefficient [PCC] 0.851; mean difference 0 [-2-0]). When compared to CMR as the reference standard for LVM, the conventional dimension-based TTE method overestimated LVM (PCC 0.652; mean difference: 106 [90-123]), whereas LVMSMART demonstrated excellent agreement with CMR (PCC 0.843; mean difference: 1 [-11-13]). For predicting extensive myocardial fibrosis, LVGLSSMART and LVMSMART exhibited performance comparable to conventional LVGLS and CMR (AUC: 0.72 and 0.66, respectively). Patients identified as high-risk for extensive fibrosis by LVGLSSMART and LVMSMART had significantly higher rates of adverse outcomes, including heart failure hospitalization, new-onset atrial fibrillation, and defibrillator implantation. ConclusionsThe SMART technique provides a comparable LVGLS evaluation and a more accurate LVM assessment than conventional TTE, with predictive values for myocardial fibrosis and adverse outcomes. These findings support its utility in HCM management.

The effect of medical explanations from large language models on diagnostic decisions in radiology

Spitzer, P., Hendriks, D., Rudolph, J., Schläger, S., Ricke, J., Kühl, N., Hoppe, B., Feuerriegel, S.

medrxiv logopreprintMay 18 2025
Large language models (LLMs) are increasingly used by physicians for diagnostic support. A key advantage of LLMs is the ability to generate explanations that can help physicians understand the reasoning behind a diagnosis. However, the best-suited format for LLM-generated explanations remains unclear. In this large-scale study, we examined the effect of different formats for LLM explanations on clinical decision-making. For this, we conducted a randomized experiment with radiologists reviewing patient cases with radiological images (N = 2020 assessments). Participants received either no LLM support (control group) or were supported by one of three LLM-generated explanations: (1) a standard output providing the diagnosis without explanation; (2) a differential diagnosis comparing multiple possible diagnoses; or (3) a chain-of-thought explanation offering a detailed reasoning process for the diagnosis. We find that the format of explanations significantly influences diagnostic accuracy. The chain-of-thought explanations yielded the best performance, improving the diagnostic accuracy by 12.2% compared to the control condition without LLM support (P = 0.001). The chain-of-thought explanations are also superior to the standard output without explanation (+7.2%; P = 0.040) and the differential diagnosis format (+9.7%; P = 0.004). We further assessed the robustness of these findings across case difficulty and different physician backgrounds such as general vs. specialized radiologists. Evidently, explaining the reasoning for a diagnosis helps physicians to identify and correct potential errors in LLM predictions and thus improve overall decisions. Altogether, the results highlight the importance of how explanations in medical LLMs are generated to maximize their utility in clinical practice. By designing explanations to support the reasoning processes of physicians, LLMs can improve diagnostic performance and, ultimately, patient outcomes.
Page 1 of 15 results
Show
per page
1

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.