Multidisciplinary artificial intelligence systems versus single-model approaches for the diagnosis and management of ileus and volvulus.
Authors
Affiliations (3)
Affiliations (3)
- Department of Internal Medicine, Etimesgut Sehit Sait Ertürk State Hospital, Eryaman, Ankara, Turkey. [email protected].
- Department of Internal Medicine, Hacettepe University, Ankara, Turkey.
- Department of Emergency Medicine, Etimesgut Sehit Sait Ertürk State Hospital, Ankara, Turkey.
Abstract
The accurate and timely diagnosis of ileus versus volvulus is essential in emergency care, as treatment choices directly influence patient outcomes. In this study, the diagnostic accuracy and adherence to guidelines of multidisciplinary AI systems were compared with those of single-model approaches and actual clinical decisions in managing these acute gastrointestinal conditions. We conducted a retrospective analysis of 234 adult patients diagnosed with ileus (n = 120, 51. 3%) or volvulus (n = 114, 48. 7%) from January 2018 to December 2024. We assessed three approaches: (1) the use of a multidisciplinary AI system including GPT-4 V (radiology), Med-PaLM (emergency medicine), BioGPT (ICU), and LLaMA 4 (surgical decisions); (2) the use of ChatGPT 5.5.0 as a single multimodal AI system; and (3) actual clinical team decisions as the reference standard. The key outcomes were diagnostic accuracy, decision alignment, and guideline adherence across management areas. The multidisciplinary AI system achieved 94.4% accuracy, outperforming ChatGPT 5.0 at 87.2% (p < 0.001). The multidisciplinary method also resulted in greater guideline adherence for sigmoid volvulus management (96.7% vs. 88.3%, p < 0.01), cecal volvulus surgical intervention (100% vs. 93.3%, p < 0.05), conservative ileus treatment (87.5% vs. 83.3%, p < 0.05), and ICU triage (91.1% vs. 85.2%, p < 0.01). Surgical decision alignment was better with the multidisciplinary system (92.8% vs. 85.5%, p < 0.001), and ICU admission predictions were more accurate (90.6% vs. 83.7%, p < 0.001). The AI system also made decisions faster: 2.5 ± 5.05 min vs. 3.2 ± 0.016 min for ChatGPT and 15.3 ± 3.2 min for clinical teams (p < 0.01 and p < 0.001, respectively). Multidisciplinary AI systems that combine specialized models for different clinical areas significantly outperform a single AI model in terms of diagnostic accuracy, guideline consistency, and decision alignment for ileus and volvulus. These results indicate that task-specific AI integration could improve clinical support, but further validation is needed before its routine use.