Large Language Models Rival Physicians in Complex Lung Cancer Decisions

A real-world study reveals that large language models (LLMs) can match or exceed human physicians' performance in challenging lung cancer case decision-making, especially for rare cases.
Key Details
- 150 challenging lung cancer cases (complex, rare, refractory) were evaluated using blinded, multidimensional scoring by experts.
- 2LLMs reviewed: DeepSeek R1, Claude 3.5, Gemini 1.5, and GPT-4o; physician decisions stratified by experience; some juniors received AI assistance.
- 3DeepSeek R1 performed between intermediate and senior physicians overall; LLMs outperformed intermediates in rare cases but lagged in refractory (longitudinal) cases.
- 4AI-augmented junior physicians saw 80-90% boosts in comprehensiveness and specificity for rare cases, but specificity slightly dropped for refractory cases.
- 5Error profiling showed LLMs are strong in knowledge breadth/updates, while physicians excel in longitudinal reasoning and stability.
Why It Matters

Source
EurekAlert
Related News

AI Predicts Risks for Outpatient Stem Cell Therapy in Myeloma
Researchers use machine learning to predict adverse events during stem cell therapy for multiple myeloma, improving outpatient safety.

AI-Enhanced CT Heart Fat Measurement Boosts Cardiovascular Risk Prediction
AI-derived measurement of heart fat from CT scans significantly improves long-term cardiovascular disease risk prediction.

Molecular Test BiliSeq Greatly Improves Bile Duct Cancer Detection
The BiliSeq molecular test developed at UPMC doubled detection sensitivity for bile duct cancer compared to standard pathology.