Bosniak classification of renal cysts using large language models: a comparative study.

August 24, 2025

papers

DOI: 10.1007/s00117-025-01499-x PMID: 40851045

Authors

Hacibey I,Kaba E

Affiliations (2)

Department of Urology, Basaksehir Çam and Sakura City Hospital, Istanbul, Turkey.
Department of Radiology, Recep Tayyip Erdogan University, Rize, Turkey.

Abstract

The Bosniak classification system is widely used to assess malignancy risk in renal cystic lesions, yet inter-observer variability poses significant challenges. Large language models (LLMs) may offer a standardized approach to classification when provided with textual descriptions, such as those found in radiology reports. This study evaluated the performance of five LLMs-GPT‑4 (ChatGPT), Gemini, Copilot, Perplexity, and NotebookLM-in classifying renal cysts based on synthetic textual descriptions mimicking CT report content. A synthetic dataset of 100 diagnostic scenarios (20 cases per Bosniak category) was constructed using established radiological criteria. Each LLM was evaluated using zero-shot and few-shot prompting strategies, while NotebookLM employed retrieval-augmented generation (RAG). Performance metrics included accuracy, sensitivity, and specificity. Statistical significance was assessed using McNemar's and chi-squared tests. GPT‑4 achieved the highest accuracy (87% zero-shot, 99% few-shot), followed by Copilot (81-86%), Gemini (55-69%), and Perplexity (43-69%). NotebookLM, tested only under RAG conditions, reached 87% accuracy. Few-shot learning significantly improved performance (p < 0.05). Classification of Bosniak IIF lesions remained challenging across models. When provided with well-structured textual descriptions, LLMs can accurately classify renal cysts. Few-shot prompting significantly enhances performance. However, persistent difficulties in classifying borderline lesions such as Bosniak IIF highlight the need for further refinement and real-world validation.

View Source Full Text PDF

Topics

Journal Article

Bosniak classification of renal cysts using large language models: a comparative study.

Authors

Affiliations (2)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?