Back to all papers

Multimodal Large Language Model for Zero-Shot L3 Body Composition Segmentation on CT: Improved Accuracy via Automated Candidate Selection.

June 30, 2026pubmed logopapers

Authors

Sugawara H,Takada A,Kato S

Affiliations (5)

  • Department of Medical Imaging, The Ottawa Hospital, University of Ottawa, 501 Smyth Road, Ottawa, ON, K1H 8L6, Canada. [email protected].
  • Department of Diagnostic Radiology, McGill University, Montreal, QC, Canada.
  • Augmented Intelligence and Precision Health Laboratory (AIPHL), Research Institute of the McGill University Health Centre, Montreal, Canada.
  • Diagnostic Radiology and Radiation Oncology, Chiba University Graduate School of Medicine, Chiba, Japan.
  • Department of Radiology, Institute of Medical Science, The University of Tokyo, Tokyo, Japan.

Abstract

The purpose of the study is to evaluate zero-shot L3 body composition segmentation on computed tomography (CT) using a general-purpose multimodal large language model (MLLM) and to assess whether automated candidate selection improves segmentation accuracy. This retrospective study used the publicly available TCIA Colorectal-Liver-Metastases CT dataset. One mid-L3 axial image was selected per case. Radiologist A segmented skeletal muscle (SM), subcutaneous adipose tissue (SAT), and visceral adipose tissue (VAT), and Radiologist B independently segmented all cases for interobserver reproducibility. For each of 192 cases, gemini-3-pro-image-preview generated 10 candidate masks, and gemini-3-pro-preview served as the evaluator model and selected the most anatomically plausible candidate. The Dice similarity coefficient (DSC) was used to compare model masks with Radiologist A reference masks. Automated candidate selection achieved mean DSCs of 0.900 ± 0.102 for SM, 0.902 ± 0.096 for SAT, and 0.714 ± 0.245 for VAT. Compared with the best cohort-level single run, automated candidate selection improved DSC for SM (0.879 ± 0.128; adjusted P = .018) and SAT (0.860 ± 0.166; adjusted P < .001), but not for VAT (0.715 ± 0.237; adjusted P = .934). Compared with the mean of 10 runs, automated candidate selection improved DSC for all compartments. Interobserver DSCs were 0.972 ± 0.094 for SM, 0.976 ± 0.095 for SAT, and 0.933 ± 0.099 for VAT. Zero-shot L3 body composition segmentation with a general-purpose MLLM appeared feasible, and automated candidate selection improved segmentation accuracy for SM and SAT, although performance remained below interobserver DSCs between radiologists, particularly for VAT.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.