Back to all papers

From image to report: automating lung cancer screening interpretation and reporting with vision-language models.

Authors

Chang TY,Gou Q,Zhao L,Zhou T,Chen H,Yang D,Ju H,Smith KE,Sun C,Pan J,Huang Y,He X,Zhang X,Xu D,Xu J,Bian J,Chen A

Affiliations (4)

  • Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, 1889 Museum Rd, Suite 7000, Gainesville, FL 32610, USA.
  • Department of Computer Science, Luddy School of Informatics, Indiana University, Bloomington, IN 47408, USA.
  • Nvidia, Santa Clara, CA 95051, USA.
  • Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, HITS 3000, BSAT, Indianapolis, IN 46202, USA; Regenstreif Institute, Indianapolis, Indiana, IN 46202, USA.

Abstract

Lung cancer is the most prevalent cancer and the leading cause of cancer-related death in the United States. Lung cancer screening with low-dose computed tomography (LDCT) helps identify lung cancer at an early stage and thus improves overall survival. The growing adoption of LDCT screening has increased radiologists' workload and demands specialized training to accurately interpret LDCT images and report findings. Advances in artificial intelligence (AI), including large language models (LLMs) and vision models, could help reduce this burden and improve accuracy. We devised LUMEN (Lung cancer screening with Unified Multimodal Evaluation and Navigation), a multimodal AI framework that mimics the radiologist's workflow by identifying nodules in LDCT images, generating their characteristics, and drafting corresponding radiology reports in accordance with reporting guidelines. LUMEN integrates computer vision, vision-language models (VLMs), and LLMs. To assess our system, we developed a benchmarking framework to evaluate the lung cancer screening reports generated based on the findings and management criteria outlined in the Lung Imaging Reporting and Data System (Lung-RADS). It extracts them from radiology reports and measures clinical accuracy-focusing on information that is clinically important for lung cancer screening-independently of report format. This complement exists LLM/VLM in semantic accuracy metrics and provides a more comprehensive view of system performance. Our lung cancer screening report generation system achieved unparalleled performance compared to contemporary VLM systems, including M3D, CT2Report and MedM3DVLM. Furthermore, compared to standard LLM metrics, the clinical metrics we designed for lung cancer screening more accurately reflect the clinical utility of the generated reports. LUMEN demonstrates the feasibility of generating clinically accurate lung nodule reports from LDCT images through a nodule-centric VQA approach, highlighting the potential of integrating VLMs and LLMs to support radiologists in lung cancer screening workflows. Our findings also underscore the importance of applying clinically meaningful evaluation metrics in developing medical AI systems.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.