Better understanding: can a large language model safely improve readability of patient information leaflets in interventional radiology?
Authors
Affiliations (4)
Affiliations (4)
- NHS Tayside, Dundee, UK [email protected].
- NHS Tayside, Dundee, UK.
- North Bristol NHS Trust, Bristol, UK.
- Cardiff and Vale University Health Board, Cardiff, UK.
Abstract
This study aimed to evaluate the feasibility of using a large language model (LLM) to generate patient information leaflets (PILs) with improved readability based on PILs in the field of interventional radiology. PILs were acquired from the Cardiovascular and Interventional Radiology Society of Europe website, reformatted, and uploaded to the GPT-4 user interface with a prompt aimed to simplify the language. Automated readability metrics were used to evaluate the readability of original and LLM-modified PILs. Factual accuracy was assessed by human evaluation from three consultant interventional radiologists using an agreed marking scheme. LLM-modified PILs had significantly lower mean reading grade (9.5±0.5) compared with original PILs (11.1±0.1) (p<0.01). However, the recommended reading grade of 6 (expected to be understood by 11- to 12-year-old children) was not achieved. Human evaluation revealed that most LLM-modified PILs had minor concerns regarding factual accuracy, but no errors that could result in serious patient harm were detected. LLMs appear to be a powerful tool in improving the readability of PILs within the field of interventional radiology. However, clinical experts are still required in PIL development to ensure the factual accuracy of these augmented documents is not compromised. LLMs should be considered as a useful tool to assist with the development and revision of PILs in the field of interventional radiology.