Background and study aims: Recent studies showed that large language models (LLMs) could enhance understanding of colorectal cancer (CRC) screening, potentially increasing participation rates. However, a limitation of these studies is that questions posed to LLMs are generated by experts. This study aimed to investigate ChatGPT-4o effectiveness in answering CRC screening queries directly generated by patients. Patients and methods: Ten consecutive subjects aged 50 to 69 years who were eligible for the Italian national CRC screening program but not actively involved were enrolled. Four possible scenarios for CRC screening were presented to each participant and they were asked to formulate one question per scenario to gather additional information. These questions were then posed to ChatGPT in two separate sessions. The responses were evaluated by five senior experts, who rated each answer based on three criteria: accuracy, completeness, and comprehensibility, using a 5-point Likert scale. In addition, the same 10 patients who created the questions assessed the answers, rating each response as complete, understandable, and trustworthy on a dichotomous scale (yes/no). Results: Experts rated the responses with mean scores of 4.1 ± 1.0 for accuracy, 4.2 ± 1.0 for completeness, and 4.3 ± 1.0 for comprehensibility. Patients rated responses as complete in 97.5%, understandable in 95%, and trustworthy in 100% of cases. Consistency over time was confirmed by an 86.8% similarity between session responses. Conclusions: Despite variability in questions and answers, ChatGPT confirmed good performances in answering CRC screening queries, even when used directly by patients.
Exploring ChatGPT effectiveness in addressing direct patient queries on colorectal cancer screening
Maida, Marcello;Vitello, Alessandro;
2025-01-01
Abstract
Background and study aims: Recent studies showed that large language models (LLMs) could enhance understanding of colorectal cancer (CRC) screening, potentially increasing participation rates. However, a limitation of these studies is that questions posed to LLMs are generated by experts. This study aimed to investigate ChatGPT-4o effectiveness in answering CRC screening queries directly generated by patients. Patients and methods: Ten consecutive subjects aged 50 to 69 years who were eligible for the Italian national CRC screening program but not actively involved were enrolled. Four possible scenarios for CRC screening were presented to each participant and they were asked to formulate one question per scenario to gather additional information. These questions were then posed to ChatGPT in two separate sessions. The responses were evaluated by five senior experts, who rated each answer based on three criteria: accuracy, completeness, and comprehensibility, using a 5-point Likert scale. In addition, the same 10 patients who created the questions assessed the answers, rating each response as complete, understandable, and trustworthy on a dichotomous scale (yes/no). Results: Experts rated the responses with mean scores of 4.1 ± 1.0 for accuracy, 4.2 ± 1.0 for completeness, and 4.3 ± 1.0 for comprehensibility. Patients rated responses as complete in 97.5%, understandable in 95%, and trustworthy in 100% of cases. Consistency over time was confirmed by an 86.8% similarity between session responses. Conclusions: Despite variability in questions and answers, ChatGPT confirmed good performances in answering CRC screening queries, even when used directly by patients.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.