Introduction: This study aims to evaluate the effectiveness of ChatGPT (Chat Generative Pretrained Transformer) in answering patients' questions about colorectal cancer (CRC) screening, with the ultimate goal of enhancing patients' awareness and adherence to national screening programs. Methods: 15 questions on CRC screening were posed to ChatGPT4. The answers were rated by 20 gastroenterology experts and 20 non-experts in three domains (accuracy, completeness, and comprehensibility), and by 100 patients in three dichotomic domains (completeness, comprehensibility and trustability). Results: According to expert rating, the mean accuracy score was 4.8±1.1 on a scale ranging from 1 to 6. Mean completeness score was 2.1±0.7 and mean comprehensibility score was 2.8±0.4 on a scale ranging from 1 to 3. Overall, accuracy (4.8±1.1 vs 5.6±0.7, P<0.001) and completeness (2.1±0.7 vs 2.7±0.4, P<0.001) scores were significantly lower for expert compared to non-expert, while comprehensibility was comparable among the two groups (2.7±0.4 vs 2.8±0.3, P=0.546). Patients rated all questions as complete, comprehensible and trustable in 97 to 100% of cases. Conclusions: ChatGPT shows good performance with the potential to enhance awareness about CRC and improve screening outcomes. Generative language systems may be further improved after proper training in accordance with scientific evidence and current guidelines.

The role of generative language systems in increasing patient awareness of colon cancer screening

Maida M
;
2024-01-01

Abstract

Introduction: This study aims to evaluate the effectiveness of ChatGPT (Chat Generative Pretrained Transformer) in answering patients' questions about colorectal cancer (CRC) screening, with the ultimate goal of enhancing patients' awareness and adherence to national screening programs. Methods: 15 questions on CRC screening were posed to ChatGPT4. The answers were rated by 20 gastroenterology experts and 20 non-experts in three domains (accuracy, completeness, and comprehensibility), and by 100 patients in three dichotomic domains (completeness, comprehensibility and trustability). Results: According to expert rating, the mean accuracy score was 4.8±1.1 on a scale ranging from 1 to 6. Mean completeness score was 2.1±0.7 and mean comprehensibility score was 2.8±0.4 on a scale ranging from 1 to 3. Overall, accuracy (4.8±1.1 vs 5.6±0.7, P<0.001) and completeness (2.1±0.7 vs 2.7±0.4, P<0.001) scores were significantly lower for expert compared to non-expert, while comprehensibility was comparable among the two groups (2.7±0.4 vs 2.8±0.3, P=0.546). Patients rated all questions as complete, comprehensible and trustable in 97 to 100% of cases. Conclusions: ChatGPT shows good performance with the potential to enhance awareness about CRC and improve screening outcomes. Generative language systems may be further improved after proper training in accordance with scientific evidence and current guidelines.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11387/176909
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact