大規模な言語モデル：小児白内障患者教育の新しいフロンティア

PMID：39174290DOI：10.1136/bjo-2024-325252

文献タイプ：

Journal Article

5大医学誌の要約と
著名医師による解説が無料で読めます

会員登録(医師のみ)してログイン
すると翻訳の精度が向上します

概要

Abstract

背景/目的：これは横断的な比較研究でした。3つの大きな言語モデル（LLMS）（ChatGPT-3.5、ChatGPT-4、およびGoogle Bard）の能力を評価して、新しい患者教育資料（PEM）を生成し、小児白内障上の既存のPEMの読みやすさを改善しました。方法：LLMSの応答を3つのプロンプトと比較しました。要求に促されたのは、「平均的なアメリカ人が簡単に理解できる」小児の白内障に関する配布資料を書いています。プロンプトB修正プロンプトAと、GobbledyGook（SMOG）読みやすさの式の単純な尺度を使用して、「6年生の読み取りレベル」で配布資料を書くように要求しました。迅速なcは、小児白内障の既存のPEMを、スモッグ読みやすさの式を使用して6年生の読書レベルに書き直します。回答は、品質（識別; 1（低品質）から5（高品質））、理解と実用性（患者教育材料評価ツール（≥70％：理解可能、70％以上：実行可能））で比較されました（リッカート誤った情報）; 1（誤った情報なし）5（高い誤った情報）および読みやすさ（Smog、Flesch-Kincaid Gradeレベル（FKGL）、グレードレベル<7：高度に読み取り可能）。結果：LLM生成されたすべての応答は、高品質（中央値識別力≥4）、理解可能性（≥70％）、および精度（likert = 1）でした。すべてのLLM生成応答は実行可能ではありませんでした（<70％）。chatgpt-3.5およびchatgpt-4プロンプトB応答は、プロンプトA応答よりも読みやすいものでした（p <0.001）。CHATGPT-4は、他の2つのLLMS（P <0.001）よりも読みやすい応答（それぞれ5.59±0.5および4.31±0.7）を生成し（それぞれ5.59±0.5および4.31±0.7）、指定された6年生の読書レベル（SMOGGOG：5.14±0.3）。結論：LLM、特にChatGPT-4は、高品質で読みやすく、正確なPEMを生成し、小児白内障上の既存の材料の読みやすさを改善することにおいて価値があることが証明されました。

BACKGROUND/AIMS: This was a cross-sectional comparative study. We evaluated the ability of three large language models (LLMs) (ChatGPT-3.5, ChatGPT-4, and Google Bard) to generate novel patient education materials (PEMs) and improve the readability of existing PEMs on paediatric cataract. METHODS: We compared LLMs' responses to three prompts. Prompt A requested they write a handout on paediatric cataract that was 'easily understandable by an average American.' Prompt B modified prompt A and requested the handout be written at a 'sixth-grade reading level, using the Simple Measure of Gobbledygook (SMOG) readability formula.' Prompt C rewrote existing PEMs on paediatric cataract 'to a sixth-grade reading level using the SMOG readability formula'. Responses were compared on their quality (DISCERN; 1 (low quality) to 5 (high quality)), understandability and actionability (Patient Education Materials Assessment Tool (≥70%: understandable, ≥70%: actionable)), accuracy (Likert misinformation; 1 (no misinformation) to 5 (high misinformation) and readability (SMOG, Flesch-Kincaid Grade Level (FKGL); grade level <7: highly readable). RESULTS: All LLM-generated responses were of high-quality (median DISCERN ≥4), understandability (≥70%), and accuracy (Likert=1). All LLM-generated responses were not actionable (<70%). ChatGPT-3.5 and ChatGPT-4 prompt B responses were more readable than prompt A responses (p<0.001). ChatGPT-4 generated more readable responses (lower SMOG and FKGL scores; 5.59±0.5 and 4.31±0.7, respectively) than the other two LLMs (p<0.001) and consistently rewrote them to or below the specified sixth-grade reading level (SMOG: 5.14±0.3). CONCLUSION: LLMs, particularly ChatGPT-4, proved valuable in generating high-quality, readable, accurate PEMs and in improving the readability of existing materials on paediatric cataract.

医師のための臨床サポートサービス

ヒポクラ x マイナビのご紹介

無料会員登録していただくと、さらに便利で効率的な検索が可能になります。

Translated by Google