Automatyczne ocenianie wyników po operacji epilepsji za pomocą sztucznej inteligencji: walidacja systemu do analizy notek klinicznych
PubMedEpilepsia
Bridging the outcome documentation gap in epilepsy surgery: Validating large language model agents for automated Engel and International League Against Epilepsy scoring from clinical notes
W skrócie
Naukowcy przetestowali zaawansowany system sztucznej inteligencji, który automatycznie czyta notatki lekarskie i określa, czy pacjent po operacji epilepsji ma mniej napadów. System nauczono rozumieć kontekst medyczny i osiągnął wysoką zgodność z oceną lekarzy (ponad 93%), podczas gdy prosta instrukcja dla sztucznej inteligencji była znacznie mniej dokładna. To odkrycie pokazuje, że inteligentne systemy mogą wspomóc lekarzy w dokumentowaniu wyników po operacjach epilepsji.
Oryginalny abstract (angielski)
OBJECTIVE: Timely and accurate classification of postepilepsy surgery outcomes using Engel and International League Against Epilepsy (ILAE) scales is essential for clinical follow-up, yet electronic health record documentation often lacks the structured detail needed for reliable scoring. This study aimed to validate large language model (LLM) agents for autonomous extraction of standardized postsurgical outcomes from unstructured follow-up notes. METHODS: We performed a retrospective validation study of deidentified postoperative epilepsy follow-up notes from patients who underwent epilepsy-related surgery or neuromodulation between 2000 and 2025 (n = 170). Each note was processed once with two fixed GPT-4-turbo prompt configurations: a concise definition-based prompt and a context-aware prompt incorporating temporal, causal, and adherence logic. Human-adjudicated consensus served as the reference standard. Prespecified metrics included exact score agreement, clinically adjacent agreement, ordinal distance, Wilson 95% confidence intervals (CIs), and paired tests comparing prompt configurations. RESULTS: Valid follow-up intervals were available for 170 cases; the median time from surgery to analyzed note was 32.7 months (interquartile range = 9.6-97.9). Human reviewers achieved 91.2% raw agreement for Engel major class (Cohen kappa = .86, 95% bootstrap CI = .79-.92) and 83.5% raw agreement for ILAE category (quadratic weighted kappa = .93, 95% CI = .89-.96). The definition-based prompt achieved 56.5% exact Engel subclass agreement (95% CI = 49.0-63.7) and 60.6% exact ILAE agreement (95% CI = 53.1-67.6). The context-aware prompt improved exact agreement to 94.7% for Engel (95% CI = 90.2-97.2) and 93.5% for ILAE (95% CI = 88.8-96.3), with lower ordinal distance for both scales (paired sign tests p < .001). SIGNIFICANCE: The meaningful finding is not that a general LLM can recite outcome definitions, but that a context-aware LLM agent can apply seizure-outcome logic to heterogeneous real-world notes with high agreement against adjudicated human consensus. Definition-only prompting remained unreliable in nuanced categories, supporting the need for explicit clinical reasoning structure, auditability, and privacy-preserving deployment.
Metadane publikacji
Journal
Epilepsia
Data publikacji
12.06.2026
PMID
42284021
DOI
10.1002/epi.70341
Autorzy
Adelson PD, Delaney H, D'Haese PF
Słowa kluczowe
Engel classification, ILAE outcome, artificial intelligence, clinical documentation, electronic health records, epilepsy surgery, large language models, outcome extraction