In three earlier papers in this series on AI in systematic literature reviews (SLRs), we reported strong AI performance in title/abstract screening, full-text screening, and methodology extraction. In this fourth paper, we evaluate model performance for full data extraction – one of the most resource-intensive SLR stages. Across 10 SLRs and ~50,000 data points, the model achieved very high accuracy (98-100%) and strong completeness (85- 98%). AI performance was consistent across clinical and real-world studies, though some complex data (e.g. subgroup efficacy or risk factors) were occasionally missed. These results support the use of AI for first-draft extraction, combined with full human quality control.
Link to White Paper