Workshop Module
Hour 1 · SLR · Step 6 of 1540%
1.4

Data extraction

Turn full-text PDFs into a structured table.

~10 min

Extraction is where reviews go to die if the schema isn't fixed up front. Define your fields, pilot on 3–5 papers, then extract the rest.

AI prompts (2)

Prompt

Extraction schema generator

When: Before you open the first PDF — to lock down the columns.

Design a data extraction schema for my systematic literature review.

Research question: <PASTE>
What I need to answer with the extracted data: <PASTE — e.g. compare interventions, map theories used, summarise methods>

Produce:
1. A column list with: field name | data type | allowed values or format | definition | example.
2. Group columns into: Bibliographic, Context, Methods, Findings, Quality, Notes.
3. Flag any field that will need a controlled vocabulary, and suggest 5–10 starter values for each.

Return as a single markdown table I can paste straight into Excel/Google Sheets.
Prompt

Per-paper extractor

When: Speed up extraction on a single full-text paper.

Extract data from this paper into my schema.

Schema (column | definition):
<PASTE SCHEMA>

Paper full text:
<PASTE>

Rules:
- If a field is not stated, write "not reported" — do not infer.
- Quote the source sentence in a "evidence" sub-row for each non-trivial field.
- For numerical fields, include the unit exactly as reported.
- Flag any contradictions inside the paper.

Return one row as a markdown table that matches the schema columns exactly.

Spot-check every extracted row against the PDF before trusting it. Hallucinated stats are the most common failure mode here.