Workshop Module
Hour 2 · Bibliometrics · Step 14 of 1593%
2.5

What can go wrong

~5 min

Author disambiguation

Same name, different people. Different spellings, same person. Left alone, it skews every count. Check author profiles and verify before you trust the numbers.

Messy data

Clean the export before you map it. Deduplicate, standardise (e.g. 'covid-19' vs 'COVID 19' vs 'SARS-CoV-2'), filter, then visualise. Rubbish in, confident rubbish out.

Over-trusting AI labels

A model will name a cluster with total confidence and be wrong. Use AI as a starting point, not the final word. Read the papers, then decide.

AI prompts (1)

Prompt

Author disambiguation checker

When: Your top-authors table looks suspicious.

I'll give you a list of author name variants from a bibliometric export. Identify likely duplicates (same person, different spellings) and likely collisions (same name, different people).

Author list (name; affiliation if available; total papers; years active):
<PASTE>

Return:
1. Likely-same-person groups, each with a recommended canonical name and the reason (initial style, accent, hyphenation, affiliation overlap).
2. Likely-different-people warnings (same name but mismatched affiliation/era).
3. Cases you can't tell from the data — list what extra field would resolve each.

Be cautious. When in doubt, mark UNCERTAIN rather than merge.