Trust Issues, Part 2: Investigating Near Duplicates in Different Data
general
rstats
survey research
duplication
fraud
This follow-up to our first entry on near duplication in survey data analyzes near duplicates in three more international survey data sets.
Given you took the time to slog through Part 1 of our dive into detecting near duplicates in our BiH survey data, you might be as curious as we are to see what will happen when we analyze our other international survey data sets from Bangladesh and Serbia. Our R workflow for Part 1 should be reproducible enough that conducting the analysis will not take very long to complete. As for writing the follow-up post? Well, we make no promises there…
Stay tuned to see what near duplicate mysteries await in our other data sets - and perhaps for a little humor at our expense as we discuss our near-incompetent attempts to solve them.