Trust Issues, Part 2: Investigating Near Duplicates in Different Data

general
rstats
survey research
duplication
fraud
This follow-up to our first entry on near duplication in survey data analyzes near duplicates in three more international survey data sets.
Authors

Jake Day

Jon Brauer

Maja Kotlaja

Published

October 9, 2023

Investigating near duplicates in different data: Will the sequel live up to the original? (Image from Matrix: Reloaded)

Given you took the time to slog through Part 1 of our dive into detecting near duplicates in our BiH survey data, you might be as curious as we are to see what will happen when we analyze our other international survey data sets from Bangladesh and Serbia. Our R workflow for Part 1 should be reproducible enough that conducting the analysis will not take very long to complete. As for writing the follow-up post? Well, we make no promises there…

Stay tuned to see what near duplicate mysteries await in our other data sets - and perhaps for a little humor at our expense as we discuss our near-incompetent attempts to solve them.