Near Duplicates in Survey Data Series

general

rstats

survey research

duplication

fraud

Do you know how to detect exact or near duplicate rows in your data? Read on to learn more!

Authors

Jake Day

Jon Brauer

Maja Kotlaja

Published

October 10, 2023

You thought The Shining was scary? Have you looked under your data for near duplicates?

You have reached the landing page for our “Near Duplicates in Survey Data” series. Click below to read the posts in this series:

Trust Issues

Post 1: Near duplicates in survey data: Like “Multiplicity” but without the humor. (Image from Multiplicity)

Trust Issues: Examining Near Duplicates in Survey Data

Do you know how to detect exact or near duplicate rows in your data? Read on to learn more!

Stumbling in the Dark

Stumbling in the Dark: Building/Iterating an R Function to Match Stata’s percentmatch

If you are looking for more information about the modified R function we used to detect near duplicates, then you have come to the right place. (Code shared; detailed write-up forthcoming)

Trust Issues, Part 2

Post 3: Investigating near duplicates in different data: Will the sequel live up to the original? (Image from Matrix: Reloaded)

Trust Issues, Part 2: Investigating Near Duplicates in Different Data

This follow-up to our first entry on near duplication in survey data analyzes near duplicates in three more international survey data sets. (Forthcoming)

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{day2023,
  author = {Day, Jake and Brauer, Jon and Kotlaja, Maja},
  title = {Near {Duplicates} in {Survey} {Data} {Series}},
  date = {2023-10-10},
  url = {https://www.reluctantcriminologists.com/blog-posts/[8]/dup-index.html},
  langid = {en}
}

For attribution, please cite this work as:

Day, Jake, Jon Brauer, and Maja Kotlaja. 2023. “Near Duplicates in Survey Data Series.” October 10. https://www.reluctantcriminologists.com/blog-posts/[8]/dup-index.html.