Near Duplicates in Survey Data Series

survey research
Do you know how to detect exact or near duplicate rows in your data? Read on to learn more!

Jake Day

Jon Brauer

Maja Kotlaja


February 12, 2024

You thought The Shining was scary? Have you looked under your data for near duplicates?

You have reached the landing page for our “Near Duplicates in Survey Data” series. Click below to read the posts in this series:

Trust Issues

Post 1: Near duplicates in survey data: Like “Multiplicity” but without the humor. (Image from Multiplicity)

Trust Issues: Examining Near Duplicates in Survey Data

Do you know how to detect exact or near duplicate rows in your data? Read on to learn more!

Stumbling in the Dark

Post 2: For this non-programmer, iterating on a percentmatch function in R was not entirely unlike stumbling in the dark. (Image created using DALL-E)

Stumbling in the Dark: Building/Iterating an R Function to Match Stata’s percentmatch

If you are looking for more information about the modified R function we used to detect near duplicates, then you have come to the right place. (Code shared; detailed write-up forthcoming)

Trust Issues, Part 2

Post 3: Investigating near duplicates in different data: Will the sequel live up to the original? (Image from Matrix: Reloaded)

Trust Issues, Part 2: Investigating Near Duplicates in Different Data

This follow-up to our first entry on near duplication in survey data analyzes near duplicates in three more international survey data sets. (Forthcoming)



BibTeX citation:
  author = {Day, Jake and Brauer, Jon and Kotlaja, Maja},
  title = {Near {Duplicates} in {Survey} {Data} {Series}},
  date = {2024-02-12},
  url = {[8]/dup-index.html},
  langid = {en}
For attribution, please cite this work as:
Day, Jake, Jon Brauer, and Maja Kotlaja. 2024. “Near Duplicates in Survey Data Series.” February 12, 2024.[8]/dup-index.html.

questions? feedback? want to connect?

send us an email