Assumptions & Ground Rules

The primary purpose of Phase 2 of the Replication and Reproducibility Project are to expose you to the academic literature surounding your research topic and/or question. A secondary purpose of this assignment is for you to develop a sense of how (un)common it is to find research articles in the top journals in the field of Criminology for which the authors have openly shared their data and code for reproducibility purposes. For more information about why and how to share data/code and other best practices for conducting reproducible research, check out, here, here, and here. :

Specifically, for this phase of the project, you will accomplish the following tasks:

  1. Find five empirical research articles related to the topic or question you identified in Phase 1 and evaluate these five articles in terms of their adherence to basic open science practices (e.g., data availability, replication code availability, etc.).

  2. Identify an empirical research article related to your topic or question that analyzes data from the NYS and review its characteristics.

  3. Tentatively commit to performing a conceptual replication of an article that does not use the NYS or a direct reproduction of one that does and identify what aspects of the research article you plan to conceptually replicate or reproduce.

At this point, I assume you are familiar with RStudio and with creating R Markdown (RMD) files. If not, please review R Assignments 1 & 2.

Additionally, I assume you know how to search for articles and then access them where available; there are numerous ways to do this. If it were me, I might start by searching for articles using the search function on each journal’s webpage. Upon finding articles of interest, I might then search for the specific articles using the title, author, and/or keywords in Google Scholar; you can also do an “advanced search” for specific topics published in specific journals. If you decide to use Google Scholar, I recommend adding your university to the “library links” setting for quick authentication and access via your library. Alternatively, you might search for a specific journal directly via the university library. (See here for additional help with finding articles using our library website.)

Finally, I assume you know how to properly format an article citation (e.g., using APA or Chicago style); Google Scholar can also help you with getting the full citation.

Part 1

Goal: Find 5 articles from top journals in Criminology, then assess availability of data and code

  1. Find five (5) peer-reviewed empirical research articles from the list of journals below
    • Each article you select should describe and report results of an “original” research study - e.g., an experimental or observational research design. In this case, “original” is not intended to rule out a “replication” study, were you to find any. Rather, it is intended simply to imply that any theoretical article, or systematic review article, or meta-analysis that does not also report the results of an original research design does not qualify for this particular assignment.
    • For collegiality reasons, refrain from selecting any articles authored by faculty or students in our department.
    • Articles that share or link to code/script files for reproducing results are acceptable for Part 1 of this assignment. However, such articles are not appropriate for the remaining phases of the Replication and Reproducibility Project if they use the NYS data). Fortunately for you but unfortunately for science, such practices are quite rare in our field anyway - a point that this part of assignment is intended to drive home.
  2. Search for these five articles in the following journals, which are widely considered to be some of the “top” journals in their respective fields:
    Top journals in criminology & criminal justice:
    • Criminology
    • Justice Quarterly
      • Journal stance on data sharing as of 25 Aug. 2021: Voluntary & encouraged.
      • “Justice Quarterly aims to foster transparency in research, and therefore ask authors to submit a Methodological Appendix, which will appear online if accepted for publication, with detailed information on the methodological procedures that produced the data and the way in which those data were analyzed, as described in the Transparency Guidelines. This is voluntary, though strongly encouraged for all submissions.”
    • Journal of Quantitative Criminology
      • Journal stance on data sharing as of 25 Aug. 2021: Voluntary & encouraged.
      • “This journal operates a type 2 research data policy. A submission to the journal implies that materials described in the manuscript, including all relevant raw data, will be freely available to any researcher wishing to use them for non-commercial purposes, without breaching participant confidentiality…. The journal strongly encourages that all datasets on which the conclusions of the paper rely should be available to readers. We encourage authors to ensure that their datasets are either deposited in publicly available repositories (where available and appropriate) or presented in the main manuscript or additional supporting files whenever possible. Please see Springer Nature’s information on recommended repositories.”
    • Journal of Research in Crime & Delinquency
    • Journal of Criminal Justice
      • Journal stance on data sharing as of 25 Aug. 2021: Voluntary & encouraged.
      • “This journal encourages and enables you to share data that supports your research publication where appropriate, and enables you to interlink the data with your published articles. Research data refers to the results of observations or experimentation that validate research findings. To facilitate reproducibility and data reuse, this journal also encourages you to share your software, code, models, algorithms, protocols, methods and other useful materials related to the project…. To foster transparency, we encourage you to state the availability of your data in your submission. This may be a requirement of your funding body or institution. If your data is unavailable to access or unsuitable to post, you will have the opportunity to indicate why during the submission process, for example by stating that the research data is confidential. The statement will appear with your published article on ScienceDirect….”

Note: For this part of the assignment, you must search in the five specific journals listed above. In the second part of the assignment, aritlces from other journals will be possible.

  1. Create an RMD file titled “CRM 495: Project Phase 2”
    • Choose “knit to HTML” for this assignment
    • Use headings (e.g., with hashmarks) to differentiate Part 1 & Part 2 of the assignment
    • Organize by article in Part1; provide the following for for each article:
      • Headings differentiating Article #1, Article #2, etc.
      • Full citation (e.g., APA or Chicago style)
      • Link to article digital object identifier or DOI (i.e., permanent link journal webpage containing ther article).
        • You can add hyperlinks in an RMD file by putting the text you want to display in brackets [] followed immediately by parentheses () containing the link.
        • E.g., [Brauer, Day, & Hammond 2019](https://journals.sagepub.com/doi/10.1177/0049124119826158) generates the following: Brauer, Day, & Hammond 2019
      • Answers to the following questions for each article:
        • Q1: Are the data used in the article shared on the journal website, or are links to the data provided with the article (e.g., in Supplementary Materials, or to an author’s GitHub page or other archive)?
          • Note: the presence of “supplementary materials” does not necessarily mean the raw data is being shared. So be sure to look carefully.
        • Q2: If “NO” to Q1, are the data publicly available for download via another repository (e.g., ICPSR)?
        • Q3: Does the author(s) provide code or script for reproducing results in the paper?
          • Note: If “Yes” and they use NYS data (waves 1-7), then this article is not eligible for the remaining phases of the Replication and Reproducibility Project - but cheers to the author!)

Part 2

Goal: Find an article using NYS data whose results you could reproduce

In part 2 instead of finding an article and then looking for the data, we’ll reverse it and start with the NYS data and then find an article that analyzes it.

  1. Go to the NYS Series landing page on ICPSR and click on the “Data-related Publications” tab to search for an article related to your topic that uses NYS data. - See instructions for Phase 1 of the Replication and Reproducibility project for details on how to do this.

  2. Like with the articles in Part 1 above, provide the full citation for the article and a link to the article DOI.

  3. Answer the following questions about the article: - Does the author(s) provide code or script for reproducing results in the paper?

    • Note: If “Yes” then you need to find another article that uses the NYS but does not provide code or script for reproducing the results.

Part 3

Goal: Tentatively commit to a conceptual replication or direct reproduction

Finally, tentatively commit to a “conceptual replication” or a “direct reproduction” of one of the articles you identified above by answering or doing the following:

  1. Identify the article you are most interested in replicating or reproducing using NYS data and state clearly whether your project will be a a direct reproduction or a conceptual replication.
  • Note: If the article is from Part 1 and does not use the NYS data, you will be conducting a conceptual replication. This will ultimately require that you can identify similar survey items measured in the NYS for at least some of the key variables in the article (you will identify these items in a future assignment).

  • Note: If the article is from Part 2 and thus does use the NYS data, it is likely you will be completing a direct reproduction

    • If the study analyzed one or a few waves of the NYS data that are available on ICPSR, there may be potential to extend their analysis to different waves of the NYS given the same or similar items are available (you will identify these items in a future assignment).
  1. Write a brief paragraph about why you are interested in replicating or reproducing this study and any issues or problems you anticipate in trying to replicate or reproduce it.

  2. Does the article include simple table(s) and/or figure(s) that presents basic descriptive statistics? If so, what is the title and page numbers of the tables and/or figures?

  3. Create a "Conclusion section where you write about what you learned in this assignment and any problems or issues you had in completing it.

Part 4: Submit your assignment

  1. Upon completing the tasks in the previous sections, “knit” your final RMD file and save the final knitted html document to your “Assignments” folder in your LastName_CRM495_work folder as: LastName_CRM495_RR-Project-Phase2_YEAR_MO_DY.

  2. Inside the “LastName_CRM495_commit” folder in our shared folder, create another folder named: RR_Project_Phase2.

  3. To submit your assignment for grading, save copies of both your (1) “RR-Project-Phase2” html file and (2) your “RR-Project_Phase2” RMD file into the LastName_CRM495_commit > RR_Project_Phase2 folder. Remember, be sure to save copies of both files - do not just drag the files over from your “work” folder, or you may lose those original copies from your “work” folder.

Postscript: Distinguishing between “Reproduction” and “Replication”

Notice that in the above assignment and in the “Replication and Reproducibility Project” as a whole, we are drawing a distinction between “reproducibility” and “replicability” and, likewise, between reproduction and replication research. In a reproduction, the goal is to verify or repeat exactly some or all of the findings reported in a previous study using identical data and methods as the original study. Unfortunately, the terminology surrounding reproduction and replication is inconsistent and confusing. For example, some use the term “pure replication” to refer to what we call reproduction research (e.g., see Freese and Peterson 2017, pp.152-3). We note that our distinctions are consistent with those used in Ritchie’s (2020) book (which you are currently reading) and with others’ recent attempts to clarify terminology in this space (e.g., Patil, Peng, & Leek 2019).

In addition to distinguishing between reproducibility and replicability, we might also draw distinctions between different types of replications. Perhaps the most common is the distinction between a direct replication and a conceptual replication (cf. Crandall and Sherman 2016; Pridemore, Makel, & Plucker 2018, p.21). In a direct replication, one assesses the same theoretical or observational claim of a study using new data and measures that are collected or designed in such a way as to match the prior study’s design as exactly as possible, though perhaps with some notable exceptions (e.g., a larger sample size to improve statistical inferences). In contrast, a conceptual replication assesses the same theoretical or observational claim as a previous study using new data and/or new measurement procedures that are conceptually similar but not identical to those used in the previous study. we recommend reading Crandall and Sherman’s (2016) detailed discussion of these distinctions and their case for the relative utility of conceptual replications in advancing scientific progress; see also Nosek and Errington’s (2020) critique of these distinctions.

Underlying many of these terminological distinctions are differences in research procedures and research aims. For instance, drawing on Freese and Peterson’s (2017) typology of the different aims involved in replication and reproducibility research, reproduction research often aims to assess verifiability by attempting to reproduce or verify an original study’s findings using the same data and methods (e.g., code). Direct replications typically assess repeatability by testing whether the same findings emerge or repeat when applying the same methods to a new sample. Conceptual replications often assess repeatability, robustness, and/or generalizability of a theoretical or observational claim by, for instance, testing the original claim’s robustness to different measurement specifications using the same data or testing the claim’s generality to new samples (e.g., different groups or contexts). We recommend reading Freese and Peterson’s in-depth discussion of these aims; for convenience, we include their definitions (see 2017, p.152) of these four aims below.

  • Tests of verifiability: “taking the results of an original study as the object of inquiry and asks limited questions regarding whether the same results are obtained by doing the same analyses on the same data.”
  • Tests of robustness: “conduct a reanalysis on the original data using alternative specifications to see if the target finding is merely the result of analytic decisions.”
  • Tests of repeatability: “collecting new data to determine whether key results of a study can be observed by using the original procedures.”
  • Tests of generalizability: “the original study provides a premise for research trying to evaluate whether similar findings may be observed consistently across different methods or settings.”