Assumptions & Ground Rules

The primary purpose of this first project assignment is to find an article on a topic of interest to you that has data available online via ICSPR (or another repository); eventually, you will be required to use these data in an attempt to reproduce a basic descriptive finding reported in a table or figure from the article. A secondary purpose of this assignment is to develop a sense of how (un)common it is to find research articles in the top journals of your field for which the authors have openly shared their data and code for reproducibility purposes. For more information about why and how to share data/code and other best practices for conducting reproducible research, check out here, here, here, and here.

At this point, I assume you are familiar with RStudio and with creating R Markdown (RMD) files. If not, please review R Assignments 1 & 2.

Additionally, I assume you know how to search for articles and then access them where available; there are numerous ways to do this. If it were me, I might start by searching for articles using the search function on each journal’s webpage. Upon finding articles of interest, I might then search for the specific articles using the title, author, and/or keywords in Google Scholar; you can also do an “advanced search” for specific topics published in specific journals. If you decide to use Google Scholar, I recommend adding your university to the “library links” setting for quick authentication and access via your library. Note that unlike in the link to instructions for IUPUI, you will want to check your university to generate the proper authentication links in Google Scholar. Finally, I assume you know how to properly format an article citation (e.g., using APA or Chicago style); Google Scholar can also help you with getting the full citation.

Part 1

Goal: Find 5 articles from top journals in area of interest, then assess availability of data and code

  1. Find five (5) peer-reviewed empirical research articles from the list of journals below
    • Each article you select should describe and report results of an “original” research study - e.g., an experimental or observational research design. In this case, “original” is not intended to rule out a “replication” study, were you to find any. Rather, it is intended simply to imply that any theoretical article, or systematic review article, or meta-analysis that does not also report the results of an original research design does not qualify for this particular assignment.
    • For collegiality reasons, refrain from selecting any articles authored by faculty or students in our department.
    • Articles that share or link to code/script files for reproducing results are acceptable for Part 1 of this assignment. However, such articles are not appropriate for Part 2 of the assignment (or, likewise, for the remaining Project Assignments). Fortunately for you but unfortunately for science, such practices are quite rare in our field anyway - a point that this part of assignment is intended to drive home.
  2. Search for these five articles in the following journals, which are widely considered to be some of the “top” journals in their respective fields:
    • Top journals in criminology & criminal justice:
    • Criminology
    • Justice Quarterly
      • Journal stance on data sharing as of 25 Aug. 2021: Voluntary & encouraged.
      • “Justice Quarterly aims to foster transparency in research, and therefore ask authors to submit a Methodological Appendix, which will appear online if accepted for publication, with detailed information on the methodological procedures that produced the data and the way in which those data were analyzed, as described in the Transparency Guidelines. This is voluntary, though strongly encouraged for all submissions.”
    • Journal of Quantitative Criminology
      • Journal stance on data sharing as of 25 Aug. 2021: Voluntary & encouraged.
      • “This journal operates a type 2 research data policy. A submission to the journal implies that materials described in the manuscript, including all relevant raw data, will be freely available to any researcher wishing to use them for non-commercial purposes, without breaching participant confidentiality…. The journal strongly encourages that all datasets on which the conclusions of the paper rely should be available to readers. We encourage authors to ensure that their datasets are either deposited in publicly available repositories (where available and appropriate) or presented in the main manuscript or additional supporting files whenever possible. Please see Springer Nature’s information on recommended repositories.”
    • Journal of Research in Crime & Delinquency
    • Journal of Criminal Justice
      • Journal stance on data sharing as of 25 Aug. 2021: Voluntary & encouraged.
      • “This journal encourages and enables you to share data that supports your research publication where appropriate, and enables you to interlink the data with your published articles. Research data refers to the results of observations or experimentation that validate research findings. To facilitate reproducibility and data reuse, this journal also encourages you to share your software, code, models, algorithms, protocols, methods and other useful materials related to the project…. To foster transparency, we encourage you to state the availability of your data in your submission. This may be a requirement of your funding body or institution. If your data is unavailable to access or unsuitable to post, you will have the opportunity to indicate why during the submission process, for example by stating that the research data is confidential. The statement will appear with your published article on ScienceDirect….”
    • Top journals in sociology:
    • American Journal of Sociology
    • American Sociological Review
      • Journal stance on data sharing as of 25 Aug. 2021: Required by code of ethics, with caveats
      • “Ethics: Submission of a manuscript to another professional journal while it is under review by the ASR is regarded by the ASA as unethical. Significant findings or contributions that have already appeared (or will appear) elsewhere must be clearly identified. All persons who publish in ASA journals are required to abide by ASA guidelines and ethics codes regarding plagiarism and other ethical issues. This requirement includes adhering to ASA’s stated policy on data-sharing: ‘Sociologists make their data available after completion of the project or its major publications, except where proprietary agreements with employers, contractors, or clients preclude such accessibility or when it is impossible to share data and protect the confidentiality of the data or the anonymity of research participants (e.g., raw field notes or detailed information from ethnographic interviews)’ (ASA Code of Ethics, 1997).”
    • Journal of Health & Social Behavior
    • Social Problems
    • Social Forces
      • Journal stance on data sharing as of 25 Aug. 2021: None. (Journal notes that it permits “Supplementary data” files given that file size does not exceed 2MB; for reference, the size of the HTML file you are viewing for this assignment is approximately 2MB.)
  3. Create an RMD file for Project Assignment 1
    • Choose “knit to HTML” for this assignment
    • Use headings (e.g., with hashmarks) to differentiate Part 1 & Part 2 of the assignment
    • Organize by article in Part1; provide the following for for each article:
      • Full citation (e.g., APA or Chicago style)
      • Link to article digital object identifier or DOI (i.e., permanent link journal webpage containing ther article).
        • You can add hyperlinks in an RMD file by putting the text you want to display in brackets [] followed immediately by parentheses () containing the link.
        • E.g., [Brauer, Day, & Hammond 2019](https://journals.sagepub.com/doi/10.1177/0049124119826158) generates the following: Brauer, Day, & Hammond 2019
      • Answers to the following questions:
        • Q1: Are the data used in the article shared on the journal website, or are links to the data provided with the article (e.g., in Supplementary Materials, or to an author’s GitHub page or other archive)?
        • Q2: If “NO” to Q1, are the data publicly available for download via another repository (e.g., ICPSR)?
        • Q3: Does the author(s) provide code or script for reproducing results in the paper? (Note: If “Yes”, then this article is not eligible for Part 2 or for remaining Project Assignments - but cheers to the author!)

Part 2

Goal: Find an article using a publicly available dataset with a table or figure that you can replicate

  1. If you answered “YES” to Q1 or Q2 in Part 1 and “NO” to Q3 for any of your five articles above, then you should feel free to use that article and the publicly available data associated with it for Part 2 and for the remaining Project Assignments.
    • Under Part 2 in your RMD file, re-list the selected article’s full citation and a link to the article’s DOI.
    • Describe where the data are located for public access, and include a link to the publicly available data associated with the article.
  2. Whether you selected an eligible article in Part 1 or not, you should familiarize yourself with the ICPSR and NACJD websites.
    • ICPSR, or the Inter-university Consortium for Political and Social Research, is an international consortium comprised of over 750 academic and research institutions that provides a host of services, including data archiving, data curation, and training in data access and analysis. ICPSR’s data archive contains “more than 250,000 files in social and behavioral sciences” and “hosts 21 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields” (see “About ICPSR”).
      • NACJD, or the National Archive of Criminal Justice Data, is one of ICPSR’s specialized collections. NACJD “archives and disseminates data on crime and justice for secondary analysis,” and its archive “contains data from over 2,700 curated studies or statistical data series.”

It’s worth familiarizing yourself with the ICPSR webpage for a specific dataset as it has some pretty standard things that can be useful. Here, we’ll use Wave 1 of the National Youth Survey from 1976. You’ll learn more about the NYS data in your next R assignment. For right now I just want to walk you through a typical ICPSR landing page.

ICPSR landing page for National Youth Survey, Wave 1, 1976

ICPSR landing page for National Youth Survey, Wave 1, 1976

Under the “At A Glance” tab, it provides a basic overview of the data including a a summary, citation, funding sources, restrictions, as well as some basics about the scope and methodology of the project. Another potentially useful thing for students is the “Data-related Publications” tab:

Finding Data-Related Publications on ICPSR

Finding Data-Related Publications on ICPSR

This provides a list of the publications that have used Wave 1 of the NYS data series. It can provide you with some potential studies that may be of interest for replicating or, if you are trying to use this data for original research, see what exactly has been done with this data before. There is other stuff to peruse on each specific data’s ICPSR site, but for this class, an important feature is the “Data & Documentation” tab. This is where you can actually download the data.

Finding Data & Documentation Files on ICPSR

Finding Data & Documentation Files on ICPSR

If you wanted to download the data, along with the codebook, setup files, and anything else associated with the dataset (this is not necessary for this assignment), you would click on the download icon.

Manually Downloading Data from ICPSR's Website

Manually Downloading Data from ICPSR’s Website

  1. If you did not find an eligible article in Part 1 to use in Part 2 and remaining Project Assignments, then do the following:
    • Search ICPSR and/or NACJD for a dataset that is relevant to your research interests and available for public download.
    • Under Part 2 in your RMD file, include the dataset title, the ICPSR study number, and a link to the data page.
    • As you can see in the image below, the ICPSR and NACJD websites have very similar formats, including all the same tabs reviewed above (e.g., “Data & Documentation;”Data-related Publications”). Use the “Data-related Publications” tab in either website to search for and find an eligible article of interest for this project. You might also wish to search Google Scholar for your topic of interest alongside the dataset name.
    • Under Part 2 in your RMD file, in addition to listing the data information, list the selected article’s full citation and a link to the article’s DOI.
Finding data-related publications with ICPSR & NACJD

Finding data-related publications with ICPSR & NACJD

  1. Irrespective of the path you took to select an article and dataset (e.g., found in Part 1 or via ICPSR/NACJD search in Part 2), you should answer the following questions in Part 2 of your RMD file:
    • Is there only one datafile associated with the article you selected? For instance, does the article use data from only one source or rely on only one wave of a longitudinal dataset, or does the article appear to rely on multiple sources of data or multiple waves of data from a longitudinal design?
    • Does the article include a simple table or figure that presents basic descriptive statistics? If so, what is the title and page number of the table or figure?
    • Is the data available on ICPSR? If so, what is the ICPSR study number? If not, where is the data available?
  2. Upon completing the assignment, “knit” your final RMD file and save the final knitted HTML document to your “Assignments” folder in your LastName_P680_work folder as: LastName_P680_ProjAssign1_YEAR_MO_DY.
    • Inside the “LastName_P680_commit” folder in our shared folder, create another folder named: Project Assignment 1.
    • To submit your assignment for grading, save copies of both your (1) “ProjAssign1” HTML file and (2) your “ProjAssign1_RMD file” into the LastName_P680_commit > Project Assignment 1 folder.