Assumptions & Ground Rules
The primary purpose of this first project assignment is to find an
article on a topic of interest to you that has data available online via
ICSPR (or another
repository); eventually, you will be required to use these data in an
attempt to reproduce a basic descriptive finding reported in a table or
figure from the article. A secondary purpose of this assignment is to
develop a sense of how (un)common it is to find research articles in the
top journals of your field for which the authors have openly shared
their data and code for reproducibility purposes. For more information
about why and how to share data/code and other best practices for
conducting reproducible research, check out here,
here,
here, and
here.
At this point, I assume you are familiar with RStudio and with
creating R Markdown (RMD) files. If not, please review R Assignments 1
& 2.
Additionally, I assume you know how to search for articles and then
access them where available; there are numerous ways to do this. If it
were me, I might start by searching for articles using the search
function on each journal’s webpage. Upon finding articles of interest, I
might then search for the specific articles using the title, author,
and/or keywords in Google
Scholar; you can also do an “advanced search” for specific topics
published in specific journals. If you decide to use Google Scholar, I
recommend adding
your university to the “library links” setting for quick
authentication and access via your library. Note that unlike in the link
to instructions for IUPUI, you will want to check your university to
generate the proper authentication links in Google Scholar. Finally, I
assume you know how to properly format an article citation (e.g., using
APA or Chicago style); Google
Scholar can also help you with getting the full citation.
Part 1
Goal: Find 5 articles from top journals in area of interest, then
assess availability of data and code
- Find five (5) peer-reviewed empirical research articles
from the list of journals below
- Each article you select should describe and report results of an
“original” research study - e.g., an experimental or observational
research design. In this case, “original” is not intended to rule out a
“replication” study, were you to find any. Rather, it is intended simply
to imply that any theoretical article, or systematic review article, or
meta-analysis that does not also report the results of an original
research design does not qualify for this particular assignment.
- For collegiality reasons, refrain from selecting any articles
authored by faculty or students in our department.
- Articles that share or link to code/script files for reproducing
results are acceptable for Part 1 of this assignment. However, such
articles are not appropriate for Part 2 of the assignment (or, likewise,
for the remaining Project Assignments). Fortunately for you but
unfortunately for science, such practices are quite rare in our field
anyway - a point that this part of assignment is intended to drive
home.
- Search for these five articles in the following journals, which are
widely considered to be some of the “top” journals in their respective
fields:
- Top journals in criminology & criminal
justice:
- Criminology
- Justice
Quarterly
- Journal
stance on data sharing as of 25 Aug. 2021: Voluntary &
encouraged.
- “Justice Quarterly aims to foster transparency in research, and
therefore ask authors to submit a Methodological Appendix, which will
appear online if accepted for publication, with detailed information on
the methodological procedures that produced the data and the way in
which those data were analyzed, as described in the Transparency
Guidelines. This is voluntary, though strongly encouraged for all
submissions.”
- Journal of
Quantitative Criminology
- Journal
stance on data sharing as of 25 Aug. 2021: Voluntary &
encouraged.
- “This journal operates a type
2 research data policy. A submission to the journal implies that
materials described in the manuscript, including all relevant raw data,
will be freely available to any researcher wishing to use them for
non-commercial purposes, without breaching participant confidentiality….
The journal strongly encourages that all datasets on which the
conclusions of the paper rely should be available to readers. We
encourage authors to ensure that their datasets are either deposited in
publicly available repositories (where available and appropriate) or
presented in the main manuscript or additional supporting files whenever
possible. Please see Springer Nature’s information on recommended
repositories.”
- Journal of
Research in Crime & Delinquency
- Journal
of Criminal Justice
- Journal
stance on data sharing as of 25 Aug. 2021: Voluntary &
encouraged.
- “This journal encourages and enables you to share data that supports
your research publication where appropriate, and enables you to
interlink the data with your published articles. Research data refers to
the results of observations or experimentation that validate research
findings. To facilitate reproducibility and data reuse, this journal
also encourages you to share your software, code, models, algorithms,
protocols, methods and other useful materials related to the project….
To foster transparency, we encourage you to state the availability of
your data in your submission. This may be a requirement of your funding
body or institution. If your data is unavailable to access or unsuitable
to post, you will have the opportunity to indicate why during the
submission process, for example by stating that the research data is
confidential. The statement will appear with your published article on
ScienceDirect….”
- Top journals in sociology:
- American
Journal of Sociology
- American
Sociological Review
- Journal
stance on data sharing as of 25 Aug. 2021: Required by code of
ethics, with caveats
- “Ethics: Submission of a manuscript to another professional journal
while it is under review by the ASR is regarded by the ASA as unethical.
Significant findings or contributions that have already appeared (or
will appear) elsewhere must be clearly identified. All persons who
publish in ASA journals are required to abide by ASA guidelines and
ethics codes regarding plagiarism and other ethical issues. This
requirement includes adhering to ASA’s stated policy on data-sharing:
‘Sociologists make their data available after completion of the project
or its major publications, except where proprietary agreements with
employers, contractors, or clients preclude such accessibility or when
it is impossible to share data and protect the confidentiality of the
data or the anonymity of research participants (e.g., raw field notes or
detailed information from ethnographic interviews)’ (ASA Code of Ethics,
1997).”
- Journal of
Health & Social Behavior
- Social
Problems
- Social Forces
- Journal
stance on data sharing as of 25 Aug. 2021: None. (Journal notes that
it permits “Supplementary data” files given that file size does not
exceed 2MB; for reference, the size of the HTML file you are viewing for
this assignment is approximately 2MB.)
- Create an RMD file for Project Assignment 1
- Choose “knit to HTML” for this assignment
- Use headings (e.g., with hashmarks) to differentiate Part
1 & Part 2 of the assignment
- Organize by article in Part1; provide the following
for for each article:
- Full citation (e.g., APA or Chicago style)
- Link to article digital object identifier or DOI (i.e., permanent
link journal webpage containing ther article).
- You can add hyperlinks in an RMD file by putting the text you want
to display in brackets
[]
followed immediately by
parentheses ()
containing the link.
- E.g.,
[Brauer, Day, & Hammond 2019](https://journals.sagepub.com/doi/10.1177/0049124119826158)
generates the following: Brauer,
Day, & Hammond 2019
- Answers to the following questions:
- Q1: Are the data used in the article shared on the
journal website, or are links to the data provided with the article
(e.g., in Supplementary Materials, or to an author’s GitHub page or
other archive)?
- Q2: If “NO” to Q1, are the data publicly available
for download via another repository (e.g., ICPSR)?
- Q3: Does the author(s) provide code or script for
reproducing results in the paper? (Note: If “Yes”, then this article is
not eligible for Part 2 or for remaining Project Assignments - but
cheers to the author!)
Part 2
Goal: Find an article using a publicly available dataset with a
table or figure that you can replicate
- If you answered “YES” to Q1 or
Q2 in Part 1 and “NO” to
Q3 for any of your five articles above, then you should
feel free to use that article and the publicly available data associated
with it for Part 2 and for the remaining Project Assignments.
- Under Part 2 in your RMD file, re-list the selected article’s full
citation and a link to the article’s DOI.
- Describe where the data are located for public access, and include a
link to the publicly available data associated with the article.
- Whether you selected an eligible article in Part 1 or not, you
should familiarize yourself with the ICPSR and NACJD
websites.
- ICPSR, or the Inter-university Consortium for Political and Social
Research, is an international consortium comprised of over 750 academic
and research institutions that provides a host of
services, including data archiving, data curation, and training in
data access and analysis. ICPSR’s data archive contains “more than
250,000 files in social and behavioral sciences” and “hosts 21
specialized collections of data in education, aging, criminal justice,
substance abuse, terrorism, and other fields” (see “About ICPSR”).
- NACJD, or the National Archive of Criminal Justice Data, is one of
ICPSR’s specialized collections. NACJD “archives and disseminates data
on crime and justice for secondary analysis,” and its archive “contains
data from over 2,700 curated studies or statistical data series.”
It’s worth familiarizing yourself with the ICPSR webpage for a
specific dataset as it has some pretty standard things that can be
useful. Here, we’ll use Wave 1 of the National Youth
Survey from 1976. You’ll learn more about the NYS data in your next
R assignment. For right now I just want to walk you through a typical
ICPSR landing page.
Under the “At A Glance” tab, it provides a basic overview of the data
including a a summary, citation, funding sources, restrictions, as well
as some basics about the scope and methodology of the project. Another
potentially useful thing for students is the “Data-related Publications”
tab: