Unlike your previous assignments, this assignment will not be organized into various numbered “parts” for you to follow and complete, and you should not organize it as a numbered list by following the numbered items below. Rather, this is your chance to demonstrate your own creativity and show what you have learned by crafting your own RMD file as you see fit. The final knitted file should contain the following (again, not numbered like below):
A description of the article, data source, and specific findings that will be reproduced, along with a justification for the reproduction. As with the rest of the document, this should be professionally written - think of it like the introductory section of a published replication/reproduction article that must describe the original research and justify the replication/reproduction research.
The table(s) and/or figure(s) found in the original study that you plan to reproduce should be included as an image(s).
Following your description/justification of the reproduction, you should move into the reproduction itself. The first step should be to download the data and explain the process in sufficient detail so that others can easily and accurately reproduce your work.
If you are downloading data from ICPSR, do it using reproducible
R code with the icpsrdata
package. With this approach, you
should also add R code that automatically creates a unique data folder
(e.g., “Project_Data”) for this project as a subfolder in a working
directory created specifically for this project (e.g.,
“Project_work/Project_Data”) before downloading, then use
icpsrdata
package to download to your new folder. Recall,
this process ensures that anyone who runs your RMD file on their own
computer will automatically create the appropriate subfolder and then
download the data to the correct folder as well (If you do not remember
how to do this, go back and review your earlier R Assignments.)
If the data are not on ICPSR, you should download it manually and save it in a unique data subfolder (e.g., “Project_Data”) within a working directory created specifically for this project (e.g., “Project_work/Project_Data”). Then, you should describe the exact procedures that others will need to follow to be able to reproduce your work. E.g., you will need to describe where and how to download the data and then where to save it (e.g., in a specifically-titled subfolder within the working directory containing your RMD file). It is important to provide detailed and careful descriptions of these procedures to ensure that others can easily and accurately reproduce your work.
Again, the point here is not to treat this like an assignment; rather, you should treat it like you are drafting an article in R Markdown, complete with code and detailed descriptions throughout that you would also submit as supplementary materials to accompany the trimmed-down and polished published article.
Read the data into R and save it as an object.
Identify and describe all the variables (i.e., “key variables”) needed to reproduce the original article’s table(s) or figure(s).
Summarize the raw versions of the key variables or items (e.g., attributes/response labels; summary/descriptive statistics) using packages and commands that you learned in earlier assignments.
There are various ways to explore your data and view response
labels (e.g., recall attr(labels)
?).
For descriptive statistics, we recommend you use the
sjmisc
package, which you learned about in R Assignment
4.Recall, using that package and tidyverse, you can easily generate
frequency and descriptive statistics tables (e.g.,
mydata %>% frq(myvar1)
or
mydata %>% descr(myvar1, myvar2)
).
Recode all the variables (e.g., using mutate
) if
that is needed for reproduction; create polished versions of
descriptives table(s) (e.g., with gt
package) or figures
(e.g., with ggplot2
package).
The final document should be well-organized using leveled subheadings (e.g., ## Top Heading, ### Subheading 1, etc.) with procedures thoroughly described in text (i.e., not in headings). Additionally, I recommend adding a table of contents (or perhaps a floating table of contents) as well, which you can do by modifying the YAML header at the top of your R Markdown file. Finally, you might wish to select your own theme to personalize the aesthetic look of your final knitted document, which you can also do by modifying the YAML header. You have not yet learned how to do this in R Markdown either, so I will give a brief and very basic tutorial on this as well below.
Upon completing the assignment, “knit” your final RMD file and save the final knitted HTML document to your “Assignments” folder in your LastName_SC500_work folder as: LastName_SC500_ProjAssign3_YEAR_MO_DY. - Inside the “LastName_SC500_commit” folder in our shared folder, create another folder named: Project Assignment 3.
To submit your assignment for grading/feedback, save copies of both your (1) “ProjAssign3” HTML file and (2) your “ProjAssign_RMD file” and any other files (and file structures) necessary to reproduce your document (e.g. image files) into the LastName_SC500_commit > Project Assignment 3 folder.
Here are some things I will look for when evaluating/providing feedback:
Description/Justification: Does the author describe and justify the reproduction project aims clearly and effectively? Is the original study included in one of the project folders? Can I find the table or figure in the original study that the author is attempting to reproduce? Is the original study and that specific table/figure described clearly and accurately?
Project File Structure: Is the RMarkdown file in the “root” folder of the shared drive? Are there separate and clearly marked folders following best practices (e.g., Data; Articles; Images)? Can I open the RMarkdown file?
R Code Reproducibility: After installing any necessary packages, can I successfully run all R Code chunks, or does running the code generate errors? If errors are generated, is it immediately obvious what those errors are, and can I fix them with minimal effort to continue the review of R Code chunks? Is there anything I can suggest to the author for improving their R Code chunks (e.g., error fixes; efficiency improvements; reproducibility improvements; useful additions)?