Since you have already completed Project Assignment #1, I assume you
have found an article on a topic of interest to you that: (1) has data
available online via ICSPR (or another
repository); (2) that you have downloaded the data already - and, if the
data are on ICPSR, that you have done so with reproducible R code using
the icpsrdata
package (if so, include this script in your
code for this assignment); and (3) that the article contains basic
descriptive findings reported in a table and/or a figure that you can
reproduce using the available data.
Moreover, since you have already completed R Assignments #1 through
#4, I assume that you are familiar with: (1) RStudio; (2) creating
organized, descriptive, and reproducible R Markdown (RMD) files; (3)
(re)producing tables in R using the gt
package; and (4)
(re)producing basic figures using ggplot2
package. If not,
please review R Assignments 1-4.
Unlike your previous assignments, this assignment will not be organized into various numbered “parts” for you to follow and complete, and you should not organize it as a numbered list by following the numbered items below. Rather, this is your chance to demonstrate your own creativity and show what you have learned by crafting your own RMD file as you see fit. The final knitted file should contain the following (again, not numbered like below):
A description of the article, data source, and specific findings that will be reproduced, along with a justification for the reproduction. As with the rest of the document, this should be professionally written - think of it like the introductory section of a published replication/reproduction article that must describe the original research and justify the replication/reproduction research.
The table(s) and/or figure(s) found in the original study that you plan to reproduce should be included as an image(s). You have not yet learned how to do this in R Markdown, so I will give a brief and very basic tutorial below.
Following your description/justification of the reproduction, you should move into the reproduction itself. The first step should be to download the data and explain the process in sufficient detail so that others can easily and accurately reproduce your work.
icpsrdata
package. With this approach, you
should also add R code that automatically creates a unique data folder
(e.g., “Project_Data”) for this project as a subfolder in a working
directory created specifically for this project (e.g.,
“Project_work/Project_Data”) before downloading, then use
icpsrdata
package to download to your new folder. Recall,
this process ensures that anyone who runs your RMD file on
their own computer will automatically create the appropriate subfolder
and then download the data to the correct folder as well. (If you do not
remember how to do this, go back and review your earlier R
Assignments.)Read the data into R and save it as an object.
Identify and describe all the variables (i.e., “key variables”) needed to reproduce the original article’s table(s) or figure(s).
Summarize the raw versions of the key variables or items (e.g., attributes/response labels; summary/descriptive statistics) using packages and commands that you learned in earlier assignments.
attr(labels)
?).sjmisc
package, which you learned about in R Assignment 4.
Recall, using that package and tidyverse, you can easily generate
frequency and descriptive statistics tables (e.g.,
mydata %>% frq(myvar1)
or
mydata %>% descr(myvar1, myvar2)
).mutate
) if that is needed for
reproduction or to create polished versions of descriptives tables
(e.g., with gt
package) or figures (e.g., with
ggplot2
package), though you should feel free to do so if
desired. You will be required to do those things in your completed
“first draft” to submit for peer review in the next assignment (Project
Assignment #3).The final document should be well-organized using leveled subheadings (e.g., ## Top Heading, ### Subheading 1, etc.) with procedures thoroughly described in text (i.e., not in headings). Additionally, I recommend adding a table of contents (or perhaps a floating table of contents) as well, which you can do by modifying the YAML header at the top of your R Markdown file. Finally, you might wish to select your own theme to personalize the aesthetic look of your final knitted document, which you can also do by modifying the YAML header. You have not yet learned how to do this in R Markdown either, so I will give a brief and very basic tutorial on this as well below.
Upon completing the assignment, “knit” your final RMD file and save
the final knitted HTML document to your “Assignments” folder in your
LastName_P680_work folder as:
LastName_P680_ProjAssign2_YEAR_MO_DY. - Inside the
“LastName_P680_commit” folder in our shared folder, create another
folder named: Project Assignment 2.
- To submit your assignment for grading, save copies of both
your (1) “ProjAssign2” HTML file and (2) your “ProjAssign2_RMD file”
into the LastName_P680_commit > Project Assignment 2 folder.
Getting a basic image into R Markdown is relatively easy. The steps involve: (1) creating an “Images” subfolder within your working directory (i.e., “work” folder); (2) getting the image you want; (3) editing the image and saving it in your new “Images” subfolder within your working directory; (4) writing a code chunk to load the image in R Markdown. We will briefly go through each of these steps below.
# check if "Images" folder exists (TRUE if it does) & create if it does not exist.
ifelse(dir.exists(here("Images")), TRUE, dir.create(here("Images")))
## [1] TRUE
We assume you know how to take and crop a screenshot, or to copy-and-paste, the image(s) or table(s) that you wish to reproduce from the original published article that you have chosen.
As for editing the image, one easy way to do this is simply to paste the image into a new PowerPoint slide. From here, you can edit the image however you wish, then PowerPoint’s “save as picture” option should result in a saved image that retains your edits.
out.width
command and then adding the
following text to the R code chunk options:
fig.cap = "R code chunk for including images in R Markdown"
knitr::
), then calling the include_graphics
function from that package. This is followed by our familiar
here()
function to point to the “Images” subfolder within
our working directory, followed by the name of the file (including its
.jpg extension).That’s it - now you should be able to load an image containing the table(s) or figure(s) you are reproducing into R Markdown!
In addition to meeting the various goals required to reproduce original research findings, you should also work on creating documents that are functionally organized, easily navigable, and aesthetically pleasing. Some of you have made solid strides toward this goal by routinely using leveled subheadings to organize your documents and by including detailed text descriptions containing numbered or bulleted lists and bold or italicized fonts. Note, though, that what you have and will learn in this class only scratches the surface of what you can do with R and R Markdown. For additional inspiration, I strongly recommend checking out Alison Hill’s excellent blog entry: “How I Teach R Markdown”.
In this section, we will highlight two simple modifications that you can make to the YAML header to improve the functionality and appearance of your final knitted document. Remember, the YAML header is the section of text that appears at the very top of your R Markdown file in between the two lines containing three dashes (“- - -”). When you create a new R Markdown file, the YAML header looks like this:
We briefly discussed the YAML this in your early R Assignments. Additionally, when you switched the knitted file output from a Word document in R Assignment 1 to an HTML file in R Assignment 2, you may have noticed that the YAML header was changed in the process.
Here, we will modify the YAML header directly to change the appearance or Bootswatch theme and to add a table of contents to your knitted HTML file.
The easiest way to get started with editing YAML headers is to simply copy others that work. So, for example, check out the YAML header used for this project assignment in the screenshot below:
There are several noteworthy things about this header in comparison to the default YAML header:
Unlike the default, this header contains multiple layers of indentation. For instance, “html_document:” is indented one level, and everything from “number_sections:” to “code_folding” is indented two levels.
If you want to automatically created numbered sections, change “number_sections: no” to “number_sections: yes”
This assignment uses the “readable” theme. The default theme is - you guessed it - named “default.” There are several built-in themes that you can choose from (see the link for previews), and many more themes that you can install and use with some additional effort.
The “toc: yes” line under “html_document” adds a table of contents to our knitted HTML document”
Recall, if you set your R code chunks to
echo = TRUE
, then your code will be shared in the knitted
document. The line “code_folding: hide” will hide
your shared code. In other words, your code will still be shared in
the knitted HTML document but readers can choose whether to click a
toggle button to see your code.
Now you should have all the basics that you need to create, organize, and slightly customize your own knitted HTML file for your reproduction project!
While the above tutorial will get you through this class assignment, as you get more comfortable with R Markdown you should transition away from copying others’ working YAML headers and instead start writing your own as you see fit. The “ymlthis” package is a great resource for learning to create your own YAML code - and for doing so in a reproducible way within R code chunks.
Speaking of reproducibility and R packages, one major threat to reproducibility that we have not mentioned yet is R itself. Due to its open-source nature, packages are not tightly curated, and old (or even relatively new) code may not work as packages become updated and functions change in ways that break or unwittingly change your previous code commands. Well, there is an R package for this problem too! The “groundhog” package will archive the specific versions of all your various loaded libraries to ensure that you or others are using the exact same package versions so you can reproduce past work. So, if you are feeling especially ambitious, try your hand at using this program in your final project!