Project Assignment 2: Describe Reproduction, Share Image, Summarize Data

Assumptions & Ground Rules

Project Assignment #2

Goal: Start Drawing the Owl - Describe Reproduction; Get & Read Data; Summarize Raw Variables

Brief Tutorial #1: Save & Load Simple Image in RMD

Brief Tutorial #2: Edit YAML Header for TOC & Theme

Final Comments on Reproducibility

Assumptions & Ground Rules

Since you have already completed Project Assignment #1, I assume you have found an article on a topic of interest to you that: (1) has data available online via ICSPR (or another repository); (2) that you have downloaded the data already - and, if the data are on ICPSR, that you have done so with reproducible R code using the icpsrdata package (if so, include this script in your code for this assignment); and (3) that the article contains basic descriptive findings reported in a table and/or a figure that you can reproduce using the available data.

Moreover, since you have already completed R Assignments #1 through #4, I assume that you are familiar with: (1) RStudio; (2) creating organized, descriptive, and reproducible R Markdown (RMD) files; (3) (re)producing tables in R using the gt package; and (4) (re)producing basic figures using ggplot2 package. If not, please review R Assignments 1-4.

Project Assignment #2

Goal: Start Drawing the Owl - Describe Reproduction; Get & Read Data; Summarize Raw Variables

Unlike your previous assignments, this assignment will not be organized into various numbered “parts” for you to follow and complete, and you should not organize it as a numbered list by following the numbered items below. Rather, this is your chance to demonstrate your own creativity and show what you have learned by crafting your own RMD file as you see fit. The final knitted file should contain the following (again, not numbered like below):

A description of the article, data source, and specific findings that will be reproduced, along with a justification for the reproduction. As with the rest of the document, this should be professionally written - think of it like the introductory section of a published replication/reproduction article that must describe the original research and justify the replication/reproduction research.
The table(s) and/or figure(s) found in the original study that you plan to reproduce should be included as an image(s). You have not yet learned how to do this in R Markdown, so I will give a brief and very basic tutorial below.
Following your description/justification of the reproduction, you should move into the reproduction itself. The first step should be to download the data and explain the process in sufficient detail so that others can easily and accurately reproduce your work.
- If you are downloading data from ICPSR, do it using reproducible R code with the icpsrdata package. With this approach, you should also add R code that automatically creates a unique data folder (e.g., “Project_Data”) for this project as a subfolder in a working directory created specifically for this project (e.g., “Project_work/Project_Data”) before downloading, then use icpsrdata package to download to your new folder. Recall, this process ensures that anyone who runs your RMD file on their own computer will automatically create the appropriate subfolder and then download the data to the correct folder as well. (If you do not remember how to do this, go back and review your earlier R Assignments.)
- If the data are not on ICPSR, you should download it manually and save it in a unique data subfolder (e.g., “Project_Data”) within a working directory created specifically for this project (e.g., “Project_work/Project_Data”). Then, you should describe the exact procedures that others will need to follow to be able to reproduce your work. E.g., you will need to describe where and how to download the data and then where to save it (e.g., in a specifically-titled subfolder within the working directory containing your RMD file). If you were actually going to publish this and share supplementary materials, you would likely share a “zip” file containing your project’s working directory with the RMD file in it as well as the datafile saved within the necessary pre-titled subfolder (and you will be doing exactly this for Project Assignment #3). Still, it is important to provide detailed and careful descriptions of these procedures to ensure that others can easily and accurately reproduce your work.

Again, the point here is not to treat this like an assignment; rather, you should treat it like you are drafting an article in R Markdown, complete with code and detailed descriptions throughout that you would also submit as supplementary materials to accompany the trimmed-down and polished published article.

Read the data into R and save it as an object.
Identify and describe all the variables (i.e., “key variables”) needed to reproduce the original article’s table(s) or figure(s).
Summarize the raw versions of the key variables or items (e.g., attributes/response labels; summary/descriptive statistics) using packages and commands that you learned in earlier assignments.
- There are various ways to explore your data and view response labels (e.g., recall attr(labels)?).
- For descriptive statistics, we recommend you use the sjmisc package, which you learned about in R Assignment 4. Recall, using that package and tidyverse, you can easily generate frequency and descriptive statistics tables (e.g., mydata %>% frq(myvar1) or mydata %>% descr(myvar1, myvar2)).
- Note: For this assignment, you are not required to recode all the variables (e.g., using mutate) if that is needed for reproduction or to create polished versions of descriptives tables (e.g., with gt package) or figures (e.g., with ggplot2 package), though you should feel free to do so if desired. You will be required to do those things in your completed “first draft” to submit for peer review in the next assignment (Project Assignment #3).

The final document should be well-organized using leveled subheadings (e.g., ## Top Heading, ### Subheading 1, etc.) with procedures thoroughly described in text (i.e., not in headings). Additionally, I recommend adding a table of contents (or perhaps a floating table of contents) as well, which you can do by modifying the YAML header at the top of your R Markdown file. Finally, you might wish to select your own theme to personalize the aesthetic look of your final knitted document, which you can also do by modifying the YAML header. You have not yet learned how to do this in R Markdown either, so I will give a brief and very basic tutorial on this as well below.

Upon completing the assignment, “knit” your final RMD file and save the final knitted HTML document to your “Assignments” folder in your LastName_P680_work folder as: LastName_P680_ProjAssign2_YEAR_MO_DY. - Inside the “LastName_P680_commit” folder in our shared folder, create another folder named: Project Assignment 2.
- To submit your assignment for grading, save copies of both your (1) “ProjAssign2” HTML file and (2) your “ProjAssign2_RMD file” into the LastName_P680_commit > Project Assignment 2 folder.

Brief Tutorial #1: Save & Load Simple Image in RMD

Getting a basic image into R Markdown is relatively easy. The steps involve: (1) creating an “Images” subfolder within your working directory (i.e., “work” folder); (2) getting the image you want; (3) editing the image and saving it in your new “Images” subfolder within your working directory; (4) writing a code chunk to load the image in R Markdown. We will briefly go through each of these steps below.

Before getting your image from the published article you selected, you will need a directory in which you can save the image and from which you can load the image in R Markdown. Once you have a subfolder within your working directory, you can easily load an image within this folder using an R code chunk in R Markdown; in fact, the process is quite similar to saving and loading a dataset from your “Datasets” subfolder, which you have done several times by now. However, rather than using your “Datasets” subfolder, we recommend instead creating a new directory (i.e., folder) entitled “Images” within your root “work” folder. Recall, if you opened R Studio directly from an RMD file in your working directory (i.e., in your “work” folder), then you can easily add an “Images” subfolder to your “work” folder with an R code chunk, using the R code below (Note: We used something called “code folding” here - more on that later; simply click the “Code” button to see it):

# check if "Images" folder exists (TRUE if it does) & create if it does not exist. 
ifelse(dir.exists(here("Images")), TRUE, dir.create(here("Images")))

## [1] TRUE

We assume you know how to take and crop a screenshot, or to copy-and-paste, the image(s) or table(s) that you wish to reproduce from the original published article that you have chosen.
As for editing the image, one easy way to do this is simply to paste the image into a new PowerPoint slide. From here, you can edit the image however you wish, then PowerPoint’s “save as picture” option should result in a saved image that retains your edits.

When editing, I (Jon) like to use subtle effects such as the “center shadow rectangle,” which gives some depth to your image without being overly distracting.

After editing the image, right-click and select “Save as Picture.” We strongly recommend naming the file without any spaces (e.g., “ProjAssign1-ppt-saveimage”) and selecting the JPEG format (“.jpg”).

Now that you have a JPEG image saved in your “Images” subfolder within your working directory, you are ready to write a code chunk to load the image in R Markdown.

Here is a screenshot of the R code that I used to load the image above:

R code chunk for including images in R Markdown

Note the “R code options line has some things that you may be unfamiliar with.
- First, the R code chunk is named - in this case, it is named “image-ppt-edit.” Up to this point, you have not been naming your R code chunks, and that is fine. However, as you become more familiar with and regularly use R Markdown, you may wish to start naming your code chunks. If you do, note that each code chunk must have a unique name or you will encounter those dreaded knitting errors! For more on naming R code chunks, see here, here, here, and here.
- Second, the chunk includes the text “out.width = ‘80%’, which instructs R to reduce the size of the image to 80% of its size during the knitting process. Use”out.width” and “out.height” to control the size of non-R figures and images; for figures generated by R, other options are recommended (e.g., to control their size and associated text scaling).
- Third, you may have noticed that this screenshot contains a caption, whereas the previous ones did not. I included the caption by putting a comma after the out.width command and then adding the following text to the R code chunk options: fig.cap = "R code chunk for including images in R Markdown"
The R code line itself starts by calling the “knitr” package (knitr::), then calling the include_graphics function from that package. This is followed by our familiar here() function to point to the “Images” subfolder within our working directory, followed by the name of the file (including its .jpg extension).

That’s it - now you should be able to load an image containing the table(s) or figure(s) you are reproducing into R Markdown!

Brief Tutorial #2: Edit YAML Header for TOC & Theme

In addition to meeting the various goals required to reproduce original research findings, you should also work on creating documents that are functionally organized, easily navigable, and aesthetically pleasing. Some of you have made solid strides toward this goal by routinely using leveled subheadings to organize your documents and by including detailed text descriptions containing numbered or bulleted lists and bold or italicized fonts. Note, though, that what you have and will learn in this class only scratches the surface of what you can do with R and R Markdown. For additional inspiration, I strongly recommend checking out Alison Hill’s excellent blog entry: “How I Teach R Markdown”.

In this section, we will highlight two simple modifications that you can make to the YAML header to improve the functionality and appearance of your final knitted document. Remember, the YAML header is the section of text that appears at the very top of your R Markdown file in between the two lines containing three dashes (“- - -”). When you create a new R Markdown file, the YAML header looks like this:

Default YAML header in new R Markdown file

We briefly discussed the YAML this in your early R Assignments. Additionally, when you switched the knitted file output from a Word document in R Assignment 1 to an HTML file in R Assignment 2, you may have noticed that the YAML header was changed in the process.

Here, we will modify the YAML header directly to change the appearance or Bootswatch theme and to add a table of contents to your knitted HTML file.

The easiest way to get started with editing YAML headers is to simply copy others that work. So, for example, check out the YAML header used for this project assignment in the screenshot below:

YAML header used for this Project Assignment 2 HTML document

There are several noteworthy things about this header in comparison to the default YAML header:

Unlike the default, this header contains multiple layers of indentation. For instance, “html_document:” is indented one level, and everything from “number_sections:” to “code_folding” is indented two levels.
- Indentation matters in YAML headers! If your document is not knitting the way you want or including things that you expected it to included, you may have messed up the indentation.
- See here for more detailed information on how to properly structure a YAML header.
If you want to automatically created numbered sections, change “number_sections: no” to “number_sections: yes”
This assignment uses the “readable” theme. The default theme is - you guessed it - named “default.” There are several built-in themes that you can choose from (see the link for previews), and many more themes that you can install and use with some additional effort.
The “toc: yes” line under “html_document” adds a table of contents to our knitted HTML document”
- The “toc_depth: 4” line specifies that the table of contents should apply to four (sub)heading levels (e.g., # to ####).
- The “toc_float: yes” line specifies that the table of contents should float (at the upper left) in the knitted HTML document. A floating table of contents is always visible when scrolling the document.
Recall, if you set your R code chunks to echo = TRUE, then your code will be shared in the knitted document. The line “code_folding: hide” will hide your shared code. In other words, your code will still be shared in the knitted HTML document but readers can choose whether to click a toggle button to see your code.

Now you should have all the basics that you need to create, organize, and slightly customize your own knitted HTML file for your reproduction project!

Final Comments on Reproducibility

While the above tutorial will get you through this class assignment, as you get more comfortable with R Markdown you should transition away from copying others’ working YAML headers and instead start writing your own as you see fit. The “ymlthis” package is a great resource for learning to create your own YAML code - and for doing so in a reproducible way within R code chunks.

Speaking of reproducibility and R packages, one major threat to reproducibility that we have not mentioned yet is R itself. Due to its open-source nature, packages are not tightly curated, and old (or even relatively new) code may not work as packages become updated and functions change in ways that break or unwittingly change your previous code commands. Well, there is an R package for this problem too! The “groundhog” package will archive the specific versions of all your various loaded libraries to ensure that you or others are using the exact same package versions so you can reproduce past work. So, if you are feeling especially ambitious, try your hand at using this program in your final project!