The purpose of this second assignment is to introduce you to working in RMarkdown. This will be the primary file format in which you will save and present your work for this class.
As noted previously, for this and all future assignments, you MUST type all commands in by hand. Do not copy & paste except for troubleshooting purposes (i.e., if you cannot figure out what you mistyped).
Early on, you may have a lot of trouble getting your code to run due to minor typos. This is normal.
Remember, you are learning to read and write a new (coding) language. As with learning any new languages, we learn from practice - and from correcting our mistakes.
(Note: Remember that, when following instructions, always substitute “LastName” for your own last name and substitute YEAR_MO_DY for the actual date. E.g., Day_CRM495_RAssign2_2022_01_21)
In the first assignment, you learned about writing and running R code
in an R Script file. You also saw how running certain commands (e.g.,
ggplot
) from an R Script file will generate results in the
RStudio Console. For Assignment 2, you will learn to create a new R
Markdown file, and you will complete the remainder of your assignment in
that file.
Like an R Script file, an R Markdown file can be used to write and run R code. However, an R Markdown file also can do much more than that. For instance, you can write and edit text, write and run R code, and generate statistical results and plots directly in the RMarkdown file. You can even create entire books and webpages using R Markdown. In fact, both this assignment and the first one were created using R Markdown.
R Markdown is an essential tool for producing reproducible research because, with it, we can thoroughly document and simultaneously provide detailed explanations for all of our coding decisions in a project - from opening and manipulating data, to recoding and combining variables, to summarizing and analyzing data, to creating and modifying figures.
We will start by simply opening and saving a new R Markdown file. For more detailed instructions, check out Danielle Navarro’s video series on using R Markdown.
The dialogue box asks for a Title, an Author, and a Default Output Format for your new R Markdown file.
Click OK
to create your new R Markdown file. It
should look like this:
The new R Markdown file contains a simple pre-populated template
to show users how to do basic tasks like add settings, create text
headings and text, insert R code chunks, and create plots. I recommend
you read through the template the first time you open a new RMarkdown
file as it contains useful information. However, it can also be a little
overwhelming for new users. So we are going to delete everything after
the metadata and second set of three dashes (i.e., after the YAML
header).
Familiarize yourself with R Markdown by adding some headers, text, and R code chunk.
<Enter>
to leave a blank line between the
header and the first line of text.<Enter>
and, on the next line (line 9), type:
### Learning R Markdown
<Enter>
to leave another blank line (line
10).Before typing anything else, save your new R Markdown file in your “LastName_CRM495_work” folder. Name the file: LastName_CRM495_RAssign2_RMD_YEAR_MO_DY
Your RStudio session should now look similar to this:
Ready for one of the best parts of R Markdown? No more pasting screenshots into Word! Instead, you are going to use the “knit” button at the top of your R Markdown file to automatically create a Word document capturing your current work.
In addition to meeting the various goals required to reproduce original research findings, you should also work on creating documents that are functionally organized, easily navigable, and aesthetically pleasing. We will make some solid strides toward this goal throughout the class. However, what we will cover will only scratch the surface of what you can do with R and R Markdown. For additional inspiration, I strongly recommend checking out Alison Hill’s excellent blog entry: “How I Teach R Markdown”.
In this section, we will highlight two simple modifications that you can make to the YAML header to improve the functionality and appearance of your final knitted document. Remember, the YAML header is the section of text that appears at the very top of your R Markdown file in between the two lines containing three dashes (“- - -”). When you create a new R Markdown file, the YAML header looks like this:
Here, we will modify the YAML header directly to change the appearance or Bootswatch theme, add a table of contents, and knit to different file formats.
The easiest way to get started with editing YAML headers is to simply copy others that work. So, for example, check out the YAML header used for this project assignment in the screenshot below:
There are several noteworthy things about this header in comparison to the default YAML header:
Unlike the default, this header contains multiple layers of indentation. For instance, “html_document:” is indented one level, and everything from “number_sections:” to “code_folding” is indented two levels. - Indentation matters in YAML headers! If your document is not knitting the way you want or including things that you expected it to included, you may have messed up the indentation. - See here for more detailed information on how to properly structure a YAML header.
If you want to automatically created numbered sections, change “number_sections: no” to “number_sections: yes”
This assignment uses the “readable” theme. The default theme is - you guessed it - named “default.” There are several built-in themes that you can choose from (see the link for previews), and many more themes that you can install and use with some additional effort.
The “toc: yes” line under “html_document” adds a table of contents to our knitted HTML document” - The “toc_depth: 4” line specifies that the table of contents should apply to four (sub)heading levels (e.g., # to ####). - The “toc_float: yes” line specifies that the table of contents should float (at the upper left) in the knitted HTML document. A floating table of contents is always visible when scrolling the document.
Recall, if you set your R code chunks to
echo = TRUE
, then your code will be shared in the knitted
document. The line “code_folding: hide” will hide
your shared code. In other words, your code will still be shared in
the knitted HTML document but readers can choose whether to click a
toggle button to see your code.
Now you should have all the basics that you need to create, organize, and slightly customize your own knitted HTML file for your reproduction project!
Now you have the basics of writing text in an RMarkdown document. But remember, one of the benefits of RMarkdown, is you can integrate your text with R code. You already know how to run code in an R Script. It is the same process in R Markdown, except you need to add an “R code chunk” into your file.
Create a second-level header in R Markdown (hereafter, “RMD”) file titled: “Load Libraries”
Insert an R chunk
Inside the new R code chunk, load the same packages that you did
in Assignment 1: library(tidyverse)
Recall, you only need to install packages one time. However, you must load them each time you start a new R session.
After your first code chunk, create another second-level header in RMD titled: “Create Data Object”
Insert another R code chunk
In the new R code chunk, assign create a data object called
arrest.data
using another built-in data set with R called
USArrests
.
arrests.data <- USArrests
arrests.data
is the name we
are giving to the data object we are creating.<-
command tells R to place whatever follows the
text arrow (in this case, the built-in dataset “USArrests”) into the
arrests.data
object.
mydata
). However, it is good coding practice
to be systematic in creating concise yet informative names (this is
somewhat redundant in this example as the built-in data set already had
an informative name).head(arrests.data)
.
head(arrests.data, 10)
). This can be useful
when you perform some operation on the data (e.g., create a new
variable) and want to get a quick sense of whether it worked how you
intended (e.g., actually created that new variable). If you wanted to
print the entire data set, you would simply type
print(arrests.data)
.Now that you have created an object, you can use and/or modify that
object in a variety of ways. For now, we’ll simply create a plot like we
did in the previous assignment. If you’ve followed along to this point,
you’ll know that the arrests.data
object has 50
observations (one for each state) and four variables (“Murder”,
“Assault”, “UrbanPop”, and “Rape”). For the crime types, the values
represent the rate of that particular crime per 100,000 in the
population. For the “UrbanPop” variable, the values represent the
percent of people in that state classified as living in an urban
location.
Create a second-level header titled: “Create Plot”.
Create plot of relationship between percent urban population and murder rate. - Create an R Code chunk and type the following code:
ggplot(data = arrests.data,
mapping = aes(x = UrbanPop, y = Murder)
) +
geom_point() +
geom_smooth(method = lm)
Under the plot, write a brief note regarding your substantive interpretation of the plot (e.g., what do the results suggest about the relationship between urbanicity and murder?).
Create another second-level header titled: “Create Another Plot”.
Create a plot of the relationship between urban population and Assault. - I’m going to let you figure this out on your own by following the logic of the Murder plot above.
Under this plot, write a brief note regarding your substantive interpretation of the plot.
Create another second-level header titled: “Conclusion”. - In the conclusion write a brief statement about what you learned in this assignment and any problems or issues you had in completing it.
Upon completing the tasks in the previous four sections, “knit” your final RMD file again and save the final knitted html document to your “Assignments” folder in your LastName_CRM495_work folder as: LastName_CRM495_RAssign2_YEAR_MO_DY.
Inside the “LastName_CRM495_commit” folder in our shared folder, create another folder named: Assignment 2.
To submit your assignment for grading, save copies of both your (1) “RAssign2” html file and (2) your “RAssign2_RMD file” into the LastName_CRM495_commit > Assignment 2 folder. Remember, be sure to save copies of both files - do not just drag the files over from your “work” folder, or you may lose those original copies from your “work” folder.
Finally, submit your knitted html document on Canvas in the “R Assignment 2” submission portal. This will allow me to have a time-stamped version of your assignment for grading purposes.