Assignment 1: Getting Started in R

Assignment 1 Objectives

The purpose of this first assignment is to demonstrate that you have downloaded the “base R” and “RStudio” statistical programs and can open a SPSS datafile in RStudio.
By the end of assignment #1, you should…

Assumptions & Ground Rules

Part 1 (Assignment 1.1)

Goal: Create new K300_L_LastName folder on your local computer, then download datasets and save to your new folder (i.e., “L” for “local” computer folder).

Part 2 (Assignment 1.2)

Goal: Login to a cloud storage platform and create a K300_C_LastName cloud-based folder (i.e., “C” for “cloud” folder), and backup files.

Part 3 (Assignment 1.3)

Goal: Download R & R Studio; Open R Markdown; Knit first RMD file

Part 4 (Assignment 1.4)

Goal: Insert R code chunk in R Markdown; Install “haven” (with “tidyverse”) and “here” packages then load in R

Part 5 (Assignment 1.5)

Goal: Save RMD; Use haven package to open SPSS file in RStudio

Part 6

Goal: Submitting your first assignment

Assignment 1 Objective Checks

Assignment 1 Objectives

The purpose of this first assignment is to demonstrate that you have downloaded the “base R” and “RStudio” statistical programs and can open a SPSS datafile in RStudio.

This document was created as an R Markdown file. You will learn about R Markdown files later in the assignment. First, I want to familiarize you with the RStudio interface.

By the end of assignment #1, you should…

have created folders on both your personal computer and in cloud storage
have 6 full datasets downloaded
know how to create R Markdown (RMD) document in RStudio
know how to add and modify text, including italic or bold font and level headings, in an R Markdown document
know how to add and use an R code chunk in an RMD file
know how to use install.packages() and library() commands to install and load packages in R
know about groundhog.library() as a reproducible alternative for loading packages (optional but recommended)
know how to add hashtags (“#”) to comment out a section of R code so it does not run
be able to use the “here” package and here() function for simple, reproducible file directory referencing
be able to read data into R/RStudio using read_spss() function from “haven” package, then assign it to an object in the R environment using an assignment (<-) operator
be able to knit your RMD file into a Word document that you can save and submit for course credit

Assumptions & Ground Rules

For all assignments in this class, you must have access to a computer and should also use a cloud storage platform at IU, such as Dropbox, OneDrive, or Google Drive. Thus, for this and all future assignments, I will assume you are working on your own computer and are downloading base R and RStudio (free open source) programs on your own computer. If you experience technical difficulties, remember that you can contact UITS for help.

If you do not have access to your own computer or cannot download R/RStudio, you may be able to request limited access to a desktop computer with these programs pre-installed at IUB’s Social Science Research Center (SSRC). Whether you download R or access it elsewhere (e.g., via SSRC), the remainder of the steps must be completed on all computers for grading purposes. Also, for this and all future assignments, you MUST type all commands in by hand. Do not copy & paste except for troubleshooting purposes (i.e., if you cannot figure out what you mistyped).

Early on, you may have a lot of trouble getting your code to run due to minor typos. This is normal. Remember, you are learning to read and write a new (coding) language. As with learning any new languages, we learn from practice - and from correcting our mistakes.

Part 1 (Assignment 1.1)

Goal: Create new K300_L_LastName folder on your local computer, then download datasets and save to your new folder (i.e., “L” for “local” computer folder).

(Note: When following instructions, always substitute “LastName” for your own last name! Also, substitute YEAR_MO_DY for the actual date. E.g., 2022_05_17_Fordham_K300Assign1_1)

On your computer, create a new folder called “K300_L_LastName” in a location that is easy to access (e.g., on Desktop or in “My Documents”).
Create two new folders in your K300_L folder: a “Datasets” folder (K300_L_LastName > Datasets) and an “Assignments” folder (K300_L_LastName > Assignments)
Visit the the Companion Website for Bachman, Paternoster, and Wilson’s Statistics for Criminology & Criminal Justice, 5th Ed (specifically, under the heading: R Data Sets, Full Versions) for the datasets.
1. On the left sidebar, click “Datasets”
2. Under the header Data Sets for SPSS Full Version, you will see six datasets:
  * Monitoring the Future 2013 grade 10_0.sav
  * 1992-2013 NCVS Lone Offender Assaults.sav
  * YouthDataset.sav
  * 2012 States Data.sav
  * 2013 YRBS.sav
  * GSS 2014.sav
Download each dataset and save them all in the new “K300_L_LastName > Datasets” folder you made in Part 2.
Take a screenshot (#1) of your new K300_L_LastName > Datasets folder with all six datasets downloaded in it. PC (Option 1): ctrl + prt sc then ctrl + P into a Word document. PC (Option 2): In the search bar type in “snipping tool” and use the tool to take a snapshot of your Datasets folder. *Mac: Command + shift + 3; this saves on your desktop.
Save the Word document with your screenshot to your “K300_L_LastName > Assignments” folder. Name the file: YEAR_MO_DY_LastName_K300Assign1_1

Part 2 (Assignment 1.2)

Goal: Login to a cloud storage platform and create a K300_C_LastName cloud-based folder (i.e., “C” for “cloud” folder), and backup files.

This step ensures that you have an official IU cloud-based folder where you can save and back up all of your datasets and files for K300 assignments. Though not essential (i.e., you can complete the assignments without a cloud storage folder), by saving and backing up using cloud storage, you will be able to access your saved files from any computer and you will still have access to your saved files if your computer were to unfortunately stop working.

Choose and login to a cloud storage platform, such as Dropbox, OneDrive, or Google Drive.
Create a new folder called K300_C_LastName
Create a new Datasets folder in your K300_C folder (K300_C_LastName > Datasets)
Backup your datasets and assignment work Copy and paste the 6 datasets from your K300_L folder on your local computer into your “Datasets” folder on your cloud storage platform (K300_C_LastName > Datasets). You should also create a new “Assignments” folder in K300_C folder to save a copy of “Assign1_1” screenshot.
Take a screenshot (#2) of your cloud storage “Datasets” folder (K300_C_LastName > Datasets) with all 6 datasets in it. Save in a Word document to your “Assignments” folder; name the file: YEAR_MO_DY_LastName_K300Assign1_2

If you would like an easier way of backing up files, you can learn how to sync your cloud storage folders directly to your computer. With this method, each time you save your R Markdown and Word files, they will automatically update in your cloud storage folders. In other words, you would be able to navigate to and use the cloud storage folders that you just created in the same way that you navigate to and use any other folder on your computer (e.g., via file explorer). If you are interested in using this method, the instructions for doing so through Dropbox are here.

Note: From this point forward, I will assume that you are backing up all folders and files you create on your local machine (K300_L_LastName) to your cloud storage folder (K300_C_LastName) as well. Doing so will save you immense trouble if something were to happen unexpectedly to the files and folders on your local machine.

Part 3 (Assignment 1.3)

Goal: Download R & R Studio; Open R Markdown; Knit first RMD file

In this section, you will begin by downloading and installing two programs on your computer: base R and RStudio.The first program, R, is simultaneously a computer coding language and a statistical software program. The second, RStudio, is an integrated development environment (IDE) that provides a more user-friendly interface for working with the R program. Throughout this course, you will learn to write and submit R code in RStudio to run statistical commands in the R program. After installing R & RStudio, you will run some simple commands to familiarize yourself with the basic features of the program and install two R packages.

Follow the instructions at the link below to install the latest versions of R and RStudio on your personal Windows or Mac computer: https://www.datacamp.com/community/tutorials/installing-R-windows-mac-ubuntu. For more detailed instructions, check out Danielle Navarro’s videos for installing R and R Studio in Windows or on a Mac.
Visit Antoine Soetewey’s blog (AS blog) entry at the link below, read the section titled “Main Components of RStudio,” and follow along in RStudio on your computer: https://towardsdatascience.com/how-to-install-r-and-rstudio-584eeefb1a41
In RStudio, open an “R Markdown” file (File > New File > R Markdown…).
1. Note: An R Script, which is the default file in RStudio, allows us to write and run code within R. However, an R Markdown file does this as well, while also permitting us to do so much more. For instance, you can write and edit text, write and run R code, and generate statistical results and plots directly in the RMarkdown file. You can even create entire books and webpages using R Markdown. In fact, this assignment was created using R Markdown!
2. R Markdown is an essential tool for producing reproducible research because, with it, we can thoroughly document and simultaneously provide detailed explanations for all of our coding decisions in a project - from opening and manipulating data, to recoding and combining variables, to summarizing and analyzing data, to creating and modifying figures.
3. We will start by simply opening and saving a new R Markdown file. For more detailed instructions, check out Danielle Navarro’s video on creating a new R Markdown file.
4. Open a new R Markdown file using File > New File > R Markdown…

Opening a new R Markdown file

The dialogue box asks for a Title, an Author, and a Default Output Format for your new R Markdown file.
1. In the Title box, enter K300 Assignment 1.
2. In the Author box, enter your First and Last Name (e.g., Tyeisha Fordham).
3. Under Default Output Format box, select “Word document” (HTML is usually the default selection)
  - (Note:* You must have Microsoft Word installed for this to work properly. IU students can install Word for free.)
4. Click OK to create your new R Markdown file. It should look like this:

Your new R Markdown file

The new R Markdown file contains a simple pre-populated template to show users how to do basic tasks like add settings, create text headings and text, insert R code chunks, and create plots. Feel free to read through the template - you may find it helpful. Personally, I find the template a little distracting and a bit overwhelming for new users. So, we are going to delete everything after the metadata and second set of three dashes (i.e., after the YAML header). Your R Markdown file should look like this:

Keep the YAML header; delete the template

Familiarize yourself with R Markdown by adding some headers, text, and R code chunk.
1. Hit <Enter> on your keyboard to leave a blank line between the header and the first line of text.
2. On the next line (line 8), type: ## Part 1 (Assignment 2.1)
  - In the markdown document, two hashmarks specifies a second-level text heading.
  - Note: This is different from a R Script file, which only contains R code. A hashmark transforms code into a comment that is not evaluated or run in an R Script file (and in an R code chunk in R Markdown, which you will learn about below).
3. Hit <Enter> and, on the next line (line 9), type: ### Learning R Markdown
  - Three hashmarks specifies a third-level text heading.
4. Hit <Enter> to leave another blank line (line 10)
5. On the next line (line 11), type the following sentence: This R Markdown document contains my work for Assignment 2. It is my work and only my work.
  - To italicize, place a single asterisk (*) before and after the word or text segment.
  - To bold, place two asterisks (**) before and after the word or text segment.
  - To bold and italicize, place three asterisks (***) before and after the word or text segment.
  - There are a lot of sources online that explain various formatting options in R Markdown. For examples, check out here, here, here, here, and here. Also, check out Nicholas Tierney’s bookdown for descriptions of and solutions to common problems with RMarkdown.
Before typing anything else, save your new R Markdown file as: YEAR_MO_DY_LastName_K300Assign1_3RMD. Your RStudio session should now look similar to this:

Your first R Markdown file

Ready for one of the best parts of R Markdown? You can use the “Knit” button at the top of your R Markdown file to automatically create a Word document capturing your current work.
1. Click the knit button (Note: You can “Knit” your R Markdown document anytime. This can be helpful when you are getting used to working with R Markdown, as it allows you to continuously review the current state of your text and code. Just be sure to Comment out any code chunks that are unfinished or incorrect, otherwise the document will not “Knit”. You will learn about Commenting later in the assignment.)
2. A Word document should pop up that looks a lot like this:

Your first knitted html document using R Markdown

Now, in R Markdown, you are ready to insert a code chunk and begin reading in and assigning your data to an object in R!

Part 4 (Assignment 1.4)

Goal: Insert R code chunk in R Markdown; Install “haven” (with “tidyverse”) and “here” packages then load in R

Create a second-level header in R Markdown (hereafter, “RMD”) file titled: “Load Libraries”
Insert an R code chunk
1. Click “Code > Insert chunk” or click the “Insert code chunk” button (green box with a “C” in it) and select “R” option (see below).
2. Type install.packages("haven") into the code chunk and hit the right-pointing green arrow on the right side of the chunk. You can also highlight the text with your cursor, hit RUN, or hit (Windows: CTRL + Enter; Mac: cmd + Enter). The AS blog referenced above gives brief directions on installing packages and operating RMD. For more detailed instructions on installing packages, see Danielle Navarro’s video on the topic.
  
  Insert R code chunk

Insert R code chunk

Once installed, load the haven package by typing, selecting, and running the command library("haven") in your RMD file.
- For more information about the haven package, see:
- https://cran.r-project.org/web/packages/haven/readme/README.html
- (Note:The “haven” package is part of a much larger suite of packages known as the “tidyverse.” This means you could optionally install the “tidyverse” package right now instead, since as haven would be installed with it at the same time. We will use various features of “tidyverse” in this class, so if you do not install it now, you will need to do so later (e.g., for Assignment 2). For now, we need the “haven” package because it allows us to easily open SPSS datafiles that you downloaded for Part 1 in RStudio.)
Now, repeat the process above to install the here package.
- Type install.packages("here") into your RMD file, hit RUN or select (highlight) this text line, and RUN the selection (Windows: CTRL + Enter; Mac: cmd + Enter).
Once installed, load the here package by typing, selecting, and running library("here") into your RMD file.
- The here package will help you start a replicable project-oriented workflow from the beginning. Here is how it will work once installed:
  - You save your primary RMarkdown code file in your top-level directory folder. For us, that means saving your RMD file in your K300_L_LastName folder.
  - Next, after closing RStudio, you will simply click directly on the RMD file in your K300_L_LastName folder to automatically open it with RStudio. When you do this, your “working directory,” i.e., the place R looks for files by default, will automatically be set to your K300_L folder.
  - The here package will then make it easy to find and call objects (e.g., SPSS dataset) in subfolders of your working directory (e.g., in “Datasets” folder).
Check to see if the haven and here packages loaded properly.
1. First, find the “Packages” tab (see “blue pane” in AS blog).
2. Next, scroll down through the listed packages until you find the haven and here package entries.
3. If there are checkmarks next to both entries, it worked!
  
  A checkmark means the R package is successfully loaded
[Optional/Recommended] R is an ever-evolving open source program with user-written packages that are frequently updated in ways that are not always backwards compatible. This poses a major problem for reproducibility, as your R script that works today may not work for you or for others who attempt to run it in the future. There are many ways to address this issue; check out here, here, and here for some ideas involving project workflows using renv or docker.
1. One relatively simple quick-fix solution is to specifically load packages from the Comprehensive R Archive Network (CRAN) package database associated with a particular date that should work (e.g., the day the script was initially written). The “groundhog” package makes this easy to do. Check out this “data colada” blog entry for a more detailed description of the groundhog package. b To get started, first install the “groundhog” package, load the package as normal with library(groundhog), then simply replace all other library() commands with groundhog.library(). You can load a package library from a specific date with the format: groundhog(package, date).
  - To make this even easier, you can assign the desired date to an object. Following the cheeky datacolada example, we will assign our date (“2022.08.30” - should specify at least two days before the current date) to an object named “groundhog.day” by typing groundhog.day="2022-08-30" in an R chunk. Then, to load the version of the “here” package associated with our specified date, we would simply type groundhog.library(here, groundhog.day).
  - Warning: Following these instructions may result in knitting errors. I have found that this is often due to conflicts between groundhog’s default folder and your self-referential here() working directory. To fix this, I recommend first saving your RMD file, re-opening it from your working directory (i.e., by clicking on it), then loading the “here” and groundhog packages using the classic library() command. After that, type the following into your code chunk: set.groundhog.folder(here()) to change the default groundhog folder to be the same as your working directory specified by the here() command. You can see how I did this for the current assignment in the image below.
2. Again, using groundhog.library() is optional but recommended practice for this course. Feel free to try using groundhog.library() yourself in place of your library commands for this assignment and all remaining assignments!
  
  Loading groundhog and setting its folder to here()

Part 5 (Assignment 1.5)

Goal: Save RMD; Use haven package to open SPSS file in RStudio

In RStudio, click File > Save As, then save RMD file in K300_L_LastName folder. You should have already saved and named the file: YEAR_MO_DY_LastName_K300Assign1_3RMD

Save R Script & Markdown files in top-level folder
1. It is essential that your RMD files are saved in the top-level K300_L_LastName folder for the here package to work properly.
2. As in the picture above, your RMD should be in the same K300 folder as the “Datasets” folder (and “Assignments,” if you created one for screenshots and assignment files as recommended).
After saving your RMD file, close RStudio, then reopen directly from RMD.
1. Open K300_L folder, locate new RMD file (YEAR_MO_DY_LastName_K300Assign1_3RMD), and click or double-click to open with RStudio.
2. By opening RStudio directly from the RMD file saved in your K300_L folder, the here package by default will set the K300_L folder as the top-level working directory
Add #comments to R code and load haven and here packages
1. In your RMD, place hashmarks (#) in front of both install.packages commands.
  - The haven and here packages should already be installed, so you should not need to run these lines again.
  - Placing a hashmark in front of a command – or in front of any text in R code – will create a “comment” that will not be run as a command in R.
2. After commenting out the #install.packages commands, run both library commands to load the haven and here packages.
  - Though already installed, you will need to reload all packages that you want to use at the beginning of each R session.
  - Tip: Since you want to run all commands in your RMD that are not commented out (i.e., both library commands), try running the entire document instead of only a single code chunk.
3. Check the “Packages” tab to make sure there are checkmarks by the haven and here packages.
4. Your RMD and RStudio session should now look similar to this:
  
  RStudio session with comments & loaded packages
5. Note the red text in the Console that appeared after loading the here package. It confirms the here package has set my K300_L_LastName folder (i.e., for me, K300_L_Fordham) as the top-level working directory.
Open SPSS dataset in RStudio using haven package and assign data to an object in R
1. Start a second-level heading titled: “Read in and Assign Data to an Object.” Then, begin a new code chunk and type the following command in your RMD file: read_spss(here("Datasets","MonitoringtheFuture2013grade10_0.sav"))
  - read_spss() is a haven package command that allows R to read SPSS datafiles
  - Inside the read_spss() parentheses, we need to specify the file location and the filename we wish to open, separated by a comma.
  - Instead of specifying the exact path location, we specify here(“Datasets”, which uses the here package to locate our “Datasets” folder. By using the here package, your R code should work for anyone on any computer as long as they have the exact same file structure (e.g., you can work from local or cloud storage, or work from different computers)
  - Next, our code specifies the ’MonitoringtheFuture2013grade10_0.sav’ file in the “Datasets” folder to open.
2. Run the read_spss() command. You should see a small snapshot of the data in your R viewer (right-hand side of RStudio,) which looks something like this:
  
  read_spss() command shows snapshot of data in R Console
3. You just read a dataset into RStudio!
4. However, you do not want the dataset in the R Console – you want to be able to call the dataset as an object on which you can do things, such as summarize or correlate variables (later). To do this, you need to assign the dataset to an object in the R Environment. Note that your R Environment tab is still empty and looks like this:
  
  Empty Environment after reading in SPSS data and before creating a data object
5. To assign the dataset to an object in R, type MF2013g10data <- before the read_spss() command.
  - The text MF2013g10data is the name we are giving to the data object we are creating.
  - The <- command tells R to place whatever follows the text arrow (in this case, the dataset read by the read_spss() command) into our MF2013g10data object. Note: We can name the object whatever we want (e.g., mydata). However, it is good coding practice to be systematic in creating concise yet informative names. The name I selected reminds me that the object is the Monitoring the Future 2013 grade 10 data.
    - Also, R code is case-specific, so be careful and consistent with using upper- and lower-case letters!
    - By this point, you may be wondering why I care so much about how you name your folders and files. The short answer is that thoughtful and systematic naming conventions can save you a lot of time and can be very helpful to your future self or to others attempting to reproduce your work. In contrast, ad hoc names can cause a great deal of unnecessary frustration. I recommend Danielle Navarro’s videos on Names machines like and Names humans like for more information and useful file naming tips.
6. Your RMD code should look like this:
  
  Example RMD for reading SPSS Data into R Object
7. Run the line of code in your RMD to create the new data object.

Part 6

Goal: Submitting your first assignment

After completing the first five parts of this assignment, hit ‘Knit’ in the top left (or press ‘Save’ if you selected ‘Knit on Save’). A Word document should pop up with all of your work. It should look like this:

Final Knitted Document for Assignment 1
You should also have Word documents in your “Assignments” folder, each containing one of your two (2) screenshots (i.e., local folder with 6 datafiles; cloud folder with 6 datafiles).
Put your full name, date, and assignment number (e.g., “Assignment 1.1”) at the top of each Word document, then make sure each document is saved correctly (e.g., YEAR_MO_DY_LastName_K300Assign1_1 is titled Assignment 1.1 and contains screenshot #1).
Log into Course Canvas page, then go to “Module 1” and open “Assignment 1.” Follow directions to upload and submit your documents for grading.

Assignment 1 Objective Checks

After completing assignment #1, …

have you created folders on both your personal computer and in cloud storage?
did you download 6 full datasets, then save them in your local & cloud storage folders?
do you know how to create a new R Markdown (RMD) document in RStudio?
do you know how to add and modify text, including italic or bold font and level headings, in an R Markdown document?
do you know how to add and use an R code chunk in an RMD file?
do you know how to use install.packages() and library() commands to install and load packages in R?
do you know about groundhog.library() as an optional but recommended reproducible alternative for loading packages?
do you know how to add hashtags (“#”) to comment out a section of R code so it does not run?
are you able to use the “here” package and here() function for simple, reproducible file directory referencing?
are you able to read data into R/RStudio using read_spss() function from “haven” package, then assign it to an object in the R environment using an assignment (<-) operator?
are you able to knit your RMD file into a Word document that you can save and submit for course credit?