Assignment 1 Objectives
The
purpose of this first assignment is to demonstrate that you have
downloaded the “base R” and “RStudio” statistical programs and can open
a SPSS datafile in RStudio.
This document was created as an R Markdown file. You will learn about
R Markdown files later in the assignment. First, I want to familiarize
you with the RStudio interface.
By the end of assignment #1, you should…
- have created folders on both your personal computer and in cloud
storage
- have 6 full datasets downloaded
- know how to create R Markdown (RMD) document in RStudio
- know how to add and modify text, including italic or
bold font and level headings, in an R Markdown
document
- know how to add and use an R code chunk in an RMD file
- know how to use
install.packages()
and
library()
commands to install and load packages in R
- know about
groundhog.library()
as a reproducible
alternative for loading packages (optional but recommended)
- know how to add hashtags (“#”) to comment out a section of R code so
it does not run
- be able to use the “here” package and
here()
function
for simple, reproducible file directory referencing
- be able to read data into R/RStudio using
read_spss()
function from “haven” package, then assign it to an object in the R
environment using an assignment (<-
) operator
- be able to knit your RMD file into a Word document that you can save
and submit for course credit
Assumptions & Ground Rules
For all assignments in this class, you must have access to a computer
and should also use a cloud
storage platform at IU, such as Dropbox, OneDrive, or Google Drive.
Thus, for this and all future assignments, I will assume you are working
on your own computer and are downloading base R and RStudio (free open
source) programs on your own computer. If you experience technical
difficulties, remember that you can contact UITS for help.
If you do not have access to your own computer or cannot download
R/RStudio, you may be able to request limited access to a desktop
computer with these programs pre-installed at IUB’s Social Science
Research Center (SSRC). Whether you download R or access it
elsewhere (e.g., via SSRC), the remainder of the steps must be completed
on all computers for grading purposes. Also, for this and all future
assignments, you MUST type all commands in by hand. Do not copy &
paste except for troubleshooting purposes (i.e., if you cannot figure
out what you mistyped).
Early on, you may have a lot of trouble getting your code to run due
to minor typos. This is normal. Remember, you are learning to read and
write a new (coding) language. As with learning any new languages, we
learn from practice - and from correcting our mistakes.
Part 1 (Assignment 1.1)
Goal:
Create new K300_L_LastName folder on your local computer, then download
datasets and save to your new folder (i.e., “L” for “local” computer
folder).
(Note: When following
instructions, always substitute “LastName” for your own last name! Also,
substitute YEAR_MO_DY for the actual date. E.g.,
2022_05_17_Fordham_K300Assign1_1)
- On your computer, create a new folder called “K300_L_LastName” in a
location that is easy to access (e.g., on Desktop or in “My
Documents”).
- Create two new folders in your K300_L folder: a “Datasets” folder
(K300_L_LastName > Datasets) and an “Assignments” folder
(K300_L_LastName > Assignments)
- Visit the the Companion Website for
Bachman, Paternoster, and Wilson’s Statistics for Criminology &
Criminal Justice, 5th Ed (specifically, under the heading:
R Data Sets, Full Versions) for the datasets.
- On the left sidebar, click “Datasets”
- Under the header Data Sets for SPSS Full Version,
you will see six datasets:
* Monitoring the Future 2013 grade 10_0.sav
* 1992-2013 NCVS Lone Offender Assaults.sav
* YouthDataset.sav
* 2012 States Data.sav
* 2013 YRBS.sav
* GSS 2014.sav
- Download each dataset and save them all in the new “K300_L_LastName
> Datasets” folder you made in Part 2.
- Take a screenshot (#1) of your new K300_L_LastName
> Datasets folder with all six datasets downloaded in it. PC
(Option 1): ctrl + prt sc then ctrl + P into a Word document. PC
(Option 2): In the search bar type in “snipping tool” and use the tool
to take a snapshot of your Datasets folder. *Mac: Command + shift + 3;
this saves on your desktop.
- Save the Word document with your screenshot to your “K300_L_LastName
> Assignments” folder. Name the file:
YEAR_MO_DY_LastName_K300Assign1_1
Part 2 (Assignment 1.2)
This step ensures that you have an official IU cloud-based folder
where you can save and back up all of your datasets and files for K300
assignments. Though not essential (i.e., you can complete the
assignments without a cloud storage folder), by saving and backing up
using cloud storage, you will be able to access your saved files from
any computer and you will still have access to your saved files if your
computer were to unfortunately stop working.
- Choose and login to a cloud storage platform, such as Dropbox,
OneDrive, or Google Drive.
- Create a new folder called K300_C_LastName
- Create a new Datasets folder in your K300_C folder
(K300_C_LastName > Datasets)
- Backup your datasets and assignment work Copy and paste the 6
datasets from your K300_L folder on your local computer into your
“Datasets” folder on your cloud storage platform (K300_C_LastName >
Datasets). You should also create a new “Assignments” folder in
K300_C folder to save a copy of “Assign1_1” screenshot.
- Take a screenshot (#2) of your cloud storage
“Datasets” folder (K300_C_LastName > Datasets) with all 6 datasets in
it. Save in a Word document to your “Assignments” folder; name the file:
YEAR_MO_DY_LastName_K300Assign1_2
If you would like an easier way of backing up files, you can learn
how to sync your cloud storage folders directly to your computer. With
this method, each time you save your R Markdown and Word files, they
will automatically update in your cloud storage folders. In other words,
you would be able to navigate to and use the cloud storage folders that
you just created in the same way that you navigate to and use any other
folder on your computer (e.g., via file explorer). If you are interested
in using this method, the instructions for doing so through Dropbox are
here.
Note: From this point forward, I will
assume that you are backing up all folders and files you create on your
local machine (K300_L_LastName) to your cloud storage folder
(K300_C_LastName) as well. Doing so will save you immense trouble if
something were to happen unexpectedly to the files and folders on your
local machine.
Part 3 (Assignment 1.3)
Goal:
Download R & R Studio; Open R Markdown; Knit first RMD file
In this section, you will begin by downloading and installing two
programs on your computer: base R and RStudio.The first program, R, is
simultaneously a computer coding language and a statistical software
program. The second, RStudio, is an integrated development environment
(IDE) that provides a more user-friendly interface for working with the
R program. Throughout this course, you will learn to write and submit R
code in RStudio to run statistical commands in the R program. After
installing R & RStudio, you will run some simple commands to
familiarize yourself with the basic features of the program and install
two R packages.
- Follow the instructions at the link below to install the latest
versions of R and RStudio on your personal Windows or Mac computer: https://www.datacamp.com/community/tutorials/installing-R-windows-mac-ubuntu.
For more detailed instructions, check out Danielle Navarro’s videos for
installing R and R Studio in Windows or on a
Mac.
- Visit Antoine Soetewey’s blog (AS blog) entry at the link
below, read the section titled “Main Components of RStudio,” and follow
along in RStudio on your computer: https://towardsdatascience.com/how-to-install-r-and-rstudio-584eeefb1a41
- In RStudio, open an “R Markdown” file (File > New File > R
Markdown…).
- Note: An R Script, which is the
default file in RStudio, allows us to write and run code within R.
However, an R Markdown file does this as well, while also permitting us
to do so much more. For instance, you can write and edit text, write and
run R code, and generate statistical results and plots directly in the
RMarkdown file. You can even create entire books and webpages using R
Markdown. In fact, this assignment was created using R
Markdown!
- R Markdown is an essential tool for producing reproducible research
because, with it, we can thoroughly document and simultaneously provide
detailed explanations for all of our coding decisions in a project -
from opening and manipulating data, to recoding and combining variables,
to summarizing and analyzing data, to creating and modifying
figures.
- We will start by simply opening and saving a new R Markdown file.
For more detailed instructions, check out Danielle Navarro’s video on creating
a new R Markdown file.
- Open a new R Markdown file using File > New File > R
Markdown…
- The dialogue box asks for a Title, an
Author, and a Default Output Format
for your new R Markdown file.
- In the Title box, enter K300 Assignment
1.
- In the Author box, enter your First and Last Name
(e.g., Tyeisha Fordham).
- Under Default Output Format box, select “Word
document” (HTML is usually the default selection)
- (Note:* You must have Microsoft Word
installed for this to work properly. IU students can install Word for free.)
- Click
OK
to create your new R Markdown file. It should
look like this:
- The new R Markdown file contains a simple pre-populated template to
show users how to do basic tasks like add settings, create text headings
and text, insert R code chunks, and create plots. Feel free to read
through the template - you may find it helpful. Personally, I find the
template a little distracting and a bit overwhelming for new users. So,
we are going to delete everything after the metadata and second set of
three dashes (i.e., after the YAML header). Your R Markdown file should
look like this:
- Familiarize yourself with R Markdown by adding some headers, text,
and R code chunk.
- Hit
<Enter>
on your keyboard to leave a blank
line between the header and the first line of text.
- On the next line (line 8), type: ## Part 1 (Assignment 2.1)
- In the markdown document, two hashmarks specifies a second-level
text heading.
- Note: This is different from a R
Script file, which only contains R code. A hashmark transforms code into
a comment that is not evaluated or run in an R Script file (and in an R
code chunk in R Markdown, which you will learn about below).
- Hit
<Enter>
and, on the next line (line 9), type:
### Learning R Markdown
- Three hashmarks specifies a third-level text heading.
- Hit
<Enter>
to leave another blank line (line
10)
- On the next line (line 11), type the following sentence: This R
Markdown document contains my work for Assignment 2. It is
my work and only my work.
- To italicize, place a single asterisk (*) before and after
the word or text segment.
- To bold, place two asterisks (**) before and after
the word or text segment.
- To bold and italicize, place three
asterisks (***) before and after the word or text segment.
- There are a lot of sources online that explain various formatting
options in R Markdown. For examples, check out here, here, here,
here,
and here.
Also, check out Nicholas Tierney’s bookdown for descriptions of and
solutions to common
problems with RMarkdown.
- Before typing anything else, save your new R Markdown file as:
YEAR_MO_DY_LastName_K300Assign1_3RMD. Your RStudio
session should now look similar to this:
- Ready for one of the best parts of R Markdown? You can use the
“Knit” button at the top of your R Markdown file to automatically create
a Word document capturing your current work.
- Click the
knit
button (Note:
You can “Knit” your R Markdown document anytime. This can be helpful
when you are getting used to working with R Markdown, as it allows you
to continuously review the current state of your text and code. Just be
sure to Comment
out any code chunks that are unfinished or
incorrect, otherwise the document will not “Knit”. You will learn about
Commenting
later in the assignment.)
- A Word document should pop up that looks a lot like this:
- Now, in R Markdown, you are ready to insert a code chunk and begin
reading in and assigning your data to an object in R!
Part 4 (Assignment 1.4)
Goal:
Insert R code chunk in R Markdown; Install “haven” (with “tidyverse”)
and “here” packages then load in R
- Create a second-level header in R Markdown (hereafter, “RMD”) file
titled: “Load Libraries”
- Insert an R code chunk
- Click “Code > Insert chunk” or click the “Insert code chunk”
button (green box with a “C” in it) and select “R” option (see
below).
- Type
install.packages("haven")
into the code chunk and
hit the right-pointing green arrow on the right side of the chunk. You
can also highlight the text with your cursor, hit RUN, or hit (Windows:
CTRL + Enter; Mac: cmd + Enter). The AS
blog referenced above gives brief directions on installing
packages and operating RMD. For more detailed instructions on installing
packages, see Danielle Navarro’s video
on the topic.
- Once installed, load the
haven
package by typing,
selecting, and running the command library("haven")
in your
RMD file.
- For more information about the haven package, see:
- https://cran.r-project.org/web/packages/haven/readme/README.html
- (Note:The “haven” package is
part of a much larger suite of packages known as the “tidyverse.” This
means you could optionally install the “tidyverse” package right now
instead, since as haven would be installed with it at the same time. We
will use various features of “tidyverse” in this class, so if you do not
install it now, you will need to do so later (e.g., for Assignment 2).
For now, we need the “haven” package because it allows us to easily open
SPSS datafiles that you downloaded for Part 1 in RStudio.)
- Now, repeat the process above to install the
here
package.
- Type
install.packages("here")
into your RMD file, hit
RUN or select (highlight) this text line, and RUN the selection
(Windows: CTRL + Enter; Mac: cmd + Enter).
- Once installed, load the
here
package by typing,
selecting, and running library("here")
into your RMD file.
- The
here
package will help you start a replicable
project-oriented workflow from the beginning. Here is how it will work
once installed:
- You save your primary RMarkdown code file in your top-level
directory folder. For us, that means saving your RMD file in your
K300_L_LastName folder.
- Next, after closing RStudio, you will simply click directly on the
RMD file in your K300_L_LastName folder to automatically open it with
RStudio. When you do this, your “working directory,” i.e., the place R
looks for files by default, will automatically be set to your K300_L
folder.
- The
here
package will then make it easy to find and
call objects (e.g., SPSS dataset) in subfolders of your working
directory (e.g., in “Datasets” folder).
- Check to see if the
haven
and here
packages loaded properly.
- First, find the “Packages” tab (see “blue pane” in AS
blog).
- Next, scroll down through the listed packages until you find the
haven and here package entries.
- If there are checkmarks next to both entries, it worked!
- [Optional/Recommended] R is an ever-evolving open source program
with user-written packages that are frequently updated in ways that are
not always backwards compatible. This poses a major problem for
reproducibility, as your R script that works today may not work for you
or for others who attempt to run it in the future. There are many ways
to address this issue; check out here, here, and here
for some ideas involving project workflows using
renv
or
docker
.
- One relatively simple quick-fix solution is to specifically load
packages from the Comprehensive R Archive Network (CRAN) package
database associated with a particular date that should work (e.g., the
day the script was initially written). The “groundhog” package makes
this easy to do. Check out this
“data colada” blog entry for a more detailed description of the
groundhog package. b To get started, first install the “groundhog”
package, load the package as normal with
library(groundhog)
, then simply replace all other
library()
commands with groundhog.library()
.
You can load a package library from a specific date with the format:
groundhog(package, date)
.
- To make this even easier, you can assign the desired date to an
object. Following the cheeky datacolada example, we will assign our date
(“2022.08.30” - should specify at least two days before the current
date) to an object named “groundhog.day” by typing
groundhog.day="2022-08-30"
in an R chunk. Then, to load the
version of the “here” package associated with our specified date, we
would simply type
groundhog.library(here, groundhog.day)
.
- Warning: Following these instructions may result in knitting errors.
I have found that this is often due to conflicts between groundhog’s
default folder and your self-referential
here()
working
directory. To fix this, I recommend first saving your RMD file,
re-opening it from your working directory (i.e., by clicking on it),
then loading the “here” and groundhog packages using the classic
library()
command. After that, type the following into your
code chunk: set.groundhog.folder(here())
to change the
default groundhog folder to be the same as your working directory
specified by the here()
command. You can see how I did this
for the current assignment in the image below.
- Again, using
groundhog.library()
is optional but
recommended practice for this course. Feel free to try using
groundhog.library()
yourself in place of your
library
commands for this assignment and all remaining
assignments!
Part 5 (Assignment 1.5)
Goal:
Save RMD; Use haven package to open SPSS file in RStudio
- In RStudio, click File > Save As, then save RMD file in
K300_L_LastName folder. You should have already saved and named the
file: YEAR_MO_DY_LastName_K300Assign1_3RMD
- It is essential that your RMD files are saved in the top-level
K300_L_LastName folder for the
here
package to work
properly.
- As in the picture above, your RMD should be in the same K300 folder
as the “Datasets” folder (and “Assignments,” if you created one for
screenshots and assignment files as recommended).
- After saving your RMD file, close RStudio, then reopen directly from
RMD.
- Open K300_L folder, locate new RMD file
(YEAR_MO_DY_LastName_K300Assign1_3RMD), and click or double-click to
open with RStudio.
- By opening RStudio directly from the RMD file saved in your K300_L
folder, the
here
package by default will set the K300_L
folder as the top-level working directory
- Add
#comments
to R code and load haven
and
here
packages
- In your RMD, place hashmarks (#) in front of both
install.packages
commands.
- The
haven
and here
packages should already
be installed, so you should not need to run these lines again.
- Placing a hashmark in front of a command – or in front of any text
in R code – will create a “comment” that will not be run as a command in
R.
- After commenting out the
#install.packages
commands,
run both library
commands to load the haven
and here
packages.
- Though already installed, you will need to reload all packages that
you want to use at the beginning of each R session.
- Tip: Since you want to run all commands in your RMD that are not
commented out (i.e., both
library
commands), try running
the entire document instead of only a single code chunk.
- Check the “Packages” tab to make sure there are checkmarks by the
haven
and here
packages.
- Your RMD and RStudio session should now look similar to this:
- Note the red text in the Console that appeared after loading the
here
package. It confirms the here
package has
set my K300_L_LastName folder (i.e., for me, K300_L_Fordham) as the
top-level working directory.
- Open SPSS dataset in RStudio using haven package and assign data to
an object in R
- Start a second-level heading titled: “Read in and Assign Data to an
Object.” Then, begin a new code chunk and type the following command in
your RMD file:
read_spss(here("Datasets","MonitoringtheFuture2013grade10_0.sav"))
read_spss()
is a haven package command that allows R to
read SPSS datafiles
- Inside the
read_spss()
parentheses, we need to specify
the file location and the filename we wish to open, separated by a
comma.
- Instead of specifying the exact path location, we specify
here(“Datasets”,
which uses the here package to locate our
“Datasets” folder. By using the here package, your R code should work
for anyone on any computer as long as they have the exact same file
structure (e.g., you can work from local or cloud storage, or work from
different computers)
- Next, our code specifies the
’MonitoringtheFuture2013grade10_0.sav’
file in the
“Datasets” folder to open.
- Run the
read_spss()
command. You should see a small
snapshot of the data in your R viewer (right-hand side of RStudio,)
which looks something like this:
- You just read a dataset into RStudio!
- However, you do not want the dataset in the R Console – you want to
be able to call the dataset as an object on which you can do things,
such as summarize or correlate variables (later). To do this, you need
to assign the dataset to an object in the R Environment. Note that your
R Environment tab is still empty and looks like this:
- To assign the dataset to an object in R, type
MF2013g10data <-
before the
read_spss()
command.
- The text
MF2013g10data
is the name we are giving to the
data object we are creating.
- The
<-
command tells R to place whatever follows the
text arrow (in this case, the dataset read by the
read_spss()
command) into our MF2013g10data
object. Note: We can name the object
whatever we want (e.g., mydata
). However, it is good coding
practice to be systematic in creating concise yet informative names. The
name I selected reminds me that the object is the
Monitoring the Future
2013 grade 10 data.
- Also, R code is case-specific, so be careful and consistent with
using upper- and lower-case letters!
- By this point, you may be wondering why I care so much about how you
name your folders and files. The short answer is that thoughtful and
systematic naming conventions can save you a lot of time and can be very
helpful to your future self or to others attempting to reproduce your
work. In contrast, ad hoc names can cause a great deal of
unnecessary frustration. I recommend Danielle Navarro’s videos on Names
machines like and Names
humans like for more information and useful file naming tips.
- Your RMD code should look like this:
- Run the line of code in your RMD to create the new data object.
Part 6
Goal: Submitting your
first assignment
- After completing the first five parts of this assignment, hit ‘Knit’
in the top left (or press ‘Save’ if you selected ‘Knit on Save’). A Word
document should pop up with all of your work. It should look like this:
- You should also have Word documents in your “Assignments” folder,
each containing one of your two (2) screenshots (i.e., local folder with
6 datafiles; cloud folder with 6 datafiles).
- Put your full name, date, and assignment number (e.g.,
“Assignment 1.1”) at the top of each Word document,
then make sure each document is saved correctly (e.g.,
YEAR_MO_DY_LastName_K300Assign1_1 is titled
Assignment 1.1 and contains screenshot #1).
- Log into Course Canvas page, then go to “Module 1” and open
“Assignment 1.” Follow directions to upload and submit your documents
for grading.
Assignment 1 Objective Checks
After completing assignment #1, …
- have you created folders on both your personal computer and in cloud
storage?
- did you download 6 full datasets, then
save them in your local & cloud storage folders?
- do you know how to create a new R Markdown (RMD) document in
RStudio?
- do you know how to add and modify text, including italic or
bold font and level headings, in an R Markdown
document?
- do you know how to add and use an R code chunk in an RMD file?
- do you know how to use
install.packages()
and
library()
commands to install and load packages in R?
- do you know about
groundhog.library()
as an optional
but recommended reproducible alternative for loading packages?
- do you know how to add hashtags (“#”) to comment out a section of R
code so it does not run?
- are you able to use the “here” package and
here()
function for simple, reproducible file directory referencing?
- are you able to read data into R/RStudio using
read_spss()
function from “haven” package, then assign it
to an object in the R environment using an assignment
(<-
) operator?
- are you able to knit your RMD file into a Word document that you can
save and submit for course credit?