The purpose of this assignment is to learn how to wrangle data in order to reproduce results from a published study (specifically figures 2-4 in Warr, 1993). As such, this assignment will directly build upon “R Assignment 5: Downloading & Describing Data.” For that assignment we learned how to download the data directly from ICPSR, trim data to just inlcude the items in which we are interested, rename columns/variables, pool different data sets together, and then look at basic descriptive statistics for the variables in their raw form. However, we rarely analyze data in their raw form. Instead, most data analysis involves a large amount of wrangling or processing the data to get it ready to analyze and describe. This is completely normal. When we are working with data, it is common for the bulk of that work to be taken up with these data management and data wrangling tasks (see this blog for a review).
Specifically, for this assignment, we will:
ifelse
funciton and logic.We assume that you are now familiar with installing and loading packages in R. Thus, when you see a package being used, I expect that you know it needs to be installed and that it needs to be loaded within your own R session in order to use it.
At this point, I also assume you are familiar with RStudio and with creating R Markdown (RMD) files. If not, please review R Assignments 1 & 2.
As with previous assignments, for this and all future assignments, you MUST type all commands in by hand. Do not copy & paste from the instructions except for troubleshooting purposes (i.e., if you cannot figure out what you mistyped).
library(tidyverse)
library(here)
library(haven)
library(icpsrdata)
library(gt)
library(sjmisc)
library(janitor)
library(patchwork)
Every time we start a project or an assignment, we want to think about and create a “reproducible file structure.” In this case, since we are primarily continuing the work from the last R Assignment (“R Assignment 5: Downloading & Describing Data”), you are welcome to copy your file structure from “R Assignment 5” and simply rename it to refer to “R Assignment 6” (e.g., “Day_CRM495_RAssignment6”). Of course, you could also recreate the file structure from scratch by following the directions from in “R Assignment 5.”
The file structure for “R Assignment 5” included just your “NYS_data” folder and your RMD file. We will need one additional folder for this assignment. Specifically, we will create a folder to house the RScript file I provided on Canvas with the assignment. You can create that within your file explorer on your computer or, like with “R Assignment 5,” you and create it within R using the following code (see “R Assignment 5” instructions/walkthrough for explanation of the logic of the code chunk):
ifelse(dir.exists(here("R_scripts")), TRUE, dir.create(here("R_scripts")))
## [1] TRUE
Download the “Warr_1993_nysfwtrim_create.R” script file from canvas and place it into the “R_scripts” folder you just created within your file structure. This will allow us to call it from within our RMD file. Specifically, we’ll use the source
function to simply run the code housed in the script file. Here is what the code to do that looks like:
source(here("R_scripts", "Warr_1993_nysfwtrim_create.R"))
Note: the code above simply tells R to look in the “R_scripts” folder, open up the “Warr_1993_nysfwtrim_create.R” script file and run the code. This should create multiple objects in your “Environment” within RStudio similar to the various NYS data objects you created in “R Assignmnet 5.” Here is what your Environment should look like:
Note: The “Warr_1993_nysfwtrim_create.R” contains the code for downloading the data directly from icpsr. So if you did not copy over the “NYS_data” folder from the “R Assignment 5” file structure, it will try to download the data again, which means you will need to enter your ICPSR email address and password in Console window the first time you run the above code chunk.
So far, we have primarily written all of our code within RMD files. This works well for our purposes because it allows you to interactively write, explain, and share your R work. When creating tutorials or assignments, this is especially useful. However, for larger and more complex research projects (e.g., that include multiple data sets from different sources, dozens to hundreds of variables, analyses that build from simple description to complex modeling, etc.), researchers often split these tasks up across multiple script files and only use RMD files to present the results (e.g., in a paper or presentation). This kind of workflow fits with Scott Long’s ideas of having a “dual workflow” where data management and analysis are kept separate (see this video for an overview of his ideas regarding computational workflow).
Recall from last assignment that we are focusing on reproducing and extending Figures 2, 3, and 4 from Warr (1993):
Also, recall that in order to produce these plots we needed to accomplish seven general steps, the first four of which, we already accomplished in “R Assignment 5”:
In what follows, we will walk through the last three steps.
The three “other aspects of peer relations” variables that Warr (1993) plotted in Figures 2-4 were originally asked with answer categories ranging from the self-reported number of evenings socializing, a Likert-style five-point “importance” scale, and “yes/no/maybe”. As a reminder, here are the three specific questions along with their answer categories:
In the last assignment, along with survey question asking for respondents’ age, we renamed these items (age, evsoc, socimp, liepolice) and examined their descriptive statistics and distributions by age. Running the following code chunk will produce these descriptive statistics:
library(sjmisc)
nys_fwtrim %>%
frq(age, evsoc, socimp, liepolice)
nys_fwtrim %>%
descr(age, evsoc, socimp, liepolice)
nys_fwtrim %>%
flat_table(evsoc, age, margin = "col") #note: margin = "col" tells it to give me column percentages
nys_fwtrim %>%
flat_table(socimp, age, margin = "col")
nys_fwtrim %>%
flat_table(liepolice, age, margin = "col")
Warr (1993) did not analyze these data in their raw form. He recoded these specific items into dichotomous variables. In other words, he chose to categorize each of the the specific “aspects of peer relations” variables into two categories:
Spending three or more nights per week socializing vs. two or less nights per week socializing (Figure 2).
Reporting that it is “Very important” or “Pretty important” to socialize via dates, parties, etc. vs. those reporting it is “not important at all,” “not too important,” or “somewhat important”(Figure 3).
Reporting “yes” they would be willing to lie to protect friends who got in trouble with police vs. “no” or “maybe” (Figure 4).
Let’s go ahead and create dichotmous variables for each of the three “other aspects of peer relations” variables that Warr (1993) plotted in Figures 2 - 4. We’ll explain the code below.
nys_fwtrim_dic <- nys_fwtrim %>%
mutate(evsoc_dic = ifelse(evsoc >= 3, 1, 0),
socimp_dic = ifelse((socimp == 4 | socimp == 5), 1, 0),
liepolice_dic = ifelse(liepolice == 3, 1, 0),
liepolice_dic = ifelse(liepolice == 4, NA, liepolice_dic))
In the above code, we created a new data set object called “nys_fwtrim_dic”. We could have also just overwritten the “nys_fwtrim” data set but, as we mentioned before, we generally try to avoid overwriting objects so we can keep clear exactly what is in the objects we create. We assume there are different perspectives on this practice, as our approach can lead to the creation of a lot of superfluous objects in our R environment (Of course, if you followed our recommended RStudio “Global options” settings, then you will be starting with a clean environment for every R session, which makes this issue a bit more tolerable).
Within this new data set, we created three new dichotomous variables with the mutate()
function from the “dplyr” package. Recall from last assignment, that the mutate()
funciton creates new variables (i.e. new columns) in the data set. Here we named those new variables by appending the non-dichotomized names with "_dic" to tell our future selves and others that these are dichotomized versions of the raw items.
In the previous assignment, we simply used the mutate()
function to create a new variable with a specific value (e.g. mutate(wave = 1)
). Above, we used the mutate function along with the ifelse
logical operator to create new variables. Essentially, each of the new variables are created through a logical test that 1) asks if the raw variable is equal to certain values, 2) assigns a numerical value of one if it is and 3) if not, assigns a numerical value of zero. In dichotomizing the variables in this way, we created what are often called “dummy” variables. In this case, our dummy variables have the value of 1 if the respondent reports “Three or more evenings socializing” (evsoc_dic), answered that it is “Very important” or “Pretty important” to socialize in these ways (socimp_dic), and answered “Yes” that they would lie to the police to protect friends in trouble (liepolice_dic); they have the value of 0 if they answered otherwise.
Let’s walk through the logic of exaclty what the code above is doing for each of these three new variables:
evsoc >= 3,
), assign evsoc_dic the value of 1, otherwise assign evsoc_dic the value of 0.5
((socimp == 4 | socimp == 5)
), assign socimp_dic the value of 1, otherwise assign socimp_dic the value of 0.liepolice == 3
), assign liepolice_dic the value of 1, otherwise assign liepolice_dic the value of 0.
liepolice == 4
), assign liepolice_dic the value of NA
, otherwise assign it the value from the liepolice_dic variable we just created (i.e. 1 or 0).Let’s go ahead and look at the data and make sure the variables we told R to create were actually created.
head(nys_fwtrim_dic) %>%
gt()
CASEID | age | evsoc | socimp | liepolice | wave | evsoc_dic | socimp_dic | liepolice_dic |
---|---|---|---|---|---|---|---|---|
1 | 13 | 3 | 3 | 1 | 1 | 1 | 0 | 0 |
2 | 15 | 4 | 3 | 3 | 1 | 1 | 0 | 1 |
3 | 11 | 1 | 5 | 1 | 1 | 0 | 1 | 0 |
4 | 16 | 2 | 3 | 2 | 1 | 0 | 0 | 0 |
5 | 14 | 0 | 4 | NA | 1 | 0 | 1 | NA |
6 | 11 | 2 | 3 | 3 | 1 | 0 | 0 | 1 |
\(~\)
Great! It looks like R created the variables we wanted inside the new data object named “nys_fwtrim_dic.” However, just looking at the first six rows isn’t enough to ensure that the code above worked as expected. In the next section we’ll walk through some basic strategies for checking this.
Anytime you recode and/or manipulate data, you want to check that R did what you wanted it to do. The thing about programming languages like R is that they will do exactly what you tell them to do (or won’t do something because you didn’t speak to them correctly). But what you tell them to do is not always what you expect them to do. So it is crucial to check your data after you have made changes.
Let’s check to make sure our new “dummy” variables are categorizing the data as we expect them to. To do this we could simply use the flat_table()
function from the “sjmisc” package that you learned about in the previous assignment. However, the flat_table()
function doesn’t include the frequencies for missing data (at least not by default) which we want when checking that our code worked. So below, I’m going to use the tabyl()
function from the “janitor” package. The tabyl()
function was designed largely to replicate and expand on the functionality of the base R table()
function within a tidyverse framework.
tabyl()
function, you need to install and load the “janitor” package.library(janitor)
nys_fwtrim_dic %>%
tabyl(evsoc, evsoc_dic) %>%
gt()
evsoc | 0 | 1 | NA_ |
---|---|---|---|
0 | 1288 | 0 | 0 |
1 | 1899 | 0 | 0 |
2 | 2068 | 0 | 0 |
3 | 0 | 1463 | 0 |
4 | 0 | 677 | 0 |
5 | 0 | 386 | 0 |
6 | 0 | 111 | 0 |
7 | 0 | 137 | 0 |
NA | 0 | 0 | 596 |
nys_fwtrim_dic %>%
tabyl(socimp, socimp_dic) %>%
gt()
socimp | 0 | 1 | NA_ |
---|---|---|---|
1 | 474 | 0 | 0 |
2 | 1460 | 0 | 0 |
3 | 2543 | 0 | 0 |
4 | 0 | 2176 | 0 |
5 | 0 | 1375 | 0 |
NA | 0 | 0 | 597 |
nys_fwtrim_dic %>%
tabyl(liepolice, liepolice_dic) %>%
gt()
liepolice | 0 | 1 | NA_ |
---|---|---|---|
1 | 4469 | 0 | 0 |
2 | 1830 | 0 | 0 |
3 | 0 | 1338 | 0 |
4 | 0 | 0 | 1 |
NA | 0 | 0 | 987 |
\(~\)
As you can see from each of these tables above, we simply created a crosstab with the original variable in the rows and our “dummy” variables in the columns. This allows you to see that the values in the original variable you intended to count as 1 or 0 in the dummy variable are indeed counting as those values.
mutate()
function is that it automatically handles missing values. So, if the observation was coded as NA
in the original variable, the mutate()
function will automatically assign the observation for the new variable constructed from it an NA
value. We assume there are limits to this functionality, but it works in most cases we would need it for. Regardless, you should always check to make sure your data wrangling and recoding worked as you anticipated, including the treatment of missing values.The mutate()
function can do a lot more than just assign a value to a new variable or create dummy variables like what we’ve primarily used it for to this point. For instance, it can also create a new variable based on a mathematical formula or based on some function of existing variables. Let’s show you some relatively simple examples:
nys_fwtrim_dictemp <- nys_fwtrim_dic %>%
mutate(peerrel_index = evsoc_dic + socimp_dic + liepolice_dic,
age_squared = age * age,
age_squaredB = age^2,
evsoc_mean = mean(evsoc, na.rm = TRUE),
evsoc_sd = sd(evsoc, na.rm = TRUE),
evsoc_z = (evsoc - evsoc_mean) / evsoc_sd,
evsoc_zB = (evsoc - mean(evsoc, na.rm = TRUE)) / sd(evsoc, na.rm = TRUE))
In the above code, we show you how you can create new variables by performing different mathematical functions on existing variables. Also, we show how R has some built in functions for calculating common statistics like the mean and standard deviation which themselves can be used to create new variables. Let us briefly walk you through what we did for each new variable create above:
NA
.NA
.mean()
and sd()
functions within the mutate equation.We are just scratching the surface of what the mutate()
function can do. Be sure to take a minute to examine the first six rows of data with the head()
function so you can see that these commands worked as intended. You’ll notice, for example, that the first observation has values of 1 only for the “evsoc_dic” variable and values of 0 for “socimp_dic” and “liepolice_dic” variables. Thus, that respondent has a value of 1 for the “peerrel_index” variable (1 + 0 + 0 = 1).
head(nys_fwtrim_dictemp) %>%
gt()
CASEID | age | evsoc | socimp | liepolice | wave | evsoc_dic | socimp_dic | liepolice_dic | peerrel_index | age_squared | age_squaredB | evsoc_mean | evsoc_sd | evsoc_z | evsoc_zB |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 13 | 3 | 3 | 1 | 1 | 1 | 0 | 0 | 1 | 169 | 169 | 2.078341 | 1.572359 | 0.58616326 | 0.58616326 |
2 | 15 | 4 | 3 | 3 | 1 | 1 | 0 | 1 | 2 | 225 | 225 | 2.078341 | 1.572359 | 1.22215039 | 1.22215039 |
3 | 11 | 1 | 5 | 1 | 1 | 0 | 1 | 0 | 1 | 121 | 121 | 2.078341 | 1.572359 | -0.68581101 | -0.68581101 |
4 | 16 | 2 | 3 | 2 | 1 | 0 | 0 | 0 | 0 | 256 | 256 | 2.078341 | 1.572359 | -0.04982388 | -0.04982388 |
5 | 14 | 0 | 4 | NA | 1 | 0 | 1 | NA | NA | 196 | 196 | 2.078341 | 1.572359 | -1.32179815 | -1.32179815 |
6 | 11 | 2 | 3 | 3 | 1 | 0 | 0 | 1 | 1 | 121 | 121 | 2.078341 | 1.572359 | -0.04982388 | -0.04982388 |
\(~\)
Finally, it is also worth examining the descriptive statistics for these variables, especially for the standardized variables, so you can see what this looks like in terms of the mean and standard deviation (when you construct z-scores the variable should have a mean of 0 and a standard deviation of 1):
nys_fwtrim_dictemp %>%
descr(peerrel_index:evsoc_zB)
##
## ## Basic descriptive statistics
##
## var type label n NA.prc mean sd se md
## peerrel_index numeric peerrel_index 7624 11.61 0.98 0.90 0.01 1.00
## age_squared numeric age_squared 8625 0.00 257.77 76.82 0.83 256.00
## age_squaredB numeric age_squaredB 8625 0.00 257.77 76.82 0.83 256.00
## evsoc_mean numeric evsoc_mean 8625 0.00 2.08 0.00 0.00 2.08
## evsoc_sd numeric evsoc_sd 8625 0.00 1.57 0.00 0.00 1.57
## evsoc_z numeric evsoc_z 8029 6.91 0.00 1.00 0.01 -0.05
## evsoc_zB numeric evsoc_zB 8029 6.91 0.00 1.00 0.01 -0.05
## trimmed range iqr skew
## 0.91 3 (0-3) 2.000000 0.47
## 254.81 320 (121-441) 128.000000 0.33
## 254.81 320 (121-441) 128.000000 0.33
## 2.08 0 (2.08-2.08) 0.000000 NaN
## 1.57 0 (1.57-1.57) 0.000000 NaN
## -0.09 4.45 (-1.32-3.13) 1.271974 0.79
## -0.09 4.45 (-1.32-3.13) 1.271974 0.79
Now that we have recoded the data, we are almost ready to reproduce Warr’s (1993) plots. But first, we need to wrangle the data some more to get it in the form we want to plot. Specifically, for each age group, we need to calculate the percentage of respondents that have a value of 1
on each of our “other aspects of peer relations” dummy variables. You already did this with the raw versions of the variables last assignment with the flat_table()
function from the “sjmisc” package. We can see the basic form in which we want to get the data by doing that again with our dummy variables.
nys_fwtrim_dic %>%
flat_table(evsoc_dic, age, margin = "col")
## age 11 12 13 14 15 16 17 18 19 20 21
## evsoc_dic
## 0 87.60 87.93 83.79 75.00 67.13 60.93 55.71 47.80 52.57 56.05 67.88
## 1 12.40 12.07 16.21 25.00 32.87 39.07 44.29 52.20 47.43 43.95 32.12
nys_fwtrim_dic %>%
flat_table(socimp_dic, age, margin = "col")
## age 11 12 13 14 15 16 17 18 19 20 21
## socimp_dic
## 0 62.00 65.59 63.86 59.14 55.16 53.47 46.77 51.01 57.21 55.00 62.42
## 1 38.00 34.41 36.14 40.86 44.84 46.53 53.23 48.99 42.79 45.00 37.58
nys_fwtrim_dic %>%
flat_table(liepolice_dic, age, margin = "col")
## age 11 12 13 14 15 16 17 18 19 20 21
## liepolice_dic
## 0 90.65 90.60 88.40 83.94 81.42 79.37 79.40 79.58 79.93 83.64 86.50
## 1 9.35 9.40 11.60 16.06 18.58 20.63 20.60 20.42 20.07 16.36 13.50
If you compare the above tables to Figures 2 - 4 in Warr (1993) you’ll notice that the percentage in the 1 category align with the value for that age group in the figures. To get this into a format that can be plotted with the “ggplot2” package, we need to create a summary data set that has a variable for age and a variable variables that reflects the values in the bottom row in each of the tables above.
In order to get the data ready to plot, we need to use these dichotomous variables to create a summary of the data that reflects the “percentage of respondents” classified as 1 in each of our “other aspects of peer relations” variables for each age group. To do that we will go back to “dplyr” and use the group_by()
and summarize()
functions. Here is what the code looks like (explained below):
nys_fwsum <- nys_fwtrim_dic %>%
group_by(age) %>%
summarize(perc_evsoc = mean(evsoc_dic, na.rm = TRUE) * 100,
perc_socimp = mean(socimp_dic, na.rm = TRUE) * 100,
perc_liepolice = mean(liepolice_dic, na.rm = TRUE) * 100)
nys_fwsum %>%
gt()
age | perc_evsoc | perc_socimp | perc_liepolice |
---|---|---|---|
11 | 12.40000 | 38.00000 | 9.345794 |
12 | 12.07243 | 34.40644 | 9.395973 |
13 | 16.20553 | 36.13666 | 11.604585 |
14 | 25.00000 | 40.86345 | 16.063830 |
15 | 32.86540 | 44.83898 | 18.578767 |
16 | 39.06511 | 46.53300 | 20.632133 |
17 | 44.28698 | 53.23295 | 20.596459 |
18 | 52.19976 | 48.98689 | 20.415648 |
19 | 47.42952 | 42.78607 | 20.066890 |
20 | 43.94737 | 45.00000 | 16.358839 |
21 | 32.12121 | 37.57576 | 13.496933 |
\(~\)
In the above code, we created summary data set that includes variables for the age grouping (“age”) and for the percentage of respondents classified as 1
in each of our three “other aspects of peer relations” dummy variables. Let us briefly explain the logic:
First, we told R to group the data set by the variable “age.” This basically tells R to perform whatever functions come next within the values of the grouping variable - in this case age (e.g., within age == 11; then within age == 12; etc.).
Second, we used the summarize()
function to create a new summary data set with the variables calculated as instructed in the parentheses.
mean()
function to indicate the proportion of respondents in each age group who were classified as 1 in each of our dummy variables. This works because when you calculate the mean of a dummy variable, the mean represents the proportion of cases that have the category coded as 1.Now that we have these summary data, plotting them is relatively easy. We just need to tell ggplot what data and variables to use and the specific geom we want to represent the data as. There are some other details, but we already created the basic template for the plots in “R Assignment 3.” So we can just re-use that here and update the information.
ggplot()
, at minimum we need to tell it 1) what data to use, 2) mapping variables in the data to visual properties of the plot and geoms (e.g., x-axis, y-axis, etc.), and 3) and specify the geometric features for visualizing the data (i.e. specify a geom. Everything else is essentially fine-tuning the details to get the plots to look exactly how we want them to.theme_set(theme_classic())
#Evenings Socializing Plot:
evsoc_plot <- ggplot(data = nys_fwsum, aes(x = age, y = perc_evsoc)) +
geom_line() +
geom_point(shape = "square") +
scale_x_continuous(limits = c(11, 21), breaks = 11:21) +
scale_y_continuous(limits = c(0, 60), breaks = seq(0, 60, 10)) +
labs (title = "Figure 2: Percentage of Respondents Reporting That They Averaged Three or More
Nights Per Week Going “On Dates, To Parties, or to Other Social Activities,” by Age",
x = "Age",
y = "Percent") +
theme(plot.title = element_text(size = 11),
axis.title = element_text(size = 10))
evsoc_plot
#Importance of Socializig Plot:
socimp_plot <- ggplot(data = nys_fwsum, aes(x = age, y = perc_socimp)) +
geom_line() +
geom_point(shape = "square") +
scale_x_continuous(limits = c(11, 21), breaks = 11:21) +
scale_y_continuous(limits = c(30, 60), breaks = seq(30, 60, 10)) +
labs (title = "Figure 3: Percentage of Respondents Who Said That It Is “Very Important” or “Pretty Important”
to “Have Dates and Go to Parties and Other Social Activities,” by Age",
x = "Age",
y = "Percent") +
theme(plot.title = element_text(size = 11),
axis.title = element_text(size = 10))
socimp_plot
#Lie to Police Plot:
liepolice_plot <- ggplot(data = nys_fwsum, aes(x = age, y = perc_liepolice)) +
geom_line() +
geom_point(shape = "square") +
scale_x_continuous(limits = c(11, 21), breaks = 11:21) +
scale_y_continuous(limits = c(0, 25), breaks = seq(0, 25, 5)) +
labs (title = "Figure 4: Percentage of Respondents Who Said That They Would Lie
to Protect Their Friends if They Got into Trouble with the Police, by Age",
x = "Age",
y = "Percent") +
theme(plot.title = element_text(size = 11),
axis.title = element_text(size = 10))
liepolice_plot
Recall that we could also put each of these plots together using the “patchwork” package:
library(patchwork)
fig234_peerrel = evsoc_plot + socimp_plot + liepolice_plot +
plot_layout(ncol = 1) +
plot_annotation(
title = "Age Distribution of Other Peer Relations Variables from Warr (1993)")
fig234_peerrel
In the above plot, I stacked the plots in a single column so the titles of the individual plots would not be cut off. Of course, with the “patchwork” package, you can adjust the layout in a multitude of ways.
{r, fig.width=4, fig.height=9}
Like with the last assignment, in order for you to demonstrate that you can apply the basic data wrangling and recoding skills that you learned above on your own, in the last part of the assignment, you will consider alternative operationalizations of one of the “other elements of peer relations” that Warr (1993) was examining in Figures 2-4.
Recall that Warr (1993) used the question about being willing to “lie to protect their friends if they got in trouble with the police” as an indicator of respondents’ “commitment or loyalty to their own particular set of friends (pg. 19).” However, in the section on “Committment to Delinquent Peers” in the codebooks for the first five waves of NYS data, their are two other questions that are meant to measure “commitment” to peers who are engaging in delinquency:
These were in addition to the question Warr (1993) examined:
Here is the table that shows you where each item is located in each of the first five waves of NYS data:
Warr (1993) Figures 2-4 NYS Items | |||||
---|---|---|---|---|---|
Item | Wave 1 | Wave 2 | Wave 3 | Wave 4 | Wave 5 |
ICPSR number1 | 8375 | 8424 | 8506 | 8917 | 9112 |
Age | V169 | V7 | V10 | V6 | V6 |
Still run around with friends | V375 | V221 | V319 | V299 | V326 |
Try to stop activities | V376 | V222 | V320 | V300 | V327 |
Lie to police | V377 | V223 | V321 | V301 | V328 |
1
Note: indicates the icpsr number for the data set and not a survey item
|
\(~\)
You should have created the pooled data with each of these items in “R Assignment 5,” now you simply need to recode the items into dummy variables and plot them like we did above.
In order to complete the assignment, here is what you need to do:
Before looking at the data, write a brief statement or commentary about whether you think the other two “commitment to delinquent peers” items will have a similar age distribution to the “lie to police” item for which you already produced the descriptive plot.
Trim, rename, and pool waves 1-5 data so that you have all three “commitment to delinquent peers” items in the same pooled data set.
Recode the three “commitment to delinquent peers” items into dummy variables and check to see that your code worked by examining cross-tabulations between the dummy variables and the original variables.
Create a summary data set with the percentage of respondents for each variable who report being “committed to delinquent peers” for each age group.
Write a brief statement or commentary about the similarities and differences between each of the “commitment to delinquent peers” items in terms of their raw frequency distribution and their age distribution.
Write a “Conclusion” section where you write about what you learned in this assignment and any problems or issues you had in completing it.
“knit” your final RMD file to html format and save it using an informative file name (e.g., “LastName_CRM495_RAssgin6_YEAR_MO_DY”) within a file structure you create for this assignment (e.g., “LastName_CRM495_RAssign6”)
Submit your knitted html file on Canvas.
Place a copy of your root folder in your LastName_495_commit folder on OneDrive.