Read through our “Causation without correlation” blog post.
Create a new RMD file for your extra credit assignment. As usual, follow file naming conventions, organize with headings, thoroughly document code, and provide detailed text descriptions throughout. Your first level-two heading (‘##’) should be titled “Phase 1: Reproducing Blog Code”.
Follow along with blog example by copying and running all code used for the post in your own extra credit RMD file. (Be sure to click on “Show Code” for all code chunks in the post.) At minimum, you should have the following level-three (‘###’) subheadings:
After completing Phase 1 by successfully reproducing all code from the blog post, create another level-two header (‘##’) titled “Phase 2: Extending Blog Example”.
For this section, you will simulate new data containing a third mediating mechanism: Opportunity. In addition to Strain and Risk, for this assignment, we will make the strong assumption that high SES increases a youth’s exposure to opportunities to engage in delinquency, and we will make the safer assumption that exposure to opportunites for delinquency increases the risks and incidence of engaging in delinquent behaviors. Below you will find a modified DAG representing the causal relationships used to generate the new simulated data. You will also find the code provided so that you can generate the new simulated data (object named ‘simdata2’). After creating the DAG and simulating data, be sure to also generate descriptive plots for each variable.
In this Phase 2, you will follow the same procedures used in the blog post and reproduced in Phase 1 to investigate how statistical associations between SES and delinquency change when we add a third mediating mechanism to the simulated data (be sure to use the new data; assign it to object named ‘simdata2’). Along the way, you should answer the following questions in your RMD file.
SESdag2 <- dagitty("dag{
SES -> Strain -> Delinquency
SES -> Risk -> Delinquency
SES -> Opportunity -> Delinquency
}")
coordinates(SESdag2) <- list(
x=c(SES=1, Strain=2, Risk=2, Opportunity=2, Delinquency=3),
y=c(SES=2, Strain=1, Risk=3, Opportunity=4, Delinquency=2) )
plot(SESdag2)
# X = Parental SES
# Y = Child delinquent behavior
# F = Mediatior through which high Parent SES might decrease delinquency - e.g., less financial (S)train
# P = Mediators through which high Parent SES increase delinquency - e.g., lower perceived (R)isk of detection
# Strain -> Delinquency <- Risk
# Strain <- SES -> Risk
set.seed(1138)
n <- 1000
# McElreath method (p.153)
# SES <- rnorm(n)
# Strain <- rnorm(n,SES)
# Risk <- rnorm(n,SES)
# Delinquency <- rnorm(n,Strain-Risk)
#
# https://www.tandfonline.com/doi/pdf/10.1080/10691898.2020.1752859
set.seed(1138)
n <- 1000
SES <- round(rnorm(n),digits=0)
Strain <- round(-.5*SES + rnorm(n),digits=0)
Risk <- round(-.5*SES + rnorm(n),digits=0)
Oppty <- round(.4*SES + rnorm(n), digits=0)
Delinquency <- round(.5*Strain + -.5*Risk + .6*Oppty + 0*SES + rpois(n,1),digits=0)
simdata2 <- tibble(SES, Strain, Risk, Oppty, Delinquency)
simdata2 <- simdata2 %>% mutate(
Delinquency = ifelse(Delinquency < 0, Delinquency == 0, Delinquency)
)
simdata2
## # A tibble: 1,000 × 5
## SES Strain Risk Oppty Delinquency
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 -1 1 0 1 3
## 2 -1 -1 0 0 0
## 3 0 1 1 0 1
## 4 0 -1 0 0 0
## 5 -1 0 1 -1 1
## 6 -1 1 0 -1 0
## 7 2 -1 -1 1 1
## 8 -1 0 -1 -1 3
## 9 0 0 -2 2 3
## 10 1 0 -1 1 1
## # ℹ 990 more rows
Estimate a pairs.panels plot using the new simulated data set (simdata2). What is the bivariate correlation between SES and delinquency?
Estimate a regression model predicting delinquency that only includes SES and omits all three mediating mechanisms. What is the regression coefficient (beta) describing the bivariate association between SES and delinquency in the new data (simdata2)?
How would you interpret these results? What do you think they mean?
Estimate a regression model predicting delinquency that includes SES and also includes all three mediating mechanisms as predictors.
What is the regression coefficient (beta) describing the association between SES and delinquency in the new data (simdata2)? - What is the t-value and p-value for this estimate? - What is the null hypothesis for this test? Would you reject or fail to reject the null hypothesis? - How would you interpret these results? What do you think they mean? Is this estimate similar to the bivariate correlation or bivariate regression estimate you reported above? Why or why not?
How would you briefly interpret each of the regression coefficients (betas) representing associations between each of the three mediating mechanisms and delinquency?
Estimate a regression model predicting delinquency that includes SES and also includes Strain as a predictor, but omit the other two mediating predictors (Risk; Oppty) from the model.
What is the regression coefficient (beta) describing the association between SES and delinquency in the new data (simdata2)? - What is the t-value and p-value for this estimate? - What is the null hypothesis for this test? Would you reject or fail to reject the null hypothesis? - How would you interpret these results? What do you think they mean? Is this estimate similar to the bivariate correlation or bivariate regression estimate you reported above? Is it similar to the estimate from the model that included all three predictors? Why or why not?
How would you briefly interpret the regression coefficient (beta) representing associations between the mediating mechanism Strain and delinquency? Is it similar to the estimate from the model with all three mechanisms? Why or why not?
Use the psych::mediate()
function to conduct a test of
indirect effects with the new simulated data (simdata2) and regression
models predicting delinquency. Include SES as the ‘exposure’ or primary
predictor and the other three variables as mediating mechanisms by
including in parentheses (Strain; Risk; Oppty).
Give your best attempt at briefly interpreting these results.
Overall, did you learn anything from this assignment? If so, what did you learn?