Extra Credit Phase 1: Reproducing Blog Code

  1. Read through our “Causation without correlation” blog post.

  2. Create a new RMD file for your extra credit assignment. As usual, follow file naming conventions, organize with headings, thoroughly document code, and provide detailed text descriptions throughout. Your first level-two heading (‘##’) should be titled “Phase 1: Reproducing Blog Code”.

  3. Follow along with blog example by copying and running all code used for the post in your own extra credit RMD file. (Be sure to click on “Show Code” for all code chunks in the post.) At minimum, you should have the following level-three (‘###’) subheadings:

  • Load Libraries & plot DAGs
  • Simulate data & plot descriptives
  • Bivariate correlations (pairs.panels)
  • Linear regression without mediators
  • Linear regression with both mediators
  • Linear regression omitting one mediator
  • Mediation analysis (test of indirect effects)

Extra Credit Phase 2: Extending Blog Example

  1. After completing Phase 1 by successfully reproducing all code from the blog post, create another level-two header (‘##’) titled “Phase 2: Extending Blog Example”.

  2. For this section, you will simulate new data containing a third mediating mechanism: Opportunity. In addition to Strain and Risk, for this assignment, we will make the strong assumption that high SES increases a youth’s exposure to opportunities to engage in delinquency, and we will make the safer assumption that exposure to opportunites for delinquency increases the risks and incidence of engaging in delinquent behaviors. Below you will find a modified DAG representing the causal relationships used to generate the new simulated data. You will also find the code provided so that you can generate the new simulated data (object named ‘simdata2’). After creating the DAG and simulating data, be sure to also generate descriptive plots for each variable.

  3. In this Phase 2, you will follow the same procedures used in the blog post and reproduced in Phase 1 to investigate how statistical associations between SES and delinquency change when we add a third mediating mechanism to the simulated data (be sure to use the new data; assign it to object named ‘simdata2’). Along the way, you should answer the following questions in your RMD file.

DAG for “three mechanism” extra credit example

SESdag2 <- dagitty("dag{
  SES -> Strain -> Delinquency
  SES -> Risk -> Delinquency
  SES -> Opportunity -> Delinquency
   }") 
coordinates(SESdag2) <- list(
  x=c(SES=1, Strain=2, Risk=2, Opportunity=2, Delinquency=3),
  y=c(SES=2, Strain=1, Risk=3, Opportunity=4, Delinquency=2) )

plot(SESdag2)

Code to generate new simulated data with three mechanisms (simdata2)

# X = Parental SES
# Y = Child delinquent behavior
# F = Mediatior through which high Parent SES might decrease delinquency - e.g., less financial (S)train
# P = Mediators through which high Parent SES increase delinquency - e.g., lower perceived (R)isk of detection

# Strain -> Delinquency <- Risk 
# Strain <- SES -> Risk

set.seed(1138)
n <- 1000

# McElreath method (p.153)
# SES <- rnorm(n)
# Strain <- rnorm(n,SES)
# Risk <- rnorm(n,SES)
# Delinquency <- rnorm(n,Strain-Risk)
# 

# https://www.tandfonline.com/doi/pdf/10.1080/10691898.2020.1752859
set.seed(1138)
n <- 1000
SES <- round(rnorm(n),digits=0)
Strain <- round(-.5*SES + rnorm(n),digits=0)
Risk <- round(-.5*SES + rnorm(n),digits=0)
Oppty <- round(.4*SES + rnorm(n), digits=0)
Delinquency <- round(.5*Strain + -.5*Risk + .6*Oppty + 0*SES + rpois(n,1),digits=0)

simdata2 <- tibble(SES, Strain, Risk, Oppty, Delinquency)
simdata2 <- simdata2 %>% mutate(
  Delinquency = ifelse(Delinquency < 0, Delinquency == 0, Delinquency)
)

simdata2
## # A tibble: 1,000 × 5
##      SES Strain  Risk Oppty Delinquency
##    <dbl>  <dbl> <dbl> <dbl>       <dbl>
##  1    -1      1     0     1           3
##  2    -1     -1     0     0           0
##  3     0      1     1     0           1
##  4     0     -1     0     0           0
##  5    -1      0     1    -1           1
##  6    -1      1     0    -1           0
##  7     2     -1    -1     1           1
##  8    -1      0    -1    -1           3
##  9     0      0    -2     2           3
## 10     1      0    -1     1           1
## # ℹ 990 more rows

Question Block 1 (Bivariate SES/Delinquency Correlation)

Estimate a pairs.panels plot using the new simulated data set (simdata2). What is the bivariate correlation between SES and delinquency?

Estimate a regression model predicting delinquency that only includes SES and omits all three mediating mechanisms. What is the regression coefficient (beta) describing the bivariate association between SES and delinquency in the new data (simdata2)?

How would you interpret these results? What do you think they mean?

Question Block 2 (Regression Model w/All Mediators)

Estimate a regression model predicting delinquency that includes SES and also includes all three mediating mechanisms as predictors.

What is the regression coefficient (beta) describing the association between SES and delinquency in the new data (simdata2)? - What is the t-value and p-value for this estimate? - What is the null hypothesis for this test? Would you reject or fail to reject the null hypothesis? - How would you interpret these results? What do you think they mean? Is this estimate similar to the bivariate correlation or bivariate regression estimate you reported above? Why or why not?

How would you briefly interpret each of the regression coefficients (betas) representing associations between each of the three mediating mechanisms and delinquency?

Question Block 3 (Springing the Causal Trap)

Estimate a regression model predicting delinquency that includes SES and also includes Strain as a predictor, but omit the other two mediating predictors (Risk; Oppty) from the model.

What is the regression coefficient (beta) describing the association between SES and delinquency in the new data (simdata2)? - What is the t-value and p-value for this estimate? - What is the null hypothesis for this test? Would you reject or fail to reject the null hypothesis? - How would you interpret these results? What do you think they mean? Is this estimate similar to the bivariate correlation or bivariate regression estimate you reported above? Is it similar to the estimate from the model that included all three predictors? Why or why not?

How would you briefly interpret the regression coefficient (beta) representing associations between the mediating mechanism Strain and delinquency? Is it similar to the estimate from the model with all three mechanisms? Why or why not?

Question Block 4 (Simple Mediation Model)

Use the psych::mediate() function to conduct a test of indirect effects with the new simulated data (simdata2) and regression models predicting delinquency. Include SES as the ‘exposure’ or primary predictor and the other three variables as mediating mechanisms by including in parentheses (Strain; Risk; Oppty).

Give your best attempt at briefly interpreting these results.

Overall, did you learn anything from this assignment? If so, what did you learn?