APPENDIX A. R Tutorial

Authors

Jonathan R. Brauer

Jacob C. Day

Published

July 30, 2025

1 Introduction

This appendix provides the R code necessary to replicate the key models discussed in the main manuscript, particularly those presented in Table 3 comparing different modeling approaches to analyzing fear of police data. We include code for frequentist and Bayesian versions of OLS and multilevel cumulative probit models, as well as methods to test the proportional odds assumption and generate predicted probabilities.

The code relies on data from Pickett et al. (2022) that were pre-processed in our primary R supplement file; see the end (section “APP-A”) for details and code where we saved the necessary replication_data.RData and replication_data_long.RData files.

2 Setup: Load Libraries and Settings

Show code
# Essential packages for data manipulation and modeling
library(here)
library(tidyverse)
library(haven)          # For reading .dta files (though data is pre-processed)
library(datawizard)     # For row_means
library(lmtest)         # For coeftest
library(sandwich)       # For vcovHC (robust standard errors)
library(rstanarm)       # For Bayesian linear models
library(brms)           # For Bayesian (multilevel) ordinal and categorical models
library(ordinal)        # For frequentist (multilevel) ordinal models (clmm)
library(MASS)           # For polr (frequentist ordinal non-multilevel)
library(gt)             # For creating nice tables
library(scales)         # For pretty breaks in plots
library(patchwork)      # For combining plots
library(parallelly)     # For detecting CPU cores
library(bayesplot)      # For color_scheme_set and other posterior plotting functions
library(modelsummary)   # For datasummary_skim and other model summary functions
library(ggeffects)      # For predicted probabilities from frequentist models
library(tidybayes)      # For Bayesian model predictions
library(viridis)        # For colorblind-accessible plots 

# Set default ggplot2 colors to viridis
options(ggplot2.discrete.colour = scale_color_viridis_d)
options(ggplot2.discrete.fill = scale_fill_viridis_d)

# Set global options for `knitr`
knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE, fig.align="center")
options(scipen = 999, digits = 2) # Set global option to two decimals for non-plot outputs

# Identify & set number of available CPU cores for parallel processing (e.g., Stan models)
nCoresphys <- parallelly::availableCores(logical = FALSE)
options(mc.cores = nCoresphys)

# Set color scheme for `bayesplot`
color_scheme_set("viridisA")

# Ensure the 'Models' directory exists for saving cached model fits
if (!dir.exists(here("Models"))) {
  dir.create(here("Models"))
}

# Vector of fear item names (should match your data prep)
fearpol_new <- c("stopyou", "searchyou", "yellyou", "handcyou",
                 "kickyou", "pinyou", "sprayyou", "taseyou",
                 "shootyou", "killyou")

3 Tutorial Overview

The table below (Appendix A in manuscript) provides a roadmap of the modeling approaches demonstrated in this tutorial. Each section builds on the previous ones, showing the advantages of ordinal modeling approaches over linear methods while providing practical solutions for common convergence issues.

Readers primarily interested in practical implementation may focus on Sections Model 2, Model 3b, and Predicted Probabilities. Those wanting to understand additional issues (e.g., frequentist convergence challenges) should review the full sequence.

3.1 Appendix A Summary Table

Show code
# === APPENDIX 3: TUTORIAL OVERVIEW ===
# This table summarizes the modeling approaches demonstrated in our R tutorial
# Available at: [INSERT URL]

tutorial_overview <- tibble(
  Section = c("Model 1", "Model 2", "Model 3a", "Alternative 3a", "Model 3b", 
              "Proportional Odds Test", "Predicted Probabilities"),
  
  Approach = c("Frequentist OLS", "Bayesian Linear", "Frequentist CP MLM", 
               "Item-Specific CP Models", "Bayesian CP MLM", "Category-Specific Model",
               "Probability Estimation"),
  
  Family = c("Gaussian", "Gaussian", "Cumulative probit", "Cumulative probit", 
            "Cumulative probit", "Categorical logit", "N/A"),
  
  Structure = c("Single-level", "Single-level", "Multilevel", "Item-specific",
                "Multilevel", "Multilevel", "From fitted models"),
  
  Status = c("Replicates original", "Converges", "Convergence issues", "Robust alternative",
             "Converges successfully", "For assumption testing", "Demonstrates interpretation"),
  
  Key_Purpose = c("Baseline comparison with linear approach",
                 "Bayesian equivalent of original analysis", 
                 "Frequentist multilevel ordinal (often fails to converge)",
                 "Simpler & more reliable fallback if frequentist MLM 
                 convergence fails",
                 "Primary recommended multilevel ordinal approach",
                 "Tests whether race effects vary across thresholds",
                 "Shows interpretable outputs: item-specific vs population-level")
)

# Display the tutorial overview table
gt(tutorial_overview) %>%
  tab_header(
    title = "Appendix A: R Tutorial Structure Overview",
    subtitle = "Comprehensive comparison of modeling approaches for ordinal fear of police data"
  ) %>%
  cols_label(
    Section = "Tutorial Section",
    Approach = "Statistical Approach", 
    Family = "Model Family",
    Structure = "Data Structure",
    Status = "Convergence Status",
    Key_Purpose = "Purpose & Key Insights"
  ) %>%
  tab_options(
    table.font.size = 11,
    heading.align = "center"
  ) %>%
  tab_footnote(
    footnote = "Complete R code, data, and detailed explanations available in online 'Appendix 3' Quarto document",
    locations = cells_title(groups = "title")
  ) %>%
  tab_footnote(
    footnote = "Demonstrates why Bayesian multilevel ordinal models are preferred over linear approaches",
    locations = cells_title(groups = "subtitle")
  )
Appendix A: R Tutorial Structure Overview1
Comprehensive comparison of modeling approaches for ordinal fear of police data2
Tutorial Section Statistical Approach Model Family Data Structure Convergence Status Purpose & Key Insights
Model 1 Frequentist OLS Gaussian Single-level Replicates original Baseline comparison with linear approach
Model 2 Bayesian Linear Gaussian Single-level Converges Bayesian equivalent of original analysis
Model 3a Frequentist CP MLM Cumulative probit Multilevel Convergence issues Frequentist multilevel ordinal (often fails to converge)
Alternative 3a Item-Specific CP Models Cumulative probit Item-specific Robust alternative Simpler & more reliable fallback if frequentist MLM convergence fails
Model 3b Bayesian CP MLM Cumulative probit Multilevel Converges successfully Primary recommended multilevel ordinal approach
Proportional Odds Test Category-Specific Model Categorical logit Multilevel For assumption testing Tests whether race effects vary across thresholds
Predicted Probabilities Probability Estimation N/A From fitted models Demonstrates interpretation Shows interpretable outputs: item-specific vs population-level
1 Complete R code, data, and detailed explanations available in online 'Appendix 3' Quarto document
2 Demonstrates why Bayesian multilevel ordinal models are preferred over linear approaches

4 Load Replication Data

Show code
# Load the main processed data
load(here("Data", "replication_data.RData"))
# Load the long-format data for multilevel models
load(here("Data", "replication_data_long.RData"))

# Display a summary of the main data to confirm successful loading
datasummary_skim(dat)
Unique Missing Pct. Mean SD Min Median Max Histogram
stopyou 5 0 2.0 1.3 0.0 2.0 4.0
searchyou 5 0 1.9 1.3 0.0 2.0 4.0
yellyou 5 0 1.9 1.3 0.0 2.0 4.0
handcyou 5 0 1.9 1.4 0.0 2.0 4.0
kickyou 5 0 1.9 1.5 0.0 2.0 4.0
pinyou 5 0 1.9 1.5 0.0 2.0 4.0
sprayyou 5 0 1.9 1.5 0.0 2.0 4.0
taseyou 5 0 1.9 1.5 0.0 2.0 4.0
shootyou 5 0 1.9 1.5 0.0 2.0 4.0
killyou 5 0 2.0 1.6 0.0 2.0 4.0
fearpol 41 0 1.9 1.3 0.0 2.0 4.0
fearpol_norm 41 0 48.2 33.4 0.0 50.0 100.0
N %
race White 504 49.4
Black 516 50.6
stopyou_ord 0 191 18.7
1 168 16.5
2 283 27.7
3 215 21.1
4 163 16.0
searchyou_ord 0 210 20.6
1 205 20.1
2 277 27.2
3 168 16.5
4 160 15.7
yellyou_ord 0 200 19.6
1 187 18.3
2 272 26.7
3 194 19.0
4 167 16.4
handcyou_ord 0 217 21.3
1 200 19.6
2 235 23.0
3 183 17.9
4 185 18.1
kickyou_ord 0 265 26.0
1 170 16.7
2 212 20.8
3 161 15.8
4 212 20.8
pinyou_ord 0 261 25.6
1 161 15.8
2 213 20.9
3 158 15.5
4 227 22.3
sprayyou_ord 0 264 25.9
1 164 16.1
2 209 20.5
3 170 16.7
4 213 20.9
taseyou_ord 0 264 25.9
1 160 15.7
2 204 20.0
3 167 16.4
4 225 22.1
shootyou_ord 0 277 27.2
1 153 15.0
2 198 19.4
3 130 12.7
4 262 25.7
killyou_ord 0 282 27.6
1 153 15.0
2 195 19.1
3 113 11.1
4 277 27.2
Show code
datasummary_skim(dat_long)
Unique Missing Pct. Mean SD Min Median Max Histogram
fear_numeric 5 0 1.9 1.4 0.0 2.0 4.0
N %
race White 5040 49.4
Black 5160 50.6
stopyou_ord 0 1910 18.7
1 1680 16.5
2 2830 27.7
3 2150 21.1
4 1630 16.0
searchyou_ord 0 2100 20.6
1 2050 20.1
2 2770 27.2
3 1680 16.5
4 1600 15.7
yellyou_ord 0 2000 19.6
1 1870 18.3
2 2720 26.7
3 1940 19.0
4 1670 16.4
handcyou_ord 0 2170 21.3
1 2000 19.6
2 2350 23.0
3 1830 17.9
4 1850 18.1
kickyou_ord 0 2650 26.0
1 1700 16.7
2 2120 20.8
3 1610 15.8
4 2120 20.8
pinyou_ord 0 2610 25.6
1 1610 15.8
2 2130 20.9
3 1580 15.5
4 2270 22.3
sprayyou_ord 0 2640 25.9
1 1640 16.1
2 2090 20.5
3 1700 16.7
4 2130 20.9
taseyou_ord 0 2640 25.9
1 1600 15.7
2 2040 20.0
3 1670 16.4
4 2250 22.1
shootyou_ord 0 2770 27.2
1 1530 15.0
2 1980 19.4
3 1300 12.7
4 2620 25.7
killyou_ord 0 2820 27.6
1 1530 15.0
2 1950 19.1
3 1130 11.1
4 2770 27.2
outcome handcyou 1020 10.0
kickyou 1020 10.0
killyou 1020 10.0
pinyou 1020 10.0
searchyou 1020 10.0
shootyou 1020 10.0
sprayyou 1020 10.0
stopyou 1020 10.0
taseyou 1020 10.0
yellyou 1020 10.0
fear 0 2431 23.8
1 1721 16.9
2 2298 22.5
3 1659 16.3
4 2091 20.5
fear_categorical 0 2431 23.8
1 1721 16.9
2 2298 22.5
3 1659 16.3
4 2091 20.5

5 Model 1: Frequentist OLS Replication

This replicates the Ordinary Least Squares (OLS) regression model from Pickett et al. (2022), predicting the 0-100 normed fear scale using race as a predictor. Robust standard errors are used, consistent with common practice in criminology.

Show code
# Model 1: Frequentist OLS Replication
# Uses the 'fearpol_norm' variable created in the data preparation script.
ols_model <- lm(fearpol_norm ~ race, data = dat)

# Calculate robust standard errors
# Need 'sandwich' package for vcovHC and 'lmtest' for coeftest
robust_results <- coeftest(ols_model, vcov = vcovHC(ols_model, type = "HC1"))

# Print model summary with robust standard errors
print(robust_results)

t test of coefficients:

            Estimate Std. Error t value            Pr(>|t|)    
(Intercept)    32.19       1.33    24.2 <0.0000000000000002 ***
raceBlack      31.62       1.84    17.1 <0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show code
# Store coefficient for comparison table
ols_coef <- coef(ols_model)["raceBlack"]
ols_se <- sqrt(diag(vcovHC(ols_model, type = "HC1")))["raceBlack"]

6 Model 2: Bayesian Linear Replication

This section provides the Bayesian equivalent of the OLS model, predicting the normed fear scale using rstanarm. This model uses diffuse priors, allowing the likelihood (data) to drive the posterior estimates.

Show code
# Model 2: Bayesian Linear Replication
# Uses rstanarm package for Bayesian linear regression.
# The 'fearpol_norm' variable is the 0-100 normed fear scale.
model_path_blm <- here("Models", "bayesian_linear_fearpol_norm.rds")

# Check if cached model exists; if so, load it. Otherwise, fit and save.
if (file.exists(model_path_blm)) {
  blm_model <- readRDS(model_path_blm)
} else {
  blm_model <- stan_glm(
    formula = fearpol_norm ~ race,
    data = dat,
    family = gaussian(),
    chains = 4,
    cores = nCoresphys,
    seed = 1138
  )
  saveRDS(blm_model, model_path_blm) # Save the model
}

# Print summary of the Bayesian linear model
summary(blm_model, probs = c(0.025, 0.975), digits = 2)

Model Info:
 function:     stan_glm
 family:       gaussian [identity]
 formula:      fearpol_norm ~ race
 algorithm:    sampling
 sample:       4000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 1020
 predictors:   2

Estimates:
              mean   sd    2.5%   97.5%
(Intercept) 32.17   1.32 29.55  34.76  
raceBlack   31.65   1.83 28.19  35.28  
sigma       29.47   0.65 28.22  30.79  

Fit Diagnostics:
           mean   sd    2.5%   97.5%
mean_PPD 48.18   1.32 45.62  50.80  

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
              mcse Rhat n_eff
(Intercept)   0.02 1.00 3698 
raceBlack     0.03 1.00 3720 
sigma         0.01 1.00 3980 
mean_PPD      0.02 1.00 3916 
log-posterior 0.03 1.00 1716 

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
Show code
# Posterior predictive check to visualize model fit
pp_check(blm_model) +
  labs(title = "Posterior Predictive Check: Bayesian Linear Model")

7 Model 3a: Multilevel Cumulative Probit (Frequentist)

The code below uses the clmm function from the ordinal package to fit a frequentist multilevel cumulative probit model. This frequentist approach mimics the Bayesian ordinal MLM model reported in the paper (see also Model 3b below); it respects the ordinal nature of the fear variable, accounts for the clustering of items within respondents, and specifies random effects across items.

Challenge: These models are notorious for convergence failures due to the complexity of integrating over random effects in ordinal models. We implement a robust three-strategy approach:

  • Strategy 1: Default settings with high-accuracy integration (nAGQ=7)
  • Strategy 2: Custom starting values from simpler models with moderate integration (nAGQ=5)
  • Strategy 3: Simplified settings with fast approximation (nAGQ=1)
Show code
# Model 3a: Multilevel Cumulative Probit (Frequentist)
# Uses the 'dat_long' data and 'fear' (ordered factor) variable.
library(ordinal) # Ensure 'ordinal' package is loaded

clmm_model_path <- here("Models", "clmm_fear_race_outcome.rds")

# Initialize success flag
clmm_success <- FALSE

if (file.exists(clmm_model_path)) {
  clmm_model <- readRDS(clmm_model_path)
  clmm_success <- TRUE
} else {
  
  clmm_model <- NULL
  
  # Strategy 1: Try without starting values first
  message("Attempting multilevel model without custom starting values...")
  try({
    clmm_model <- clmm(
      fear ~ race + (1 | outcome),
      data = dat_long,
      link = "probit",
      nAGQ = 7,
      control = clmm.control(maxIter = 1000, gradTol = 1e-4)
    )
    clmm_success <- TRUE
    message("Success with default starting values!")
  }, silent = TRUE)
  
  # Strategy 2: Try with custom starting values if Strategy 1 failed
  if (!clmm_success) {
    message("Default approach failed. Trying with custom starting values...")
    try({
      # Fit simple model for reference
      simple_ord <- clm(fear ~ race, data = dat_long, link = "probit")
      simple_coefs <- coef(simple_ord)
      
      # Starting values
      beta_race <- simple_coefs["raceBlack"]
      sigma_outcome <- 0.3
      start_vals <- c(beta_race, sigma_outcome)
      names(start_vals) <- c("raceBlack", "outcome.(Intercept)")
      
      message("Using starting values: ", paste(names(start_vals), "=", round(start_vals, 3), collapse = ", "))
      
      clmm_model <- clmm(
        fear ~ race + (1 | outcome),
        data = dat_long,
        link = "probit",
        nAGQ = 5,
        start = start_vals,
        control = clmm.control(maxIter = 1000, gradTol = 1e-4)
      )
      clmm_success <- TRUE
      message("Success with custom starting values!")
    }, silent = TRUE)
  }
  
  # Strategy 3: Try simplified settings if Strategy 2 failed
  if (!clmm_success) {
    message("Custom starting values also failed. Trying simplest approach...")
    try({
      clmm_model <- clmm(
        fear ~ race + (1 | outcome),
        data = dat_long,
        link = "probit",
        nAGQ = 1,
        control = clmm.control(maxIter = 100)
      )
      clmm_success <- TRUE
      message("Success with simplified settings!")
    }, silent = TRUE)
  }
  
  # Save if successful
  if (clmm_success && !is.null(clmm_model)) {
    saveRDS(clmm_model, clmm_model_path)
  }
}

# Report final results
if (clmm_success && !is.null(clmm_model)) {
  cat("\n=== CLMM Model Successfully Fitted ===\n")
  
  # Check if model has valid results
  try({
    model_summary <- summary(clmm_model)
    print(model_summary)
  }, silent = FALSE)
  
  # Check convergence using clmm-specific methods
  if ("info" %in% names(clmm_model) && !is.null(clmm_model$info)) {
    conv_code <- clmm_model$info$convergence
    if (!is.null(conv_code) && conv_code == 0) {
      message("Model converged successfully!")
    } else {
      message("Model fitted but convergence may be questionable. Using simplified settings (nAGQ=1).")
      message("Results should be interpreted with caution but are usable for tutorial purposes.")
    }
  } else {
    message("Model fitted with simplified settings. Results are usable for demonstration purposes.")
  }
  
  # Store success flag for later sections
  assign("clmm_converged", TRUE, envir = .GlobalEnv)
  
} else {
  cat("\n=== CLMM Model Failed ===\n")
  message("All CLMM strategies failed to converge.")
  message("This is common with complex ordinal multilevel models.")
  message("The item-specific alternative approach will be used instead.")
  
  # Store failure flag for later sections
  assign("clmm_converged", FALSE, envir = .GlobalEnv)
}

=== CLMM Model Successfully Fitted ===
Cumulative Link Mixed Model fitted with the Laplace approximation

formula: fear ~ race + (1 | outcome)
data:    dat_long

 link   threshold nobs  logLik    AIC      niter    max.grad cond.H
 probit flexible  10200 -15283.33 30578.66 706(827) 5.73e-05 NaN   

Random effects:
 Groups  Name        Variance       Std.Dev. 
 outcome (Intercept) 0.000000000406 0.0000202
Number of groups:  outcome 10 

Coefficients:
          Estimate Std. Error z value Pr(>|z|)
raceBlack    0.988        NaN     NaN      NaN

Threshold coefficients:
    Estimate Std. Error z value
0|1   -0.306        NaN     NaN
1|2    0.234        NaN     NaN
2|3    0.888        NaN     NaN
3|4    1.429        NaN     NaN
Show code
# Interpretation note
if (clmm_success) {
  cat("\n=== Model Interpretation ===\n")
  cat("Race coefficient (raceBlack):", round(coef(clmm_model)["raceBlack"], 3), "\n")
  cat("This suggests Black respondents have higher latent fear scores.\n")
  cat("WARNING: Random effect variance is essentially zero, which is likely incorrect.\n")
  cat("This suggests the clmm model failed to properly estimate between-item variation.\n")
  cat("The item-specific analysis below reveals substantial variation across items.\n")
}

=== Model Interpretation ===
Race coefficient (raceBlack): 0.99 
This suggests Black respondents have higher latent fear scores.
WARNING: Random effect variance is essentially zero, which is likely incorrect.
This suggests the clmm model failed to properly estimate between-item variation.
The item-specific analysis below reveals substantial variation across items.

8 Alternative Frequentist Approach: Item-Specific Models

Given the convergence issues with the multilevel clmm model above, we implement a more robust approach by fitting separate cumulative probit models to each item and meta-analyzing the results. This approach often reveals patterns that complex multilevel models miss due to convergence failures.

Show code
# Alternative Model 3a: Item-specific cumulative probit models
# This approach fits separate models to each item and can be meta-analyzed

# Create ordered factor versions if not already done
fearpol_ord <- paste0(fearpol_new, "_ord")

# Fit separate cumulative probit models for each item
item_models <- list()
item_results <- tibble()

for (item in fearpol_new) {
  item_ord <- paste0(item, "_ord")
  
  # Fit cumulative probit model for this item
  model <- clm(
    formula = as.formula(paste0(item_ord, " ~ race")), 
    data = dat, 
    link = "probit"
  )
  
  item_models[[item]] <- model
  
  # Extract coefficient for race
  race_coef <- coef(model)["raceBlack"]
  race_se <- sqrt(vcov(model)["raceBlack", "raceBlack"])
  
  # Add to results
  item_results <- bind_rows(item_results, 
                           tibble(
                             item = item,
                             coefficient = race_coef,
                             se = race_se,
                             z_value = race_coef / race_se,
                             p_value = 2 * (1 - pnorm(abs(race_coef / race_se)))
                           ))
}

# Meta-analytic summary (simple average)
meta_summary <- item_results %>%
  summarise(
    mean_coefficient = mean(coefficient),
    se_mean = sqrt(sum(se^2)) / n(), # Combined standard error
    items_significant = sum(p_value < 0.05),
    total_items = n()
  )

print("Item-specific cumulative probit results:")
[1] "Item-specific cumulative probit results:"
Show code
print(item_results)
# A tibble: 10 × 5
   item      coefficient     se z_value p_value
   <chr>           <dbl>  <dbl>   <dbl>   <dbl>
 1 stopyou         0.917 0.0688    13.3       0
 2 searchyou       0.853 0.0684    12.5       0
 3 yellyou         0.725 0.0676    10.7       0
 4 handcyou        0.926 0.0689    13.4       0
 5 kickyou         1.05  0.0702    14.9       0
 6 pinyou          1.05  0.0704    15.0       0
 7 sprayyou        1.00  0.0698    14.4       0
 8 taseyou         1.04  0.0702    14.8       0
 9 shootyou        1.14  0.0715    16.0       0
10 killyou         1.16  0.0718    16.2       0
Show code
print("Meta-analytic summary:")
[1] "Meta-analytic summary:"
Show code
print(meta_summary)
# A tibble: 1 × 4
  mean_coefficient se_mean items_significant total_items
             <dbl>   <dbl>             <int>       <int>
1            0.987  0.0221                10          10
Show code
# Note: This approach sacrifices the multilevel structure but provides
# a robust frequentist alternative that mirrors the Bayesian approach

{cat("\n=== Key Findings ===\n")
cat("Range of race effects: ", round(min(item_results$coefficient), 2), " to ", round(max(item_results$coefficient), 2), "\n")
cat("This", round(max(item_results$coefficient) - min(item_results$coefficient), 2), "unit range contradicts the clmm finding of no item variation.\n")
cat("More severe items (kill, shoot, physical violence) show larger race disparities.\n")
cat("Meta-analytic average effect:", round(meta_summary$mean_coefficient, 2), "\n")
cat("Note: This", round(meta_summary$mean_coefficient, 2), "average is nearly identical to the clmm coefficient of", round(coef(clmm_model)["raceBlack"], 2), "\n")
cat("However, the clmm model incorrectly suggests no variation across items (variance ≈ 0).\n")
cat("The item-specific approach reveals the true heterogeneity that clmm missed.\n")
}

=== Key Findings ===
Range of race effects:  0.72  to  1.2 
This 0.44 unit range contradicts the clmm finding of no item variation.
More severe items (kill, shoot, physical violence) show larger race disparities.
Meta-analytic average effect: 0.99 
Note: This 0.99 average is nearly identical to the clmm coefficient of 0.99 
However, the clmm model incorrectly suggests no variation across items (variance ≈ 0).
The item-specific approach reveals the true heterogeneity that clmm missed.

9 Model 3b: Multilevel Cumulative Probit (Bayesian)

This model replicates the Bayesian multilevel cumulative probit model from the original supplement, using brms. This is the primary ordinal multilevel model discussed in the main text.

Why we prefer the Bayesian model: As demonstrated by the convergence failures in Model 3a above, frequentist multilevel ordinal models (clmm) are notoriously unstable. The Bayesian approach offers several key advantages:

  • Superior convergence: MCMC sampling is more robust to complex likelihood surfaces
  • Maintains multilevel structure: Unlike the item-specific fallback, this preserves the hierarchical data structure
  • Partial pooling: Estimates are “shrunk” toward the overall mean, providing more stable item-specific effects and better out-of-sample predictive performance
  • Uncertainty quantification: Full posterior distributions capture both within- and between-item uncertainty
  • Flexible priors: Can incorporate substantive knowledge and cumulative evidence from previous studies to aid convergence and improve estimation
  • Comprehensive model checking: Posterior predictive checks allow intuitive assessment of model fit by comparing observed data to model-generated predictions, incorporating parameter uncertainty

The Bayesian model should successfully estimate both the overall race effect AND the item-to-item variation that the clmm model failed to capture, while avoiding the need to abandon the multilevel structure entirely.

Show code
# Model 3b: Multilevel Cumulative Probit (Bayesian)
# Re-uses the 'hier_mod' object from the original supplement file.
model_path_bcpm <- here("Models", "hier_fear_fit.rds")

if (file.exists(model_path_bcpm)) {
  bcpm_model <- readRDS(model_path_bcpm)
} else {
  # Bayesian multilevel cumulative probit model
  bf_fear <- bf(fear | thres(4) ~ race + (race | outcome))
  priors_hier <- c(
    prior(normal(-0.84, 1), class = "Intercept", coef = "1"),
    prior(normal(-0.25, 1), class = "Intercept", coef = "2"),
    prior(normal(0.25, 1), class = "Intercept", coef = "3"),
    prior(normal(0.84, 1), class = "Intercept", coef = "4"),
    prior(normal(0, 1), class = "b"),
    prior(exponential(1), class = "sd"),
    prior(lkj(2), class = "cor")
  )

  bcpm_model <- brm(
    bf_fear,
    data = dat_long,
    family = cumulative("probit"),
    prior = priors_hier,
    cores = nCoresphys,
    chains = 4,
    iter = 2000,
    warmup = 1000,
    seed = 1138,
    control = list(adapt_delta = 0.95),
    file = model_path_bcpm,
    file_refit = "on_change"
  )
  saveRDS(bcpm_model, model_path_bcpm)
}

# Print summaries of the Bayesian multilevel cumulative probit model
summary(bcpm_model, probs = c(0.025, 0.975), digits = 2)
 Family: cumulative 
  Links: mu = probit; disc = identity 
Formula: fear | thres(4) ~ race + (race | outcome) 
   Data: dat_long (Number of observations: 10200) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~outcome (Number of levels: 10) 
                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept)                0.09      0.03     0.04     0.15 1.00     1720
sd(raceBlack)                0.17      0.05     0.10     0.29 1.00     1583
cor(Intercept,raceBlack)    -0.83      0.16    -0.99    -0.40 1.00     1457
                         Tail_ESS
sd(Intercept)                2303
sd(raceBlack)                2415
cor(Intercept,raceBlack)     2200

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]    -0.31      0.03    -0.38    -0.24 1.00     1898     2480
Intercept[2]     0.23      0.03     0.17     0.30 1.00     1881     2408
Intercept[3]     0.89      0.03     0.82     0.95 1.00     1867     2440
Intercept[4]     1.43      0.04     1.36     1.50 1.00     1983     2213
raceBlack        0.98      0.06     0.86     1.10 1.00     1557     2135

Further Distributional Parameters:
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Show code
ranef(bcpm_model, probs = c(0.025, 0.975), digits = 2)
$outcome
, , Intercept

          Estimate Est.Error     Q2.5   Q97.5
handcyou     0.038     0.046 -0.04777  0.1294
kickyou     -0.052     0.048 -0.15259  0.0385
killyou     -0.097     0.049 -0.20011 -0.0053
pinyou      -0.041     0.047 -0.13910  0.0510
searchyou    0.056     0.046 -0.03520  0.1497
shootyou    -0.085     0.049 -0.18762  0.0066
sprayyou    -0.030     0.047 -0.12450  0.0604
stopyou      0.089     0.048  0.00093  0.1902
taseyou     -0.038     0.048 -0.13418  0.0545
yellyou      0.126     0.050  0.03232  0.2327

, , raceBlack

          Estimate Est.Error   Q2.5  Q97.5
handcyou    -0.082     0.079 -0.239  0.076
kickyou      0.070     0.083 -0.090  0.237
killyou      0.238     0.085  0.081  0.416
pinyou       0.088     0.081 -0.069  0.252
searchyou   -0.170     0.081 -0.333 -0.013
shootyou     0.204     0.084  0.050  0.380
sprayyou     0.040     0.083 -0.128  0.199
stopyou     -0.133     0.080 -0.292  0.022
taseyou      0.079     0.082 -0.077  0.240
yellyou     -0.252     0.083 -0.417 -0.091
Show code
# Posterior predictive check
pp_check(bcpm_model, ndraws = 100) +
  labs(title = "Posterior Predictive Check: Bayesian Multilevel CP Model")

Show code
# Interpretation note for Bayesian model
{cat("\n=== Bayesian Model Interpretation ===\n")

# Extract race coefficient summary
race_coef_summary <- summary(bcpm_model)$fixed
race_coef_mean <- race_coef_summary["raceBlack", "Estimate"]
race_coef_lower <- race_coef_summary["raceBlack", "l-95% CI"]
race_coef_upper <- race_coef_summary["raceBlack", "u-95% CI"]

cat("Race coefficient (raceBlack): ", round(race_coef_mean, 3), 
    " [95% CI: ", round(race_coef_lower, 3), ", ", round(race_coef_upper, 3), "]\n", sep="")

# Extract random effects variance
random_effects <- VarCorr(bcpm_model)
outcome_var <- random_effects$outcome$sd["Intercept", "Estimate"]
race_var <- random_effects$outcome$sd["raceBlack", "Estimate"] 

cat("Random intercept SD: ", round(outcome_var, 3), "\n")
cat("Random race slope SD: ", round(race_var, 3), "\n") 
}

=== Bayesian Model Interpretation ===
Race coefficient (raceBlack): 0.98 [95% CI: 0.86, 1.1]
Random intercept SD:  0.086 
Random race slope SD:  0.17 
Show code
{cat("\n=== Key Contrasts with Frequentist Results ===\n")
cat("• Race effects: Bayesian (", round(race_coef_mean, 2), ") ≈ clmm (", round(coef(clmm_model)["raceBlack"], 2), 
    ") ≈ item-specific (", round(meta_summary$mean_coefficient, 2), ") - all methods agree on aggregate race effect\n")
cat("• Between-item variation: Bayesian SD = ", round(outcome_var, 3), 
    " vs clmm ≈ 0 (failed to detect)\n")
cat("• Race effect variation: Bayesian SD = ", round(race_var, 3), 
    " - race effects vary meaningfully across items\n")
cat("• Uncertainty quantification: 95% CI = ±", 
    round((race_coef_upper - race_coef_lower)/2, 2), " units around point estimate\n")
}

=== Key Contrasts with Frequentist Results ===
• Race effects: Bayesian ( 0.98 ) ≈ clmm ( 0.99 ) ≈ item-specific ( 0.99 ) - all methods agree on aggregate race effect
• Between-item variation: Bayesian SD =  0.086  vs clmm ≈ 0 (failed to detect)
• Race effect variation: Bayesian SD =  0.17  - race effects vary meaningfully across items
• Uncertainty quantification: 95% CI = ± 0.12  units around point estimate
Show code
{cat("\n=== Substantive Insights ===\n")
cat("• Average race effect (~1.0) represents a large difference (~1SD) on the latent scale\n")
cat("• Race effects range from ~", round(race_coef_mean - 1.96*race_var, 2), 
    " to ~", round(race_coef_mean + 1.96*race_var, 2), " across items (±2SD)\n")
cat("• This matches the item-specific range of ", round(min(item_results$coefficient), 2),
    " to ", round(max(item_results$coefficient), 2), " observed earlier\n")
}

=== Substantive Insights ===
• Average race effect (~1.0) represents a large difference (~1SD) on the latent scale
• Race effects range from ~ 0.64  to ~ 1.3  across items (±2SD)
• This matches the item-specific range of  0.72  to  1.2  observed earlier
Show code
{cat("\n=== Model Success ===\n")
cat("✓ Converged without issues (unlike clmm)\n")
cat("✓ Maintained multilevel structure (unlike item-specific approach)\n") 
cat("✓ Estimated realistic between-item variation (SD = ", round(outcome_var, 3), ")\n")
cat("✓ Captured meaningful race effect heterogeneity (SD = ", round(race_var, 3), ")\n")
cat("✓ Provided full uncertainty quantification\n")
cat("✓ Results align with item-specific findings while adding structure\n")
}

=== Model Success ===
✓ Converged without issues (unlike clmm)
✓ Maintained multilevel structure (unlike item-specific approach)
✓ Estimated realistic between-item variation (SD =  0.086 )
✓ Captured meaningful race effect heterogeneity (SD =  0.17 )
✓ Provided full uncertainty quantification
✓ Results align with item-specific findings while adding structure

10 Testing Proportional Odds Assumption

The proportional odds assumption is crucial for cumulative ordinal models. It assumes that the effect of race is the same across all ordinal thresholds (i.e., the race effect is the same for 0|1, 1|2, 2|3, and 3|4 cutpoints).

Testing Strategy: We’ll compare two Bayesian models with different assumptions:

  • Model 3b (Restrictive): Cumulative probit model that assumes proportional odds - estimates one race coefficient that applies to all ordinal thresholds (0|1, 1|2, 2|3, 3|4)
  • Category-Specific Model (Flexible): Categorical logit model - estimates separate race coefficients for each response category (categories 1, 2, 3, 4 vs. reference category 0)

Note on Link Functions: We use logit for the categorical model because the categorical family in brms only supports logit links, while our cumulative model uses probit. This difference in link functions doesn’t affect our test of the proportional odds assumption - we’re comparing model structures (one vs. multiple race coefficients), not link functions, and our model comparison method (LOO cross-validation) evaluates which model better predicts out-of-sample data regardless of the specific link functions used.

What “Category-Specific” Means: Instead of modeling ordinal thresholds, the categorical model directly estimates how race affects the probability of each response category relative to a reference category. This allows race to have different effects on being “afraid” vs “very afraid” compared to “very unafraid.”

Model Comparison Method: We’ll use Leave-One-Out Cross-Validation (LOO-CV) to assess which model better predicts out-of-sample data. Better predictive performance suggests the model captures the true data structure more accurately.

Show code
# Testing Proportional Odds Assumption using Bayesian Models
# Compare restrictive (proportional odds) vs flexible (category-specific) models

message("Testing proportional odds assumption using Bayesian model comparison...")

# Model 3b (restrictive) assumes proportional odds - we already have this
# Now fit a category-specific model that relaxes this assumption

model_path_nonprop <- here("Models", "nonprop_odds_fear_fit.rds")

if (file.exists(model_path_nonprop)) {
  category_specific_model <- readRDS(model_path_nonprop)
} else {
  message("Fitting Bayesian category-specific model...")
  
  # Category-specific model: separate effects for each response category
  bf_category_specific <- bf(fear_categorical ~ race + (1 | outcome))
  
  category_specific_model <- brm(
    bf_category_specific,
    data = dat_long,
    family = categorical("logit"),  # No proportional odds assumption
    cores = nCoresphys,
    chains = 4,
    iter = 2000,
    warmup = 1000,
    seed = 1138,
    control = list(adapt_delta = 0.95),
    file = model_path_nonprop,
    file_refit = "on_change"
  )
  saveRDS(category_specific_model, model_path_nonprop)
}

message("Both models fitted successfully. Comparing predictive performance...")

# Cache the LOO computations 
loo_path_prop <- here("Models", "loo_proportional.rds")
loo_path_cat <- here("Models", "loo_category_specific.rds")

if (file.exists(loo_path_prop)) {
  loo_proportional <- readRDS(loo_path_prop)
} else {
  loo_proportional <- loo(bcpm_model, cores = nCoresphys)
  saveRDS(loo_proportional, loo_path_prop)
}

if (file.exists(loo_path_cat)) {
  loo_category_specific <- readRDS(loo_path_cat)
} else {
  loo_category_specific <- loo(category_specific_model, cores = nCoresphys)
  saveRDS(loo_category_specific, loo_path_cat)
}

# Statistical comparison
loo_comparison <- loo_compare(loo_proportional, loo_category_specific)
print(loo_comparison)
                        elpd_diff se_diff
category_specific_model   0.0       0.0  
bcpm_model              -84.5      14.7  
Show code
# Extract key statistics 
elpd_diff <- loo_comparison[2, "elpd_diff"]  # bcpm_model row, not first row
se_diff <- loo_comparison[2, "se_diff"]
Show code
{cat("\n=== Proportional Odds Test Results ===\n")
cat("Expected Log Pointwise Density (ELPD) difference: ", round(elpd_diff, 1), " ± ", round(se_diff, 1), "\n")
cat("Interpretation: Negative = category-specific model preferred (PO assumption violated)\n")
}

=== Proportional Odds Test Results ===
Expected Log Pointwise Density (ELPD) difference:  -84  ±  15 
Interpretation: Negative = category-specific model preferred (PO assumption violated)
Show code
# Statistical interpretation using standard rule
if (abs(elpd_diff) > 2 * se_diff) {
  if (elpd_diff < 0) {
    message("PROPORTIONAL ODDS ASSUMPTION VIOLATED") 
    message("The category-specific model has substantially better predictive performance")
    message("Race effects vary significantly across response categories")
    message("ELPD difference of ", round(elpd_diff, 1), " is ", round(abs(elpd_diff)/se_diff, 1), " standard errors from zero")
  } else {
    message("PROPORTIONAL ODDS ASSUMPTION SUPPORTED")
    message("The proportional odds model has meaningfully better predictive performance")
  }
} else {
  message("INCONCLUSIVE EVIDENCE")
  message("Both models have similar predictive performance")
}

11 Interpretation: Model Selection in Practice

The proportional odds test reveals an important finding: the assumption is violated in these data (ELPD difference = -84.5 ± 14.7). Race effects vary substantially across response categories, with the category-specific model providing much better predictive performance. This suggests that racial disparities in fear of police are not uniform across the ordinal scale - for instance, the effect may be larger for extreme fear responses than moderate ones.

Should we therefore prefer the category-specific model? While this model fits the data better, the choice depends on analytical goals:

Arguments for category-specific modeling: - Better captures the true data structure (as evidenced by superior predictive performance) - Reveals heterogeneous effects across response levels - More flexible and data-driven approach

What does this mean if we use cumulative ordinal modeling? - Still maintains the inherent ordering of responses (very afraid > afraid > neither > unafraid > very unafraid) - The comparison in our main paper showed cumulative ordinal models vastly outperformed linear models on fit statistics, even if they don’t perfectly satisfy proportional odds - Crucially: even a “imperfect” ordinal model is vastly superior to linear models that ignore the categorical nature of the data entirely

Practical recommendation: The cumulative ordinal approach (Model 3b) represents a substantial improvement over linear modeling while maintaining interpretability. The proportional odds violation suggests caution in interpretation - the single race coefficient represents an average effect across response categories, but researchers should be aware that the actual effects vary. In applications, researchers should explore item-level and category-specific effect heterogeneity patterns revealed by their multilevel models and diagnostic tests.

Bottom line: The choice should be between different ordinal approaches (cumulative vs. category-specific), not between ordinal and linear. Linear models should be abandoned for ordinal outcomes regardless of interpretability concerns.

Speaking of interpretability concerns… these can be directly addressed through predicted probabilities - a major strength of ordinal approaches demonstrated in the next section. Rather than relying on difficult-to-interpret (or, too often, fundamentally meaningless) coefficients on latent scales, ordinal models allow us to generate intuitive, policy-relevant predictions about the probability of each response category for different groups. This provides far richer substantive insights than linear model coefficients while maintaining statistical rigor.

12 Generating Predicted Probabilities

12.1 Predicted Probabilities from Item-Specific Models (Alternative Approach)

Since the clmm model had convergence issues, we’ll first demonstrate predicted probabilities using the robust item-specific approach. We’ll use the “yellyou” and “killyou” items as an exemplars, as these items showed the smallest and largest race effect estimates respectively.

Show code
# Generate predicted probabilities from item-specific models
# Using smallest and largest race effect items to show heterogeneity

# Identify smallest and largest race effect items
min_effect_item <- item_results$item[which.min(item_results$coefficient)]
max_effect_item <- item_results$item[which.max(item_results$coefficient)]

{cat("=== Demonstrating Race Effect Heterogeneity ===\n")
cat("Smallest race effect:", min_effect_item, "(coefficient =", round(min(item_results$coefficient), 3), ")\n")
cat("Largest race effect:", max_effect_item, "(coefficient =", round(max(item_results$coefficient), 3), ")\n")
cat("This", round(max(item_results$coefficient) - min(item_results$coefficient), 2), "unit difference shows substantial item-to-item variation\n\n")
}
=== Demonstrating Race Effect Heterogeneity ===
Smallest race effect: yellyou (coefficient = 0.72 )
Largest race effect: killyou (coefficient = 1.2 )
This 0.44 unit difference shows substantial item-to-item variation
Show code
# Function to generate predictions for any item
generate_item_predictions <- function(item_name) {
  model <- item_models[[item_name]]
  newdata <- data.frame(race = factor(c("White", "Black")))
  preds <- predict(model, newdata = newdata, type = "prob")
  
  # Convert to tidy format
  probs <- preds %>%
    as.data.frame() %>%
    mutate(race = c("White", "Black")) %>%
    pivot_longer(cols = -race, names_to = "category", values_to = "probability") %>%
    mutate(
      category = as.numeric(gsub("[^0-9]", "", category)),
      item = item_name
    )
  
  # Calculate contrasts
  contrasts <- probs %>%
    dplyr::select(race, category, probability, item) %>%
    pivot_wider(names_from = race, values_from = probability, values_fn = list) %>%
    mutate(
      White = as.numeric(White),
      Black = as.numeric(Black),
      contrast = Black - White,
      item = item_name
    )
  
  return(list(probs = probs, contrasts = contrasts))
}

# Generate predictions for both items
min_results <- generate_item_predictions(min_effect_item)
max_results <- generate_item_predictions(max_effect_item)

# Combine results
combined_probs <- bind_rows(min_results$probs, max_results$probs) %>%
  mutate(
    item_label = case_when(
      item == min_effect_item ~ paste0(item, " (smallest effect: ", round(min(item_results$coefficient), 2), ")"),
      item == max_effect_item ~ paste0(item, " (largest effect: ", round(max(item_results$coefficient), 2), ")"),
      TRUE ~ item
    )
  )

combined_contrasts <- bind_rows(min_results$contrasts, max_results$contrasts) %>%
  mutate(
    item_label = case_when(
      item == min_effect_item ~ paste0(item, " (smallest effect)"),
      item == max_effect_item ~ paste0(item, " (largest effect)"),
      TRUE ~ item
    )
  )

# Plot predicted probabilities with point-intervals
p1_comparison <- ggplot(combined_probs, aes(x = category, y = probability, color = race)) +
  geom_point(size = 2, position = position_dodge(width = 0.3)) +
  geom_line(aes(group = race), position = position_dodge(width = 0.3)) +
  facet_wrap(~item_label, ncol = 1) +
  labs(title = "Race-Specific Predicted Probabilities",
       x = "Response Category (0=Very Unafraid, 4=Very Afraid)", 
       y = "Probability") +
  scale_x_continuous(breaks = 0:4) +
  scale_color_viridis_d(end = 0.8) +  # Use viridis discrete scale
  theme_minimal() +
  theme(text = element_text(family = "serif"),
        legend.position = "bottom", 
        title = element_text(size=10))

# Plot contrasts with point-intervals
p2_comparison <- ggplot(combined_contrasts, aes(x = category, y = contrast)) +
  geom_point(size = 2, alpha = 0.8, color = "#440154FF") +  # Viridis dark purple
  geom_line(color = "#440154FF", alpha = 0.8) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
  facet_wrap(~item_label, ncol = 1) +
  labs(title = "Black-White Probability Differences",
       x = "Response Category", y = "Probability Difference (Black - White)") +
  scale_x_continuous(breaks = 0:4) +
  theme_minimal() + 
  theme(
    text = element_text(family = "serif"),
    title = element_text(size=10))

print(p1_comparison + p2_comparison + plot_annotation(
  title = "Item-Specific Predicted Probabilities: Demonstrating Effect Heterogeneity",
  subtitle = "Predicted Probabilities (Left) & Contrasts (Right), Smallest & Largest Race Effects"))

Show code
# Interpretation
# Compare key statistics
{cat("\n=== Comparative Insights ===\n")
for (item_name in c(min_effect_item, max_effect_item)) {
  results <- if (item_name == min_effect_item) min_results else max_results
  probs <- results$probs
  
  # Afraid or very afraid (categories 3-4)
  afraid_white <- sum(probs[probs$race == "White" & probs$category >= 3, "probability"])
  afraid_black <- sum(probs[probs$race == "Black" & probs$category >= 3, "probability"])
  
  cat("\n", toupper(item_name), "item:\n")
  cat("P(Afraid or Very Afraid) - White:", round(afraid_white, 3), "Black:", round(afraid_black, 3), "\n")
  cat("Difference:", round((afraid_black - afraid_white) * 100, 1), "percentage points\n")
}

cat("\nThis demonstrates how one might use item-level analysis to assess important")
cat("\nheterogeneity that aggregate or multilevel models might obscure.\n")
}

=== Comparative Insights ===

 YELLYOU item:
P(Afraid or Very Afraid) - White: 0.22 Black: 0.48 
Difference: 26 percentage points

 KILLYOU item:
P(Afraid or Very Afraid) - White: 0.17 Black: 0.58 
Difference: 41 percentage points

This demonstrates how one might use item-level analysis to assess important
heterogeneity that aggregate or multilevel models might obscure.

12.2 Predicted Probabilities from Model 3b (Bayesian CP MLM)

The “alternative” frequentist item-specific approach above is informative but has important limitations. Most notably, generating confidence intervals for predicted probabilities from frequentist ordinal models is surprisingly complex - it requires delta method calculations, bootstrap resampling, or other computationally intensive approaches to properly propagate uncertainty from model parameters to probability predictions.

This is a major reason why we emphasize Bayesian approaches from the beginning. With Bayesian models, uncertainty quantification is straightforward and automatic:

  • Full posterior distributions: Every prediction incorporates parameter uncertainty naturally
  • No delta method needed: Uncertainty propagates automatically through the MCMC samples
  • Intuitive intervals: Credible intervals directly represent our uncertainty about the predictions
  • Complex models, simple inference: Even with multilevel structures and partial pooling, predictions remain straightforward

The Bayesian multilevel model below demonstrates this advantage while also capturing both the overall race effect and item-to-item variation through partial pooling - something the frequentist item-specific approach demonstrated above cannot achieve.

Show code
# Prediction grid
nd_bcpm <- tibble(
  race = factor(c("White", "Black"))
)

# === PART 1: Population-Level (Pooled) Predictions ===
# Get predictions, marginalizing over random effects
# This gives us the "average" effect across all items with partial pooling
{cat("=== Population-Level Predictions (Marginalizing Over Items) ===\n")
cat("These represent the pooled/aggregated effect across all items\n")
cat("Equivalent to 'latent fear' but with proper uncertainty quantification\n\n")
}
=== Population-Level Predictions (Marginalizing Over Items) ===
These represent the pooled/aggregated effect across all items
Equivalent to 'latent fear' but with proper uncertainty quantification
Show code
mlm_preds <- add_epred_draws(bcpm_model,
                            newdata = nd_bcpm,
                            re_formula = NA) %>%  # Marginalizes over random effects
  ungroup()

# Get predicted probabilities with full uncertainty quantification
mlm_probs <- mlm_preds %>%
  group_by(race, .category) %>%
  summarise(
    p = mean(.epred),
    ll = quantile(.epred, .025),
    ul = quantile(.epred, .975),
    .groups = "drop"
  )

# Calculate contrasts
mlm_contrasts <- mlm_preds %>%
  dplyr::select(.draw, .category, race, .epred) %>%
  pivot_wider(names_from = race, values_from = .epred) %>%
  mutate(diff = Black - White) %>%
  group_by(.category) %>%
  summarise(
    con = mean(diff),
    ll = quantile(diff, .025),
    ul = quantile(diff, .975),
    .groups = "drop"
  )

# Table of population-level estimates
{cat("Population-Level Predicted Probabilities:\n")
mlm_table <- mlm_probs %>%
  mutate(
    estimate = paste0(round(p, 3), " [", round(ll, 3), ", ", round(ul, 3), "]"),
    category = paste0("Category ", .category)
  ) %>%
  dplyr::select(race, category, estimate) %>%
  pivot_wider(names_from = race, values_from = estimate)

print(mlm_table)
}
Population-Level Predicted Probabilities:
# A tibble: 5 × 3
  category   Black                White               
  <chr>      <chr>                <chr>               
1 Category 0 0.098 [0.084, 0.113] 0.378 [0.353, 0.404]
2 Category 1 0.128 [0.117, 0.14]  0.213 [0.204, 0.223]
3 Category 2 0.235 [0.225, 0.246] 0.221 [0.21, 0.232] 
4 Category 3 0.211 [0.201, 0.221] 0.111 [0.102, 0.121]
5 Category 4 0.328 [0.3, 0.356]   0.076 [0.067, 0.087]
Show code
{cat("\nPopulation-Level Contrasts (Black - White):\n")
contrast_table <- mlm_contrasts %>%
  mutate(
    estimate = paste0(round(con, 3), " [", round(ll, 3), ", ", round(ul, 3), "]"),
    category = paste0("Category ", .category)
  ) %>%
  dplyr::select(category, estimate)

print(contrast_table)
}

Population-Level Contrasts (Black - White):
# A tibble: 5 × 2
  category   estimate               
  <chr>      <chr>                  
1 Category 0 -0.28 [-0.312, -0.246] 
2 Category 1 -0.085 [-0.096, -0.074]
3 Category 2 0.015 [0.006, 0.023]   
4 Category 3 0.1 [0.088, 0.11]      
5 Category 4 0.251 [0.218, 0.284]   
Show code
# === PART 2: Item-Specific Predictions from MLM ===
{cat("\n=== Item-Specific Predictions from MLM ===\n")
cat("Now extracting item-specific predictions for our exemplar items\n")
cat("These show partial pooling: item-specific but informed by overall pattern\n\n")

# Create prediction grid for specific items
nd_items <- expand_grid(
  race = factor(c("White", "Black")),
  outcome = factor(c(min_effect_item, max_effect_item))
)

# Get item-specific predictions (including random effects)
item_preds_mlm <- add_epred_draws(bcpm_model,
                                 newdata = nd_items,
                                 re_formula = NULL) %>%  # Include random effects
  ungroup()

# Process item-specific predictions
item_probs_mlm <- item_preds_mlm %>%
  group_by(race, .category, outcome) %>%
  summarise(
    p = mean(.epred),
    ll = quantile(.epred, .025),
    ul = quantile(.epred, .975),
    .groups = "drop"
  ) %>%
  mutate(
    item_label = case_when(
      outcome == min_effect_item ~ paste0(outcome, " (smallest effect)"),
      outcome == max_effect_item ~ paste0(outcome, " (largest effect)"),
      TRUE ~ as.character(outcome)
    )
  )

# Item-specific contrasts
item_contrasts_mlm <- item_preds_mlm %>%
  dplyr::select(.draw, .category, race, .epred, outcome) %>%
  pivot_wider(names_from = race, values_from = .epred) %>%
  mutate(diff = Black - White) %>%
  group_by(.category, outcome) %>%
  summarise(
    con = mean(diff),
    ll = quantile(diff, .025),
    ul = quantile(diff, .975),
    .groups = "drop"
  ) %>%
  mutate(
    item_label = case_when(
      outcome == min_effect_item ~ paste0(outcome, " (smallest effect)"),
      outcome == max_effect_item ~ paste0(outcome, " (largest effect)"),
      TRUE ~ as.character(outcome)
    )
  )

# Tables for each item
for (item_name in c(min_effect_item, max_effect_item)) {
  cat("\nItem-Specific Predictions -", toupper(item_name), ":\n")
  
  item_table <- item_probs_mlm %>%
    dplyr::filter(outcome == item_name) %>%
    mutate(
      estimate = paste0(round(p, 3), " [", round(ll, 3), ", ", round(ul, 3), "]"),
      category = paste0("Category ", .category)
    ) %>%
    dplyr::select(race, category, estimate) %>%
    pivot_wider(names_from = race, values_from = estimate)
  
  print(item_table)
  
  cat("Contrasts (Black - White):\n")
  item_contrast_table <- item_contrasts_mlm %>%
    dplyr::filter(outcome == item_name) %>%
    mutate(
      estimate = paste0(round(con, 3), " [", round(ll, 3), ", ", round(ul, 3), "]"),
      category = paste0("Category ", .category)
    ) %>%
    dplyr::select(category, estimate)
  
  print(item_contrast_table)
}
}

=== Item-Specific Predictions from MLM ===
Now extracting item-specific predictions for our exemplar items
These show partial pooling: item-specific but informed by overall pattern


Item-Specific Predictions - YELLYOU :
# A tibble: 5 × 3
  category   Black                White               
  <chr>      <chr>                <chr>               
1 Category 0 0.122 [0.104, 0.14]  0.332 [0.3, 0.364]  
2 Category 1 0.144 [0.132, 0.157] 0.211 [0.201, 0.22] 
3 Category 2 0.246 [0.235, 0.256] 0.234 [0.223, 0.246]
4 Category 3 0.205 [0.194, 0.215] 0.127 [0.115, 0.14] 
5 Category 4 0.284 [0.255, 0.314] 0.096 [0.081, 0.112]
Contrasts (Black - White):
# A tibble: 5 × 2
  category   estimate               
  <chr>      <chr>                  
1 Category 0 -0.21 [-0.248, -0.172] 
2 Category 1 -0.066 [-0.079, -0.055]
3 Category 2 0.012 [0.001, 0.021]   
4 Category 3 0.077 [0.064, 0.091]   
5 Category 4 0.188 [0.154, 0.222]   

Item-Specific Predictions - KILLYOU :
# A tibble: 5 × 3
  category   Black                White               
  <chr>      <chr>                <chr>               
1 Category 0 0.076 [0.063, 0.089] 0.416 [0.383, 0.451]
2 Category 1 0.11 [0.098, 0.123]  0.213 [0.204, 0.223]
3 Category 2 0.22 [0.207, 0.233]  0.209 [0.195, 0.222]
4 Category 3 0.214 [0.204, 0.223] 0.099 [0.088, 0.111]
5 Category 4 0.38 [0.348, 0.417]  0.063 [0.053, 0.074]
Contrasts (Black - White):
# A tibble: 5 × 2
  category   estimate               
  <chr>      <chr>                  
1 Category 0 -0.34 [-0.378, -0.304] 
2 Category 1 -0.103 [-0.116, -0.091]
3 Category 2 0.012 [-0.003, 0.027]  
4 Category 3 0.115 [0.103, 0.127]   
5 Category 4 0.317 [0.281, 0.355]   
Show code
# === PLOTTING ===
# Plot 1: Population-level predictions
p1_bcpm <- ggplot(mlm_probs, aes(x = as.numeric(as.character(.category)), y = p, color = race)) +
  geom_pointrange(aes(ymin = ll, ymax = ul), position = position_dodge(width = 0.2), size = .75, linewidth = 1.5) +
  geom_line(aes(group = race), position = position_dodge(width = 0.2)) +
  labs(title = "Population-Level (Pooled across Items)",
       x = "Response Category", y = "Probability") +
  scale_x_continuous(breaks = 0:4) +
  scale_color_viridis_d(end = 0.8) +
  theme_minimal() +
  theme(
    legend.position = "bottom", 
    text = element_text(family = "serif"),
    title = element_text(size=10))

# Plot 2: Item-specific predictions
p2_bcpm <- ggplot(item_probs_mlm, aes(x = as.numeric(as.character(.category)), y = p, color = race)) +
  geom_pointrange(aes(ymin = ll, ymax = ul), position = position_dodge(width = 0.2), size = .5, linewidth = 1) +
  geom_line(aes(group = race), position = position_dodge(width = 0.2)) +
  facet_wrap(~item_label, ncol = 1) +
  labs(title = "Item-Specific (w/Partial Pooling)",
       x = "Response Category", y = "Probability") +
  scale_x_continuous(breaks = 0:4) +
  scale_color_viridis_d(end = 0.8) +
  theme_minimal() +
  theme(
    legend.position = "bottom", 
    text = element_text(family = "serif"),
    title = element_text(size=10))

print(p1_bcpm + p2_bcpm + plot_annotation(
  title = "Bayesian MLM: Population-Level vs Item-Specific Predicted Probabilities", 
  subtitle = "Average (Pooled across Items; Left) & Item-Specific (w/Partial Pooling; Right) Predictions by Race"))

Show code
{cat("\n=== Key Insights ===\n")
cat("• Population-level estimates show the 'average' effect with partial pooling\n")
cat("• Item-specific estimates are 'shrunk' toward the population mean\n") 
cat("• All estimates include automatic uncertainty quantification\n")
cat("• Compare these MLM item-specific estimates to the standalone item-specific models above\n")
}

=== Key Insights ===
• Population-level estimates show the 'average' effect with partial pooling
• Item-specific estimates are 'shrunk' toward the population mean
• All estimates include automatic uncertainty quantification
• Compare these MLM item-specific estimates to the standalone item-specific models above

13 Conclusion

This tutorial has demonstrated multiple approaches to analyzing ordinal fear of police data, progressing from simple linear methods to sophisticated Bayesian multilevel ordinal models. The key findings and recommendations are:

13.1 Key Methodological Insights

Linear models are inadequate for ordinal data. While Models 1 and 2 replicated conventional linear approaches, the subsequent ordinal analyses revealed their fundamental limitations. The comparison with ordinal models (supported by fit statistics in the main paper) showed that treating ordinal responses as continuous may overlook important insights and can produce misleading interpretations, while the predicted probabilities section demonstrated how ordinal approaches can provide genuinely meaningful, policy-relevant insights.

Frequentist multilevel ordinal models often fail. Model 3a illustrated the notorious convergence issues with frequentist multilevel (clmm) models, which often can produce unreliable estimates or fail entirely with complex random effects structures.

Item-specific models provide robust alternatives. When frequentist multilevel models fail, an item-specific approach may offer a reliable fallback that maintains ordinal modeling principles while revealing important effect heterogeneity across items.

Bayesian multilevel models are superior. Model 3b demonstrated that Bayesian approaches more consistently converge, provide automatic uncertainty quantification, and enable sophisticated model checking through posterior predictive checks.

Proportional odds assumptions should be tested. Our diagnostic revealed that race effects vary significantly across ordinal thresholds, highlighting the importance of testing key model assumptions.

Predicted probabilities enhance interpretability. Rather than relying on coefficients on latent scales, ordinal models enable direct prediction of meaningful quantities like “probability of being afraid or very afraid.”

13.2 Practical Recommendations

  1. Abandon linear models for ordinal outcomes regardless of interpretability concerns
  2. Start with Bayesian multilevel ordinal models as the primary approach
  3. Use item-specific models as robust alternatives when multilevel convergence fails
  4. Always test proportional odds assumptions and consider category-specific models when violated
  5. Focus on predicted probabilities rather than model coefficients for substantive interpretation
  6. Leverage automatic uncertainty quantification available in Bayesian frameworks

13.3 Substantive Implications

Beyond methodological considerations, this analysis revealed important substantive findings about racial disparities in fear of police. The race effects vary meaningfully across both items (0.72 to 1.16 on the latent scale) and response categories, with larger effects for more extreme fear responses. Importantly, the item-specific analyses revealed monotonic patterns in these data; that is, race effects consistently increased with item severity. However, such monotonicity cannot be taken for granted across all applications. The scaling and aggregation approaches often used in linear modeling may mask even more consequential heterogeneity patterns that would only be revealed through item-specific, ordinal, and multilevel workflows.

This highlights a critical limitation of composite scale approaches: They assume that relationships are uniform across items and response levels. In reality, predictors may have complex, non-monotonic effects across different components of a scale. Only disaggregated (e.g., item-specific, ordinal, multilevel) approaches can detect whether effects vary by item type, severity level, or response category - patterns that could have profound theoretical and policy implications.

The tutorial demonstrates that methodological rigor and substantive insight are complementary rather than competing goals. Appropriate statistical methods reveal richer, more accurate pictures of social phenomena while maintaining the interpretability that applied researchers require.

13.4 Future Directions

This framework extends naturally to more complex analyses involving multiple predictors, hierarchical data structures, and longitudinal designs. The Bayesian multilevel ordinal approach provides a foundation for sophisticated modeling while avoiding the computational and interpretive pitfalls of linear alternatives.

Most importantly, this tutorial illustrates that the choice between ordinal and linear modeling is not about statistical preferences — it’s about whether we want accurate, meaningful insights or convenient but incomplete and possibly misleading summaries of complex social realities.