This appendix provides the R code necessary to replicate the key models discussed in the main manuscript, particularly those presented in Table 3 comparing different modeling approaches to analyzing fear of police data. We include code for frequentist and Bayesian versions of OLS and multilevel cumulative probit models, as well as methods to test the proportional odds assumption and generate predicted probabilities.
The code relies on data from Pickett et al. (2022) that were pre-processed in our primary R supplement file; see the end (section “APP-A”) for details and code where we saved the necessary replication_data.RData and replication_data_long.RData files.
2 Setup: Load Libraries and Settings
Show code
# Essential packages for data manipulation and modelinglibrary(here)library(tidyverse)library(haven) # For reading .dta files (though data is pre-processed)library(datawizard) # For row_meanslibrary(lmtest) # For coeftestlibrary(sandwich) # For vcovHC (robust standard errors)library(rstanarm) # For Bayesian linear modelslibrary(brms) # For Bayesian (multilevel) ordinal and categorical modelslibrary(ordinal) # For frequentist (multilevel) ordinal models (clmm)library(MASS) # For polr (frequentist ordinal non-multilevel)library(gt) # For creating nice tableslibrary(scales) # For pretty breaks in plotslibrary(patchwork) # For combining plotslibrary(parallelly) # For detecting CPU coreslibrary(bayesplot) # For color_scheme_set and other posterior plotting functionslibrary(modelsummary) # For datasummary_skim and other model summary functionslibrary(ggeffects) # For predicted probabilities from frequentist modelslibrary(tidybayes) # For Bayesian model predictionslibrary(viridis) # For colorblind-accessible plots # Set default ggplot2 colors to viridisoptions(ggplot2.discrete.colour = scale_color_viridis_d)options(ggplot2.discrete.fill = scale_fill_viridis_d)# Set global options for `knitr`knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE, fig.align="center")options(scipen =999, digits =2) # Set global option to two decimals for non-plot outputs# Identify & set number of available CPU cores for parallel processing (e.g., Stan models)nCoresphys <- parallelly::availableCores(logical =FALSE)options(mc.cores = nCoresphys)# Set color scheme for `bayesplot`color_scheme_set("viridisA")# Ensure the 'Models' directory exists for saving cached model fitsif (!dir.exists(here("Models"))) {dir.create(here("Models"))}# Vector of fear item names (should match your data prep)fearpol_new <-c("stopyou", "searchyou", "yellyou", "handcyou","kickyou", "pinyou", "sprayyou", "taseyou","shootyou", "killyou")
3 Tutorial Overview
The table below (Appendix A in manuscript) provides a roadmap of the modeling approaches demonstrated in this tutorial. Each section builds on the previous ones, showing the advantages of ordinal modeling approaches over linear methods while providing practical solutions for common convergence issues.
Readers primarily interested in practical implementation may focus on Sections Model 2, Model 3b, and Predicted Probabilities. Those wanting to understand additional issues (e.g., frequentist convergence challenges) should review the full sequence.
3.1 Appendix A Summary Table
Show code
# === APPENDIX 3: TUTORIAL OVERVIEW ===# This table summarizes the modeling approaches demonstrated in our R tutorial# Available at: [INSERT URL]tutorial_overview <-tibble(Section =c("Model 1", "Model 2", "Model 3a", "Alternative 3a", "Model 3b", "Proportional Odds Test", "Predicted Probabilities"),Approach =c("Frequentist OLS", "Bayesian Linear", "Frequentist CP MLM", "Item-Specific CP Models", "Bayesian CP MLM", "Category-Specific Model","Probability Estimation"),Family =c("Gaussian", "Gaussian", "Cumulative probit", "Cumulative probit", "Cumulative probit", "Categorical logit", "N/A"),Structure =c("Single-level", "Single-level", "Multilevel", "Item-specific","Multilevel", "Multilevel", "From fitted models"),Status =c("Replicates original", "Converges", "Convergence issues", "Robust alternative","Converges successfully", "For assumption testing", "Demonstrates interpretation"),Key_Purpose =c("Baseline comparison with linear approach","Bayesian equivalent of original analysis", "Frequentist multilevel ordinal (often fails to converge)","Simpler & more reliable fallback if frequentist MLM convergence fails","Primary recommended multilevel ordinal approach","Tests whether race effects vary across thresholds","Shows interpretable outputs: item-specific vs population-level"))# Display the tutorial overview tablegt(tutorial_overview) %>%tab_header(title ="Appendix A: R Tutorial Structure Overview",subtitle ="Comprehensive comparison of modeling approaches for ordinal fear of police data" ) %>%cols_label(Section ="Tutorial Section",Approach ="Statistical Approach", Family ="Model Family",Structure ="Data Structure",Status ="Convergence Status",Key_Purpose ="Purpose & Key Insights" ) %>%tab_options(table.font.size =11,heading.align ="center" ) %>%tab_footnote(footnote ="Complete R code, data, and detailed explanations available in online 'Appendix 3' Quarto document",locations =cells_title(groups ="title") ) %>%tab_footnote(footnote ="Demonstrates why Bayesian multilevel ordinal models are preferred over linear approaches",locations =cells_title(groups ="subtitle") )
Appendix A: R Tutorial Structure Overview1
Comprehensive comparison of modeling approaches for ordinal fear of police data2
Tutorial Section
Statistical Approach
Model Family
Data Structure
Convergence Status
Purpose & Key Insights
Model 1
Frequentist OLS
Gaussian
Single-level
Replicates original
Baseline comparison with linear approach
Model 2
Bayesian Linear
Gaussian
Single-level
Converges
Bayesian equivalent of original analysis
Model 3a
Frequentist CP MLM
Cumulative probit
Multilevel
Convergence issues
Frequentist multilevel ordinal (often fails to converge)
Alternative 3a
Item-Specific CP Models
Cumulative probit
Item-specific
Robust alternative
Simpler & more reliable fallback if frequentist MLM convergence fails
Model 3b
Bayesian CP MLM
Cumulative probit
Multilevel
Converges successfully
Primary recommended multilevel ordinal approach
Proportional Odds Test
Category-Specific Model
Categorical logit
Multilevel
For assumption testing
Tests whether race effects vary across thresholds
Predicted Probabilities
Probability Estimation
N/A
From fitted models
Demonstrates interpretation
Shows interpretable outputs: item-specific vs population-level
1 Complete R code, data, and detailed explanations available in online 'Appendix 3' Quarto document
2 Demonstrates why Bayesian multilevel ordinal models are preferred over linear approaches
4 Load Replication Data
Show code
# Load the main processed dataload(here("Data", "replication_data.RData"))# Load the long-format data for multilevel modelsload(here("Data", "replication_data_long.RData"))# Display a summary of the main data to confirm successful loadingdatasummary_skim(dat)
Unique
Missing Pct.
Mean
SD
Min
Median
Max
Histogram
stopyou
5
0
2.0
1.3
0.0
2.0
4.0
searchyou
5
0
1.9
1.3
0.0
2.0
4.0
yellyou
5
0
1.9
1.3
0.0
2.0
4.0
handcyou
5
0
1.9
1.4
0.0
2.0
4.0
kickyou
5
0
1.9
1.5
0.0
2.0
4.0
pinyou
5
0
1.9
1.5
0.0
2.0
4.0
sprayyou
5
0
1.9
1.5
0.0
2.0
4.0
taseyou
5
0
1.9
1.5
0.0
2.0
4.0
shootyou
5
0
1.9
1.5
0.0
2.0
4.0
killyou
5
0
2.0
1.6
0.0
2.0
4.0
fearpol
41
0
1.9
1.3
0.0
2.0
4.0
fearpol_norm
41
0
48.2
33.4
0.0
50.0
100.0
N
%
race
White
504
49.4
Black
516
50.6
stopyou_ord
0
191
18.7
1
168
16.5
2
283
27.7
3
215
21.1
4
163
16.0
searchyou_ord
0
210
20.6
1
205
20.1
2
277
27.2
3
168
16.5
4
160
15.7
yellyou_ord
0
200
19.6
1
187
18.3
2
272
26.7
3
194
19.0
4
167
16.4
handcyou_ord
0
217
21.3
1
200
19.6
2
235
23.0
3
183
17.9
4
185
18.1
kickyou_ord
0
265
26.0
1
170
16.7
2
212
20.8
3
161
15.8
4
212
20.8
pinyou_ord
0
261
25.6
1
161
15.8
2
213
20.9
3
158
15.5
4
227
22.3
sprayyou_ord
0
264
25.9
1
164
16.1
2
209
20.5
3
170
16.7
4
213
20.9
taseyou_ord
0
264
25.9
1
160
15.7
2
204
20.0
3
167
16.4
4
225
22.1
shootyou_ord
0
277
27.2
1
153
15.0
2
198
19.4
3
130
12.7
4
262
25.7
killyou_ord
0
282
27.6
1
153
15.0
2
195
19.1
3
113
11.1
4
277
27.2
Show code
datasummary_skim(dat_long)
Unique
Missing Pct.
Mean
SD
Min
Median
Max
Histogram
fear_numeric
5
0
1.9
1.4
0.0
2.0
4.0
N
%
race
White
5040
49.4
Black
5160
50.6
stopyou_ord
0
1910
18.7
1
1680
16.5
2
2830
27.7
3
2150
21.1
4
1630
16.0
searchyou_ord
0
2100
20.6
1
2050
20.1
2
2770
27.2
3
1680
16.5
4
1600
15.7
yellyou_ord
0
2000
19.6
1
1870
18.3
2
2720
26.7
3
1940
19.0
4
1670
16.4
handcyou_ord
0
2170
21.3
1
2000
19.6
2
2350
23.0
3
1830
17.9
4
1850
18.1
kickyou_ord
0
2650
26.0
1
1700
16.7
2
2120
20.8
3
1610
15.8
4
2120
20.8
pinyou_ord
0
2610
25.6
1
1610
15.8
2
2130
20.9
3
1580
15.5
4
2270
22.3
sprayyou_ord
0
2640
25.9
1
1640
16.1
2
2090
20.5
3
1700
16.7
4
2130
20.9
taseyou_ord
0
2640
25.9
1
1600
15.7
2
2040
20.0
3
1670
16.4
4
2250
22.1
shootyou_ord
0
2770
27.2
1
1530
15.0
2
1980
19.4
3
1300
12.7
4
2620
25.7
killyou_ord
0
2820
27.6
1
1530
15.0
2
1950
19.1
3
1130
11.1
4
2770
27.2
outcome
handcyou
1020
10.0
kickyou
1020
10.0
killyou
1020
10.0
pinyou
1020
10.0
searchyou
1020
10.0
shootyou
1020
10.0
sprayyou
1020
10.0
stopyou
1020
10.0
taseyou
1020
10.0
yellyou
1020
10.0
fear
0
2431
23.8
1
1721
16.9
2
2298
22.5
3
1659
16.3
4
2091
20.5
fear_categorical
0
2431
23.8
1
1721
16.9
2
2298
22.5
3
1659
16.3
4
2091
20.5
5 Model 1: Frequentist OLS Replication
This replicates the Ordinary Least Squares (OLS) regression model from Pickett et al. (2022), predicting the 0-100 normed fear scale using race as a predictor. Robust standard errors are used, consistent with common practice in criminology.
Show code
# Model 1: Frequentist OLS Replication# Uses the 'fearpol_norm' variable created in the data preparation script.ols_model <-lm(fearpol_norm ~ race, data = dat)# Calculate robust standard errors# Need 'sandwich' package for vcovHC and 'lmtest' for coeftestrobust_results <-coeftest(ols_model, vcov =vcovHC(ols_model, type ="HC1"))# Print model summary with robust standard errorsprint(robust_results)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 32.19 1.33 24.2 <0.0000000000000002 ***
raceBlack 31.62 1.84 17.1 <0.0000000000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show code
# Store coefficient for comparison tableols_coef <-coef(ols_model)["raceBlack"]ols_se <-sqrt(diag(vcovHC(ols_model, type ="HC1")))["raceBlack"]
6 Model 2: Bayesian Linear Replication
This section provides the Bayesian equivalent of the OLS model, predicting the normed fear scale using rstanarm. This model uses diffuse priors, allowing the likelihood (data) to drive the posterior estimates.
Show code
# Model 2: Bayesian Linear Replication# Uses rstanarm package for Bayesian linear regression.# The 'fearpol_norm' variable is the 0-100 normed fear scale.model_path_blm <-here("Models", "bayesian_linear_fearpol_norm.rds")# Check if cached model exists; if so, load it. Otherwise, fit and save.if (file.exists(model_path_blm)) { blm_model <-readRDS(model_path_blm)} else { blm_model <-stan_glm(formula = fearpol_norm ~ race,data = dat,family =gaussian(),chains =4,cores = nCoresphys,seed =1138 )saveRDS(blm_model, model_path_blm) # Save the model}# Print summary of the Bayesian linear modelsummary(blm_model, probs =c(0.025, 0.975), digits =2)
Model Info:
function: stan_glm
family: gaussian [identity]
formula: fearpol_norm ~ race
algorithm: sampling
sample: 4000 (posterior sample size)
priors: see help('prior_summary')
observations: 1020
predictors: 2
Estimates:
mean sd 2.5% 97.5%
(Intercept) 32.17 1.32 29.55 34.76
raceBlack 31.65 1.83 28.19 35.28
sigma 29.47 0.65 28.22 30.79
Fit Diagnostics:
mean sd 2.5% 97.5%
mean_PPD 48.18 1.32 45.62 50.80
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.02 1.00 3698
raceBlack 0.03 1.00 3720
sigma 0.01 1.00 3980
mean_PPD 0.02 1.00 3916
log-posterior 0.03 1.00 1716
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
Show code
# Posterior predictive check to visualize model fitpp_check(blm_model) +labs(title ="Posterior Predictive Check: Bayesian Linear Model")
7 Model 3a: Multilevel Cumulative Probit (Frequentist)
The code below uses the clmm function from the ordinal package to fit a frequentist multilevel cumulative probit model. This frequentist approach mimics the Bayesian ordinal MLM model reported in the paper (see also Model 3b below); it respects the ordinal nature of the fear variable, accounts for the clustering of items within respondents, and specifies random effects across items.
Challenge: These models are notorious for convergence failures due to the complexity of integrating over random effects in ordinal models. We implement a robust three-strategy approach:
Strategy 1: Default settings with high-accuracy integration (nAGQ=7)
Strategy 2: Custom starting values from simpler models with moderate integration (nAGQ=5)
Strategy 3: Simplified settings with fast approximation (nAGQ=1)
Show code
# Model 3a: Multilevel Cumulative Probit (Frequentist)# Uses the 'dat_long' data and 'fear' (ordered factor) variable.library(ordinal) # Ensure 'ordinal' package is loadedclmm_model_path <-here("Models", "clmm_fear_race_outcome.rds")# Initialize success flagclmm_success <-FALSEif (file.exists(clmm_model_path)) { clmm_model <-readRDS(clmm_model_path) clmm_success <-TRUE} else { clmm_model <-NULL# Strategy 1: Try without starting values firstmessage("Attempting multilevel model without custom starting values...")try({ clmm_model <-clmm( fear ~ race + (1| outcome),data = dat_long,link ="probit",nAGQ =7,control =clmm.control(maxIter =1000, gradTol =1e-4) ) clmm_success <-TRUEmessage("Success with default starting values!") }, silent =TRUE)# Strategy 2: Try with custom starting values if Strategy 1 failedif (!clmm_success) {message("Default approach failed. Trying with custom starting values...")try({# Fit simple model for reference simple_ord <-clm(fear ~ race, data = dat_long, link ="probit") simple_coefs <-coef(simple_ord)# Starting values beta_race <- simple_coefs["raceBlack"] sigma_outcome <-0.3 start_vals <-c(beta_race, sigma_outcome)names(start_vals) <-c("raceBlack", "outcome.(Intercept)")message("Using starting values: ", paste(names(start_vals), "=", round(start_vals, 3), collapse =", ")) clmm_model <-clmm( fear ~ race + (1| outcome),data = dat_long,link ="probit",nAGQ =5,start = start_vals,control =clmm.control(maxIter =1000, gradTol =1e-4) ) clmm_success <-TRUEmessage("Success with custom starting values!") }, silent =TRUE) }# Strategy 3: Try simplified settings if Strategy 2 failedif (!clmm_success) {message("Custom starting values also failed. Trying simplest approach...")try({ clmm_model <-clmm( fear ~ race + (1| outcome),data = dat_long,link ="probit",nAGQ =1,control =clmm.control(maxIter =100) ) clmm_success <-TRUEmessage("Success with simplified settings!") }, silent =TRUE) }# Save if successfulif (clmm_success &&!is.null(clmm_model)) {saveRDS(clmm_model, clmm_model_path) }}# Report final resultsif (clmm_success &&!is.null(clmm_model)) {cat("\n=== CLMM Model Successfully Fitted ===\n")# Check if model has valid resultstry({ model_summary <-summary(clmm_model)print(model_summary) }, silent =FALSE)# Check convergence using clmm-specific methodsif ("info"%in%names(clmm_model) &&!is.null(clmm_model$info)) { conv_code <- clmm_model$info$convergenceif (!is.null(conv_code) && conv_code ==0) {message("Model converged successfully!") } else {message("Model fitted but convergence may be questionable. Using simplified settings (nAGQ=1).")message("Results should be interpreted with caution but are usable for tutorial purposes.") } } else {message("Model fitted with simplified settings. Results are usable for demonstration purposes.") }# Store success flag for later sectionsassign("clmm_converged", TRUE, envir = .GlobalEnv)} else {cat("\n=== CLMM Model Failed ===\n")message("All CLMM strategies failed to converge.")message("This is common with complex ordinal multilevel models.")message("The item-specific alternative approach will be used instead.")# Store failure flag for later sectionsassign("clmm_converged", FALSE, envir = .GlobalEnv)}
=== CLMM Model Successfully Fitted ===
Cumulative Link Mixed Model fitted with the Laplace approximation
formula: fear ~ race + (1 | outcome)
data: dat_long
link threshold nobs logLik AIC niter max.grad cond.H
probit flexible 10200 -15283.33 30578.66 706(827) 5.73e-05 NaN
Random effects:
Groups Name Variance Std.Dev.
outcome (Intercept) 0.000000000406 0.0000202
Number of groups: outcome 10
Coefficients:
Estimate Std. Error z value Pr(>|z|)
raceBlack 0.988 NaN NaN NaN
Threshold coefficients:
Estimate Std. Error z value
0|1 -0.306 NaN NaN
1|2 0.234 NaN NaN
2|3 0.888 NaN NaN
3|4 1.429 NaN NaN
Show code
# Interpretation noteif (clmm_success) {cat("\n=== Model Interpretation ===\n")cat("Race coefficient (raceBlack):", round(coef(clmm_model)["raceBlack"], 3), "\n")cat("This suggests Black respondents have higher latent fear scores.\n")cat("WARNING: Random effect variance is essentially zero, which is likely incorrect.\n")cat("This suggests the clmm model failed to properly estimate between-item variation.\n")cat("The item-specific analysis below reveals substantial variation across items.\n")}
=== Model Interpretation ===
Race coefficient (raceBlack): 0.99
This suggests Black respondents have higher latent fear scores.
WARNING: Random effect variance is essentially zero, which is likely incorrect.
This suggests the clmm model failed to properly estimate between-item variation.
The item-specific analysis below reveals substantial variation across items.
8 Alternative Frequentist Approach: Item-Specific Models
Given the convergence issues with the multilevel clmm model above, we implement a more robust approach by fitting separate cumulative probit models to each item and meta-analyzing the results. This approach often reveals patterns that complex multilevel models miss due to convergence failures.
Show code
# Alternative Model 3a: Item-specific cumulative probit models# This approach fits separate models to each item and can be meta-analyzed# Create ordered factor versions if not already donefearpol_ord <-paste0(fearpol_new, "_ord")# Fit separate cumulative probit models for each itemitem_models <-list()item_results <-tibble()for (item in fearpol_new) { item_ord <-paste0(item, "_ord")# Fit cumulative probit model for this item model <-clm(formula =as.formula(paste0(item_ord, " ~ race")), data = dat, link ="probit" ) item_models[[item]] <- model# Extract coefficient for race race_coef <-coef(model)["raceBlack"] race_se <-sqrt(vcov(model)["raceBlack", "raceBlack"])# Add to results item_results <-bind_rows(item_results, tibble(item = item,coefficient = race_coef,se = race_se,z_value = race_coef / race_se,p_value =2* (1-pnorm(abs(race_coef / race_se))) ))}# Meta-analytic summary (simple average)meta_summary <- item_results %>%summarise(mean_coefficient =mean(coefficient),se_mean =sqrt(sum(se^2)) /n(), # Combined standard erroritems_significant =sum(p_value <0.05),total_items =n() )print("Item-specific cumulative probit results:")
# Note: This approach sacrifices the multilevel structure but provides# a robust frequentist alternative that mirrors the Bayesian approach{cat("\n=== Key Findings ===\n")cat("Range of race effects: ", round(min(item_results$coefficient), 2), " to ", round(max(item_results$coefficient), 2), "\n")cat("This", round(max(item_results$coefficient) -min(item_results$coefficient), 2), "unit range contradicts the clmm finding of no item variation.\n")cat("More severe items (kill, shoot, physical violence) show larger race disparities.\n")cat("Meta-analytic average effect:", round(meta_summary$mean_coefficient, 2), "\n")cat("Note: This", round(meta_summary$mean_coefficient, 2), "average is nearly identical to the clmm coefficient of", round(coef(clmm_model)["raceBlack"], 2), "\n")cat("However, the clmm model incorrectly suggests no variation across items (variance ≈ 0).\n")cat("The item-specific approach reveals the true heterogeneity that clmm missed.\n")}
=== Key Findings ===
Range of race effects: 0.72 to 1.2
This 0.44 unit range contradicts the clmm finding of no item variation.
More severe items (kill, shoot, physical violence) show larger race disparities.
Meta-analytic average effect: 0.99
Note: This 0.99 average is nearly identical to the clmm coefficient of 0.99
However, the clmm model incorrectly suggests no variation across items (variance ≈ 0).
The item-specific approach reveals the true heterogeneity that clmm missed.
9 Model 3b: Multilevel Cumulative Probit (Bayesian)
This model replicates the Bayesian multilevel cumulative probit model from the original supplement, using brms. This is the primary ordinal multilevel model discussed in the main text.
Why we prefer the Bayesian model: As demonstrated by the convergence failures in Model 3a above, frequentist multilevel ordinal models (clmm) are notoriously unstable. The Bayesian approach offers several key advantages:
Superior convergence: MCMC sampling is more robust to complex likelihood surfaces
Maintains multilevel structure: Unlike the item-specific fallback, this preserves the hierarchical data structure
Partial pooling: Estimates are “shrunk” toward the overall mean, providing more stable item-specific effects and better out-of-sample predictive performance
Uncertainty quantification: Full posterior distributions capture both within- and between-item uncertainty
Flexible priors: Can incorporate substantive knowledge and cumulative evidence from previous studies to aid convergence and improve estimation
Comprehensive model checking: Posterior predictive checks allow intuitive assessment of model fit by comparing observed data to model-generated predictions, incorporating parameter uncertainty
The Bayesian model should successfully estimate both the overall race effect AND the item-to-item variation that the clmm model failed to capture, while avoiding the need to abandon the multilevel structure entirely.
Show code
# Model 3b: Multilevel Cumulative Probit (Bayesian)# Re-uses the 'hier_mod' object from the original supplement file.model_path_bcpm <-here("Models", "hier_fear_fit.rds")if (file.exists(model_path_bcpm)) { bcpm_model <-readRDS(model_path_bcpm)} else {# Bayesian multilevel cumulative probit model bf_fear <-bf(fear |thres(4) ~ race + (race | outcome)) priors_hier <-c(prior(normal(-0.84, 1), class ="Intercept", coef ="1"),prior(normal(-0.25, 1), class ="Intercept", coef ="2"),prior(normal(0.25, 1), class ="Intercept", coef ="3"),prior(normal(0.84, 1), class ="Intercept", coef ="4"),prior(normal(0, 1), class ="b"),prior(exponential(1), class ="sd"),prior(lkj(2), class ="cor") ) bcpm_model <-brm( bf_fear,data = dat_long,family =cumulative("probit"),prior = priors_hier,cores = nCoresphys,chains =4,iter =2000,warmup =1000,seed =1138,control =list(adapt_delta =0.95),file = model_path_bcpm,file_refit ="on_change" )saveRDS(bcpm_model, model_path_bcpm)}# Print summaries of the Bayesian multilevel cumulative probit modelsummary(bcpm_model, probs =c(0.025, 0.975), digits =2)
Family: cumulative
Links: mu = probit; disc = identity
Formula: fear | thres(4) ~ race + (race | outcome)
Data: dat_long (Number of observations: 10200)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~outcome (Number of levels: 10)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept) 0.09 0.03 0.04 0.15 1.00 1720
sd(raceBlack) 0.17 0.05 0.10 0.29 1.00 1583
cor(Intercept,raceBlack) -0.83 0.16 -0.99 -0.40 1.00 1457
Tail_ESS
sd(Intercept) 2303
sd(raceBlack) 2415
cor(Intercept,raceBlack) 2200
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1] -0.31 0.03 -0.38 -0.24 1.00 1898 2480
Intercept[2] 0.23 0.03 0.17 0.30 1.00 1881 2408
Intercept[3] 0.89 0.03 0.82 0.95 1.00 1867 2440
Intercept[4] 1.43 0.04 1.36 1.50 1.00 1983 2213
raceBlack 0.98 0.06 0.86 1.10 1.00 1557 2135
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc 1.00 0.00 1.00 1.00 NA NA NA
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
=== Bayesian Model Interpretation ===
Race coefficient (raceBlack): 0.98 [95% CI: 0.86, 1.1]
Random intercept SD: 0.086
Random race slope SD: 0.17
Show code
{cat("\n=== Key Contrasts with Frequentist Results ===\n")cat("• Race effects: Bayesian (", round(race_coef_mean, 2), ") ≈ clmm (", round(coef(clmm_model)["raceBlack"], 2), ") ≈ item-specific (", round(meta_summary$mean_coefficient, 2), ") - all methods agree on aggregate race effect\n")cat("• Between-item variation: Bayesian SD = ", round(outcome_var, 3), " vs clmm ≈ 0 (failed to detect)\n")cat("• Race effect variation: Bayesian SD = ", round(race_var, 3), " - race effects vary meaningfully across items\n")cat("• Uncertainty quantification: 95% CI = ±", round((race_coef_upper - race_coef_lower)/2, 2), " units around point estimate\n")}
=== Key Contrasts with Frequentist Results ===
• Race effects: Bayesian ( 0.98 ) ≈ clmm ( 0.99 ) ≈ item-specific ( 0.99 ) - all methods agree on aggregate race effect
• Between-item variation: Bayesian SD = 0.086 vs clmm ≈ 0 (failed to detect)
• Race effect variation: Bayesian SD = 0.17 - race effects vary meaningfully across items
• Uncertainty quantification: 95% CI = ± 0.12 units around point estimate
Show code
{cat("\n=== Substantive Insights ===\n")cat("• Average race effect (~1.0) represents a large difference (~1SD) on the latent scale\n")cat("• Race effects range from ~", round(race_coef_mean -1.96*race_var, 2), " to ~", round(race_coef_mean +1.96*race_var, 2), " across items (±2SD)\n")cat("• This matches the item-specific range of ", round(min(item_results$coefficient), 2)," to ", round(max(item_results$coefficient), 2), " observed earlier\n")}
=== Substantive Insights ===
• Average race effect (~1.0) represents a large difference (~1SD) on the latent scale
• Race effects range from ~ 0.64 to ~ 1.3 across items (±2SD)
• This matches the item-specific range of 0.72 to 1.2 observed earlier
Show code
{cat("\n=== Model Success ===\n")cat("✓ Converged without issues (unlike clmm)\n")cat("✓ Maintained multilevel structure (unlike item-specific approach)\n") cat("✓ Estimated realistic between-item variation (SD = ", round(outcome_var, 3), ")\n")cat("✓ Captured meaningful race effect heterogeneity (SD = ", round(race_var, 3), ")\n")cat("✓ Provided full uncertainty quantification\n")cat("✓ Results align with item-specific findings while adding structure\n")}
=== Model Success ===
✓ Converged without issues (unlike clmm)
✓ Maintained multilevel structure (unlike item-specific approach)
✓ Estimated realistic between-item variation (SD = 0.086 )
✓ Captured meaningful race effect heterogeneity (SD = 0.17 )
✓ Provided full uncertainty quantification
✓ Results align with item-specific findings while adding structure
10 Testing Proportional Odds Assumption
The proportional odds assumption is crucial for cumulative ordinal models. It assumes that the effect of race is the same across all ordinal thresholds (i.e., the race effect is the same for 0|1, 1|2, 2|3, and 3|4 cutpoints).
Testing Strategy: We’ll compare two Bayesian models with different assumptions:
Model 3b (Restrictive): Cumulative probit model that assumes proportional odds - estimates one race coefficient that applies to all ordinal thresholds (0|1, 1|2, 2|3, 3|4)
Category-Specific Model (Flexible): Categorical logit model - estimates separate race coefficients for each response category (categories 1, 2, 3, 4 vs. reference category 0)
Note on Link Functions: We use logit for the categorical model because the categorical family in brms only supports logit links, while our cumulative model uses probit. This difference in link functions doesn’t affect our test of the proportional odds assumption - we’re comparing model structures (one vs. multiple race coefficients), not link functions, and our model comparison method (LOO cross-validation) evaluates which model better predicts out-of-sample data regardless of the specific link functions used.
What “Category-Specific” Means: Instead of modeling ordinal thresholds, the categorical model directly estimates how race affects the probability of each response category relative to a reference category. This allows race to have different effects on being “afraid” vs “very afraid” compared to “very unafraid.”
Model Comparison Method: We’ll use Leave-One-Out Cross-Validation (LOO-CV) to assess which model better predicts out-of-sample data. Better predictive performance suggests the model captures the true data structure more accurately.
Show code
# Testing Proportional Odds Assumption using Bayesian Models# Compare restrictive (proportional odds) vs flexible (category-specific) modelsmessage("Testing proportional odds assumption using Bayesian model comparison...")# Model 3b (restrictive) assumes proportional odds - we already have this# Now fit a category-specific model that relaxes this assumptionmodel_path_nonprop <-here("Models", "nonprop_odds_fear_fit.rds")if (file.exists(model_path_nonprop)) { category_specific_model <-readRDS(model_path_nonprop)} else {message("Fitting Bayesian category-specific model...")# Category-specific model: separate effects for each response category bf_category_specific <-bf(fear_categorical ~ race + (1| outcome)) category_specific_model <-brm( bf_category_specific,data = dat_long,family =categorical("logit"), # No proportional odds assumptioncores = nCoresphys,chains =4,iter =2000,warmup =1000,seed =1138,control =list(adapt_delta =0.95),file = model_path_nonprop,file_refit ="on_change" )saveRDS(category_specific_model, model_path_nonprop)}message("Both models fitted successfully. Comparing predictive performance...")# Cache the LOO computations loo_path_prop <-here("Models", "loo_proportional.rds")loo_path_cat <-here("Models", "loo_category_specific.rds")if (file.exists(loo_path_prop)) { loo_proportional <-readRDS(loo_path_prop)} else { loo_proportional <-loo(bcpm_model, cores = nCoresphys)saveRDS(loo_proportional, loo_path_prop)}if (file.exists(loo_path_cat)) { loo_category_specific <-readRDS(loo_path_cat)} else { loo_category_specific <-loo(category_specific_model, cores = nCoresphys)saveRDS(loo_category_specific, loo_path_cat)}# Statistical comparisonloo_comparison <-loo_compare(loo_proportional, loo_category_specific)print(loo_comparison)
# Extract key statistics elpd_diff <- loo_comparison[2, "elpd_diff"] # bcpm_model row, not first rowse_diff <- loo_comparison[2, "se_diff"]
Show code
{cat("\n=== Proportional Odds Test Results ===\n")cat("Expected Log Pointwise Density (ELPD) difference: ", round(elpd_diff, 1), " ± ", round(se_diff, 1), "\n")cat("Interpretation: Negative = category-specific model preferred (PO assumption violated)\n")}
=== Proportional Odds Test Results ===
Expected Log Pointwise Density (ELPD) difference: -84 ± 15
Interpretation: Negative = category-specific model preferred (PO assumption violated)
Show code
# Statistical interpretation using standard ruleif (abs(elpd_diff) >2* se_diff) {if (elpd_diff <0) {message("PROPORTIONAL ODDS ASSUMPTION VIOLATED") message("The category-specific model has substantially better predictive performance")message("Race effects vary significantly across response categories")message("ELPD difference of ", round(elpd_diff, 1), " is ", round(abs(elpd_diff)/se_diff, 1), " standard errors from zero") } else {message("PROPORTIONAL ODDS ASSUMPTION SUPPORTED")message("The proportional odds model has meaningfully better predictive performance") }} else {message("INCONCLUSIVE EVIDENCE")message("Both models have similar predictive performance")}
11 Interpretation: Model Selection in Practice
The proportional odds test reveals an important finding: the assumption is violated in these data (ELPD difference = -84.5 ± 14.7). Race effects vary substantially across response categories, with the category-specific model providing much better predictive performance. This suggests that racial disparities in fear of police are not uniform across the ordinal scale - for instance, the effect may be larger for extreme fear responses than moderate ones.
Should we therefore prefer the category-specific model? While this model fits the data better, the choice depends on analytical goals:
Arguments for category-specific modeling: - Better captures the true data structure (as evidenced by superior predictive performance) - Reveals heterogeneous effects across response levels - More flexible and data-driven approach
What does this mean if we use cumulative ordinal modeling? - Still maintains the inherent ordering of responses (very afraid > afraid > neither > unafraid > very unafraid) - The comparison in our main paper showed cumulative ordinal models vastly outperformed linear models on fit statistics, even if they don’t perfectly satisfy proportional odds - Crucially: even a “imperfect” ordinal model is vastly superior to linear models that ignore the categorical nature of the data entirely
Practical recommendation: The cumulative ordinal approach (Model 3b) represents a substantial improvement over linear modeling while maintaining interpretability. The proportional odds violation suggests caution in interpretation - the single race coefficient represents an average effect across response categories, but researchers should be aware that the actual effects vary. In applications, researchers should explore item-level and category-specific effect heterogeneity patterns revealed by their multilevel models and diagnostic tests.
Bottom line: The choice should be between different ordinal approaches (cumulative vs. category-specific), not between ordinal and linear. Linear models should be abandoned for ordinal outcomes regardless of interpretability concerns.
Speaking of interpretability concerns… these can be directly addressed through predicted probabilities - a major strength of ordinal approaches demonstrated in the next section. Rather than relying on difficult-to-interpret (or, too often, fundamentally meaningless) coefficients on latent scales, ordinal models allow us to generate intuitive, policy-relevant predictions about the probability of each response category for different groups. This provides far richer substantive insights than linear model coefficients while maintaining statistical rigor.
12 Generating Predicted Probabilities
12.1 Predicted Probabilities from Item-Specific Models (Alternative Approach)
Since the clmm model had convergence issues, we’ll first demonstrate predicted probabilities using the robust item-specific approach. We’ll use the “yellyou” and “killyou” items as an exemplars, as these items showed the smallest and largest race effect estimates respectively.
Show code
# Generate predicted probabilities from item-specific models# Using smallest and largest race effect items to show heterogeneity# Identify smallest and largest race effect itemsmin_effect_item <- item_results$item[which.min(item_results$coefficient)]max_effect_item <- item_results$item[which.max(item_results$coefficient)]{cat("=== Demonstrating Race Effect Heterogeneity ===\n")cat("Smallest race effect:", min_effect_item, "(coefficient =", round(min(item_results$coefficient), 3), ")\n")cat("Largest race effect:", max_effect_item, "(coefficient =", round(max(item_results$coefficient), 3), ")\n")cat("This", round(max(item_results$coefficient) -min(item_results$coefficient), 2), "unit difference shows substantial item-to-item variation\n\n")}
=== Demonstrating Race Effect Heterogeneity ===
Smallest race effect: yellyou (coefficient = 0.72 )
Largest race effect: killyou (coefficient = 1.2 )
This 0.44 unit difference shows substantial item-to-item variation
Show code
# Function to generate predictions for any itemgenerate_item_predictions <-function(item_name) { model <- item_models[[item_name]] newdata <-data.frame(race =factor(c("White", "Black"))) preds <-predict(model, newdata = newdata, type ="prob")# Convert to tidy format probs <- preds %>%as.data.frame() %>%mutate(race =c("White", "Black")) %>%pivot_longer(cols =-race, names_to ="category", values_to ="probability") %>%mutate(category =as.numeric(gsub("[^0-9]", "", category)),item = item_name )# Calculate contrasts contrasts <- probs %>% dplyr::select(race, category, probability, item) %>%pivot_wider(names_from = race, values_from = probability, values_fn = list) %>%mutate(White =as.numeric(White),Black =as.numeric(Black),contrast = Black - White,item = item_name )return(list(probs = probs, contrasts = contrasts))}# Generate predictions for both itemsmin_results <-generate_item_predictions(min_effect_item)max_results <-generate_item_predictions(max_effect_item)# Combine resultscombined_probs <-bind_rows(min_results$probs, max_results$probs) %>%mutate(item_label =case_when( item == min_effect_item ~paste0(item, " (smallest effect: ", round(min(item_results$coefficient), 2), ")"), item == max_effect_item ~paste0(item, " (largest effect: ", round(max(item_results$coefficient), 2), ")"),TRUE~ item ) )combined_contrasts <-bind_rows(min_results$contrasts, max_results$contrasts) %>%mutate(item_label =case_when( item == min_effect_item ~paste0(item, " (smallest effect)"), item == max_effect_item ~paste0(item, " (largest effect)"),TRUE~ item ) )# Plot predicted probabilities with point-intervalsp1_comparison <-ggplot(combined_probs, aes(x = category, y = probability, color = race)) +geom_point(size =2, position =position_dodge(width =0.3)) +geom_line(aes(group = race), position =position_dodge(width =0.3)) +facet_wrap(~item_label, ncol =1) +labs(title ="Race-Specific Predicted Probabilities",x ="Response Category (0=Very Unafraid, 4=Very Afraid)", y ="Probability") +scale_x_continuous(breaks =0:4) +scale_color_viridis_d(end =0.8) +# Use viridis discrete scaletheme_minimal() +theme(text =element_text(family ="serif"),legend.position ="bottom", title =element_text(size=10))# Plot contrasts with point-intervalsp2_comparison <-ggplot(combined_contrasts, aes(x = category, y = contrast)) +geom_point(size =2, alpha =0.8, color ="#440154FF") +# Viridis dark purplegeom_line(color ="#440154FF", alpha =0.8) +geom_hline(yintercept =0, linetype ="dashed", color ="gray50") +facet_wrap(~item_label, ncol =1) +labs(title ="Black-White Probability Differences",x ="Response Category", y ="Probability Difference (Black - White)") +scale_x_continuous(breaks =0:4) +theme_minimal() +theme(text =element_text(family ="serif"),title =element_text(size=10))print(p1_comparison + p2_comparison +plot_annotation(title ="Item-Specific Predicted Probabilities: Demonstrating Effect Heterogeneity",subtitle ="Predicted Probabilities (Left) & Contrasts (Right), Smallest & Largest Race Effects"))
Show code
# Interpretation# Compare key statistics{cat("\n=== Comparative Insights ===\n")for (item_name inc(min_effect_item, max_effect_item)) { results <-if (item_name == min_effect_item) min_results else max_results probs <- results$probs# Afraid or very afraid (categories 3-4) afraid_white <-sum(probs[probs$race =="White"& probs$category >=3, "probability"]) afraid_black <-sum(probs[probs$race =="Black"& probs$category >=3, "probability"])cat("\n", toupper(item_name), "item:\n")cat("P(Afraid or Very Afraid) - White:", round(afraid_white, 3), "Black:", round(afraid_black, 3), "\n")cat("Difference:", round((afraid_black - afraid_white) *100, 1), "percentage points\n")}cat("\nThis demonstrates how one might use item-level analysis to assess important")cat("\nheterogeneity that aggregate or multilevel models might obscure.\n")}
=== Comparative Insights ===
YELLYOU item:
P(Afraid or Very Afraid) - White: 0.22 Black: 0.48
Difference: 26 percentage points
KILLYOU item:
P(Afraid or Very Afraid) - White: 0.17 Black: 0.58
Difference: 41 percentage points
This demonstrates how one might use item-level analysis to assess important
heterogeneity that aggregate or multilevel models might obscure.
12.2 Predicted Probabilities from Model 3b (Bayesian CP MLM)
The “alternative” frequentist item-specific approach above is informative but has important limitations. Most notably, generating confidence intervals for predicted probabilities from frequentist ordinal models is surprisingly complex - it requires delta method calculations, bootstrap resampling, or other computationally intensive approaches to properly propagate uncertainty from model parameters to probability predictions.
This is a major reason why we emphasize Bayesian approaches from the beginning. With Bayesian models, uncertainty quantification is straightforward and automatic:
Full posterior distributions: Every prediction incorporates parameter uncertainty naturally
No delta method needed: Uncertainty propagates automatically through the MCMC samples
Intuitive intervals: Credible intervals directly represent our uncertainty about the predictions
Complex models, simple inference: Even with multilevel structures and partial pooling, predictions remain straightforward
The Bayesian multilevel model below demonstrates this advantage while also capturing both the overall race effect and item-to-item variation through partial pooling - something the frequentist item-specific approach demonstrated above cannot achieve.
Show code
# Prediction gridnd_bcpm <-tibble(race =factor(c("White", "Black")))# === PART 1: Population-Level (Pooled) Predictions ===# Get predictions, marginalizing over random effects# This gives us the "average" effect across all items with partial pooling{cat("=== Population-Level Predictions (Marginalizing Over Items) ===\n")cat("These represent the pooled/aggregated effect across all items\n")cat("Equivalent to 'latent fear' but with proper uncertainty quantification\n\n")}
=== Population-Level Predictions (Marginalizing Over Items) ===
These represent the pooled/aggregated effect across all items
Equivalent to 'latent fear' but with proper uncertainty quantification
# === PLOTTING ===# Plot 1: Population-level predictionsp1_bcpm <-ggplot(mlm_probs, aes(x =as.numeric(as.character(.category)), y = p, color = race)) +geom_pointrange(aes(ymin = ll, ymax = ul), position =position_dodge(width =0.2), size = .75, linewidth =1.5) +geom_line(aes(group = race), position =position_dodge(width =0.2)) +labs(title ="Population-Level (Pooled across Items)",x ="Response Category", y ="Probability") +scale_x_continuous(breaks =0:4) +scale_color_viridis_d(end =0.8) +theme_minimal() +theme(legend.position ="bottom", text =element_text(family ="serif"),title =element_text(size=10))# Plot 2: Item-specific predictionsp2_bcpm <-ggplot(item_probs_mlm, aes(x =as.numeric(as.character(.category)), y = p, color = race)) +geom_pointrange(aes(ymin = ll, ymax = ul), position =position_dodge(width =0.2), size = .5, linewidth =1) +geom_line(aes(group = race), position =position_dodge(width =0.2)) +facet_wrap(~item_label, ncol =1) +labs(title ="Item-Specific (w/Partial Pooling)",x ="Response Category", y ="Probability") +scale_x_continuous(breaks =0:4) +scale_color_viridis_d(end =0.8) +theme_minimal() +theme(legend.position ="bottom", text =element_text(family ="serif"),title =element_text(size=10))print(p1_bcpm + p2_bcpm +plot_annotation(title ="Bayesian MLM: Population-Level vs Item-Specific Predicted Probabilities", subtitle ="Average (Pooled across Items; Left) & Item-Specific (w/Partial Pooling; Right) Predictions by Race"))
Show code
{cat("\n=== Key Insights ===\n")cat("• Population-level estimates show the 'average' effect with partial pooling\n")cat("• Item-specific estimates are 'shrunk' toward the population mean\n") cat("• All estimates include automatic uncertainty quantification\n")cat("• Compare these MLM item-specific estimates to the standalone item-specific models above\n")}
=== Key Insights ===
• Population-level estimates show the 'average' effect with partial pooling
• Item-specific estimates are 'shrunk' toward the population mean
• All estimates include automatic uncertainty quantification
• Compare these MLM item-specific estimates to the standalone item-specific models above
13 Conclusion
This tutorial has demonstrated multiple approaches to analyzing ordinal fear of police data, progressing from simple linear methods to sophisticated Bayesian multilevel ordinal models. The key findings and recommendations are:
13.1 Key Methodological Insights
Linear models are inadequate for ordinal data. While Models 1 and 2 replicated conventional linear approaches, the subsequent ordinal analyses revealed their fundamental limitations. The comparison with ordinal models (supported by fit statistics in the main paper) showed that treating ordinal responses as continuous may overlook important insights and can produce misleading interpretations, while the predicted probabilities section demonstrated how ordinal approaches can provide genuinely meaningful, policy-relevant insights.
Frequentist multilevel ordinal models often fail. Model 3a illustrated the notorious convergence issues with frequentist multilevel (clmm) models, which often can produce unreliable estimates or fail entirely with complex random effects structures.
Item-specific models provide robust alternatives. When frequentist multilevel models fail, an item-specific approach may offer a reliable fallback that maintains ordinal modeling principles while revealing important effect heterogeneity across items.
Bayesian multilevel models are superior. Model 3b demonstrated that Bayesian approaches more consistently converge, provide automatic uncertainty quantification, and enable sophisticated model checking through posterior predictive checks.
Proportional odds assumptions should be tested. Our diagnostic revealed that race effects vary significantly across ordinal thresholds, highlighting the importance of testing key model assumptions.
Predicted probabilities enhance interpretability. Rather than relying on coefficients on latent scales, ordinal models enable direct prediction of meaningful quantities like “probability of being afraid or very afraid.”
13.2 Practical Recommendations
Abandon linear models for ordinal outcomes regardless of interpretability concerns
Start with Bayesian multilevel ordinal models as the primary approach
Use item-specific models as robust alternatives when multilevel convergence fails
Always test proportional odds assumptions and consider category-specific models when violated
Focus on predicted probabilities rather than model coefficients for substantive interpretation
Leverage automatic uncertainty quantification available in Bayesian frameworks
13.3 Substantive Implications
Beyond methodological considerations, this analysis revealed important substantive findings about racial disparities in fear of police. The race effects vary meaningfully across both items (0.72 to 1.16 on the latent scale) and response categories, with larger effects for more extreme fear responses. Importantly, the item-specific analyses revealed monotonic patterns in these data; that is, race effects consistently increased with item severity. However, such monotonicity cannot be taken for granted across all applications. The scaling and aggregation approaches often used in linear modeling may mask even more consequential heterogeneity patterns that would only be revealed through item-specific, ordinal, and multilevel workflows.
This highlights a critical limitation of composite scale approaches: They assume that relationships are uniform across items and response levels. In reality, predictors may have complex, non-monotonic effects across different components of a scale. Only disaggregated (e.g., item-specific, ordinal, multilevel) approaches can detect whether effects vary by item type, severity level, or response category - patterns that could have profound theoretical and policy implications.
The tutorial demonstrates that methodological rigor and substantive insight are complementary rather than competing goals. Appropriate statistical methods reveal richer, more accurate pictures of social phenomena while maintaining the interpretability that applied researchers require.
13.4 Future Directions
This framework extends naturally to more complex analyses involving multiple predictors, hierarchical data structures, and longitudinal designs. The Bayesian multilevel ordinal approach provides a foundation for sophisticated modeling while avoiding the computational and interpretive pitfalls of linear alternatives.
Most importantly, this tutorial illustrates that the choice between ordinal and linear modeling is not about statistical preferences — it’s about whether we want accurate, meaningful insights or convenient but incomplete and possibly misleading summaries of complex social realities.