Primary Supplement for ‘When the Mean is Meaningless’
Authors
Jonathan R. Brauer
Jacob C. Day
Published
July 30, 2025
1 Introduction
This file is the online supplement for the paper “When the Mean is Misleading: A Guide to Ordered Regression for Meaningfully Modeling Ordinal Outcomes” (Jonathan R. Brauer & Jacob C. Day). Our primary aim with this paper is to show how and why to use ordered regression techniques like cumulative probit regression when analyzing ordinal outcomes (and perhaps count and continuous outcome too!). We illustrate via simulations and then by applying these methods in an analysis of data from Pickett, Graham, and Cullen’s 2022 study reported in Criminology in “The American racial divide in fear of the police” and shared as part of their supplemental materials on OSF.
This primary supplement contains all R code and output reported in the manuscript as well as other supplementary materials. In addition to this primary supplement, we have also provided a standalone tutorial in a separate Quarto file (Appendix A) that provides a step-by-step guide to our progression from frequentist and Bayesian versions of linear, to ordinal, to multilevel ordinal models.
1.1 Why this paper?
We picked Pickett, Graham, & Cullen’s paper for several reasons. First, we wanted to illustrate these methods with data on an important topic. That way, if the tutorial’s analysis improves our understanding at all, even if slightly, then our methods paper might also help us better understand a topic about that a lot of people care about.
Second, we wanted to analyze publicly available data. We searched a LOT of criminology papers that we thought would be good candidates and usually came up empty at this juncture. So, THANK YOU to Pickett, Graham, and Cullen for practicing open science and making this analysis possible!
Third, we wanted to pick a paper that we thought was already well done - that is, a paper that did a good job of carefully and thoroughly applying traditional tools to extract rich, meaningful signals from data. In doing so, it would make our job easier as we would already have a good sense of signals in the data and we could avoid taking too many methodological detours. This approach might also help readers better assess what, if anything, might be gained by spending valuable time learning new methods compared to traditional practices published in our top journals.
Finally, our main goal was to find data for a study in which researchers analyze Likert-type ordinal survey items as outcomes and combine them into composite scales so we could illustrate how ordinal methods might help us better understand signals in our data compared to traditional approaches in this common scenario. The methods can be - and we think often should be - applied to other types of data, including rare count outcomes (e.g., self-reported crime) and even continuous outcomes with thousands of levels! (See Frank Harrell, e.g. here and here).
2 Load libraries & settings
I will start by loading various libraries that I often use for such projects.
Show code
# Packages I needlibrary(here)library(haven)library(tidyverse)library(gt)library(janitor)library(vtable)library(modelsummary)library(easystats)library(marginaleffects)library(ggplot2)library(GGally)library(ggeffects)library(ggdist)library(patchwork)library(parallelly)library(bayesplot)library(ggdist)library(correlation)library(sjmisc)library(forcats)library(ggtext)library(ggmosaic)library(gtExtras)library(tidybayes)library(scales)library(purrr)library(viridis)library(loo)library(broom) #need MASS for frequentist polr but it masks common dplyr functions #use explicit dplyr::select() and dplyr::filter calls # or don't load MASS & call MASS::polr() instead library(MASS) library(extrafont)library(stringr)library(ragg)# to use brms, do following:# install rtools 4.4 # install rstan# https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started# test ability to install packages from sources using Rtools: # install.packages("Rcpp", type="source")# it works# remove any existing rstan installs# remove.packages("rstan")# if (file.exists(".RData")) file.remove(".RData")# run the next line if you already have rstan installed# remove.packages(c("StanHeaders", "rstan"))# install rstan# install.packages("rstan", repos = c('https://stan-dev.r-universe.dev', getOption("repos")))# test rstan install# example(stan_model, package = "rstan", run.dontrun = TRUE)# it works library(rstan)# install cmdstanr (alternative brms backend for interfacing w/stan)# install.packages("cmdstanr", repos = c('https://stan-dev.r-universe.dev', getOption("repos")))# newest developmental version# install.packages("remotes")# remotes::install_github("stan-dev/cmdstanr", force=TRUE)# Sys.getenv("RTOOLS44_HOME")# Sys.getenv("PATH")# readLines("~/.Renviron")# had to delete old .Reviron fie pointing to Rtools40 from Documents folder # if prefer cmdstanr backend to brms, then: # cmdstanr::install_cmdstan()# cmdstanr::check_cmdstan_toolchain(fix = TRUE)# library(cmdstanr)library(rstanarm)library(brms)print("T/F: Root 'here()' folder contains subfolder 'Models'")
#check for Models folder to save figures, create if one does not existifelse(dir.exists(here("Models")), TRUE, dir.create(here("Models")))
[1] TRUE
I will also set some default settings.
Show code
#Fonts# First time setup (run this once in console, not in your document):# Import and register fonts (only needed once per system)# Uncomment if you haven't imported fonts before# extrafont::font_import()# Load the fonts into the R sessionextrafont::loadfonts(device ="win", quiet =TRUE) # For Windows# extrafont::loadfonts(device = "pdf", quiet = TRUE) # For PDF output# extrafont::loadfonts(device = "postscript", quiet = TRUE) # For PostScript output# If you only imported Times New Roman, load this instead: # Load the specific font you need for this session# if(!("Times New Roman" %in% windowsFonts())) {# windowsFonts(`Times New Roman` = windowsFont("Times New Roman"))# }#To improve font rendering in viewer:# Configure knitr to use the ragg device which handles fonts betterknitr::opts_chunk$set(dev ="ragg_png",dpi =300)# Set the default theme for all ggplot2 plotstheme_set(theme_minimal(base_family ="Times New Roman"))#set default knit & scientific notation optionsknitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE, fig.align="center")options(scipen =999, digits =2) #set global option to two decimals # identify & set # of available cores# replaced parallel::detectCores()# https://www.jottr.org/2022/12/05/avoid-detectcores/nCoresphys <- parallelly::availableCores(logical =FALSE) options(mc.cores = parallelly::availableCores(logical=FALSE))#options(mc.cores = 4, logical=FALSE)#alt: can specify exact number of cores desired here &/or in models/ppchecks color_scheme_set("viridisA")# https://mc-stan.org/bayesplot/reference/bayesplot-colors.html
3 Simulation: Figure 1
Before getting into real data, let’s illustrate the underlying issue using a simulation.
I will simulate five different ordinal distributions with distinct shapes. Think of them as five groups, with N=250 observations per distributional group (N=1,250 total).
Show code
# Set seed for reproducibilityset.seed(1138)# Number of observations per groupn <-250# Function to simulate ordinal data based on probabilitiessimulate_ordinal <-function(probs, n) {sample(1:length(probs), size = n, replace =TRUE, prob = probs)}# Distribution 0: Symmetric Normal (peak at 3)probs_symmetric <-c(0.10, 0.20, 0.40, 0.20, 0.10) # Symmetric around 3data_symmetric <-simulate_ordinal(probs_symmetric, n)# Original distributionsprobs_uniform <-rep(0.2, 5)data_uniform <-simulate_ordinal(probs_uniform, n)probs_bimodal <-c(0.3, 0.1333, 0.1333, 0.1333, 0.3)data_bimodal <-simulate_ordinal(probs_bimodal, n)probs_skew_left <-c(0.10, 0.20, 0.15, 0.25, 0.30)data_skew_left <-simulate_ordinal(probs_skew_left, n)probs_skew_right <-c(0.30, 0.25, 0.15, 0.20, 0.10)data_skew_right <-simulate_ordinal(probs_skew_right, n)# Combine data into a data framedf <-data.frame(group =factor(rep(c("1. Symmetric Normal", "2. Uniform", "3. Bimodal", "4. Skewed Left", "5. Skewed Right"), each = n),levels =c("1. Symmetric Normal", "2. Uniform", "3. Bimodal", "4. Skewed Left", "5. Skewed Right")),response =c(data_symmetric, data_uniform, data_bimodal, data_skew_left, data_skew_right))# Function to get normal curve points for each group separatelyget_normal_curve <-function(data, group_name, x_range =seq(0.5, 5.5, length.out =100)) { mean_val <-mean(data) sd_val <-sd(data) y_vals <-dnorm(x_range, mean = mean_val, sd = sd_val) scaling_factor <- (max(table(data))/length(data)) /max(y_vals)data.frame(x = x_range, y = y_vals * scaling_factor *100,group = group_name )}# Function to get ordered probit predicted probabilities with CIsget_probit_probs <-function(data, group_name, n_boot =1000) { model <-polr(ordered(response) ~1, data =data.frame(response = data), method ="probit") probs <-predict(model, newdata =data.frame(response =1), type ="probs") boot_probs <-matrix(nrow = n_boot, ncol =5)for(i in1:n_boot) { boot_data <-sample(data, replace =TRUE) boot_model <-try(polr(ordered(boot_data) ~1, method ="probit"), silent =TRUE)if(!inherits(boot_model, "try-error")) { boot_probs[i,] <-predict(boot_model, newdata =data.frame(response =1), type ="probs") } } ci_lower <-apply(boot_probs, 2, quantile, 0.025, na.rm =TRUE) ci_upper <-apply(boot_probs, 2, quantile, 0.975, na.rm =TRUE)data.frame(x =1:5,y =as.numeric(probs) *100,lower = ci_lower *100,upper = ci_upper *100,group = group_name )}# Get curves for each groupcurves_df <-bind_rows(get_normal_curve(data_symmetric, "1. Symmetric Normal"),get_normal_curve(data_uniform, "2. Uniform"),get_normal_curve(data_bimodal, "3. Bimodal"),get_normal_curve(data_skew_left, "4. Skewed Left"),get_normal_curve(data_skew_right, "5. Skewed Right"))# Get probit probabilities with CIs for each groupprobit_df <-bind_rows(get_probit_probs(data_symmetric, "1. Symmetric Normal"),get_probit_probs(data_uniform, "2. Uniform"),get_probit_probs(data_bimodal, "3. Bimodal"),get_probit_probs(data_skew_left, "4. Skewed Left"),get_probit_probs(data_skew_right, "5. Skewed Right"))# Update the stats_df creation to prepend A. through E.stats_df <- df %>%group_by(group) %>%summarise(mean =round(mean(response), 2),sd =round(sd(response), 2) ) %>%mutate(# Create new labels with A. or B. etc. prependednew_label =case_when( group =="1. Symmetric Normal"~paste0("A. Normal-like\n(mean=", mean, ", sd=", sd, ")"),TRUE~paste0( letters[match(group, unique(group)) +0] %>%toupper(), ". ", str_remove(group, "^\\d+\\.\\s+"), "\n(mean=", mean, ", sd=", sd, ")" ) ) )# Create two separate plots# Plot 1: Normal-approxp1 <-ggplot(df %>% dplyr::filter(group =="1. Symmetric Normal"), aes(x = response)) +geom_bar(aes(y =after_stat(prop)), fill ="#FEC488FF", color ="#FEC488FF") +geom_line(data = curves_df %>% dplyr::filter(group =="1. Symmetric Normal") %>%mutate(y = y/100), aes(x = x, y = y), color ="#341069FF", linewidth =1.5) +geom_pointrange(data = probit_df %>% dplyr::filter(group =="1. Symmetric Normal") %>%mutate(y = y/100, lower = lower/100, upper = upper/100), aes(x = x, y = y, ymin = lower, ymax = upper),color ="#D3436EFF", size =1.5, linewidth =1, fatten =2) +facet_wrap(~group, ncol =1, labeller =as_labeller(setNames(stats_df$new_label, stats_df$group))) +scale_x_continuous(breaks =1:5) +scale_y_continuous(limits =c(0, 0.45), expand =expansion(mult =c(0.02, 0.08))) +labs(x ="Response Category", y ="Proportion") +theme_minimal() +theme(text =element_text(family ="serif"),axis.title.x =element_text(size=12), axis.title.y =element_text(size=12), axis.text.x =element_text(size=12), axis.text.y =element_text(size=12), strip.text =element_text(hjust =0, size=14), panel.grid =element_blank() )# Plot 2: Other distributions in 2x2 gridp2 <-ggplot(df %>% dplyr::filter(group !="1. Symmetric Normal"), aes(x = response)) +geom_bar(aes(y =after_stat(prop)), fill ="#FEC488FF", color ="#FEC488FF") +geom_line(data = curves_df %>% dplyr::filter(group !="1. Symmetric Normal") %>%mutate(y = y/100), aes(x = x, y = y), color ="#341069FF", linewidth =1.5) +geom_pointrange(data = probit_df %>% dplyr::filter(group !="1. Symmetric Normal") %>%mutate(y = y/100, lower = lower/100, upper = upper/100), aes(x = x, y = y, ymin = lower, ymax = upper),color ="#D3436EFF", size =1.5, linewidth =1, fatten =2) +facet_wrap(~group, ncol =2, labeller =as_labeller(setNames(stats_df$new_label, stats_df$group))) +scale_x_continuous(breaks =1:5) +scale_y_continuous(limits =c(0, 0.45), expand =expansion(mult =c(0.02, 0.08))) +labs(x ="Response Category", y ="") +theme_minimal() +theme(text =element_text(family ="serif"),axis.title.x =element_text(size=12), axis.title.y =element_text(size=12), axis.text.x =element_text(size=12), axis.text.y =element_text(size=12), strip.text =element_text(hjust =0, size=14), panel.grid =element_blank())# Combine plots using patchwork# combined_plot <- p1 + p2 + # plot_layout(widths = c(1, 2)) +# plot_annotation(# title = "FIGURE1. Comparing Metric vs. Ordinal View of Data",# subtitle = "Data: Light bars show observed (simulated) proportions\nMetric view: Dark line shows linear model-implied normal curve\nOrdinal view: Point-intervals show ordered probit model-implied predicted probabilities & 95% CI",# theme = theme(# plot.title = element_text(hjust = 0.5),# plot.subtitle = element_text(hjust = 0)# )# )# Create spacers for above/below normal-liketop_spacer <-plot_spacer()bottom_spacer <-plot_spacer()# Combine plots w/custom layoutcombined_plot <- top_spacer + p2 + p1 + bottom_spacer +plot_layout(design =" AB CB DB ",heights =c(1, 2, 1), widths =c(1, 2) ) +plot_annotation(title ="FIGURE1. Comparing Metric vs. Ordinal View of Data",subtitle ="Data: Light bars show observed (simulated) proportions\nMetric view: Dark line shows linear model-implied normal curve\nOrdinal view: Point-intervals show ordered probit model-implied predicted probabilities & 95% CI",theme =theme(plot.title =element_text(hjust =0.5),plot.subtitle =element_text(hjust =0) ) )
3.1 FIG1
Show code
combined_plot
Clearly, these distributions are very different. However, a linear model would predict no mean differences between three of these groups (A, B, C). Meanwhile, although a linear model might detect differences between the two skewed distributions (D vs. E) with a large enough sample, the model-implied differences on the mean scale are hard to interpret, and predicted differences on response probability scale may depart drastically from observed differences in the data. For example, the linear model “view” only sees about ~1 response category difference between D & E, with means implying the typical observation in D is a 3 or 4 and the typical observation in E is a 2 or 3. Yet, the majority of observations in D were in categories 4 & 5 (not 3 & 4), whereas the majority of observations in E were in 1 & 2 (not 2 & 3)!
Before moving on, it might help to visualize the latent response thresholds.
Show code
# Define the threshold annotation functionadd_probit_thresholds <-function(plot) { plot +# Add vertical lines at each response category boundarygeom_vline(xintercept =1:4+0.5, linetype ="dashed", color ="gray60", alpha =0.5) +annotate("text", x =2.5, y =0.4, label ="Cumulative\nThresholds", color ="gray50", hjust =0.5)}# Modify both p1 and p2 to add threshold labelsp1 <-ggplot(df %>% dplyr::filter(group =="1. Symmetric Normal"), aes(x = response)) +geom_bar(aes(y =after_stat(prop)), fill ="#FEC488FF", color ="#FEC488FF") +geom_line(data = curves_df %>% dplyr::filter(group =="1. Symmetric Normal") %>%mutate(y = y/100), aes(x = x, y = y), color ="#341069FF", linewidth =1.5) +geom_pointrange(data = probit_df %>% dplyr::filter(group =="1. Symmetric Normal") %>%mutate(y = y/100, lower = lower/100, upper = upper/100), aes(x = x, y = y, ymin = lower, ymax = upper),color ="#D3436EFF", size =1.5, linewidth =1, fatten =2) +# Add vertical threshold lines with tau labelsgeom_vline(xintercept =1:4+0.5, linetype ="dashed", color ="gray60", alpha =0.5) +annotate("text", x=1.65, y=0.42, label ="tau[1]", color ="gray50", parse =TRUE, size=6) +annotate("text", x=2.65, y=0.42, label ="tau[2]", color ="gray50", parse =TRUE, size=6) +annotate("text", x=3.65, y=0.42, label ="tau[3]", color ="gray50", parse =TRUE, , size=6) +annotate("text", x=4.65, y=0.42, label ="tau[4]", color ="gray50", parse =TRUE, , size=6) +facet_wrap(~group, ncol =1, labeller =as_labeller(setNames(stats_df$new_label, stats_df$group))) +scale_x_continuous(breaks =1:5) +scale_y_continuous(limits =c(0, 0.45), expand =expansion(mult =c(0.02, 0.08))) +labs(x ="Response Category", y ="Proportion") +theme_minimal() +theme(text =element_text(family ="serif"),strip.text =element_text(hjust =0, size=14), panel.grid =element_blank(), axis.text =element_text(size=14),axis.title =element_text(size=14) )p2 <-ggplot(df %>% dplyr::filter(group !="1. Symmetric Normal"), aes(x = response)) +geom_bar(aes(y =after_stat(prop)), fill ="#FEC488FF", color ="#FEC488FF") +geom_line(data = curves_df %>% dplyr::filter(group !="1. Symmetric Normal") %>%mutate(y = y/100), aes(x = x, y = y), color ="#341069FF", linewidth =1.5) +geom_pointrange(data = probit_df %>% dplyr::filter(group !="1. Symmetric Normal") %>%mutate(y = y/100, lower = lower/100, upper = upper/100), aes(x = x, y = y, ymin = lower, ymax = upper),color ="#D3436EFF", size =1.5, linewidth =1, fatten =2) +# Add vertical threshold lines with tau labelsgeom_vline(xintercept =1:4+0.5, linetype ="dashed", color ="gray60", alpha =0.5) +annotate("text", x=1.65, y=0.42, label ="tau[1]", color ="gray50", parse =TRUE, size=6) +annotate("text", x=2.65, y=0.42, label ="tau[2]", color ="gray50", parse =TRUE, size=6) +annotate("text", x=3.65, y=0.42, label ="tau[3]", color ="gray50", parse =TRUE, , size=6) +annotate("text", x=4.65, y=0.42, label ="tau[4]", color ="gray50", parse =TRUE, , size=6) +facet_wrap(~group, ncol =2, labeller =as_labeller(setNames(stats_df$new_label, stats_df$group))) +scale_x_continuous(breaks =1:5) +scale_y_continuous(limits =c(0, 0.45), expand =expansion(mult =c(0.02, 0.08))) +labs(x ="Response Category", y ="") +theme_minimal() +theme(text =element_text(family ="serif"),strip.text =element_text(hjust =0.5, size=14), panel.grid =element_blank(), axis.text =element_text(size=14),axis.title =element_text(size=14))combined_plot <- p1 + p2 +plot_layout(widths =c(1, 2)) +plot_annotation(title ="Comparing Metric vs. Ordinal View of Data",subtitle ="Data: Light bars show observed (simulated) proportions\nMetric view: Dark line shows linear model-implied normal curve\nOrdinal view: Point-intervals show ordered probit model predicted probabilities & 95% CI\n Dashed lines represent category thresholds in latent space",theme =theme(plot.title =element_text(hjust =0.5),plot.subtitle =element_text(hjust =0)) )combined_plot
Following Kurz’s brms adaptation of Kruschke, we can also visualize the latent thresholds by fitting Bayesian threshold-only models and then plotting draws from the posterior distribution.
Show code
# Function to fit and save brms model for each distributionfit_brms_model <-function(data, group_name) {# Prepare the data my_data <-data.frame(response = data)# Fit a Bayesian cumulative probit model fit <-brm(data = my_data,family =cumulative(probit), response ~1,prior =prior(normal(0, 4), class = Intercept),iter =3000, warmup =1000, chains =4, cores =4,seed =123,file =paste0("Models/sim_fit_", group_name),file_refit ="on_change" )return(fit)}# Fit models for each distributionfit_symmetric <-fit_brms_model(data_symmetric, "symmetric")fit_uniform <-fit_brms_model(data_uniform, "uniform")fit_bimodal <-fit_brms_model(data_bimodal, "bimodal")fit_skew_left <-fit_brms_model(data_skew_left, "skew_left")fit_skew_right <-fit_brms_model(data_skew_right, "skew_right")# Create a list of fits for easy referencefits_list <-list(symmetric = fit_symmetric,uniform = fit_uniform,bimodal = fit_bimodal,skew_left = fit_skew_left,skew_right = fit_skew_right)# Function to create threshold plotcreate_threshold_plot <-function(fit, group_name) {# Extract draws draws <-as_draws_df(fit)# Compute the posterior means for each threshold means <- draws %>%summarise(across(starts_with("b_Intercept"), mean)) %>%pivot_longer(everything(), names_to ="name", values_to ="mean")# Prepare the data with explicit threshold calculation plot_data <- draws %>%pivot_longer(starts_with("b_Intercept"), names_to ="name", values_to ="threshold") %>%group_by(.draw) %>%# Calculate theta_bar as the mean of all thresholds for this drawmutate(theta_bar =mean(threshold))# Create the plotggplot(plot_data, aes(x = threshold, y = theta_bar, color = name)) +# Vertical lines at mean thresholdsgeom_vline(data = means,aes(xintercept = mean, color = name),linetype =2) +# Points with low alpha to show densitygeom_point(alpha =1/10) +scale_color_viridis_d(option ="magma", begin =0.2, end =0.8) +# Dynamic x-axis limits based on actual threshold valuesscale_x_continuous(breaks = means$mean,labels =c("τ1", "τ2", "τ3", "τ4") ) +labs(title = group_name,y ="Mean of Thresholds per Draw",x ="Latent Thresholds") +theme_minimal() +theme(legend.position ="none", axis.text.y =element_blank(), axis.ticks.y =element_blank() )}# Create threshold plots for each modelthreshold_plots <-list(create_threshold_plot(fit_symmetric, "A. Normal-like"),create_threshold_plot(fit_uniform, "B. Uniform"),create_threshold_plot(fit_bimodal, "C. Bimodal"),create_threshold_plot(fit_skew_left, "D. Skewed Left"),create_threshold_plot(fit_skew_right, "E. Skewed Right"))# Combine the threshold plotscombined_threshold_plot <-wrap_plots(threshold_plots, ncol =3)combined_threshold_plot
This simulation demonstrates a scenario where linear and ordinal models yield different substantive conclusions despite zero overall average treatment effects. We model a situation where a highly publicized police incident causes polarized reactions that vary by political ideology, illustrating how distributional changes can be obscured when focusing solely on average effects.
The key insight: Even when ordinal data appears “continuous-like” with symmetric distributions across many categories, the underlying categorical social processes can create meaningful distributional shifts that the cumulative ordinal model captures more accurately compared to the linear model - even with explicit proportional odds assumption violation.
4.1 Simulation Design
Show code
set.seed(1138)n_per_group <-1000# Create symmetric baseline distributions that look "normal enough" for OLS# This mimics what researchers see when they think linear regression should be finegenerate_baseline <-function(n, ideology) {# Both groups start with nearly identical, symmetric 7-point distributionsif(ideology =="Conservative") { probs <-c(0.06, 0.11, 0.21, 0.28, 0.19, 0.10, 0.05) } else { probs <-c(0.05, 0.10, 0.19, 0.28, 0.21, 0.11, 0.06) } responses <-sample(1:7, size = n, replace =TRUE, prob = probs)data.frame(id =1:n,ideology = ideology,time =0,response = responses )}baseline_data <-rbind(generate_baseline(n_per_group, "Conservative"),generate_baseline(n_per_group, "Liberal"))
# Check if models are actually different by comparing log-likelihoodsll_linear <-log_lik(linear_model)ll_ordinal <-log_lik(ordinal_model)cat("\nMean log-likelihood (linear):", round(mean(rowSums(ll_linear)), 2), "\n")
4.7 The Critical Difference: Predicted Probabilities
While both models show similar coefficient patterns, they make fundamentally different predictions about response probabilities. This reveals why focusing solely on coefficient interpretation can be misleading.
Show code
# Generate predictions from both models for post-incident periodnewdata_post <-expand_grid(ideology =c("Conservative", "Liberal"),time =1# Post-incident only)# Get predictions from linear model (need to convert to probabilities)linear_pred_draws <-posterior_epred(linear_model, newdata = newdata_post, re_formula =NA)# Convert linear predictions to "probabilities" by assuming normal distribution# and calculating probability of falling in each category rangeconvert_linear_to_probs <-function(linear_preds, categories =1:7) { n_draws <-nrow(linear_preds) n_conditions <-ncol(linear_preds)# Initialize probability array prob_array <-array(dim =c(n_draws, n_conditions, length(categories)))for(draw in1:n_draws) {for(cond in1:n_conditions) { mean_pred <- linear_preds[draw, cond]# Get residual SD from model using modern syntax sigma <-as_draws_df(linear_model)$sigma[draw]# Calculate probability of each categoryfor(cat in1:length(categories)) {if(cat ==1) { prob_array[draw, cond, cat] <-pnorm(1.5, mean_pred, sigma) } elseif(cat ==length(categories)) { prob_array[draw, cond, cat] <-1-pnorm(cat -0.5, mean_pred, sigma) } else { prob_array[draw, cond, cat] <-pnorm(cat +0.5, mean_pred, sigma) -pnorm(cat -0.5, mean_pred, sigma) } } } }return(prob_array)}# Get ordinal model predictionsordinal_pred_draws <-posterior_epred(ordinal_model, newdata = newdata_post, re_formula =NA)# Convert linear predictions to probabilitieslinear_probs <-convert_linear_to_probs(linear_pred_draws)# Calculate observed proportions for post-incident periodobserved_post <- sim_data %>% dplyr::filter(time ==1) %>%group_by(ideology, response) %>%summarise(count =n(), .groups ="drop") %>%group_by(ideology) %>%mutate(proportion = count /sum(count))# Create comparison datasetprob_comparison <-expand_grid(ideology =c("Conservative", "Liberal"),category =1:7) %>%mutate(condition_idx =case_when( ideology =="Conservative"~1, ideology =="Liberal"~2 ),# Linear model predictionslinear_prob =map2_dbl(condition_idx, category, ~mean(linear_probs[, .x, .y])),linear_lower =map2_dbl(condition_idx, category, ~quantile(linear_probs[, .x, .y], 0.025)),linear_upper =map2_dbl(condition_idx, category, ~quantile(linear_probs[, .x, .y], 0.975)),# Ordinal model predictionsordinal_prob =map2_dbl(condition_idx, category, ~mean(ordinal_pred_draws[, .x, .y])),ordinal_lower =map2_dbl(condition_idx, category, ~quantile(ordinal_pred_draws[, .x, .y], 0.025)),ordinal_upper =map2_dbl(condition_idx, category, ~quantile(ordinal_pred_draws[, .x, .y], 0.975)),category_label =factor(category, levels =1:7,labels =c("Very\nUnafraid", "Unafraid", "Somewhat\nUnafraid", "Neutral", "Somewhat\nAfraid", "Afraid", "Very\nAfraid")) ) %>%# Add observed proportionsleft_join( observed_post %>% dplyr::select(ideology, response, proportion),by =c("ideology", "category"="response") ) %>%# Reshape for plottingpivot_longer(cols =c(linear_prob, ordinal_prob, proportion),names_to ="type", values_to ="probability" ) %>%mutate(lower =case_when( type =="linear_prob"~ linear_lower, type =="ordinal_prob"~ ordinal_lower, type =="proportion"~NA_real_ ),upper =case_when( type =="linear_prob"~ linear_upper, type =="ordinal_prob"~ ordinal_upper, type =="proportion"~NA_real_ ),type_label =factor(type, levels =c("proportion", "linear_prob", "ordinal_prob"),labels =c("Observed", "Linear Model", "Ordinal Model")) )# Plot model comparison with colorblind-friendly palettemodel_comparison_plot <- prob_comparison %>%ggplot(aes(x = category_label, y = probability, color = type_label, shape = type_label)) +geom_point(size =2.5, position =position_dodge(width =0.4)) +geom_errorbar(aes(ymin = lower, ymax = upper), width =0.15, position =position_dodge(width =0.4),na.rm =TRUE) +geom_line(aes(group = type_label), position =position_dodge(width =0.4), alpha =0.7) +facet_wrap(~ideology, labeller = label_both) +scale_color_manual(values =c("Observed"="#000000", "Linear Model"="#D55E00", "Ordinal Model"="#0072B2"),name ="") +scale_shape_manual(values =c("Observed"=17, "Linear Model"=15, "Ordinal Model"=16),name ="") +scale_y_continuous(labels = scales::percent_format(accuracy =1)) +labs(title ="Model Accuracy: Observed vs. Predicted Probabilities",subtitle ="Ordinal model captures bimodal reality; linear model predicts impossible intermediate peaks",x ="Fear Response Category",y ="Probability" ) +theme_minimal(base_family ="serif") +theme(plot.title =element_text(hjust =0.5, size =13, face ="bold"),plot.subtitle =element_text(hjust =0.5, size =11),axis.text =element_text(size =10),axis.title =element_text(size =11),legend.position ="bottom",panel.grid.minor =element_blank(),panel.grid.major.x =element_blank(),strip.text =element_text(size =11, face ="bold") )model_comparison_plot
This simulation demonstrates several critical points about ordinal modeling in realistic scenarios:
Coefficient Similarity Masks Fundamental Differences: Both linear and cumulative probit models show similar coefficient direction and significance patterns, potentially leading researchers focused on significance testing to miss dramatic differences in substantive predictions about response probabilities.
Ordinal Superiority Despite Assumption Violations: The heterogeneous within-group responses (minority of conservatives moving toward “very afraid,” minority of liberals toward “very unafraid”) explicitly violate the proportional odds assumption, yet the cumulative probit model still vastly outperforms linear regression because it operates within the appropriate categorical framework.
Realistic Social Complexity: Real social phenomena rarely exhibit perfect proportional odds; rather, groups often might be expected show heterogeneous responses with minority segments reacting opposite to the majority. This simulation reflects such realistic complexity while demonstrating that cumulative ordinal models handle these violations better than linear alternatives even when imperfect (and category-specific models would accurately capture even those non-monotonic effects).
The “Looks Normal” Fallacy: Even symmetric 7-category distributions that appear appropriate for linear modeling can undergo categorical polarization processes with threshold-specific effects that linear models cannot represent, regardless of how “continuous” the original data appears.
Predictive vs. Parametric Framework: The cumulative ordinal modeling advantage is not perfect model specification, but using a modeling approach capable of representing categorical response processes. Linear models fundamentally cannot accurately predict complexities in ordinal distributions, such as bimodal categorical outcomes or threshold-specific effects.
Bottom Line: Researchers who focus solely on coefficient direction and statistical significance interpretation would miss that linear models predict impossible intermediate distributions while ordinal models (even when their assumptions are violated) better capture the underlying categorical response patterns. The methodological choice determines whether you conclude “gradual population-wide shift” (linear) versus “severe polarization” (cumulative ordinal) with minority cross-over effects (via category-specific ordinal extension); these represent fundamentally different substantive interpretations of the same social phenomenon.
5 Supp. Sim. S1b: Heterogenous Effects w/in Zero-Inflated Crime Data
5.1 Motivation and Design
The previous simulation (S1a) demonstrated that ordinal and linear models can yield similar average treatment effects while differing in predictive accuracy. However, many criminologists often appear to prioritize coefficient interpretation rather than predictive performance. This supplemental analysis (S1b) addresses a more fundamental concern: scenarios where linear and ordinal models yield different coefficients and substantively different conclusions about the nature and extent of heterogeneous treatment effects, due to their distinct estimands.
We simulate a zero-inflated crime distribution that resembles common criminological outcomes (e.g., self-reported offending, arrest counts). The scenario models heterogeneous treatment effects of an exogenous shock (economic strain) across two subpopulations that operate through completely different social mechanisms yet produce similar conditional means. This design challenges the adequacy of linear modeling frameworks that reduce complex social processes to single mean parameters.
5.2 Mathematical Summary of Data-Generating Process
Baseline: Both groups start with identical zero-inflated distribution
Post-Shock: Group-specific conditional mechanisms
Where represents:
High Constraint: Severity effects (existing offenders escalate)
Key Insight: True model includes but fitted models assume simple additive effects. Both are misspecified, but ordinal models better capture distributional consequences of the true heterogeneous mechanisms.
5.3 Theoretical Data-Generating Process
Rather than simply specifying probability distributions, we model the true underlying heterogeneous mechanisms by which an exogenous shock affects different populations. This approach creates a realistic scenario where the data arise from complex social processes that simple Group × Time regression models cannot fully capture, yet linear models will conclude there are “no group differences” because the mechanisms happen to produce similar conditional means.
Show code
# Define simulation data generation functions # True latent heterogeneous shock effects functiongenerate_latent_shock_effects <-function(baseline_response, group, shock_intensity =0.75) {# Group-specific latent parameters reflecting different social mechanismsif(group =="High Constraint") {# Severity mechanism: Conditional escalation process severity_threshold <-0.6# Probability of being affected if already offending escalation_rate <-2.0# How much existing offenders escalate initiation_rate <-0.1# Low probability of new initiationif(baseline_response ==0) {# Abstainers: low probability of initiationif(runif(1) < shock_intensity * initiation_rate) { new_response <-sample(1:2, 1, prob =c(0.7, 0.3)) } else { new_response <-0 } } else {# Existing offenders: high probability of escalationif(runif(1) < shock_intensity * severity_threshold) {# Escalate proportionally to current level escalation <-rpois(1, escalation_rate * baseline_response) new_response <-min(6, baseline_response + escalation) } else { new_response <- baseline_response } } } else { # Low Constraint# Prevalence mechanism: Initiation-focused process initiation_threshold <-0.7# High probability of initiation among abstainers severity_dampening <-0.2# Existing offenders less likely to escalate entry_level_bias <-3.0# New offenders cluster at low levelsif(baseline_response ==0) {# Abstainers: high probability of initiation at low levelsif(runif(1) < shock_intensity * initiation_threshold) {# New offenders cluster at levels 1-3 new_response <-sample(1:3, 1, prob =c(0.4, 0.35, 0.25)) } else { new_response <-0 } } else {# Existing offenders: dampened escalation (system is "saturated")if(runif(1) < shock_intensity * severity_dampening) { escalation <-sample(0:2, 1, prob =c(0.6, 0.3, 0.1)) new_response <-min(6, baseline_response + escalation) } else { new_response <- baseline_response } } }return(new_response)}# Generate data using the heterogeneous latent processgenerate_heterogeneous_data <-function(n_per_group, shock_intensity =0.75) {# Define shared baseline zero-inflated crime distribution for both groups baseline_probs <-c(0.50, 0.25, 0.15, 0.06, 0.03, 0.01, 0.00) # Re-define locally if not global# Start with baseline data for both groups baseline_data <-rbind(tibble(id =1:n_per_group,group ="High Constraint",time ="Baseline",response =sample(0:6, n_per_group, replace =TRUE, prob = baseline_probs) ),tibble(id =1:n_per_group,group ="Low Constraint",time ="Baseline",response =sample(0:6, n_per_group, replace =TRUE, prob = baseline_probs) ) )# Apply latent heterogeneous shock effects post_shock_data <- baseline_data %>%rowwise() %>%mutate(post_shock_response =generate_latent_shock_effects(response, group, shock_intensity),time ="Post-Shock",response = post_shock_response ) %>% dplyr::select(-post_shock_response)# Combine baseline and post-shock combined_data <-bind_rows(baseline_data, post_shock_data) %>%mutate(group =factor(group, levels =c("High Constraint", "Low Constraint")),time =factor(time, levels =c("Baseline", "Post-Shock")),shock =ifelse(time =="Post-Shock", 1, 0),crime_ordered =factor(response, levels =0:6, ordered =TRUE),participant_id =ifelse(group =="High Constraint", id, id + n_per_group) )return(combined_data)}set.seed(1138)n_per_group <-1000# Define shared baseline zero-inflated crime distribution for both groupsbaseline_probs <-c(0.50, 0.25, 0.15, 0.06, 0.03, 0.01, 0.00)# Calculate baseline mean for referencebaseline_mean <-sum((0:6) * baseline_probs)# Generate the data for single dataset demonstration# The function 'generate_heterogeneous_data' is now available globally from 'setup-sims'sim_data <-generate_heterogeneous_data(n_per_group, shock_intensity =0.75)cat("True Data-Generating Process:\n")
True Data-Generating Process:
Show code
cat("- High Constraint: Conditional escalation (severity effects among existing offenders)\n")
- High Constraint: Conditional escalation (severity effects among existing offenders)
Show code
cat("- Low Constraint: Initiation-focused (prevalence effects among abstainers)\n")
- Low Constraint: Initiation-focused (prevalence effects among abstainers)
Show code
cat("- Both mechanisms calibrated to produce identical mean changes (~0.58). The 'true interaction' for the linear model's estimand (difference in mean changes) is near zero.\n")
- Both mechanisms calibrated to produce identical mean changes (~0.58). The 'true interaction' for the linear model's estimand (difference in mean changes) is near zero.
Show code
cat("- Simple Group×Time models are misspecified (missing baseline×group×shock interactions)\n")
- Simple Group×Time models are misspecified (missing baseline×group×shock interactions)
Show code
cat("- Linear models will detect null interaction (similar means = 'no group differences')\n")
- Linear models will detect null interaction (similar means = 'no group differences')
Show code
cat("- Ordinal models may detect non-null interaction (different mechanisms across thresholds, reflecting distributional heterogeneity)\n")
- Ordinal models may detect non-null interaction (different mechanisms across thresholds, reflecting distributional heterogeneity)
5.4 Verification: Mean Similarity Despite Mechanism Differences
Show code
# Verify that we achieve similar means through different mechanismsobserved_summary <- sim_data %>%group_by(group, time) %>%summarise(n =n(),mean_crime =mean(response),.groups ="drop" ) %>%pivot_wider(names_from = time, values_from = mean_crime) %>%mutate(change =`Post-Shock`- Baseline)gt(observed_summary) %>%tab_header(title ="Observed Means by Group and Time Period") %>%fmt_number(columns =c(Baseline, `Post-Shock`, change), decimals =3) %>%cols_label(group ="Constraint Group",change ="Shock Effect" ) %>%tab_footnote(footnote ="Similar post-shock means despite different underlying mechanisms",locations =cells_column_labels(columns = change) )
Observed Means by Group and Time Period
Constraint Group
n
Baseline
Post-Shock
Shock Effect1
High Constraint
1000
0.881
1.462
0.581
Low Constraint
1000
0.847
1.428
0.581
1 Similar post-shock means despite different underlying mechanisms
Show code
# Calculate true interaction effect from latent process (for the linear model's estimand)true_group1_effect <- observed_summary$change[observed_summary$group =="High Constraint"]true_group2_effect <- observed_summary$change[observed_summary$group =="Low Constraint"]true_interaction <- true_group2_effect - true_group1_effect # This is the "true" mean-based interaction effect.cat("\nExpected Model Behavior:\n")
Expected Model Behavior:
Show code
cat("Linear model expectation: Interaction ≈ 0 (correctly detecting identical mean changes – its specific estimand)\n") # Clarified "correctly detecting" for its estimand
Linear model expectation: Interaction ≈ 0 (correctly detecting identical mean changes – its specific estimand)
Show code
cat("Ordinal model expectation: Interaction ≠ 0 (detecting distributional heterogeneity, which is a different estimand)\n") # Clarified "different estimand"
Ordinal model expectation: Interaction ≠ 0 (detecting distributional heterogeneity, which is a different estimand)
Show code
cat("This difference reflects different conceptions of 'treatment heterogeneity' based on chosen estimands.\n") # Refined wording
This difference reflects different conceptions of 'treatment heterogeneity' based on chosen estimands.
5.5 Visualization of Theoretical Distributions
Show code
# Calculate actual post-shock distributions from generated dataactual_distributions <- sim_data %>%count(group, time, response) %>%group_by(group, time) %>%mutate(proportion = n /sum(n)) %>%ungroup() %>% dplyr::select(group, time, crime_level = response, proportion) %>%# --- Complete missing combinations with proportion = 0 ---# Ensure all combinations of group, time, and crime_level (0-6) existcomplete(group, time, crime_level =0:6, fill =list(proportion =0)) %>%# Re-order factors for consistent plotting if needed (though group/time should be factors already)mutate(group =factor(group, levels =c("High Constraint", "Low Constraint")),time =factor(time, levels =c("Baseline", "Post-Shock")) )# Calculate means for each distribution for vertical linesdistribution_means <- actual_distributions %>%group_by(group, time) %>%summarise(# Re-calculate mean using completed proportions, so it's consistentmean_crime =sum(crime_level * proportion), .groups ="drop" ) %>%# Only show means for post-shock (since baseline is de-emphasized)filter(time =="Post-Shock")# Create vertical plot with dodged barstheoretical_dist_plot <- actual_distributions %>%ggplot(aes(x = crime_level, y = proportion, fill = time, alpha = time)) +geom_col(position =position_dodge(width =0.8), width =0.7) +# Add vertical lines at post-shock meansgeom_vline(data = distribution_means,aes(xintercept = mean_crime),linetype ="dashed",color ="black",size =1) +# Add mean labels, explicitly setting font family here for geom_textgeom_text(data = distribution_means, aes(x = mean_crime, y =max(actual_distributions$proportion) *0.9, # Use 'proportion' as max y from actual_distributionslabel =paste0("Mean = ", round(mean_crime, 2))),angle =0, vjust =1, hjust =-0.1, size =3.5,color ="black",family ="serif") +facet_wrap(~group, ncol =1, scales ="fixed") +scale_fill_manual(values =c("Baseline"="gray60", "Post-Shock"="#440154"),name ="Period" ) +scale_alpha_manual(values =c("Baseline"=0.3, "Post-Shock"=0.8),guide ="none"# Don't show alpha in legend ) +scale_x_continuous(name ="Crime Frequency (0 = No Crime, 6 = Frequent Crime)",breaks =0:6,labels =0:6 ) +scale_y_continuous(name ="Probability",labels =percent_format(accuracy =1) ) +theme_minimal() +theme(text =element_text(family ="serif"), strip.text =element_text(size =10, face ="bold", family ="serif"), legend.position ="bottom",panel.grid.minor =element_blank(),panel.grid.major.x =element_blank(),# axis.title = element_text(size = 10, family = "serif"), axis.title =element_blank(), axis.text =element_text(size =10, family ="serif"), plot.title =element_text(size =11, face ="bold"),plot.subtitle =element_text(size =10) ) +labs(title ="A. Pre/Post-Shock Crime Distributions by Group",subtitle ="High Constraint: Severity effects (existing offenders escalate)\nLow Constraint: Prevalence effects (abstainers begin offending)")theoretical_dist_plot
As the figure above illustrates, this simulation demonstrates a scenario where linear and ordinal models should yield fundamentally different conclusions about the nature and extent of treatment heterogeneity despite analyzing identical data. The distributions show that both groups experience identical mean increases in crime (~0.58 units) following the shock, but through completely different social mechanisms: High Constraint groups exhibit severity effects where existing offenders escalate to higher crime levels, while Low Constraint groups exhibit prevalence effects where former abstainers begin engaging in lower-level crime.
The Conceptual Challenge: Linear models, by design, can only detect heterogeneity that manifests as differences in conditional means. When complex social mechanisms happen to produce similar average outcomes, linear models conclude there are “no group differences” - not because there is no heterogeneity, but because their methodological framework cannot conceptualize heterogeneity beyond mean comparisons. This represents a fundamental limitation rather than accurate inference.
The Ordinal Alternative: Ordinal models attempt to capture distributional patterns across multiple response thresholds rather than reducing social processes to single mean parameters. When the underlying mechanisms create different patterns of category probabilities (even with similar means), ordinal models can detect this heterogeneity through threshold-specific effects. The cumulative ordinal model will be imperfect due to proportional odds violations, but it approximates the multi-dimensional nature of the true social processes better than models that assume all meaningful variation occurs in conditional means.
Consequently, we expect linear models to yield interaction estimates near zero (correctly summarizing mean similarity, which is their primary estimand, but missing distributional complexity), while ordinal models should produce non-zero interaction estimates that capture the distinct severity versus prevalence mechanisms through their effect on the latent scale and category probabilities. This illustrates how methodological frameworks shape - and potentially limit - researchers’ ability to detect and understand heterogeneous social processes.
5.6 Model Comparison: Linear vs. Ordinal
Here we fit both linear and ordinal models to our single simulated dataset and compare their coefficient estimates. The linear model treats crime levels as continuous values, while the ordinal model respects the categorical nature of the responses.
Show code
# Fit Bayesian linear and ordinal models with cachinglibrary(rstanarm)library(brms)# Set up for parallel processingoptions(mc.cores = parallel::detectCores())# Define cache file pathslinear_model_path <-here("Models", "s1b_linear_bayes.rds")ordinal_model_path <-here("Models", "s1b_ordinal_bayes.rds")# Fit Bayesian linear model using rstanarmif (file.exists(linear_model_path)) { linear_model_bayes <-readRDS(linear_model_path)cat("Loaded cached Bayesian linear model\n")} else {cat("Fitting Bayesian linear model...\n") linear_model_bayes <-stan_glm( response ~ group * time,data = sim_data,family =gaussian(),prior_intercept =normal(3, 2),prior =normal(0, 1),prior_aux =exponential(1),chains =4,iter =2000,seed =1138 )saveRDS(linear_model_bayes, linear_model_path)cat("Bayesian linear model fitted and cached\n")}
Loaded cached Bayesian linear model
Show code
# Fit Bayesian ordinal model using brmsif (file.exists(ordinal_model_path)) { ordinal_model_bayes <-readRDS(ordinal_model_path)cat("Loaded cached Bayesian ordinal model\n")} else {cat("Fitting Bayesian ordinal model...\n")# Prior for ordinal model with equal probability across 7 categories ord_prior <-prior(normal(-1.07, 1), class ="Intercept", coef ="1") +prior(normal(-0.57, 1), class ="Intercept", coef ="2") +prior(normal(-0.18, 1), class ="Intercept", coef ="3") +prior(normal(0.18, 1), class ="Intercept", coef ="4") +prior(normal(0.57, 1), class ="Intercept", coef ="5") +prior(normal(1.07, 1), class ="Intercept", coef ="6") +prior(normal(0, 1), class ="b") ordinal_model_bayes <-brm( crime_ordered ~ group * time,data = sim_data,family =cumulative("probit"),prior = ord_prior,chains =4,iter =2000,cores =4,seed =1138,control =list(adapt_delta =0.95) )saveRDS(ordinal_model_bayes, ordinal_model_path)cat("Bayesian ordinal model fitted and cached\n")}
Loaded cached Bayesian ordinal model
Show code
# Check convergencecat("Linear model Rhat values:\n")
# Extract coefficients for comparisonlibrary(broom.mixed) # Need this for rstanarm modelslinear_coefs_bayes <- linear_model_bayes %>% broom.mixed::tidy(conf.int =TRUE, conf.level =0.95) %>% dplyr::filter(str_detect(term, "group|time")) %>%mutate(model ="Linear (Bayesian)") %>% dplyr::select(term, estimate, conf.low, conf.high, model)ordinal_coefs_bayes <- ordinal_model_bayes %>% broom::tidy(conf.int =TRUE, conf.level =0.95) %>%# brms uses regular broom dplyr::filter(str_detect(term, "group|time")) %>%mutate(model ="Ordinal (Bayesian)") %>% dplyr::select(term, estimate, conf.low, conf.high, model)# Combine resultsmodel_comparison_bayes <-bind_rows(linear_coefs_bayes, ordinal_coefs_bayes) %>%mutate(term_clean =case_when(str_detect(term, ":") ~"Group × Time Interaction",str_detect(term, "time") ~"Time (Post vs. Pre)",TRUE~ term ) ) %>% dplyr::filter(term_clean %in%c("Time (Post vs. Pre)", "Group × Time Interaction")) %>%arrange(term_clean, model)# Display coefficient comparisongt(model_comparison_bayes) %>%tab_header(title ="Bayesian Model Coefficient Comparison") %>%fmt_number(columns =c(estimate, conf.low, conf.high), decimals =3) %>%cols_label(term ="Raw Term",term_clean ="Effect",estimate ="Posterior Mean",conf.low ="95% Credible Interval Lower",conf.high ="95% Credible Interval Upper",model ="Model" ) %>%tab_footnote(footnote ="Bayesian models with weakly informative priors; credible intervals from posterior distributions. Note that coefficients from linear and ordinal models have different estimands: linear coefficients reflect effects on the conditional mean of the observed outcome, while ordinal coefficients reflect effects on an underlying latent continuous variable.", # Added estimand clarificationlocations =cells_title() )
Bayesian Model Coefficient Comparison1
Raw Term
Posterior Mean
95% Credible Interval Lower
95% Credible Interval Upper
Model
Effect
groupLow Constraint:timePost-Shock
−0.001
−0.174
0.163
Linear (Bayesian)
Group × Time Interaction
groupLowConstraint:timePostMShock
0.185
0.051
0.319
Ordinal (Bayesian)
Group × Time Interaction
timePost-Shock
0.582
0.468
0.699
Linear (Bayesian)
Time (Post vs. Pre)
timePostMShock
0.359
0.263
0.455
Ordinal (Bayesian)
Time (Post vs. Pre)
1 Bayesian models with weakly informative priors; credible intervals from posterior distributions. Note that coefficients from linear and ordinal models have different estimands: linear coefficients reflect effects on the conditional mean of the observed outcome, while ordinal coefficients reflect effects on an underlying latent continuous variable.
Show code
# Posterior predictive checks for model validation# Panel C1: Linear model pp checkpanel_c1 <-pp_check(linear_model_bayes, ndraws =50) +ggtitle("C1. Linear Model Posterior Predictive Check") +theme_minimal() +theme(text =element_text(family ="serif"), plot.title =element_text(hjust =0.5, face ="bold", size =11),axis.text =element_text(size =10),# axis.title = element_text(size = 10)axis.title =element_blank(), )# Panel C2: Ordinal model pp checkpanel_c2 <-pp_check(ordinal_model_bayes, ndraws =50, type ="bars") +ggtitle("C2. Ordinal Model Posterior Predictive Check") +theme_minimal() +theme(text =element_text(family ="serif"), plot.title =element_text(hjust =0.5, face ="bold", size =11),axis.text =element_text(size =10),# axis.title = element_text(size = 10)axis.title =element_blank(), )# Combine the plotslibrary(patchwork)ppcheck_combined <- panel_c1 / panel_c2 +plot_annotation(title ="Posterior Predictive Checks: Model Fit Comparison",subtitle ="Dark line = observed data; Light lines = posterior predictions",theme =theme(text =element_text(family ="serif"), plot.title =element_text(size =11, face ="bold", hjust =0.5),plot.subtitle =element_text(size =10, hjust =0.5) ) )ppcheck_combined
Show code
# Print summary of model fitcat("Linear Model Summary:\n")
Linear Model Summary:
Show code
cat("- Assumes continuous normal distribution for residuals around the conditional mean\n") # Clarified assumption
- Assumes continuous normal distribution for residuals around the conditional mean
Show code
cat("- Predictions may extend beyond valid range (0-6) of the observed ordinal scale\n") # Clarified "valid range"
- Predictions may extend beyond valid range (0-6) of the observed ordinal scale
Show code
cat("- Poor fit expected for discrete ordinal data, especially in terms of category-specific probabilities\n\n") # Clarified "poor fit"
- Poor fit expected for discrete ordinal data, especially in terms of category-specific probabilities
Show code
cat("Ordinal Model Summary:\n")
Ordinal Model Summary:
Show code
cat("- Respects discrete nature of responses by modeling an underlying latent variable and thresholds\n") # Clarified
- Respects discrete nature of responses by modeling an underlying latent variable and thresholds
Show code
cat("- Predictions constrained to valid categories (0-6) and inherently provide category-specific probabilities\n") # Clarified
- Predictions constrained to valid categories (0-6) and inherently provide category-specific probabilities
Show code
cat("- Better fit expected for ordinal data, especially in capturing distributional patterns\n") # Clarified
- Better fit expected for ordinal data, especially in capturing distributional patterns
5.7 Interactions Represent Different Conceptions of Treatment Heterogeneity
Linear and ordinal models conceptualize and detect treatment heterogeneity differently based on their respective estimands. The linear model asks: “Do groups differ in mean response to treatment?” The ordinal model asks: “Do groups exhibit different patterns of response across ordinal outcome categories by affecting the underlying latent variable and its thresholds?”
The plot below demonstrates how the cumulative ordinal model uses its increased flexibility (via estimation of multiple ordered thresholds instead of the mean) to better capture differences in the different distributional patterns across groups and time periods, showing predicted probabilities for each crime level. The cumulative ordinal model estimates are not perfect though, since our treatment heterogeneity approach explicitly violated the proportional odds assumption (meaning a category-specific model would be even more appropriate here).
Show code
# Generate predicted probabilities using posterior samplesnewdata_grid <-expand_grid(group =factor(c("High Constraint", "Low Constraint")),time =factor(c("Baseline", "Post-Shock")))# Get posterior predicted probabilities from Bayesian ordinal modelordinal_preds_bayes <-posterior_epred(ordinal_model_bayes, newdata = newdata_grid, ndraws =1000)# Get posterior linear predictions (means, not individual predictions)linear_pred_means <-posterior_epred(linear_model_bayes, newdata = newdata_grid, ndraws =1000)# Get residual SD for converting means to probabilitiessigma_draws <-as_draws_df(linear_model_bayes)$sigma# Calculate predicted probability summaries for ordinal modelpred_summary_ordinal_bayes <-expand_grid(group =c("High Constraint", "Low Constraint"),time =c("Baseline", "Post-Shock"),category =0:6) %>%mutate(condition_idx =case_when( group =="High Constraint"& time =="Baseline"~1, group =="High Constraint"& time =="Post-Shock"~2, group =="Low Constraint"& time =="Baseline"~3, group =="Low Constraint"& time =="Post-Shock"~4 ),# Extract posterior draws for this condition and categoryposterior_draws =map2(condition_idx, category +1, ~ordinal_preds_bayes[, .x, .y]),# Calculate summary statistics from posteriorpred_prob =map_dbl(posterior_draws, mean),pred_lower =map_dbl(posterior_draws, ~quantile(.x, 0.025)),pred_upper =map_dbl(posterior_draws, ~quantile(.x, 0.975)),model ="Ordinal" ) %>% dplyr::select(group, time, category, pred_prob, pred_lower, pred_upper, model)# Convert linear MEANS to category probabilitiespred_summary_linear_bayes <- newdata_grid %>%mutate(condition_idx =row_number()) %>%crossing(category =0:6) %>%mutate(# For each condition/category, calculate probability across posterior meanscategory_probs =map2(condition_idx, category, function(cond, cat) {# Get linear MEANS for this condition (not individual predictions) linear_means <- linear_pred_means[, cond]# Use a single representative sigma (mean of posterior) sigma_mean <-mean(sigma_draws)# Convert means to category probabilities, assuming intervals are centered at integersmap_dbl(linear_means, function(mu) {case_when( cat ==0~pnorm(0.5, mean = mu, sd = sigma_mean), cat ==6~1-pnorm(5.5, mean = mu, sd = sigma_mean),TRUE~pnorm(cat +0.5, mean = mu, sd = sigma_mean) -pnorm(cat -0.5, mean = mu, sd = sigma_mean) ) }) }),# Summarize posterior for this categorypred_prob =map_dbl(category_probs, mean),pred_lower =map_dbl(category_probs, ~quantile(.x, 0.025)),pred_upper =map_dbl(category_probs, ~quantile(.x, 0.975)),model ="Linear" ) %>% dplyr::select(group, time, category, pred_prob, pred_lower, pred_upper, model)# Combine ordinal and linear predictionsall_predictions_bayes <-bind_rows(pred_summary_ordinal_bayes, pred_summary_linear_bayes)# Calculate observed proportionsobserved_props <- sim_data %>%count(group, time, response) %>%group_by(group, time) %>%mutate(observed_prob = n /sum(n)) %>%ungroup() %>% dplyr::select(group, time, category = response, observed_prob) %>%complete(group, time, category =0:6, fill =list(observed_prob =0))# Create predicted probability plot with Bayesian uncertaintyprob_plot_bayes <-ggplot() +# Observed data as barsgeom_col(data = observed_props,aes(x =factor(category), y = observed_prob),alpha =0.6, width =0.6, fill ="gray70") +# Model predictions - error bars first, then points on topgeom_errorbar(data = all_predictions_bayes,aes(x =factor(category), ymin = pred_lower, ymax = pred_upper, color = model),width =0.2, position =position_dodge(width =0.4), size =0.8) +geom_point(data = all_predictions_bayes,aes(x =factor(category), y = pred_prob, color = model, shape = model),size =3, position =position_dodge(width =0.4)) +facet_grid(group ~ time, scales ="free_y") +scale_color_viridis_d(name ="Model Type",option ="plasma",end =0.8,labels =c("Linear", "Ordinal") ) +scale_shape_manual(name ="Model Type",values =c("Linear"=17, "Ordinal"=16),labels =c("Linear", "Ordinal") ) +scale_x_discrete(name ="Crime Frequency (0 = No Crime, 6 = Frequent Crime)") +scale_y_continuous(name ="Probability", labels = scales::percent_format(accuracy =1)) +theme_minimal() +theme(text =element_text(family ="serif"), strip.text =element_text(size =10, face ="bold"),legend.position ="bottom",panel.grid.minor =element_blank(), axis.title =element_blank(), plot.title =element_text(size =11, face ="bold"),plot.subtitle =element_text(size =10) ) +labs(title ="B. Bayesian Model Predicted Probabilities vs. Observed Data",subtitle ="Gray bars = observed;\nPoints = posterior means with 95% credible intervals" )prob_plot_bayes
5.8 Monte Carlo Simulation Results
Now we move beyond the single dataset demonstration to examine the sampling distribution of interaction effect size estimates across many simulated datasets. This should show us how each modeling approach estimates its respective estimand and how their interaction coefficients diverge, which is important if researchers focus primarily on interpreting sign and significance of coefficients.
Show code
# Monte Carlo simulation with interaction effects, conditional means, & effect sizes# Set simulation parametersn_sims <-1000set.seed(1138)# Check if cached results existmonte_carlo_file <-here("Models", "s1b_monte_carlo_results_enhanced.rds")if (file.exists(monte_carlo_file)) {cat("Loading cached enhanced Monte Carlo results...\n") sim_results <-readRDS(monte_carlo_file)cat("Loaded results for", max(sim_results$sim_id, na.rm =TRUE), "simulations\n")} else {cat("Running enhanced Monte Carlo simulation...\n")# Enhanced storage for results sim_results <-tibble(sim_id =integer(),model_type =character(),interaction_estimate =numeric(),interaction_se =numeric(),interaction_lower =numeric(),interaction_upper =numeric(),significant =logical(),converged =logical(),# New columns for conditional means (on observed 0-6 scale)high_baseline_mean =numeric(),high_postshock_mean =numeric(),low_baseline_mean =numeric(),low_postshock_mean =numeric(),cohens_d =numeric() # This column will now correctly represent the d for each model type )# Progress tracking pb <- progress::progress_bar$new(format ="[:bar] :percent :eta",total = n_sims,clear =FALSE,width =60 )# Run Monte Carlo simulationfor (sim in1:n_sims) { pb$tick() # Corrected progress bar syntax (removed ->)# Generate data using the same latent process sim_data_mc <-generate_heterogeneous_data(n_per_group, shock_intensity =0.75)# Create prediction grid newdata_pred <-expand_grid(group =factor(c("High Constraint", "Low Constraint")),time =factor(c("Baseline", "Post-Shock")) )# Fit linear model linear_mc <-lm(response ~ group * time, data = sim_data_mc) linear_summary <- broom::tidy(linear_mc, conf.int =TRUE)# Get linear predictions linear_preds <-predict(linear_mc, newdata = newdata_pred)# Calculate Cohen's d for linear model (pooled overall treatment effect on observed scale) pooled_baseline_linear <-mean(linear_preds[c(1,3)]) pooled_postshock_linear <-mean(linear_preds[c(2,4)]) cohens_d_linear <- (pooled_postshock_linear - pooled_baseline_linear) /sd(sim_data_mc$response, na.rm =TRUE)# Extract interaction effect for linear model linear_interaction_row <- linear_summary %>% dplyr::filter(str_detect(term, "group.*time|time.*group|:"))if (nrow(linear_interaction_row) >0) { linear_result <-tibble(sim_id = sim,model_type ="Linear",interaction_estimate = linear_interaction_row$estimate[1],interaction_se = linear_interaction_row$std.error[1],interaction_lower = linear_interaction_row$conf.low[1],interaction_upper = linear_interaction_row$conf.high[1],significant = linear_interaction_row$p.value[1] <0.05,converged =TRUE,high_baseline_mean = linear_preds[1],high_postshock_mean = linear_preds[2],low_baseline_mean = linear_preds[3],low_postshock_mean = linear_preds[4],cohens_d = cohens_d_linear ) } else { linear_result <-tibble(sim_id = sim,model_type ="Linear",interaction_estimate =NA_real_,interaction_se =NA_real_,interaction_lower =NA_real_,interaction_upper =NA_real_,significant =FALSE,converged =FALSE,high_baseline_mean =NA_real_,high_postshock_mean =NA_real_,low_baseline_mean =NA_real_,low_postshock_mean =NA_real_,cohens_d =NA_real_ ) }# Fit frequentist ordinal model# Use a placeholder to ensure bind_rows always has a full row even if tryCatch fails ordinal_result_placeholder <-tibble( sim_id = sim,model_type ="Ordinal",interaction_estimate =NA_real_,interaction_se =NA_real_,interaction_lower =NA_real_,interaction_upper =NA_real_,significant =FALSE,converged =FALSE,high_baseline_mean =NA_real_,high_postshock_mean =NA_real_,low_baseline_mean =NA_real_,low_postshock_mean =NA_real_,cohens_d =NA_real_ ) ordinal_result <- ordinal_result_placeholder # Initialize with placeholdertryCatch({ ordinal_mc <- ordinal::clm(crime_ordered ~ group * time, data = sim_data_mc) ordinal_tidy <- broom::tidy(ordinal_mc, conf.int =TRUE)# Get ordinal predictions and convert to expected values ordinal_preds_probs <-predict(ordinal_mc, newdata = newdata_pred, type ="prob")$fit ordinal_expected_values <-apply(ordinal_preds_probs, 1, function(probs) sum((0:6) * probs))# Find interaction term for its coefficient ordinal_interaction_row <- ordinal_tidy %>% dplyr::filter(str_detect(term, "groupLow Constraint.*timePost-Shock|timePost-Shock.*groupLow Constraint"))if (nrow(ordinal_interaction_row) >0) {# Construct ordinal_result tibble here, ensuring all columns are assigned ordinal_result <-tibble(sim_id = sim,model_type ="Ordinal",interaction_estimate = ordinal_interaction_row$estimate[1],interaction_se = ordinal_interaction_row$std.error[1],interaction_lower = ordinal_interaction_row$conf.low[1],interaction_upper = ordinal_interaction_row$conf.high[1],significant = ordinal_interaction_row$p.value[1] <0.05,converged =TRUE,high_baseline_mean = ordinal_expected_values[1],high_postshock_mean = ordinal_expected_values[2],low_baseline_mean = ordinal_expected_values[3],low_postshock_mean = ordinal_expected_values[4],cohens_d = ordinal_interaction_row$estimate[1] # Latent coef is Cohen's d equivalent ) } else {# If interaction_row not found, use placeholder defaults and set converged to FALSE ordinal_result <- ordinal_result_placeholder ordinal_result$converged <-FALSE } }, error =function(e) {cat("Simulation", sim, "ordinal model failed:", e$message, "\n")# If error, use placeholder defaults ordinal_result <- ordinal_result_placeholder ordinal_result$converged <-FALSE })# Store results sim_results <-bind_rows(sim_results, linear_result, ordinal_result) }# Save simulation resultssaveRDS(sim_results, monte_carlo_file)cat("Enhanced Monte Carlo simulation complete. Results cached.\n")}
Loading cached enhanced Monte Carlo results...
Loaded results for 1000 simulations
This section examines how well each modeling approach recovers its respective estimand across repeated sampling. The key question: do linear and ordinal models systematically differ in their estimates of treatment heterogeneity, given their distinct conceptualizations of the underlying social processes?
Show code
# Filter to successfully converged modelsconverged_results <- sim_results %>% dplyr::filter(converged ==TRUE, !is.na(interaction_estimate))# Calculate summary statistics for interaction estimatesinteraction_summary <- converged_results %>%group_by(model_type) %>%summarise(n =n(),mean_estimate =mean(interaction_estimate, na.rm =TRUE),median_estimate =median(interaction_estimate, na.rm =TRUE),sd_estimate =sd(interaction_estimate, na.rm =TRUE),proportion_significant =mean(significant, na.rm =TRUE), # FIXED: Use 'significant' not 'interaction_significant'q25 =quantile(interaction_estimate, 0.25, na.rm =TRUE),q75 =quantile(interaction_estimate, 0.75, na.rm =TRUE),.groups ="drop" )gt(interaction_summary) %>%tab_header(title ="Interaction Effect Sampling Distribution Summary",subtitle =paste0("Based on ", n_sims, " Monte Carlo simulations") ) %>%fmt_number(columns =c(mean_estimate, median_estimate, sd_estimate, q25, q75), decimals =4) %>%fmt_percent(columns = proportion_significant, decimals =1) %>%cols_label(model_type ="Model Type",n ="N Converged",mean_estimate ="Mean",median_estimate ="Median",sd_estimate ="SD",proportion_significant ="% Significant",q25 ="Q25",q75 ="Q75" ) %>%tab_footnote(footnote ="Interaction represents group × time effect. For the linear model, this is the effect on the conditional mean. For the ordinal model, it is the effect on the latent scale, reflecting distributional shifts.",locations =cells_column_labels(columns = mean_estimate) )
Interaction Effect Sampling Distribution Summary
Based on 1000 Monte Carlo simulations
Model Type
N Converged
Mean1
Median
SD
% Significant
Q25
Q75
Linear
1000
−0.0746
−0.0740
0.0469
1.8%
−0.1070
−0.0440
Ordinal
1000
0.4065
0.4076
0.0587
99.9%
0.3672
0.4454
1 Interaction represents group × time effect. For the linear model, this is the effect on the conditional mean. For the ordinal model, it is the effect on the latent scale, reflecting distributional shifts.
Show code
# Reference the mean changes from our latent process (the "true" linear estimand)cat("Mean Changes from Latent Process (True Linear Estimand):\n")
Mean Changes from Latent Process (True Linear Estimand):
cat("Difference in mean changes (True Linear Interaction):", round(true_interaction, 4), "\n\n")
Difference in mean changes (True Linear Interaction): 0
Show code
cat("Expected vs Observed Model Behavior:\n")
Expected vs Observed Model Behavior:
Show code
cat("- Linear models should estimate interaction ≈", round(true_interaction, 4), " (unbiased for its estimand)\n")
- Linear models should estimate interaction ≈ 0 (unbiased for its estimand)
Show code
cat("- Ordinal models may estimate interaction ≠", round(true_interaction, 4), " because they quantify a different aspect of heterogeneity (latent scale shift)\n")
- Ordinal models may estimate interaction ≠ 0 because they quantify a different aspect of heterogeneity (latent scale shift)
Show code
cat("- This highlights that depending on the research question, different estimands and models may be more appropriate for capturing complex social processes.\n\n")
- This highlights that depending on the research question, different estimands and models may be more appropriate for capturing complex social processes.
Show code
# Test if model types differ significantly in their estimatesif (nrow(converged_results) >0) { linear_estimates <- converged_results %>% dplyr::filter(model_type =="Linear") %>%pull(interaction_estimate) ordinal_estimates <- converged_results %>% dplyr::filter(model_type =="Ordinal") %>%pull(interaction_estimate)# Welch's t-test for difference in meansif (length(linear_estimates) >1&length(ordinal_estimates) >1) { estimate_comparison <-t.test(ordinal_estimates, linear_estimates)cat("Difference between model estimates (Ordinal - Linear):\n")cat("Mean difference:",round(estimate_comparison$estimate[1] - estimate_comparison$estimate[2], 4), "\n")cat("95% CI:", round(estimate_comparison$conf.int[1], 4),"to", round(estimate_comparison$conf.int[2], 4), "\n")cat("p-value:", format.pval(estimate_comparison$p.value, digits =4), "\n") }}
Difference between model estimates (Ordinal - Linear):
Mean difference: 0.48
95% CI: 0.48 to 0.49
p-value: < 0.00000000000000022
This visualization shows the sampling distributions of interaction effect estimates from both modeling approaches. It highlights how linear models accurately recover the null mean-based interaction, while ordinal models capture a significant latent scale interaction that reflects the underlying distributional heterogeneity.
Show code
# Find the significance transition thresholds for each modeltransition_thresholds <- converged_results %>%group_by(model_type) %>%arrange(interaction_estimate) %>%summarise(# Find where significance changes from TRUE to FALSE (upper threshold)sig_upper_threshold =max(interaction_estimate[significant], na.rm =TRUE), # # Find where significance changes from FALSE to TRUE (lower threshold) sig_lower_threshold =min(interaction_estimate[significant], na.rm =TRUE), # .groups = "drop" )# Print the transition thresholdscat("Significance Transition Thresholds:\n")
Significance Transition Thresholds:
Show code
print(transition_thresholds)
# A tibble: 2 × 3
model_type sig_upper_threshold sig_lower_threshold
<chr> <dbl> <dbl>
1 Linear -0.174 -0.233
2 Ordinal 0.611 0.233
Show code
# Find the overall range where estimates are non-significant for BOTH modelsoverall_nonsig_min <-max(transition_thresholds$sig_lower_threshold) # Right edge of leftmost significant regionoverall_nonsig_max <-min(transition_thresholds$sig_upper_threshold) # Left edge of rightmost significant regioncat("\nOverall non-significant region (both models):\n")
# Calculate density datacreate_density_data <-function(data, model_name) { estimates <- data$interaction_estimate[data$model_type == model_name] dens <-density(estimates, n =512)tibble(x = dens$x, y = dens$y, model_type = model_name)}linear_dens <-create_density_data(converged_results, "Linear")ordinal_dens <-create_density_data(converged_results, "Ordinal")all_dens <-bind_rows(linear_dens, ordinal_dens)# Create the plot with grey box in the middle (non-significant for both)sampling_dist_plot <-ggplot() +# Full density curvesgeom_line(data = all_dens,aes(x = x, y = y, color = model_type),size =1 ) +geom_area(data = all_dens,aes(x = x, y = y, fill = model_type),alpha =0.5 ) +# Grey box covering the region where BOTH models are non-significantgeom_rect(aes(xmin = overall_nonsig_min, xmax = overall_nonsig_max, ymin =-Inf, ymax =Inf),fill ="grey50",alpha =0.2 ) +# Reference linesgeom_vline(xintercept =0, linetype ="dashed", size =1, color ="black") +scale_fill_viridis_d(name ="Model Type", option ="plasma", end =0.8) +scale_color_viridis_d(name ="Model Type", option ="plasma", end =0.8) +scale_x_continuous(name ="Group × Time Interaction Estimate", breaks =seq(-0.2, 0.7, 0.10)) +scale_y_continuous(name ="Density") +theme_minimal() +theme(text =element_text(family ="serif"), legend.position ="bottom", axis.title =element_blank(), plot.title =element_text(size =11, face ="bold"),plot.subtitle =element_text(size =10) ) +labs(title ="D. Sampling Distributions of Interaction Coefficient",subtitle ="Based on N=1,000 Monte Carlo simulations per model type\nGrey box = region where both models yield non-significant estimates" )sampling_dist_plot
Show code
# Create significance detection plotsignificance_data <- converged_results %>%group_by(model_type, significant) %>%# FIXED: Use 'significant'summarise(count =n(), .groups ="drop") %>%group_by(model_type) %>%mutate(total =sum(count),proportion = count / total,significance_label =ifelse(significant, "Significant", "Non-significant") )sig_plot <-ggplot(significance_data, aes(x = model_type, y = proportion, fill = significance_label)) +geom_col(position ="stack", alpha =0.8) +geom_text(aes(label =paste0(round(proportion *100, 1), "%")),position =position_stack(vjust =0.5), size =4, fontface ="bold") +scale_fill_manual(name ="Effect Detection",values =c("Significant"="#440154", "Non-significant"="#fde725"),guide =guide_legend(reverse =TRUE) ) +scale_y_continuous(name ="Proportion", labels =percent_format()) +scale_x_discrete(name ="Model Type") +theme_minimal() +theme(legend.position ="bottom",axis.title =element_text(size =10),plot.title =element_text(size =11, face ="bold", hjust =0.5) ) +labs(title ="Proportion of Simulations Detecting Significant Interaction")# Combine plotscombined_mc_plot <- sampling_dist_plot / sig_plot +plot_layout(heights =c(2, 1)) +plot_annotation(title ="Figure S1B. Monte Carlo Simulation Results: Linear vs. Ordinal Models",subtitle ="Heterogeneous treatment effects in zero-inflated crime data: Different estimands lead to different conclusions",theme =theme(plot.title =element_text(size =11, face ="bold", hjust =0.5),plot.subtitle =element_text(size =10, hjust =0.5) ) )# combined_mc_plot
Show code
# Calculate summary statistics using the Monte Carlo resultsenhanced_mc_summary <- sim_results %>% dplyr::filter(converged ==TRUE) %>%group_by(model_type) %>%summarise(N_estimates =n(),Mean_Interaction_Est =mean(interaction_estimate, na.rm =TRUE),Prop_Interaction_Sig =mean(significant, na.rm =TRUE),# These are now true Monte Carlo averagesAvg_High_Pre =mean(high_baseline_mean, na.rm =TRUE),Avg_High_Post =mean(high_postshock_mean, na.rm =TRUE),Avg_Low_Pre =mean(low_baseline_mean, na.rm =TRUE),Avg_Low_Post =mean(low_postshock_mean, na.rm =TRUE),Avg_Cohens_D =mean(cohens_d, na.rm =TRUE),# Add variability measuresSD_Cohens_D =sd(cohens_d, na.rm =TRUE),.groups ="drop" )gt(enhanced_mc_summary) %>%tab_header(title ="Monte Carlo Simulation: Comprehensive Summary",subtitle ="Model Coefficients and Predicted Conditional Means (Averaged Across 1000 Simulations)" ) %>%fmt_number(columns =c(Mean_Interaction_Est, Avg_High_Pre, Avg_High_Post, Avg_Low_Pre, Avg_Low_Post, Avg_Cohens_D, SD_Cohens_D), decimals =3) %>%fmt_percent(columns = Prop_Interaction_Sig, decimals =1) %>%cols_label(model_type ="Model Type",N_estimates ="N",Mean_Interaction_Est ="Interaction Coef.",Prop_Interaction_Sig ="% Sig.",Avg_High_Pre ="High/Pre",Avg_High_Post ="High/Post",Avg_Low_Pre ="Low/Pre",Avg_Low_Post ="Low/Post",Avg_Cohens_D ="Cohen's d",SD_Cohens_D ="SD(d)" ) %>%tab_spanner(label ="Average Predicted Conditional Means",columns =c(Avg_High_Pre, Avg_High_Post, Avg_Low_Pre, Avg_Low_Post) ) %>%tab_spanner(label ="Overall Treatment Effect Size",columns =c(Avg_Cohens_D, SD_Cohens_D) ) %>%tab_footnote(footnote ="Interaction coefficient from model. For the Linear Model, this reflects the average difference in conditional means. For the Ordinal Model, this reflects the average difference in latent means (on a standardized scale where latent SD=1).",locations =cells_column_labels(columns = Mean_Interaction_Est) ) %>%tab_footnote(footnote ="Averaged across 1000 Monte Carlo simulations. Shows both models recover similar conditional means when extracted from predicted distributions.",locations =cells_column_spanners(spanners ="Average Predicted Conditional Means") ) %>%tab_footnote(footnote ="Cohen's d: For the Linear Model, this is Cohen's d for the pooled overall treatment effect, calculated on the observed outcome scale. For the Ordinal Model, this is the latent Cohen's d for the interaction, which is the direct estimate of the interaction coefficient on the standardized latent scale (where latent SD=1). For the Ordinal Model, the 'Interaction Coef.' and 'Cohen's d' columns will display the same average value, as the coefficient itself is directly interpretable as Cohen's d on the latent scale. SD(d) shows variability across simulations.",locations =cells_column_spanners(spanners ="Overall Treatment Effect Size") )
Monte Carlo Simulation: Comprehensive Summary
Model Coefficients and Predicted Conditional Means (Averaged Across 1000 Simulations)
Model Type
N
Interaction Coef.3
% Sig.
Average Predicted Conditional Means1
Overall Treatment Effect Size2
High/Pre
High/Post
Low/Pre
Low/Post
Cohen's d
SD(d)
Linear
1000
−0.075
1.8%
0.902
1.500
0.899
1.422
0.397
0.015
Ordinal
1000
0.406
99.9%
0.948
1.248
0.944
1.559
0.406
0.059
1 Averaged across 1000 Monte Carlo simulations. Shows both models recover similar conditional means when extracted from predicted distributions.
2 Cohen's d: For the Linear Model, this is Cohen's d for the pooled overall treatment effect, calculated on the observed outcome scale. For the Ordinal Model, this is the latent Cohen's d for the interaction, which is the direct estimate of the interaction coefficient on the standardized latent scale (where latent SD=1). For the Ordinal Model, the 'Interaction Coef.' and 'Cohen's d' columns will display the same average value, as the coefficient itself is directly interpretable as Cohen's d on the latent scale. SD(d) shows variability across simulations.
3 Interaction coefficient from model. For the Linear Model, this reflects the average difference in conditional means. For the Ordinal Model, this reflects the average difference in latent means (on a standardized scale where latent SD=1).
5.11 Combined Visualization
Show code
# Define the layout for the 4 panelsdesign <-" AB CD"# Combine the plots using their generated objects from previous chunks# Apply labels (A, B, C, D) directly within the wrap_plots call for consistent placement.combined_s1_plot <-wrap_plots(A = theoretical_dist_plot, B = prob_plot_bayes, C = ppcheck_combined, D = sampling_dist_plot ) +plot_layout(design = design) +# Apply the defined layoutplot_annotation(title ="Figure S1B. Linear & Ordinal Modeling of Heterogeneous Effects in Simulated Crime Data",theme =theme(plot.title =element_text(hjust =0.5, size =12, face ="bold") ) ) combined_s1_plot
This simulation demonstrates several critical points about ordinal modeling for criminological research:
Mean Equivalence Can Mask Underlying Heterogeneity: Linear models, which focus on conditional means, can entirely miss fundamental differences in how effects manifest. Even when complex social mechanisms lead to similar average changes in outcomes across groups, the linear model will accurately report a near-null mean-based interaction, failing to detect underlying (categorically) divergent processes.
Divergent Inferential Conclusions on Heterogeneity: As a direct consequence, linear models may yield non-significant or misleadingly small (or even mis-signed) coefficients for heterogeneous effects, while ordinal models (whose coefficients reflect shifts on the latent scale) consistently detect a statistically significant and substantively meaningful interaction, leading to fundamentally different inferential conclusions about the presence and nature of heterogeneity.
Ordinal Models Capture Distinct Mechanisms: By explicitly modeling the categorical nature of the data and an underlying latent variable, the cumulative ordinal model in this example was sensitive to different types of change (e.g., prevalence effects where abstainers begin offending vs. severity effects where existing offenders escalate). This provides a richer and more accurate understanding of the social processes, which are obscured by mean-focused linear models.
Superiority Despite Assumption Violations: The design explicitly encodes heterogeneous treatment effects that violate the cumulative ordinal model’s proportional odds assumption. However, the cumulative ordinal model still offers a more informative signal about underlying heterogeneity than the linear model, which fundamentally mischaracterizes the ordinal outcome distribution.
Implications for Causal Inference and Robustness Checks: Relying on linear models as “robustness checks” by simply comparing coefficient signs and significance can be deeply misleading. This simulation demonstrates that models sensitive to distributional patterns can provide better insights into differential causal mechanisms, thereby enhancing the accuracy of causal inference and substantive interpretation.
Bottom Line: Researchers who focus solely on coefficient direction and statistical significance would miss significant underlying heterogeneity, concluding “no group difference” when distinct social mechanisms are at play. The methodological choice determines whether you overlook fundamental differences in how groups experience an effect (linear model) versus accurately identify divergent underlying causal processes (cumulative ordinal model), leading to profoundly different substantive conclusions from the same data.
Now, let’s move on to illustrate some of these issues with real data. We are ready to load and skim data shared by Pickett and colleagues from their study on racial differences in fear of police.
6 PGC Data
In this tutorial, we are not trying to reproduce the numerous models or results for all outcomes reported in the authors’ original manuscript (though we certainly encourage that as a laudable goal). Rather, we we want to dig deeply into the application of cumulative probit models, so we will restrict our focus primarily to describing relationships between participant’s self-reported “race” and their “personal fear” of the police.
6.1 Load data
Pickett, Graham, and Cullen (hereafter PGC) provided Stata data and code (.do) files. Let’s start by reading in their data file.
Show code
dat <-read_dta(file =here("Data", "feardata(12).dta"))
Initially, Jake loaded data and translated all of PGC’s data wrangling code from their Stata .do file into R in another file. Here, since we are focusing solely on a handful of key items, we will filter the data to keep only those relevant items and then skim the data.
Show code
dat <- dat %>% dplyr::select(caseid, race, Q2_1, Q2_2, Q2_3, Q2_4, Q2_5, Q2_6, Q2_7, Q2_8, Q2_9, Q2_10)
Now, let’s use haven to zap Stata formats and attributes, then skim the data.
Show code
# str(dat)dat <-zap_formats(dat) dat <-zap_labels(dat) dat <-zap_label(dat) # str(dat)dat %>% dplyr::select(-caseid) %>%datasummary_skim()
Unique
Missing Pct.
Mean
SD
Min
Median
Max
Histogram
race
7
0
1.8
1.1
1.0
2.0
7.0
Q2_1
5
0
3.0
1.3
1.0
3.0
5.0
Q2_2
5
0
3.1
1.3
1.0
3.0
5.0
Q2_3
6
0
3.0
1.3
1.0
3.0
5.0
Q2_4
5
0
3.1
1.4
1.0
3.0
5.0
Q2_5
6
0
3.1
1.5
1.0
3.0
5.0
Q2_6
5
0
3.0
1.5
1.0
3.0
5.0
Q2_7
5
0
3.1
1.5
1.0
3.0
5.0
Q2_8
5
0
3.0
1.5
1.0
3.0
5.0
Q2_9
5
0
3.0
1.5
1.0
3.0
5.0
Q2_10
5
0
3.0
1.6
1.0
3.0
5.0
6.2 Race/Ethnicity
We will restrict our focus to fear contrasts between Black and White participants; this was the central contrast reported throughout PGC’s paper. In some figures, PGC also reported results for a combined “other” race/ethnicity group; however, that group’s small subsample size generated much noisier estimates (i.e., very wide confidence bounds).
Here is PGC’s description of the “personal fear” measure.
“For our survey, we developed original measures of both personal and altruistic fear of the police (see the appendix at the end of the article). In so doing, we followed best practices for measuring fear (Ferraro, 1995), and we modeled our measures on those used in recent studies to examine similar emotions in other contexts (e.g., Burton et al., 2021; Pickett, Roche, et al., 2018). To measure personal fear, we asked respondents about their emotional fear “that the police will do the following things to you without good reason in the next five years.” They rated how afraid they were (0 = very unafraid, 4 = very afraid) of falling victim to ten types of police mistreatment (e.g., “punch or kick you,” “pepper spray you,” or “kill you”). The full list of items and question wording are provided in the appendix. We averaged the ten items to construct an index (loadings: .823 to .958, α = .98) on which higher values indicated greater personal fear.
Here is the precise description of the from their appendix:
Stem: “Now, we want to ask about EMOTIONAL FEAR. How afraid or unafraid are you that the POLICE will do the following things to you WITHOUT GOOD REASON in the next five years?”
Items: 1. Stop you 2. Search you 3. Yell at you 4. Handcuff you 5. Punch or kick you 6. Pin you to the ground 7. Pepper spray you 8. Use a taser on you 9. Shoot at you with a gun 10. Kill you
Response Scale: Very afraid, afraid, neither afraid nor unafraid, unafraid, very unafraid.
Following PGC, we will recode each of the 10 items so that higher values indicate greater fear. Specifically, the items are coded as: 0=very unfraid; 1=unafraid; 2=neither afraid nor afraid; 3=afraid; 4=very afraid. We will also follow their column naming conventions.
Recoding worked as intended. It appears there is one missing (NA) observation on Q2_5 (“kickyou”). It is not completely clear to me how PGC handled this particular NA value or missing data in general. Jake addressed missingness in more detail (regarding an “income” variable) in a different file where he was adapting PGC’s code to R, but I did not see any info related to how this one case was treated. Given our aims, since there is only one NA value, I am going to drop it and move on, with the expectation that any differences in reproducing estimates resulting from this decision will be trivial. However, it is worth noting that ordinal regression methods are fully compatible with multiple imputation approaches to missing data.
Show code
dat <- dat %>% dplyr::select(-all_of(fearpol_old)) %>%drop_na()dat %>% dplyr::select(-caseid) %>%datasummary_skim()
Unique
Missing Pct.
Mean
SD
Min
Median
Max
Histogram
stopyou
5
0
2.0
1.3
0.0
2.0
4.0
searchyou
5
0
1.9
1.3
0.0
2.0
4.0
yellyou
5
0
1.9
1.3
0.0
2.0
4.0
handcyou
5
0
1.9
1.4
0.0
2.0
4.0
kickyou
5
0
1.9
1.5
0.0
2.0
4.0
pinyou
5
0
1.9
1.5
0.0
2.0
4.0
sprayyou
5
0
1.9
1.5
0.0
2.0
4.0
taseyou
5
0
1.9
1.5
0.0
2.0
4.0
shootyou
5
0
1.9
1.5
0.0
2.0
4.0
killyou
5
0
2.0
1.6
0.0
2.0
4.0
race
N
%
White
504
49.4
Black
516
50.6
Now that we have data on 10 ordinal outcomes and one dichotomous categorical predictor, we can move on to modeling the data.
7 (Not-so-)Normal models
A primary purpose of statistical modeling is to accurately estimate relationships between observed variables in a data set; if the data and model are good, then a model can help extract accurate “signals” like causal inferences or predictions about future outcomes from observed data. We learn in introductory statistics courses that all models require assumptions. Some assumptions we make are warranted, and some assumptions are unwarranted but their violation imposes negligible penalties. However, violating some unwarranted assumptions puts our model at risk of failing to meet our primary goal of accurately estimating relationships among observed variables.
One such assumption is that popular linear modeling techniques designed for metric normal data will perform well when applied to items or scales created using ordinal items. Liddell and Kruschke (2018) convincingly show how such assumptions can go wrong, generating model predictions that exhibit poor fit to the data and resulting in substantial magnitude and even directional inference errors. Rather than duplicate those efforts, we refer readers to their paper and other work on the topic (e.g., Harrell; Bürkner & Vuorre). Instead, in this tutorial, we wish to introduce criminologists to this important topic by illustrating how cumulative probit regression models and visualization of predicted probabilities from those models can improve the signals that we extract from ordinal data.
We will use R packages to apply Bayesian ordinal regression techniques to these data. The posterior distributions that Bayesian models generate are advantageous for subsequent model checking and flexible visualization of results. However, traditional frequentist approaches for applying ordinal regressions are available in popular statistical programs (e.g., R: ordinal, MASS::polr(https://rdrr.io/cran/MASS/man/polr.html), rms::orm; SAS: proc probit, proc logistic; Stata: ologit, oprobit; continuous Y outcomes: R package rms::orm or SAS JMP; quick-start tutorials: R, Stata, SAS).
Topics covered:
Introduce the mosaic plot as an alternative to a scatterplot for visualizing bivariate ordinal response distributions
Compare how accurately intercept-only models recover underlying data distributions across linear and cumulative probit regression techniques.
Compare linear and cumulative probit regression results of item-specific analyses estimating race differences in fear of police on latent item scales (regression betas)
Briefly introduce application of category-specific probit regression techniques for even more accurate recovery of underlying distributions (for online supplement only).
Compare results from a linear model applied to PGC’s multi-item composite scale measuring latent fear of police to those from an ordinal multilevel model predicting latent fear of police.
Extract race-specific predicted probabilities and calculate black-white contrasts and illustrate visualizations for conveying these estimates.
8 Descriptive visualizations
Before we start modeling, it is a good idea to visualize our data. Let’s start by viewing the ten observed item distributions using frequency bar graphs. Remember, we want our models to accurately recover and then conditionally predict meaningful characteristics of these distributions (e.g., their means or ordinal thresholds in linear and cumulative probit models, respectively).
8.1 Univariate bar charts
Show code
# Get colors from viridis magma scale (between 0.2 and 0.8)magma_colors <-viridis(n =5, option ="magma", begin =0.2, end =0.8)# show_col(magma(50))# Create outcome labels fear_labels <-c("stopyou"="Stop You","searchyou"="Search You", "yellyou"="Yell at You","handcyou"="Handcuff You","kickyou"="Kick/Hit You","pinyou"="Pin You Down","sprayyou"="Pepper Spray You","taseyou"="Tase You","shootyou"="Shoot at You","killyou"="Kill You")# Create response level labelsresponse_labels <-c("0"="Very Unafraid","1"="Unafraid","2"="Neither","3"="Afraid","4"="Very Afraid")# Modified function to show reference lines at 25% and 50%plot_outcome <-function(data, var, label) {ggplot(data, aes(y =factor(.data[[var]], levels =0:4,labels = response_labels,ordered =TRUE))) +geom_vline(xintercept =c(102, 204, 306), color ="gray80", linetype ="dashed") +geom_bar(aes(fill =factor(.data[[var]], levels =0:4,labels = response_labels,ordered =TRUE))) +coord_cartesian(expand =FALSE) +scale_fill_manual(values = magma_colors) +labs(y =NULL,title = label ) +theme_minimal() +theme(plot.title =element_text(size =10, hjust =0.5),axis.title.x =element_blank(),legend.position ="none",panel.grid.major.y =element_blank(),plot.margin =margin(t =5, r =5, b =15, l =5) ) +scale_x_continuous(breaks =c(102, 204, 306),labels =c(".1", ".2", ".3") )}# Create all plotsplots <-list()for(i inseq_along(fearpol_new)) { plots[[i]] <-plot_outcome( dat, fearpol_new[i], fear_labels[fearpol_new[i]] )}# Combine plots using patchworkfear_fig <-wrap_plots(plots, ncol =5, nrow =2) +plot_annotation(title ="Fear of Police Item-Specific Distributions",subtitle ="Response distribution from Very Unafraid (0) to Very Afraid (4)",caption ="Note: Higher values indicate greater fear (N = 1,020)" ) &theme(plot.title =element_text(size =12),plot.subtitle =element_text(size =10),plot.caption =element_text(size =10, hjust =0) )fear_fig
At first glance, PGC’s new fear measures appear to conflate fear with perceived risk or probability of experiencing actions, which was something they explicitly intended to avoid (see p.293). For instance, there are more people who are reportedly very unafraid the police will shoot at them or kill them than there are people very unafraid the police will stop them or search them. Maybe this is not really a problem - perhaps the degree of fear one feels about an action intrinsically depends upon one’s perceived probability that they will experience the action (aka, vulnerability model).
Now, let’s illustrate how we can use mosaic plots to visualize bivariate ordinal distributions.
8.2 Mosaic plots
Goal: Introduce the mosaic plot as an alternative to a scatterplot for visualizing bivariate ordinal response distributions
Scatterplots are notoriously bad at generating useful visual descriptions of bivariate relationships between ordinal or categorical variables. In contrast, mosaic plots are designed for this task. Will Ball describes a mosaic plot as “similar to a heatmap, except that the frequency of an observation… is proportional to the area of a tile.”
For those that prefer numbers, we can also summarize these race-specific proportions in a table. However, including all estimates would be a lot of information. For simplicity, let’s focus on summarizing the extreme categories of “very unafraid” (y=0) and “very afraid” (y=4).
8.2.2 APP-B
Show code
# List fear variablesfear_vars <-names(fear_labels)# Calculate proportions for each fear level by racefear_props <- datp %>%pivot_longer(cols =all_of(fear_vars),names_to ="item",values_to ="response" ) %>%# Add item labelsmutate(item_label = fear_labels[item]) %>%# Group by race, item, and response levelgroup_by(race, item, item_label, response) %>%# Count occurrencessummarize(n =n(), .groups ="drop") %>%# Calculate proportions within each race and itemgroup_by(race, item) %>%mutate(prop = n /sum(n, na.rm =TRUE)) %>%ungroup()# Create summary for the extreme values tablefear_summary <- fear_props %>% dplyr::filter(response %in%c("Very Unafraid", "Very Afraid")) %>% dplyr::select(race, item_label, response, prop) %>% tidyr::pivot_wider(names_from =c(race, response),values_from = prop ) %>% dplyr::mutate(diff_unafraid =`Black_Very Unafraid`-`White_Very Unafraid`,diff_afraid =`Black_Very Afraid`-`White_Very Afraid`,item_label =factor(item_label, levels =c("Stop You", "Search You", "Yell at You","Handcuff You", "Kick/Hit You", "Pin You Down", "Pepper Spray You", "Tase You", "Shoot at You", "Kill You")) ) %>% dplyr::arrange(item_label) %>% dplyr::rename(white_unafraid =`White_Very Unafraid`,black_unafraid =`Black_Very Unafraid`,white_afraid =`White_Very Afraid`,black_afraid =`Black_Very Afraid` )# Create gt tablegt_fear <- fear_summary %>%gt() %>%# Column labelscols_label(item_label ="Police Action",white_unafraid ="White",black_unafraid ="Black",diff_unafraid ="Diff",white_afraid ="White",black_afraid ="Black", diff_afraid ="Diff" ) %>%fmt_percent(columns =c(white_unafraid, black_unafraid, diff_unafraid, white_afraid, black_afraid, diff_afraid),decimals =1 ) %>%tab_spanner(label ="Very Unafraid (y=0)",columns =c(white_unafraid, black_unafraid, diff_unafraid) ) %>%tab_spanner(label ="Very Afraid (y=4)",columns =c(white_afraid, black_afraid, diff_afraid) ) %>%tab_header(title ="APPENDIX B. Extreme Fear Responses by Race and Police Action",subtitle ="Percentages reporting highest and lowest levels of fear" ) %>%# Stylingtab_style(style =cell_text(weight ="bold"),locations =cells_column_labels() ) %>%tab_style(style =cell_text(weight ="bold"),locations =cells_column_spanners() ) %>%tab_style(style =cell_text(align ="left"),locations =cells_column_labels(columns = item_label) ) %>%cols_align(align ="left",columns = item_label ) %>%cols_align(align ="center",columns =c(white_unafraid, black_unafraid, diff_unafraid, white_afraid, black_afraid, diff_afraid) )
Show code
gt_fear
APPENDIX B. Extreme Fear Responses by Race and Police Action
Percentages reporting highest and lowest levels of fear
Police Action
Very Unafraid (y=0)
Very Afraid (y=4)
White
Black
Diff
White
Black
Diff
Stop You
28.8%
8.9%
−19.9%
6.3%
25.4%
19.0%
Search You
31.5%
9.9%
−21.7%
6.5%
24.6%
18.1%
Yell at You
29.2%
10.3%
−18.9%
7.9%
24.6%
16.7%
Handcuff You
33.1%
9.7%
−23.4%
7.3%
28.7%
21.3%
Kick/Hit You
41.9%
10.5%
−31.4%
8.7%
32.6%
23.8%
Pin You Down
41.1%
10.5%
−30.6%
9.5%
34.7%
25.2%
Pepper Spray You
41.7%
10.5%
−31.2%
8.9%
32.6%
23.6%
Tase You
41.9%
10.3%
−31.6%
9.5%
34.3%
24.8%
Shoot at You
44.2%
10.5%
−33.8%
10.7%
40.3%
29.6%
Kill You
44.2%
11.4%
−32.8%
10.3%
43.6%
33.3%
These mosaic plots help us clearly see substantial race differences exist in the ordinal response distribution for each fear of police item.
Of course, we want a precise way to describe and estimate these effects, and we want to do so using a modeling technique that would permit adjustment for potential confounders and communicate the degree of uncertainty in our estimates.
So, let’s move onto modeling joint distributions.
9 Intercept-only models
Goal: Compare how accurately intercept-only models recover underlying data distributions across linear and cumulative probit regression techniques.
We start by comparing linear and ordinal Bayesian “Intercept only” models to illustrate the poor fit of linear modeling to ordinal data and the substantial improvements we gain in predictive fit to underlying data distribution by using ordinal (e.g., cumulative probit) modeling techniques instead.
We begin with the Bayesian equivalent of simple means or intercept-only models for each of the ten outcome items. As implied by the name, each intercept-only model unconditionally estimates only the intercepts (e.g., mean values or ordinal thresholds) for each fear item without predictors.
Before we do this, let’s briefly discuss prior distibutions.
9.1 Prior distributions
If we were using classical (frequentist) techniques, these intercept estimates would be computed directly from the data or likelihood. In a Bayesian model, these estimates represent the center of a posterior sample of estimates that each are generated by combining the likelihood (e.g., observed mean or ordinal thresholds in the data) with a prior distribution of plausible values for the parameter in question. In constructing our Bayesian models, we specify highly diffuse or “flat” prior distributions so our posterior estimates are primarily generated by the likelihood and results will converge with estimates generated from comparable frequentist statistical models.
Alternatively, one might specify stronger or “more informative” prior distributions that, for instance, specify a certain range of values as highly implausible or even as impossible; doing so will result in posterior estimates that deviate from likelihood-only estimates in predictable, prior-specified ways. For a simple example, a classical model of self-reported crime counts might sometimes generate negative predicted counts for some subgroups and improbably extreme positive counts for other subgroups; in a Bayesian model, one could specify an a priori distribution of plausible values for which negative counts are impossible and extreme positive counts are improbable.
We will use rstanarm package for linear models, as it specifies diffuse priors that are scaled by default to the outcome’s response distribution. We will use brms package for cumulative probit models and will manually specify flat (equal probability) priors for ordinal models. These priors will allow for a wide range of likelihood-driven estimates and ensures results will be comparable to those that would be generated using similar frequentist techniques.
9.2 Linear: Lin~1 model (poor fit)
I start by estimating a simple linear intercept model of stopyou with no predictors to illustrate the poor fit of a linear model to the ordinal outcome. Here, I use Bayesian estimation with default priors; again, Bayesian models will be beneficial for model building and visualization later.
Show code
# List to store intercept-only linear modelsint_lin_mods <-list()# Run models for each outcome with manual caching to save model fitsfor(outcome in fearpol_new) {# Create file path for cached model model_path <-paste0("Models/int_lin_mod_", outcome, ".rds")# Check if cached model existsif (file.exists(model_path)) { int_lin_mods[[outcome]] <-readRDS(model_path) } else { formula <-as.formula(paste0(outcome, " ~ 1")) int_lin_mods[[outcome]] <-stan_glm(formula = formula,data = dat,family = gaussian,# rstanarm defaults:# prior_intercept = normal(location = y_mean, scale = 2.5 * y_sd)# prior_aux = exponential(rate = 1/y_sd)chains =4,cores = nCoresphys,seed =1138 )# Save modelsaveRDS(int_lin_mods[[outcome]], model_path) }}
We ran intercept-only linear models for all ten outcomes. Now, let’s examine detailed output for the first model.
Model Info:
function: stan_glm
family: gaussian [identity]
formula: stopyou ~ 1
algorithm: sampling
sample: 4000 (posterior sample size)
priors: see help('prior_summary')
observations: 1020
predictors: 1
Estimates:
mean sd 2.5% 97.5%
(Intercept) 1.99 0.04 1.91 2.07
sigma 1.33 0.03 1.27 1.39
Fit Diagnostics:
mean sd 2.5% 97.5%
mean_PPD 1.99 0.06 1.87 2.11
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.00 1.00 2355
sigma 0.00 1.00 2732
mean_PPD 0.00 1.00 2932
log-posterior 0.02 1.00 1630
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
Priors for model 'int_lin_mods[["stopyou"]]'
------
Intercept (after predictors centered)
Specified prior:
~ normal(location = 2, scale = 2.5)
Adjusted prior:
~ normal(location = 2, scale = 3.3)
Auxiliary (sigma)
Specified prior:
~ exponential(rate = 1)
Adjusted prior:
~ exponential(rate = 0.75)
------
See help('prior_summary.stanreg') for more details
The linear (metric normal) model intercept estimate accurately recovers the mean of stopyou (=1.99), which is what it is designed to do. However, the posterior predictive check plots (e.g., density; histogram; box) show that the underlying normal distribution assumption results in model predictions that fail to recover the observed ordinal item values. Let’s extract the density plots for all ten items and plot them together.
9.2.1 All PPCheck density plots
Show code
# Create list to store individual density plotsint_lin_pp_plots <-list()# Generate pp_check plot for each outcomefor(outcome in fearpol_new) { int_lin_pp_plots[[outcome]] <-pp_check(int_lin_mods[[outcome]]) +ggtitle(gsub("you", " You", tools::toTitleCase(outcome))) +theme(plot.title =element_text(size =8, hjust =0.5))}# Combine all plots with explicit 2x5 layoutint_lin_pp_grid <-wrap_plots(int_lin_pp_plots, ncol =2, nrow =5) +plot_annotation(title ="Posterior Predictive Checks for Linear Models",subtitle ="Dark line = observed data, Light lines = posterior predictions",theme =theme(plot.title =element_text(size =10),plot.subtitle =element_text(size =8) ) )int_lin_pp_grid
Let’s replicate this proces for cumulative probit intercept-only models.
9.3 Cumulative probit: Ord~1 (good fit)
Show code
# Create ordered factor versions with suffix _orddat <- dat %>%mutate(across(all_of(fearpol_new), ~factor(., levels =0:4, ordered =TRUE),.names ="{.col}_ord"))# Create vector of ordinal variable namesfearpol_ord <-paste0(fearpol_new, "_ord")# Now dat contains both original numeric variables (stopyou, searchyou, etc.)# and ordered factor versions (stopyou_ord, searchyou_ord, etc.)# List to store intercept-only cumulative probit modelsint_ord_mods <-list()# Prior for ordinal models# For equal probability (0.2) across 5 categories, thresholds should be:# qnorm(c(0.2, 0.4, 0.6, 0.8)) ≈ c(-0.84, -0.25, 0.25, 0.84)ord_prior <-prior(normal(-0.84, 1), class ="Intercept", coef =1) +prior(normal(-0.25, 1), class ="Intercept", coef =2) +prior(normal(0.25, 1), class ="Intercept", coef =3) +prior(normal(0.84, 1), class ="Intercept", coef =4)# Run models for each outcomefor(outcome in fearpol_ord) { formula <-bf(paste0(outcome, " ~ 1")) int_ord_mods[[outcome]] <-brm(formula = formula,data = dat,family =cumulative("probit"),prior = ord_prior,iter =2000, warmup =1000, chains =4,cores = nCoresphys, seed =1138,file =paste0("Models/int_ord_mod_", outcome),file_refit ="on_change" )}
Again, after running threshold-only models for all ten outcomes, let’s start by examining detailed output for the first model.
Family: cumulative
Links: mu = probit; disc = identity
Formula: stopyou_ord ~ 1
Data: dat (Number of observations: 1020)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1] -0.89 0.05 -0.98 -0.80 1.00 3346 2903
Intercept[2] -0.38 0.04 -0.46 -0.30 1.00 4253 3452
Intercept[3] 0.33 0.04 0.25 0.41 1.00 4912 3396
Intercept[4] 1.00 0.05 0.90 1.09 1.00 4856 3722
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc 1.00 0.00 1.00 1.00 NA NA NA
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
prior class coef group resp dpar nlpar lb ub source
student_t(3, 0, 2.5) Intercept default
normal(-0.84, 1) Intercept 1 user
normal(-0.25, 1) Intercept 2 user
normal(0.25, 1) Intercept 3 user
normal(0.84, 1) Intercept 4 user
Notice how the posterior predictions generated by the cumulative probit model for the first outcome are a much better fit to the ordinal response data.
9.3.1 All PPCheck density plots
Show code
# Create list to store pp_check plotsint_ord_pp_plots <-list()# Generate pp_check plot for each outcomefor(outcome in fearpol_ord) { int_ord_pp_plots[[outcome]] <-pp_check(int_ord_mods[[outcome]]) +ggtitle(gsub("you", " You", tools::toTitleCase(outcome))) +theme(plot.title =element_text(size =8, hjust =0.5))}# Combine all plots in a 5x2 layoutint_ord_pp_grid <-wrap_plots(int_ord_pp_plots, ncol =2, nrow =5) +plot_annotation(title ="Posterior Predictive Checks for Cumulative Probit Models",subtitle ="Dark line = observed data, Light lines = posterior predictions",theme =theme(plot.title =element_text(size =10),plot.subtitle =element_text(size =8) ) )int_ord_pp_grid
9.4 Comparing int-only lin & ord mods
Now, let’s pull ppcheck plots together. We will go with the density plots for linear models as it effectively displays the underlying normal curve assumptions underlying these models. However, I think the bar charts are more effective at showing model recovery of data for ordinal plots.
Show code
# Function to generate ppcheck plots for(outcome in fearpol_new) { ord_outcome <-paste0(outcome, "_ord")# Linear model density plot int_lin_pp_plots[[outcome]] <-pp_check(int_lin_mods[[outcome]]) +ggtitle(fear_labels[outcome]) +coord_cartesian(xlim =c(-3, 7)) +scale_x_continuous(breaks =seq(0, 4, 1)) +scale_color_manual(values =c("y"="#FD9B6BFF", "yrep"="#782281FF"),labels =c("y"="Observed", "yrep"="Replicated") ) +theme(text =element_text(family ="serif"),plot.title =element_text(size =14, hjust =0.5),axis.text.x =if(outcome %in%c("shootyou", "killyou")) element_text(size=14) elseelement_blank(),axis.text.y =element_blank(),legend.position ="bottom",legend.title =element_text(size =14), legend.text =element_text(size =14) ) +guides(color =guide_legend(title ="Linear:"))# Ordinal model bar plot int_ord_pp_plots[[outcome]] <-pp_check( int_ord_mods[[ord_outcome]], type ="bars", ndraws =1000, linewidth =1.5, size =3/4) +scale_x_continuous(breaks =1:5, labels =0:4) +scale_fill_manual(values =c("y"="#FD9B6BFF", "yrep"="#782281FF"),labels =c("y"="Observed", "yrep"="Replicated") ) +scale_colour_manual(values =c("y"="#FD9B6BFF", "yrep"="#782281FF"), labels =c("y"="Observed", "yrep"="Replicated") ) +theme(text =element_text(family ="serif"),plot.title =element_blank(),axis.text.x =if(outcome %in%c("shootyou", "killyou")) element_text(size=14) elseelement_blank(),axis.text.y =element_blank(),axis.title.y =element_blank(),legend.position ="bottom",legend.title =element_text(size =14), legend.text =element_text(size =14) ) +guides(fill =guide_legend(title ="Ordinal:"),color =guide_legend(title =element_blank()))}# Define layout (last extra row for model labels)layout <-"ABCDEFGHIJKLMNOPQRSTUVWX"# Added row for model labels# Add text annotations for model typesmodel_labels <-ggplot() +annotate("text", x =0.5, y =0.5, label ="Linear Model", size =5,family ="serif") +theme_void()model_labels2 <-ggplot() +annotate("text", x =0.5, y =0.5, label ="Ordinal Model", size =5,family ="serif") +theme_void()# Combine all plots in sequencepp_comparison <- (int_lin_pp_plots[["stopyou"]] +# A int_ord_pp_plots[["stopyou"]] +# B int_lin_pp_plots[["searchyou"]] +# C int_ord_pp_plots[["searchyou"]] +# D int_lin_pp_plots[["yellyou"]] +# E int_ord_pp_plots[["yellyou"]] +# F int_lin_pp_plots[["handcyou"]] +# G int_ord_pp_plots[["handcyou"]] +# H int_lin_pp_plots[["kickyou"]] +# I int_ord_pp_plots[["kickyou"]] +# J int_lin_pp_plots[["pinyou"]] +# K int_ord_pp_plots[["pinyou"]] +# L int_lin_pp_plots[["sprayyou"]] +# M int_ord_pp_plots[["sprayyou"]] +# N int_lin_pp_plots[["taseyou"]] +# O int_ord_pp_plots[["taseyou"]] +# P int_lin_pp_plots[["shootyou"]] +# Q int_ord_pp_plots[["shootyou"]] +# R int_lin_pp_plots[["killyou"]] +# S int_ord_pp_plots[["killyou"]] +# T model_labels +# U model_labels2 +# V model_labels +# W model_labels2) +# Xplot_layout(design = layout, guides ='collect', heights =c(1, 1, 1, 1, 1, 0.3)) +plot_annotation(title ="FIGURE 3: Posterior Predictive Checks for Linear vs. Ordinal Models",subtitle ="Pairs show linear model density checks (left) and ordinal model bar checks (right)",theme =theme(text =element_text(family ="serif"),plot.title =element_text(size =12),plot.subtitle =element_text(size =11),legend.position ="bottom" ) )
9.4.1 FIG3
Show code
pp_comparison
As Figure 3 shows, the threshold-only cumulative probit model predictions are a much better fit to the data than are the comparable intercept-only linear models. This is unsurprising, as additional parameters are estimated in CP model to summarize each item’s ordinal distribution.
We can quantify and compare the relative fit of the intercept-only linear regression and cumulative probit models for each fear of police item using Pareto-smoothed importance sampling leave-one-out cross-validation (PSIS-LOO-CV) and calculating expected log predictive density (ELPD) for each model. The ELPD provides a measure of predictive accuracy, with higher values indicating better predictive performance and, by comparing its predictive performance to unseen data, it penalizes models for extra parameters.
Now, Let’s use PSIS-LOO-CV to compare model fits with ELPD.
9.4.2 ELPD comparisons
Show code
# Check if cached comparisons existif (file.exists("Models/fit_comparisons.rds")) { fit_comparisons <-readRDS("Models/fit_comparisons.rds")} else {# Function to compare linear and ordinal models for one outcome compare_fits <-function(lin_mod, ord_mod, outcome_name) {# Get LOO for each model loo_lin <-loo(lin_mod) loo_ord <-loo(ord_mod)# Get pointwise log-likelihood values ll_lin <- loo_lin$pointwise[,"elpd_loo"] ll_ord <- loo_ord$pointwise[,"elpd_loo"]# Compare total log-likelihooddata.frame(outcome = outcome_name,elpd_lin =sum(ll_lin),elpd_ord =sum(ll_ord),diff =sum(ll_ord) -sum(ll_lin),se =sqrt(length(ll_lin) *var(ll_ord - ll_lin)) ) }# Create empty dataframe to store all comparisons fit_comparisons <-data.frame()# Loop through all outcomes and compare modelsfor(outcome in fearpol_new) {# Get linear model comparison for ordinal outcome ord_outcome <-paste0(outcome, "_ord") comp <-compare_fits(lin_mod = int_lin_mods[[outcome]],ord_mod = int_ord_mods[[ord_outcome]],outcome_name = outcome ) fit_comparisons <-rbind(fit_comparisons, comp) } fit_comparisons$outcome <- fear_labels[fit_comparisons$outcome]# Save the comparisonssaveRDS(fit_comparisons, "Models/fit_comparisons.rds")}
As noted above, the ELPD provides a measure of predictive accuracy, with higher values indicating better predictive performance. The ‘diff’ column shows the difference in ELPD between the ordinal and linear models (ordinal minus linear), with positive values favoring the ordinal model. The standard error (se) quantifies our uncertainty in these differences.
These tables confirm what we observed in the posterior predictive checks (Figure 2), where ordinal models better captured the discrete nature of the response categories compared to the continuous approximation used in linear models. The ordinal models showed consistently better predictive performance across all items, with ELPD differences ranging from about 115 to 319 units. The improvements were particularly substantial for more severe items like “Shoot at You” (diff ≈ 289) and “Kill You” (diff ≈ 317), which exhibited more extreme bimodal distributions. Generally, the performance of modeling ordinal variables using continuous Gaussian approximation will be better for variables with mean-centered distributions and several response categories that exhibit symmetric decay about the mean.
For statistical testing with LOO-CV comparisons, one might consider differences meaningful when they exceed twice their standard error (similar to a z-test). In this case, all differences are well over 10 standard errors from zero (around 12-20), providing very strong evidence for the superior fit of the ordinal models.
Now let’s add race as a predictor in our models.
10 Item-specific race differences in fear
Goal: Compare linear and cumulative probit regression results of item-specific analyses estimating race differences in fear of police on latent item scales (regression betas)
10.1 Linear & CP models
Show code
# List to store race linear modelsrace_lin_mods <-list()# Run linear models for each outcomefor(outcome in fearpol_new) {# Create file path for cached model model_path <-paste0("Models/race_lin_mod_", outcome, ".rds")# Check if cached model existsif (file.exists(model_path)) { race_lin_mods[[outcome]] <-readRDS(model_path) } else {# Linear model with race race_lin_mods[[outcome]] <-stan_glm(formula =as.formula(paste0(outcome, " ~ race")),data = dat,family = gaussian,chains =4,cores = nCoresphys,seed =1138 )# Save modelsaveRDS(race_lin_mods[[outcome]], model_path) }}# List to store ordinal race modelsrace_ord_mods <-list()# Prior for ordinal race modelsord_prior <-prior(normal(-0.84, 1), class ="Intercept", coef =1) +prior(normal(-0.25, 1), class ="Intercept", coef =2) +prior(normal(0.25, 1), class ="Intercept", coef =3) +prior(normal(0.84, 1), class ="Intercept", coef =4) +prior(normal(0, 1), class ="b") # Prior for race effect# Run ordinal models for each outcomefor(outcome in fearpol_new) { ord_outcome <-paste0(outcome, "_ord")# Fit and store model with proper naming race_ord_mods[[ord_outcome]] <-brm(formula =bf(paste0(ord_outcome, " ~ race")),data = dat,family =cumulative("probit"),prior = ord_prior,iter =2000, warmup =1000,chains =4,cores = nCoresphys,seed =1138,file =paste0("Models/race_ord_mod_", outcome),file_refit ="on_change" )}
As before, let’s check detailed output from the linear and ordinal models of the first outcome item.
Model Info:
function: stan_glm
family: gaussian [identity]
formula: stopyou ~ race
algorithm: sampling
sample: 4000 (posterior sample size)
priors: see help('prior_summary')
observations: 1020
predictors: 2
Estimates:
mean sd 2.5% 97.5%
(Intercept) 1.44 0.05 1.33 1.55
raceBlack 1.09 0.08 0.94 1.24
sigma 1.21 0.03 1.16 1.27
Fit Diagnostics:
mean sd 2.5% 97.5%
mean_PPD 1.99 0.05 1.88 2.09
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.00 1.00 3823
raceBlack 0.00 1.00 3809
sigma 0.00 1.00 4261
mean_PPD 0.00 1.00 3956
log-posterior 0.03 1.00 1717
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
prior class coef group resp dpar nlpar lb ub
normal(0, 1) b
normal(0, 1) b raceBlack
student_t(3, 0, 2.5) Intercept
normal(-0.84, 1) Intercept 1
normal(-0.25, 1) Intercept 2
normal(0.25, 1) Intercept 3
normal(0.84, 1) Intercept 4
source
user
(vectorized)
default
user
user
user
user
Again, the linear model recovers conditional group means, which is what it is designed to do. However, it is an otherwise poor fit to the data. For instance, the model-implied medians are very far from the observed medians for both white and black participants.
Note the cumulative probit still recovers the data far better than the linear model, though its simplifying assumptions result in some mismatch between the predictions and data for some of the thresholds in the grouped bar chart. If we had sufficient data and were worried about the slippage in fit (and more worried about it than overfitting with a more complex model), we could estimate category-specific ordinal models instead. We will illustrate how to do that for the supplement. For now though, let’s see if we can plot the effects of race (black) on latent y scale for each outcome by model.
10.1.1 FIG4
Plot latent y scale contrasts, all outcomes
Show code
# Check coefficient names in first models# names(as_draws_df(race_lin_mods[["stopyou"]]))# names(as_draws_df(race_ord_mods[["stopyou_ord"]]))# Create ordered factor for outcomesoutcome_order <- fear_labels[fearpol_new]# Extract posterior samples for race effects from both model typesall_effects <-map2_dfr(fearpol_new, fear_labels, ~{# Get linear model effects lin_draws <-as_draws_df(race_lin_mods[[.x]]) %>%transmute(effect = raceBlack) %>%median_qi(.width = .95) %>%mutate(model ="Linear",outcome = .y )# Get ordinal model effects ord_draws <-as_draws_df(race_ord_mods[[paste0(.x, "_ord")]]) %>%transmute(effect = b_raceBlack) %>%median_qi(.width = .95) %>%mutate(model ="Ordinal",outcome = .y )# Combinebind_rows(lin_draws, ord_draws)})# Create plotggplot(all_effects, aes(x = effect, y = outcome, color = model)) +geom_linerange(aes(xmin = .lower, xmax = .upper), linewidth =1.5,position =position_dodge(width =0.4)) +geom_point(shape =124, size =5,position =position_dodge(width =0.4)) +geom_vline(xintercept =0, linetype ="dashed") +scale_color_manual(values =c("Linear"="#FD9B6BFF", "Ordinal"="#782281FF")) +scale_y_discrete(limits =rev(outcome_order)) +scale_x_continuous(lim =c(-.05, 2.05), breaks =seq(0, 2, .25)) +labs(x ="Effect of race (Black) on latent scale (beta est + 95% CrI)",y ="", color ="") +theme_minimal() +theme(text =element_text(family ="serif"),legend.position ="bottom",panel.grid.major.y =element_blank(),panel.grid.minor =element_blank(),axis.line =element_line(color ="black"),axis.ticks =element_line(color ="black"), axis.text =element_text(size =14),axis.title.x =element_text(size =14),legend.text =element_text(size =14) )
At first glance, this plot seems to suggest the linear model overestimates the effects of race on the latent y scale for all 10 items. But by how much? Or, more importantly, since the linear and cumulative probit models assume very different scales for each outcome, what do these differences really mean?
Recall that the linear model is assuming an unbounded metric continuous outcome, so smaller effects could be relative to a much larger set of model-implied outcome values (i.e., with “plausible” predictions extending below y=0 and above y=4). Meanwhile, the cumulative probit model accurately assumes a discrete distribution with y-values bounded between 0 and 4, so the larger CP estimates on the latent y scale are interpreted relative to a more narrowly restricted range of y values. In short, though the CP beta coefficient is intended to be somewhat comparable to the beta from a linear model, where a one-unit increase in X (or Black vs. White contrast) is associated with a beta-unit increase in y, it is hard to precisely interpret differences in results across these models. As a result, it is understandable that people might observe estimates from different modeling specifications like these that are in the same direction, implying similar statistical inferential conclusions (e.g., all 95% CrIs exclude zero), and apparently somewhat similar magnitudes and then interpret both modeling techniques as generating “similar substantive conclusions.” But do they generate similar substantive conclusions? Perhaps, or perhaps not - it depends upon how precise we want or need our scientific conclusions to be. Let’s see if we can do better.
With ordinal models, we can calculate and plot predicted probability contrasts (black - white) for specific response categories and for each outcome. For now, we will simply focus on the predicted probability of reporting being “afraid” or “very afraid” (y>3). We will return to generating predictions for all thresholds later. However, this seems like it maps onto the implied estimand of the exemplar study, which aimed to estimate group-based differences in “fear” of police.
Interpretations of linear regression results often stop at effects on latent scales - e.g., interpreting beta coefficients as differences in conditional means of latent (though often fundamentally non-metric) distributions. Yet, since our Bayesian linear models generate a full posterior distribution, we can also use knowledge of the model’s assumptions (e.g., about standard normal distributions) and our model-based posterior predictions to similarly estimate the conditional predicted probabilities of responding “Afraid” or “Very afraid” for each group (i.e., y>=3) and uncertainty around those estimates.
10.2 Extracting useful signals
Goal: Extract race-specific predicted probabilities and calculate black-white contrasts and illustrate visualizations for conveying these estimates.
Let’s start by extracting predicted probabilities and contrasts (black - white) of “afraid” or “very afraid” (y>=3) from the ordinal model for the first outcome, then iterate across ordinal models for all outcomes.
Show code
# Function to get probabilities and contrasts for one outcomeget_prob_contrasts <-function(model, outcome_name) {# Get predictions for both race groups newdata <-data.frame(race =factor(c("White", "Black")))# Get posterior predictions preds <-posterior_epred(model, newdata = newdata)# Calculate P(y >= 3) for each draw p_high <-1- (preds[,,1] + preds[,,2] + preds[,,3])# Get probabilities for each group probs <-data.frame(race =c("White", "Black"),prob =c(mean(p_high[,1]), mean(p_high[,2])),lower =c(quantile(p_high[,1], 0.025), quantile(p_high[,2], 0.025)),upper =c(quantile(p_high[,1], 0.975), quantile(p_high[,2], 0.975)) )# Calculate contrasts (Black - White) contrasts <- p_high[,2] - p_high[,1]# Summarize contrasts contrast_summary <-data.frame(outcome = outcome_name,estimate =median(contrasts),lower =quantile(contrasts, 0.025),upper =quantile(contrasts, 0.975) )# Return bothlist(probabilities = probs,contrasts = contrast_summary )}# Try it with one modelresults <-get_prob_contrasts( race_ord_mods[["stopyou_ord"]], "Stop You")print("Predicted Probabilities of Afraid/Very Afraid:")
[1] "Predicted Probabilities of Afraid/Very Afraid:"
Show code
print(results$probabilities)
race prob lower upper
1 White 0.20 0.17 0.23
2 Black 0.53 0.49 0.57
Show code
print("\nBlack-White Contrast:")
[1] "\nBlack-White Contrast:"
Show code
print(results$contrasts)
outcome estimate lower upper
2.5% Stop You 0.33 0.28 0.37
Looks good. Let’s iterate across all ten ordinal models.
Show code
# Calculate probabilities and contrasts for all outcomesall_results <-map2_dfr( fearpol_new, # outcomes fear_labels, # labels~get_prob_contrasts( race_ord_mods[[paste0(.x, "_ord")]], .y )$contrasts # Just get contrasts for now)# Create plotggplot(all_results, aes(y =fct_rev(fct_reorder(outcome, estimate)), x = estimate, xmin = lower, xmax = upper)) +geom_pointrange(color ="#782281FF",size =0.5) +geom_vline(xintercept =0, linetype ="dashed") +labs(x ="Black-White Contrast in P(Afraid or Very Afraid)",y ="") +theme_minimal() +theme(panel.grid.major.y =element_blank(),panel.grid.minor =element_blank(),axis.line =element_line(color ="black"),axis.ticks =element_line(color ="black") )
Show code
# Also save probabilities tableall_probs <-map2_dfr( fearpol_new, fear_labels,~get_prob_contrasts( race_ord_mods[[paste0(.x, "_ord")]], .y )$probabilities %>%mutate(outcome = .y))all_probs %>%gt()
race
prob
lower
upper
outcome
White
0.20
0.17
0.23
Stop You
Black
0.53
0.49
0.57
Stop You
White
0.17
0.15
0.20
Search You
Black
0.46
0.42
0.50
Search You
White
0.22
0.19
0.25
Yell at You
Black
0.48
0.44
0.52
Yell at You
White
0.19
0.16
0.23
Handcuff You
Black
0.52
0.48
0.56
Handcuff You
White
0.18
0.15
0.21
Kick/Hit You
Black
0.54
0.50
0.58
Kick/Hit You
White
0.18
0.15
0.22
Pin You Down
Black
0.56
0.52
0.60
Pin You Down
White
0.19
0.16
0.22
Pepper Spray You
Black
0.55
0.51
0.59
Pepper Spray You
White
0.19
0.16
0.23
Tase You
Black
0.57
0.53
0.61
Tase You
White
0.18
0.15
0.21
Shoot at You
Black
0.58
0.54
0.62
Shoot at You
White
0.17
0.14
0.20
Kill You
Black
0.58
0.54
0.62
Kill You
Now let’s see if we can use our posterior predictions to estimate comparable quantities for the linear models. As before, we will start with one model and then iterate acros all outcomes. First, though, we need a function to help us calculate the predicted probability of a y>=3 response given the linear model-implied predictions. Let’s write out the logic of what we need.
For a linear model with normal errors, to get P(y ≥ 3) for each group:
For Whites:
Mean = intercept SD = sigma
I want P(y ≥ 3), which should translate to: 1 - Φ((3 - intercept)/sigma)
For Blacks:
Mean = intercept + beta(race) SD = sigma Want P(y ≥ 3) This is 1 - Φ((3 - [intercept + beta])/sigma)
Where Φ is the standard normal CDF.
So, for each posterior draw we want to:
Calculate the mean for each group using intercept and race effect Use the sigma from that draw Calculate P(y ≥ 3) using the normal CDF Then get contrasts between groups
Show code
# Function to get probabilities and contrasts for one linear modelget_prob_contrasts_lin <-function(model, outcome_name) {# Get posterior draws draws <-as_draws_df(model)# Calculate P(y >= 3) for each draw p_white <-1-pnorm(3, mean = draws$`(Intercept)`, sd = draws$sigma) p_black <-1-pnorm(3, mean = draws$`(Intercept)`+ draws$raceBlack, sd = draws$sigma)# Summarize probabilities probs <-data.frame(race =c("White", "Black"),prob =c(mean(p_white), mean(p_black)),lower =c(quantile(p_white, 0.025), quantile(p_black, 0.025)),upper =c(quantile(p_white, 0.975), quantile(p_black, 0.975)) )# Calculate contrasts from each draw contrasts <- p_black - p_white contrast_summary <-data.frame(outcome = outcome_name,estimate =mean(contrasts),lower =quantile(contrasts, 0.025),upper =quantile(contrasts, 0.975) )list(probabilities = probs, contrasts = contrast_summary)}
Let’s try our function with one model.
Show code
# Try with stopyouresults_lin <-get_prob_contrasts_lin( race_lin_mods[["stopyou"]], "Stop You")print("Linear Model Results for 'Stop You':")
[1] "Linear Model Results for 'Stop You':"
Show code
print("\nPredicted Probabilities of Afraid/Very Afraid by Race:")
[1] "\nPredicted Probabilities of Afraid/Very Afraid by Race:"
Show code
print(results_lin$probabilities)
race prob lower upper
1 White 0.099 0.082 0.12
2 Black 0.349 0.317 0.38
Show code
print("\nBlack-White Contrast:")
[1] "\nBlack-White Contrast:"
Show code
print(results_lin$contrasts)
outcome estimate lower upper
2.5% Stop You 0.25 0.21 0.29
Show code
# Let's also print mean estimates from model for referenceprint("\nModel Coefficients:")
[1] "\nModel Coefficients:"
Show code
print(summary(race_lin_mods[["stopyou"]]))
Model Info:
function: stan_glm
family: gaussian [identity]
formula: stopyou ~ race
algorithm: sampling
sample: 4000 (posterior sample size)
priors: see help('prior_summary')
observations: 1020
predictors: 2
Estimates:
mean sd 10% 50% 90%
(Intercept) 1.4 0.1 1.4 1.4 1.5
raceBlack 1.1 0.1 1.0 1.1 1.2
sigma 1.2 0.0 1.2 1.2 1.2
Fit Diagnostics:
mean sd 10% 50% 90%
mean_PPD 2.0 0.1 1.9 2.0 2.1
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.0 1.0 3823
raceBlack 0.0 1.0 3809
sigma 0.0 1.0 4261
mean_PPD 0.0 1.0 3956
log-posterior 0.0 1.0 1717
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
Our function appears to have worked. Here is a breakdown from the first model (stopyou).
For Whites:
Mean = 1.4 SD = 1.2 We want P(y ≥ 3) To standardize: (3 - 1.4)/1.2 = 1.33 standard deviations P(Z ≥ 1.33) = 1 - Φ(1.33) ≈ 0.092 This matches our calculated probability of about 0.10
For Blacks:
Mean = 2.5 SD = 1.2 We want P(y ≥ 3) To standardize: (3 - 2.5)/1.2 = 0.417 standard deviations P(Z ≥ 0.417) = 1 - Φ(0.417) ≈ 0.338 This matches our calculated probability of about 0.35
We can verify these in R using model estimates.
Show code
# For Whites1-pnorm((3-1.4)/1.2)
[1] 0.091
Show code
# For Blacks 1-pnorm((3-2.5)/1.2)
[1] 0.34
The contrast of 0.25 (95% CI: 0.21, 0.29) represents the difference in these probabilities.
Now, let’s iterate across the remaining outcomes.
Show code
# Get all linear model resultsall_probs_lin <-map2_dfr( fearpol_new, fear_labels,~get_prob_contrasts_lin( race_lin_mods[[.x]], .y )$probabilities %>%mutate(outcome = .y))all_contrasts_lin <-map2_dfr( fearpol_new, fear_labels,~get_prob_contrasts_lin( race_lin_mods[[.x]], .y )$contrasts)# Get all ordinal model resultsall_probs_ord <-map2_dfr( fearpol_new, fear_labels,~get_prob_contrasts( race_ord_mods[[paste0(.x, "_ord")]], .y )$probabilities %>%mutate(outcome = .y))all_contrasts_ord <-map2_dfr( fearpol_new, fear_labels,~get_prob_contrasts( race_ord_mods[[paste0(.x, "_ord")]], .y )$contrasts)# Add model type to contrasts for plotting laterall_contrasts_lin$model <-"Linear"all_contrasts_ord$model <-"Ordinal"
# Linear Model Probabilitiesall_probs_lin %>%gt() %>%fmt_number(columns =c(prob, lower, upper), decimals =3) %>%tab_header(title ="TABLE 1. Linear Model Predicted Probabilities",subtitle ="Probability of Afraid or Very Afraid Response by Race" )
TABLE 1. Linear Model Predicted Probabilities
Probability of Afraid or Very Afraid Response by Race
race
prob
lower
upper
outcome
White
0.099
0.082
0.118
Stop You
Black
0.349
0.317
0.382
Stop You
White
0.091
0.075
0.110
Search You
Black
0.307
0.277
0.339
Search You
White
0.118
0.099
0.138
Yell at You
Black
0.313
0.283
0.345
Yell at You
White
0.096
0.079
0.114
Handcuff You
Black
0.344
0.312
0.377
Handcuff You
White
0.086
0.070
0.103
Kick/Hit You
Black
0.368
0.336
0.400
Kick/Hit You
White
0.091
0.075
0.109
Pin You Down
Black
0.385
0.351
0.420
Pin You Down
White
0.094
0.077
0.113
Pepper Spray You
Black
0.368
0.335
0.401
Pepper Spray You
White
0.094
0.077
0.113
Tase You
Black
0.383
0.350
0.416
Tase You
White
0.088
0.072
0.105
Shoot at You
Black
0.412
0.378
0.447
Shoot at You
White
0.088
0.072
0.105
Kill You
Black
0.419
0.388
0.453
Kill You
Show code
# Ordinal Model Probabilitiesall_probs_ord %>%gt() %>%fmt_number(columns =c(prob, lower, upper), decimals =3) %>%tab_header(title ="TABLE 2. Ordinal Model Predicted Probabilities",subtitle ="Probability of Afraid or Very Afraid Response by Race" )
TABLE 2. Ordinal Model Predicted Probabilities
Probability of Afraid or Very Afraid Response by Race
race
prob
lower
upper
outcome
White
0.203
0.173
0.235
Stop You
Black
0.533
0.493
0.571
Stop You
White
0.174
0.147
0.204
Search You
Black
0.464
0.425
0.504
Search You
White
0.222
0.190
0.254
Yell at You
Black
0.482
0.441
0.522
Yell at You
White
0.193
0.165
0.225
Handcuff You
Black
0.522
0.483
0.562
Handcuff You
White
0.176
0.148
0.206
Kick/Hit You
Black
0.545
0.504
0.584
Kick/Hit You
White
0.185
0.155
0.216
Pin You Down
Black
0.559
0.520
0.600
Pin You Down
White
0.191
0.162
0.223
Pepper Spray You
Black
0.551
0.510
0.591
Pepper Spray You
White
0.192
0.163
0.225
Tase You
Black
0.565
0.525
0.605
Tase You
White
0.175
0.147
0.206
Shoot at You
Black
0.581
0.540
0.620
Shoot at You
White
0.172
0.145
0.202
Kill You
Black
0.583
0.544
0.623
Kill You
Show code
# Linear Model Contrastsall_contrasts_lin %>%gt() %>%fmt_number(columns =c(estimate, lower, upper), decimals =3) %>%tab_header(title ="TABLE 3. Linear Model Black-White Contrasts",subtitle ="Difference in Probability of Afraid or Very Afraid Response" )
TABLE 3. Linear Model Black-White Contrasts
Difference in Probability of Afraid or Very Afraid Response
outcome
estimate
lower
upper
model
Stop You
0.250
0.214
0.286
Linear
Search You
0.215
0.183
0.250
Linear
Yell at You
0.195
0.161
0.230
Linear
Handcuff You
0.249
0.212
0.284
Linear
Kick/Hit You
0.283
0.249
0.317
Linear
Pin You Down
0.293
0.256
0.329
Linear
Pepper Spray You
0.274
0.238
0.310
Linear
Tase You
0.289
0.254
0.324
Linear
Shoot at You
0.324
0.287
0.362
Linear
Kill You
0.331
0.296
0.367
Linear
Show code
# Ordinal Model Contrastsall_contrasts_ord %>%gt() %>%fmt_number(columns =c(estimate, lower, upper), decimals =3) %>%tab_header(title ="TABLE 4. Ordinal Model Black-White Contrasts",subtitle ="Difference in Probability of Afraid or Very Afraid Response" )
TABLE 4. Ordinal Model Black-White Contrasts
Difference in Probability of Afraid or Very Afraid Response
outcome
estimate
lower
upper
model
Stop You
0.331
0.284
0.375
Ordinal
Search You
0.290
0.244
0.335
Ordinal
Yell at You
0.260
0.214
0.306
Ordinal
Handcuff You
0.328
0.283
0.373
Ordinal
Kick/Hit You
0.370
0.323
0.414
Ordinal
Pin You Down
0.375
0.329
0.421
Ordinal
Pepper Spray You
0.359
0.314
0.404
Ordinal
Tase You
0.373
0.327
0.418
Ordinal
Shoot at You
0.406
0.360
0.451
Ordinal
Kill You
0.410
0.366
0.456
Ordinal
Now, let’s generate a plot like before, but this time we are plotting predicted probabilities of y>=3 from linear and ordinal models instead of beta-based effects on latent outcome scale.
10.2.1 Plot Black-White contrasts by outcome & model
10.2.2 Plot group-specific predicted probabilities by outcome & model
Note contrasts are larger in the ordinal model. This plot appears to show the exact opposite pattern seemingly implied by the plotted effects on the latent scale above. I will try to explain this later. First, let’s plot the group-specific predicted probabilities themselves rather than the contrasts so we can see how these contrasts are estimated.
race_pp_plot + race_contrast_plot +plot_annotation(title ="FIGURE 3: Race-Specific Predicted Probabilities of \"Afraid\" or \"Very Afraid\" Responses and Black-White Probability Contrasts, by Linear or Ordinal Models",theme =theme(plot.title =element_text(size =12) ))
Before we move on, let’s make a version that includes the observed group-specific proportions for y>=3 and contrasts. We will also use different point shapes and fills to make it easier to tell which groups are represented by each interval estimate and clean up other aesthetics.
10.2.3 FIG5
Show code
# First calculate observed proportions for each outcome by raceobs_props <- dat %>%group_by(race) %>%summarize(across(all_of(fearpol_new), ~mean(. >=3, na.rm =TRUE) # P(y >= 3) ) ) %>%pivot_longer(cols =all_of(fearpol_new),names_to ="var",values_to ="prop" ) %>%mutate(outcome =factor(fear_labels[var], levels =rev(outcome_order)) )# Calculate observed contrasts obs_contrasts <- obs_props %>%pivot_wider(names_from = race,values_from = prop ) %>%mutate(contrast = Black - White,model ="Observed"# Add model label for legend matching )# Define colors with desired legend ordermodel_colors <-c("Linear.White"="#FD9B6BFF", "Linear.Black"="#FD9B6BFF", "Ordinal.White"="#782281FF","Ordinal.Black"="#782281FF","Observed.White"="black", "Observed.Black"="black")# Create ordered factor levels for guideguide_order <-c("Linear.White", "Linear.Black", "Ordinal.White", "Ordinal.Black", "Observed.White", "Observed.Black")# Combine probabilities from both models with new orderall_probs <-bind_rows( all_probs_lin %>%mutate(model ="Linear"), all_probs_ord %>%mutate(model ="Ordinal")) %>%mutate(outcome =factor(outcome, levels =rev(outcome_order)),race =factor(race, levels =c("White", "Black")),# Create ordered interaction for legend with new ordermodel_race =factor(interaction(model, race), levels = guide_order[guide_order !="Observed.White"& guide_order !="Observed.Black"]) )# Create base plot with new orderrace_pp_plot <-ggplot(all_probs, aes(y = outcome, x = prob, xmin = lower, xmax = upper,color = model_race,shape = race)) +geom_pointrange(aes(y =as.numeric(outcome) +if_else(model =="Ordinal", 0.3, 0),fill =interaction(model, race)), size =3,linewidth =1.5, fatten =1,shape =21) +scale_color_manual(values = model_colors) +scale_shape_manual(values =c("White"=21, "Black"=21)) +scale_fill_manual(values =c("Linear.White"="white", "Linear.Black"="#FD9B6BFF","Ordinal.White"="white", "Ordinal.Black"="#782281FF")) +scale_y_discrete(limits =rev(outcome_order)) +guides(shape ="none",fill ="none",color =guide_legend(override.aes =list(shape =c(21, 21, 21, 21),fill =c("white", "#FD9B6BFF", "white", "#782281FF") )) ) +coord_cartesian(xlim =c(0, .7)) +scale_x_continuous(breaks =seq(0, .7, .1)) +labs(x ="Pred. prob. of \"afraid\" or \"very afraid\"\n(P(y>=3|race)), by model",y ="",color ="") +theme_minimal() +theme(text =element_text(family ="serif"),legend.position ="bottom",panel.grid.major.y =element_blank(),panel.grid.minor =element_blank(),axis.line =element_line(color ="black"),axis.ticks =element_line(color ="black"),axis.text =element_text(size =14),axis.title.x =element_text(size =14),legend.text =element_text(size =14) )# Add observed points to prop plotrace_pp_plot2 <- race_pp_plot +geom_point(data = obs_props,mapping =aes(x = prop, y = outcome,color =factor(if_else(race =="White", "Observed.White", "Observed.Black"),levels = guide_order) ),shape =23, size =3,fill =if_else(obs_props$race =="White", "white", "black"),inherit.aes =FALSE ) +scale_color_manual(values = model_colors,breaks = guide_order ) +guides(shape ="none",color =guide_legend(override.aes =list(shape =c(21, 21, 21, 21, 23, 23),fill =c("white", "#FD9B6BFF", "white", "#782281FF", "white", "black"), size =c(1, 1, 1, 1, 3, 3) )) )# Add observed contrasts to contrast plotrace_contrast_plot2 <- race_contrast_plot +geom_point(data = obs_contrasts,aes(x = contrast,y = outcome,color = model), # Add to color aesthetic for legendshape =23, # Diamond shapefill ="black",size =3,inherit.aes =FALSE ) +scale_color_manual(values =c("Linear"="#FD9B6BFF", "Ordinal"="#782281FF","Observed"="black"),breaks =c("Linear", "Ordinal", "Observed") ) +guides(color =guide_legend(override.aes =list(shape =c(21, 21, 23),fill =c("#FD9B6BFF", "#782281FF", "black"),size =c(1, 1, 3) )) )race_pp_plot2 + race_contrast_plot2 +plot_annotation(title ="FIGURE 5: Race-Specific Predicted Probabilities of \"Afraid\" or \"Very Afraid\" Responses and Black-White Probability Contrasts, by Linear or Ordinal Models",theme =theme(plot.title =element_text(size =12) ))
First, as expected, ordinal point-intervals accurately recover observed data. Second, though a naive interpretation of the beta estimates might suggest larger race estimates in the linear model, conversion to predicted probabilities shows this would be an inaccurate interpretation (more on this soon). All linear probability and contrast predictions substantially underestimate observed values. In fact, none of the linear model intervals successfully recovers the observed race-specific proportions or Black-White contrasts from any of the outcome items.
Let’s see if we can quantify and visualize the degree of underestimation in these linear models.
Show code
# calculate avg & range of linear underestimation of observed data# print("Structure of obs_props:")# str(obs_props)# # print("\nStructure of all_probs_lin:")# str(all_probs_lin)# # print("\nStructure of obs_contrasts:")# str(obs_contrasts)# # print("\nStructure of all_contrasts_lin:")# str(all_contrasts_lin)# 1. Calculate probability underestimates# Match on both outcome and raceprob_underestimates <- obs_props %>%mutate(outcome_str =as.character(outcome)) %>%left_join( all_probs_lin %>%mutate(outcome_str =gsub(" ", " ", outcome)), by =c("outcome_str"="outcome", "race") ) %>%mutate(difference = prop - prob )# 2. Summary statistics for probability underestimatesprob_summary <- prob_underestimates %>%summarize(avg_underestimate =mean(difference, na.rm =TRUE),min_underestimate =min(difference, na.rm =TRUE),max_underestimate =max(difference, na.rm =TRUE) )# 3. Summary by raceprob_by_race <- prob_underestimates %>%group_by(race) %>%summarize(avg_underestimate =mean(difference, na.rm =TRUE),min_underestimate =min(difference, na.rm =TRUE),max_underestimate =max(difference, na.rm =TRUE) )# 4. Calculate contrast underestimatescontrast_underestimates <- obs_contrasts %>%mutate(outcome_str =as.character(outcome)) %>%left_join( all_contrasts_lin %>%mutate(outcome_str =gsub(" ", " ", outcome)), by =c("outcome_str"="outcome") ) %>%mutate(difference = contrast - estimate )# 5. Summary statistics for contrast underestimatescontrast_summary <- contrast_underestimates %>%summarize(avg_underestimate =mean(difference, na.rm =TRUE),min_underestimate =min(difference, na.rm =TRUE),max_underestimate =max(difference, na.rm =TRUE) )# 6. Print resultscat("Probability Underestimates (Observed - Linear):\n")
# A tibble: 20 × 5
outcome_str race observed predicted difference
<chr> <chr> <dbl> <dbl> <dbl>
1 Stop You White 0.192 0.0995 0.0930
2 Search You White 0.179 0.0915 0.0871
3 Yell at You White 0.234 0.118 0.116
4 Handcuff You White 0.196 0.0959 0.101
5 Kick/Hit You White 0.181 0.0857 0.0948
6 Pin You Down White 0.187 0.0915 0.0950
7 Pepper Spray You White 0.198 0.0942 0.104
8 Tase You White 0.196 0.0939 0.103
9 Shoot at You White 0.173 0.0879 0.0847
10 Kill You White 0.167 0.0880 0.0787
11 Stop You Black 0.545 0.349 0.195
12 Search You Black 0.461 0.307 0.154
13 Yell at You Black 0.471 0.313 0.158
14 Handcuff You Black 0.521 0.344 0.177
15 Kick/Hit You Black 0.547 0.368 0.178
16 Pin You Down Black 0.564 0.385 0.179
17 Pepper Spray You Black 0.548 0.368 0.180
18 Tase You Black 0.568 0.383 0.185
19 Shoot at You Black 0.591 0.412 0.179
20 Kill You Black 0.593 0.419 0.174
Show code
cat("\nAverage Probability Underestimate:", round(prob_summary$avg_underestimate, 3), "\n")
Average Probability Underestimate: 0.14
Show code
cat("Range of Probability Underestimates:", round(prob_summary$min_underestimate, 3), "to", round(prob_summary$max_underestimate, 3), "\n")
Range of Probability Underestimates: 0.079 to 0.2
Show code
cat("\nProbability Underestimates by Race:\n")
Probability Underestimates by Race:
Show code
print(prob_by_race)
# A tibble: 2 × 4
race avg_underestimate min_underestimate max_underestimate
<chr> <dbl> <dbl> <dbl>
1 Black 0.176 0.154 0.195
2 White 0.0957 0.0787 0.116
# 8. Visualize probability underestimates by racep2 <-ggplot(prob_underestimates, aes(x =reorder(outcome_str, difference), y = difference, fill = race)) +geom_col(position ="dodge") +geom_hline(yintercept =0, linetype ="dashed") +scale_fill_manual(values =c("White"="#FECD90FF", "Black"="#FD9B6BFF")) +labs(title ="Probability Underestimates by Outcome and Race",subtitle ="Observed - Linear Model Prediction",x ="",y ="Difference (Observed - Predicted)" ) +coord_flip() +theme_minimal() +theme(legend.position ="bottom",panel.grid.minor =element_blank(),axis.text =element_text(size =12),axis.title =element_text(size =14) )print(p2)
The linear model, on average, underestimated the probability of “afraid” or “very afraid” responses by 0.14, with underestimates ranging from 0.079 to 0.2. Similarly, the linear model underestimated the Black-White contrast by an average of 0.08, with underestimates ranging from 0.042 to 0.1. This pattern aligns with the ceiling and floor effects described by Kruschke and Liddell (2018), where linear models struggle to represent data clustered at extreme values due to their inherent assumption of normality. Overall, these plots clearly demonstrate the linear model’s limitations in representing the observed data.
10.3 Explaining differences between latent & predicted probability scales
First, let’s review why we might generally expect to see different patterns for latent and predicted probability scales across linear and ordinal models.
Differences between Linear and Ordinal Models: - Linear models and ordinal models handle categorical outcomes differently. Linear models (like linear regression) treat the ordinal outcome as a continuous variable, which can lead to problematic assumptions when the underlying scale isn’t truly continuous.
Latent Scale vs. Predicted Probability: - Latent Scale (Betas): This represents the underlying linear predictor. In ordinal models (like probit or logit ordinal regression), this scale is transformed through a cumulative link function. - Predicted Probabilities: This is the actual (estimated) probability of being in a specific category or above a certain threshold.
Apparent discrepancies can occur because of the non-linear transformation of the latent scale in ordinal models. - Linear Model: Assumes a linear relationship across the entire scale, which can overestimate effects by treating the ordinal categories as equidistant. - Ordinal Model: Uses a cumulative link function that can capture non-linear relationships more accurately.
So, different patterns on the latent scale and on predicted probability scales might emerge because, when transformed to predicted probabilities, more or less pronounced differences in the ordinal model might emerge due to its more specific handling of category boundaries. This is a classic example of why ordinal models should be preferred for categorical ordered outcomes. As we saw in the intercept-only models above, the non-linear transformation in ordinal models allows for: - More accurate representation of category boundaries - Better handling of the inherent ordering in categorical responses - Potentially more realistic and directly interpretable modeling of the underlying psychological process
Now, let me try to specifically explain what is happening in this specific case.
Normal Distribution Assumptions: In a linear model with normal distribution assumptions, the probability density is highest near the mean and symmetrically decreases as you move away from the center. This means:
Extreme values (very unafraid or very afraid) get “pulled” toward the central tendency
The model assumes roughly equal spacing between categories
Probabilities at the tails are compressed due to the normal distribution’s shape
Bimodal and Skewed Distribution: With the observed item distribution (flat or bimodal with clustering at extremes), the linear model’s normal assumption becomes particularly problematic:
The symmetric normal distribution doesn’t match the actual data distribution
Extreme group-specific clusters (whites very unafraid, blacks very afraid) are not well-represented
Linear model tries to “average out” these extreme clusters
Why Larger Latent Scale Effects:
The linear model compensates for the mismatch between its assumptions and data by increasing the coefficient magnitude on the latent scale
This allows the model to try to capture the group differences, but does so by inflating the linear predictor
However, due to normal distribution constraints, this doesn’t translate directly to extreme probability predictions
Ordinal Model Advantages: The ordinal model (cumulative link model) is more flexible:
Uses cumulative probabilities across category thresholds
Can more accurately represent non-linear relationships
Doesn’t assume equal spacing or symmetric distribution
Directly models the probability of being at or above each threshold
Demonstration of the differences in effects by scaling:
Linear model, effect on latent “stopyou”: Coefficient of 1.09 for Black race
Ordinal model, effect on latent “stopyou”: Coefficient of 0.92 for Black race
Despite slightly smaller coefficient magnitudes on latent scale, the probability contrasts for y>=3 is larger in ordinal model due to different distributional assumptions
11 Race differences in latent fear
Goal: Compare results from a linear model applied to PGC’s multi-item composite scale measuring latent fear of police to those from an ordinal multilevel model predicting latent fear of police.
11.1 Is a composite “latent” fear measure appropriate?
In PGC’s study, they did not estimate item-specific models. Rather, after reporting some common indicators of unidimensionality (e.g., high Cronbach’s alpha; significant factor loadings), they created a composite scale by averaging all ten outcome items (0 to 4, by 0.1) to essentially predict “latent” fear of police. Then, they normed or re-scaled the measure to range from 0 to 100 so they could interpret coefficients as effects on a “percentage” of the outcome scale.
Unfortunately, such transformations only provide the illusion of interpretability. We cannot take ordinal items, add them up, rescale them, and then accurately interpret results in metric scaling (e.g., as percentages). Averaging ordinal items assumes equal intervals between response categories, which is unlikely to be true (e.g., the “distance” between “very unafraid” and “unafraid” may not equal the distance between “unafraid” and “neither”). Also, creating a composite score by averaging loses information about item-specific response patterns and assumes all items equally represent the underlying construct.
Meanwhile, rescaling to 0-100 and interpreting as “percentages” is problematic because it implies metric properties that do not exist in the original ordinal data. The “percentage” interpretation suggests a meaningful zero point and equal units, but the transformation does not address the fundamental ordinal nature of the data.
Fortunately, there are alternatives. We can use ordinal models that respect the ordinal nature of the items and allow for unequal distances between thresholds. Also, we can estimate item-specific effects (like we did) or use multilevel modeling to account for item clustering while estimating overall effects on “latent” fear of police.
Before we illustrate such a multilevel modeling approach, it is helpful to visualize whether predicted probabilities for all five response categories (from 0 to 4) actually cluster across all ten outcome items, and whether response-specific contrasts also cluster across outcomes. If so, this would bolster our confidence in assuming the appropriateness of predicting overall effects on fear as a latent construct.
11.2 Do all ten outcome items appear to assess “latent” fear?
Let’s start by generating race-specific predicted probabilities and black-white contrasts for all thresholds and for all ten outcomes from cumulative probit models.
Show code
# Get predictions for both racesnd <-tibble(race =c("White", "Black"))# Get predictions for all modelsrace_preds <- purrr::map_dfr( race_ord_mods, ~add_epred_draws(.x, newdata = nd), .id =".model") %>%ungroup()# Calculate predicted probabilities by racerace_probs <- race_preds %>%group_by(.model, race, .category) %>%summarise(p =mean(.epred),ll =quantile(.epred, 0.025),ul =quantile(.epred, 0.975) )# Calculate Black-White contrastsrace_contrasts <- race_preds %>% dplyr::select(.draw, .model, race, .category, .epred) %>%pivot_wider(names_from = race, values_from = .epred) %>%mutate(diff = Black - White) %>%group_by(.model, .category) %>%summarise(con =mean(diff),ll =quantile(diff, 0.025),ul =quantile(diff, 0.975) )
Now, let’s assume responses to each outcome represent indicators of a single underlying latent fear of police construct. If so, when we “stack” the race-specific probabilities and black-white contrasts from all models together, they should cluster across outcomes. Let’s see what happens when we plot probabilities and contrasts for all response categories, overlapping interval estimates for all 10 items.
Show code
# Predicted probabilities plotp1 <-ggplot(race_probs, aes(x = .category, y = p, ymin = ll, ymax = ul, color = race)) +geom_hline(yintercept = scales::breaks_pretty()(range(0, .6)), color ="lightgrey") +geom_linerange(linewidth =1, position =position_dodge(width =1/3)) +geom_point(aes(shape = race), position =position_dodge(width =1/3), size =6) +scale_color_manual(NULL, values =c("Black"="#100B2DFF", "White"="#FEC488FF")) +scale_shape_manual(NULL, values =c(3, 3)) +scale_y_continuous(limits =c(0, .61), breaks = scales::breaks_pretty()) +labs(title ="Predicted Category Probabilities by Race",x ="Response Category",y ="Predicted Probability") +theme_minimal() +theme(legend.background =element_blank(),legend.position ="bottom",panel.grid =element_blank(),axis.line =element_line(color ="black"),axis.ticks =element_line(color ="black") )# Contrast plotp2 <-ggplot(race_contrasts, aes(x = .category, y = con)) +geom_hline(yintercept = scales::breaks_pretty()(range(-0.4, 0.4)), color ="lightgrey") +geom_hline(yintercept =0, color ="black") +geom_pointrange(aes(ymin = ll, ymax = ul), linewidth =1, size =1.5, color ="#B2357BFF",shape =3) +scale_y_continuous(limits =c(-0.42, 0.42), breaks = scales::breaks_pretty()) +labs(title ="Black-White Contrasts in Predicted Probabilities",x ="Response Category",y ="Contrast") +theme_minimal() +theme(legend.background =element_blank(),panel.grid =element_blank(),axis.line =element_line(color ="black"),axis.ticks =element_line(color ="black") )# Combine plotsp1 + p2 +plot_annotation(title ="Visualizing Clustering of Estimates by Response Category Across Ordinal Models of All Ten Fear Items",theme =theme(plot.title =element_text(size =12) ))
Overall, there is substantial clustering, though certain response categories (e.g., 0, 4) show more variation than others (2,3,4).
One of our goals here is to provide an intuitive, visualized sense of what our multilevel model will be doing later - essentially it will use each of these threshold-specific and item-specific (single-level) model estimates to generate overall aggregated estimates. Let’s try it.
11.3 CP: Multilevel model
Show code
# Data prep - need to convert to long formatdat_long <- dat %>%pivot_longer(all_of(fearpol_new), # Use our vector of 10 fear itemsnames_to ="outcome",values_to ="fear" ) %>%mutate(fear =factor(fear, ordered =TRUE), # Ensure ordered factoroutcome =factor(outcome) # Factor for random effects grouping )# Formula for hierarchical ordinal modelbf_fear <-bf(fear |thres(4) ~ race + (race | outcome))# Priors (same thresholds as before, add random effects)priors_hier <-c(# Fixed effect priors (same as before)prior(normal(-0.84, 1), class ="Intercept", coef ="1"),prior(normal(-0.25, 1), class ="Intercept", coef ="2"),prior(normal(0.25, 1), class ="Intercept", coef ="3"),prior(normal(0.84, 1), class ="Intercept", coef ="4"),prior(normal(0, 1), class ="b"),# Random effects priorsprior(exponential(1), class ="sd"),prior(lkj(2), class ="cor") # Prior for correlation of random effects)# Fit hierarchical modelhier_mod <-brm( bf_fear,data = dat_long,family =cumulative("probit"),prior = priors_hier,cores = nCoresphys,chains =4,iter =2000,warmup =1000,seed =1138,control =list(adapt_delta =0.95),file ="Models/hier_fear_fit",file_refit ="on_change")
# Define paths for cached LOO resultshier_mod_loo_path <-"Models/hier_mod_loo.rds"# Check if cached LOO existsif (file.exists(hier_mod_loo_path)) {# Read cached LOO hier_mod_loo <-readRDS(hier_mod_loo_path)} else {# Compute LOO if not cached hier_mod_loo <-loo(hier_mod)# Save LOOsaveRDS(hier_mod_loo, hier_mod_loo_path)}print(hier_mod_loo$estimates)
The multilevel model (MLM) appears to have fit without issue and recovers the underlying data reasonably well. Now we will extract population-level predictions using re_formula = NA to get predictions that marginalize over the random effects of outcomes, effectively giving us estimates for the latent “fear of police” construct. Then, we can calculate and plot race-specific posterior probabilities and black-white contrasts for each response category.
Show code
# Prediction grid nd <-tibble(race =factor(c("White", "Black")))# Get predictions, marginalizing over random effectsmlm_preds <-add_epred_draws(hier_mod, newdata = nd, re_formula =NA) %>%ungroup()# Get predicted probabilities for latent fearmlm_probs <- mlm_preds %>%group_by(race, .category) %>%summarise(p =mean(.epred),ll =quantile(.epred, .025),ul =quantile(.epred, .975) )# Calculate contrasts for latent fearmlm_contrasts <- mlm_preds %>% dplyr::select(.draw, .category, race, .epred) %>%pivot_wider(names_from = race, values_from = .epred) %>%mutate(diff = Black - White) %>%group_by(.category) %>%summarise(con =mean(diff),ll =quantile(diff, .025),ul =quantile(diff, .975) )
Most of the 95% CrI error bars are relatively small, implying relatively precise estimates. This is the case even though we use re_formula = NA to marginalize over random effects, which means these interval estimates incorporate uncertainty from both fixed and random effects in the predictions. The narrower intervals may reflect a relatively large sample size (N=1,020), the examination of population-level effects averaged across all ten items, and the items showing relatively consistent effects (i.e., small variation in random effects; recall clustering of estimates in Figure 4).
Let’s pull all this together.
11.3.1 FIG6
Show code
# For mlm predictions, get full posterior for violin plotsmlm_preds_violin <- mlm_preds %>%group_by(race, .category, .draw) %>%summarise(p =mean(.epred),.groups ="drop" )# Combined predictions plotp1 <-ggplot() +# Individual model estimates (semi-transparent, jittered)geom_pointrange(data = race_probs,aes(x = .category, y = p, ymin = ll, ymax = ul, color = race),position =position_jitterdodge(jitter.width =0.2,jitter.height =0,dodge.width =0.4,seed =1138 ), alpha =0.8,linewidth =1,shape =3) +# MLM estimates as violin plotsgeom_violin(data = mlm_preds_violin,aes(x = .category, y = p, fill = race),position =position_dodge(width =0.4),alpha =0.2,scale ="width") +scale_color_manual(NULL, values =c("Black"="#100B2DFF", "White"="#FEC488FF")) +scale_fill_manual(NULL,values =c("Black"="#100B2DFF", "White"="#FEC488FF")) +scale_y_continuous(limits =c(0, .61), breaks = scales::breaks_pretty()) +labs(title ="Predicted Category Probabilities by Race",subtitle ="Semi-transparent intervals show individual items; violin plots show MLM estimates",x ="Response Category",y ="Predicted Probability") +theme_minimal() +theme(text =element_text(family ="serif"),legend.background =element_blank(),legend.position ="bottom",panel.grid =element_blank(),axis.line =element_line(color ="black"),axis.ticks =element_line(color ="black"), axis.title =element_text(size =14),axis.text =element_text(size =14),legend.text =element_text(size =14) )# Similar approach for contrastsmlm_contrasts_violin <- mlm_preds %>% dplyr::select(.draw, race, .category, .epred) %>%pivot_wider(names_from = race, values_from = .epred) %>%mutate(diff = Black - White) %>%group_by(.category, .draw) %>%summarise(con =mean(diff),.groups ="drop" )p2 <-ggplot() +# Individual model contrastsgeom_pointrange(data = race_contrasts,aes(x = .category, y = con, ymin = ll, ymax = ul),position =position_jitter(width =0.2),alpha =0.8,linewidth =1,color ="#B2357BFF") +# MLM contrasts as violingeom_violin(data = mlm_contrasts_violin,aes(x = .category, y = con),fill ="#B2357BFF",alpha =0.2) +geom_hline(yintercept =0, color ="black") +labs(title ="Black-White Contrasts in Predicted Probabilities",subtitle ="Semi-transparent intervals show individual items; violin plots show MLM estimates",x ="Response Category",y ="Contrast") +theme_minimal() +theme(text =element_text(family ="serif"),legend.background =element_blank(),panel.grid =element_blank(),axis.line =element_line(color ="black"),axis.ticks =element_line(color ="black"),axis.title =element_text(size =14),axis.text =element_text(size =14),legend.text =element_text(size =14) )# Combine plotsp1 + p2 +plot_annotation(title ="FIGURE 6: Comparing Individual Item and MLM Estimates",theme =theme(plot.title =element_text(size =12) ))
Overall, item-specific models generate probability and contrast estimates that closely cluster across the ordinal response distribution, which is what we would expect to see if the individual items indeed measured a latent “fear of police” attitudinal construct. Two notable exceptions emerged: there is substantial heterogeneity across items in the predicted probabilities of “very unafraid” for Whites and of “very afraid” for Blacks, which results in item-specific heterogeneity in the predicted contrasts for these ordinal response categories.
Likewise, the multilevel model estimates are very precise where underlying item-specific predictions are closely clustered (e.g., for 8 of 10 predicted probabilities and 3 of 5 contrasts), while uncertainty intervals are wider where there is more variation in item-specific predictions.
Examination of observed proportions in these extreme columns (Appendix A) provides a clue as to the sources of heterogeneity and uncertain predictions.
First, a larger proportion of White participants report being very unafraid (y=0) of physically intensive (and “objectively” scarier) police use of force actions, while a smaller proportion report being very unafraid of more common and less physically intrusive actions. This pattern suggests that the fear items may in part be measuring perceived probability of experiencing these actions, at least for White participants.
Second, and in direct contrast to these findings for Whites, a larger proportion of Black participants report being very afraid (y=4) of the physically intensive (and “objectively” scarier) police use of force actions, while a smaller proportion report being very afraid of more common and less physically intrusive actions.
Collectively, these patterns reveal important race-specific and outcome-specific complexities in racial differences in fear of police - patterns that we are unable to detect if we start by assuming the appropriateness of latent measurement and analysis.
11.4 Lin: Latent effects on normed mean scale
Show code
#DV: Personal Fear of Police - Normed Scaledat <- dat %>%mutate(fearpol = datawizard::row_means(pick(all_of(fearpol_new))),fearpol_norm = (fearpol -min(fearpol, na.rm =TRUE)) / (max(fearpol, na.rm =TRUE) -min(fearpol, na.rm =TRUE)) *100 )# summary(dat$fearpol)# summary(dat$fearpol_norm)# Model path for the cached modelmodel_path <-"Models/race_lin_mod_fearpol_norm.rds"# Check if cached model existsif (file.exists(model_path)) { race_lin_mod_fearpol_norm <-readRDS(model_path)} else {# Linear model with race for normed fear of police scale race_lin_mod_fearpol_norm <-stan_glm(formula = fearpol_norm ~ race,data = dat,family = gaussian,chains =4,cores = nCoresphys,seed =1138 )# Save modelsaveRDS(race_lin_mod_fearpol_norm, model_path)}
Model Info:
function: stan_glm
family: gaussian [identity]
formula: fearpol_norm ~ race
algorithm: sampling
sample: 4000 (posterior sample size)
priors: see help('prior_summary')
observations: 1020
predictors: 2
Estimates:
mean sd 2.5% 97.5%
(Intercept) 32.17 1.32 29.55 34.76
raceBlack 31.65 1.83 28.19 35.28
sigma 29.47 0.65 28.22 30.79
Fit Diagnostics:
mean sd 2.5% 97.5%
mean_PPD 48.18 1.32 45.62 50.80
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.02 1.00 3698
raceBlack 0.03 1.00 3720
sigma 0.01 1.00 3980
mean_PPD 0.02 1.00 3916
log-posterior 0.03 1.00 1716
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
Before interpreting, let’s add an exact frequentist replication.
Show code
# Exact frequentist replicationlibrary(sandwich)library(lmtest)ols_model <-lm(fearpol_norm ~ race, data = dat)robust_results <-coeftest(ols_model, vcov =vcovHC(ols_model, type ="HC1"))# This should yield very close to our Bayesian linear estimate & Pickett et al.'s 32.088 estimaterobust_results
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 32.19 1.33 24.2 <0.0000000000000002 ***
raceBlack 31.62 1.84 17.1 <0.0000000000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
After averaging across the ten fear items and rescaling to 0-100, our Bayesian linear model shows that Black respondents reported substantially higher fear of police than White respondents. Specifically, the raceBlack coefficient of 31.65 (95% CI: 28.19, 35.28) represents the estimated mean difference between Black and White respondents on the normed fear scale (0-100), which means that Black respondents scored an average of about 32 points higher than White respondents, whose mean score was also about 32 points.
However, remember the following limitations in interpreting this coefficient:
While the rescaling to 0-100 makes the coefficient seem more interpretable (e.g., “32 percentage points”), this interpretation assumes equal intervals between original response categories.
The scale transformation doesn’t address the ordinal nature of the underlying items - the distance between “very unafraid” and “unafraid” may not equal the distance between “unafraid” and “neither”
The linear model assumes normally distributed errors around predicted means, which may not be appropriate for bounded scores derived from ordinal items
What if we were to back-transform these predictions from the 0-100 scale to the original 0-4 scale?
Back-transformed: (63.82/100) * 4 = 2.55 on 0-4 scale
So on the original ordinal scale where:
0 = very unafraid 1 = unafraid 2 = neither afraid nor unafraid 3 = afraid 4 = very afraid
The model predicts that on average:
White respondents fall between “unafraid” and “neither” (closer to “unafraid”)
Black respondents fall between “neither” and “afraid” (closer to “afraid”)
This back-transformation helps illustrate what these predictions mean in terms of the mean values on original response categories, though we should still note the limitations of treating ordinal responses as continuous. After all, what do these mean values even mean? As noted earlier, precise response category predictions from item-specific models were substantially different from ordinal predictions, and the linear model estimates consistently failed where ordinal estimates reasonably succeeded to recover the underlying data. Now, let’s compare precise model-based category predictions for “latent” fear of police, focusing on race-specific predicted probabilities of reportedly being “very afraid” of police overall.
11.4.1 Comparing MLM and linear latent model predictions for “very afraid”
What are the predicted probabilities and contrasts for “very afraid” implied by the linear model? This is not a question with a straightforward or obvious answer. Let’s consider two different answers using either the precise “very afraid” category cutoff (y>=4) or a more generous cutoff falling between “afraid” and “very afraid” (y>=3.5).
First, let’s calculate model predictions of 4 or higher on the original scale to estimate the model-implied probability of scoring equal to “very afraid” (=4 or higher).
Show code
get_prob_contrasts_lin_fearpol_norm <-function(model) {# Get posterior draws draws <-as_draws_df(model)# Use 100 (=4) as the cutoff for "very afraid" p_white <-1-pnorm(100, mean = draws$`(Intercept)`, sd = draws$sigma) p_black <-1-pnorm(100, mean = draws$`(Intercept)`+ draws$raceBlack, sd = draws$sigma)# Summarize probabilities probs <-data.frame(race =c("White", "Black"),prob =c(mean(p_white), mean(p_black)),lower =c(quantile(p_white, 0.025), quantile(p_black, 0.025)),upper =c(quantile(p_white, 0.975), quantile(p_black, 0.975)) )# Calculate contrasts from each draw contrasts <- p_black - p_white contrast_summary <-data.frame(estimate =mean(contrasts),lower =quantile(contrasts, 0.025),upper =quantile(contrasts, 0.975) )list(probabilities = probs, contrasts = contrast_summary)}# Run function on cached modelfearpol_norm_results <-get_prob_contrasts_lin_fearpol_norm(race_lin_mod_fearpol_norm)fearpol_norm_results$probabilities %>%gt()
race
prob
lower
upper
White
0.011
0.0074
0.015
Black
0.110
0.0917
0.131
Show code
fearpol_norm_results$contrasts %>%gt()
estimate
lower
upper
0.099
0.082
0.12
Now, let’s try a more generous estimate of a normed score equivalent to >=3.5 on the original scale, as it implies at least half of respondents would have reported being very afraid to have a score of that value.
Show code
# Mapping 3.5 to the normed scale# Original scale: 0 to 4# Normed scale: 0 to 100# 3.5 / 4 * 100 = 88 (equivalent point on 0 to 100 scale)get_prob_contrasts_lin_fearpol_norm <-function(model) {# Get posterior draws draws <-as_draws_df(model)# Use 87.5 as generous cutoff for "very afraid" p_white <-1-pnorm(87.5, mean = draws$`(Intercept)`, sd = draws$sigma) p_black <-1-pnorm(87.5, mean = draws$`(Intercept)`+ draws$raceBlack, sd = draws$sigma)# Summarize probabilities probs <-data.frame(race =c("White", "Black"),prob =c(mean(p_white), mean(p_black)),lower =c(quantile(p_white, 0.025), quantile(p_black, 0.025)),upper =c(quantile(p_white, 0.975), quantile(p_black, 0.975)) )# Calculate contrasts from each draw contrasts <- p_black - p_white contrast_summary <-data.frame(estimate =mean(contrasts),lower =quantile(contrasts, 0.025),upper =quantile(contrasts, 0.975) )list(probabilities = probs, contrasts = contrast_summary)}# Run the function on cached modelfearpol_norm_results2 <-get_prob_contrasts_lin_fearpol_norm(race_lin_mod_fearpol_norm)fearpol_norm_results2$probabilities %>%gt()
race
prob
lower
upper
White
0.03
0.023
0.04
Black
0.21
0.185
0.24
Show code
fearpol_norm_results2$contrasts %>%gt()
estimate
lower
upper
0.18
0.16
0.21
How do these compare with estimates from the ordinal MLM model?
While we are at it, let’s add observed proportions too. First, we will calculate the race-specific proportion of “very afraid” (y=4) responses for each item. Next, we will calculate the mean of the 10 proportions (one per item) for each race group. This gives us an average race-specific proportion of “very afraid” responses across all items. Finally, we will calculate the black-white contrast in these proportions.
Note that none of the 95% credibility intervals from either the “precise” and “generous” linear model estimates of race-specific “very afraid” probabilities or black-white probability contrasts contained the observed parameters. In contrast, all three of the 95% CrI intervals from the ordinal model estimates contain the observed parameter.
12 Supplemental Analyses
12.1 Unequal group variances
The cumulative probit models estimated above assume equal group variance (i.e., equal variances for White and Black response distributions). Yet, sometimes this assumption is not appropriate, and it can result in poor recovery of underlying ordinal data. Below, we modified the specifications of our cumulative probit multilevel model to include a variance structure that allows for unequal group variances by race. In brms, we achieve this by adding a group-level variance parameter with the ~ (1 | race) syntax in the random effects specification.
Show code
# Supplemental analysis allowing unequal variances by racebf_fear_unequal_var <-bf(fear |thres(4) ~ race + (1| race) + (race | outcome))# Modify priors to include the new variance structurepriors_hier_unequal_var <-c(# Fixed effect priors (same as before)prior(normal(-0.84, 1), class ="Intercept", coef ="1"),prior(normal(-0.25, 1), class ="Intercept", coef ="2"),prior(normal(0.25, 1), class ="Intercept", coef ="3"),prior(normal(0.84, 1), class ="Intercept", coef ="4"),prior(normal(0, 1), class ="b"),# Random effects priorsprior(exponential(1), class ="sd"),prior(lkj(2), class ="cor") # Prior for correlation of random effects)# Fit model with unequal varianceshier_mod_unequal_var <-brm( bf_fear_unequal_var,data = dat_long,family =cumulative("probit"),prior = priors_hier_unequal_var,cores = nCoresphys,chains =4,iter =2000,warmup =1000,seed =1138,control =list(adapt_delta =0.95),file ="Models/hier_fear_fit_unequal_var",file_refit ="on_change")
49 divergent transitions. Should increase warmup and draws, perhaps increase adapt_delta to 0.99.
# Define paths for cached LOO resultshier_mod_unequal_var_loo_path <-"Models/hier_mod_unequal_var_loo.rds"# Check if cached LOO existsif (file.exists(hier_mod_unequal_var_loo_path)) {# Read cached LOO hier_mod_unequal_var_loo <-readRDS(hier_mod_unequal_var_loo_path)} else {# Compute LOO if not cached hier_mod_unequal_var_loo <-loo(hier_mod_unequal_var)# Save LOOsaveRDS(hier_mod_unequal_var_loo, hier_mod_unequal_var_loo_path)}print(hier_mod_unequal_var_loo$estimates)
Let’s compare this model with the original multilevel model specifying equal variances using LOO (leave-one-out) cross-validation to see if allowing for unequal variances improves model fit.
The model with unequal variances (hier_mod_unequal_var) and the original hierarchical model (hier_mod) perform almost identically in terms of predictive accuracy. The ELPD difference is about twice the magnitude of the standard error, which suggests that allowing for unequal variances might capture slightly more variation in the data, though an ELPD difference of -0.2 indicates the degree of improvement over the simpler model is very small. So, in this case, the added model complexity and computational costs are not worth it given the comparable predictive performance of both models.
12.2 Category-specific probit models (supplement)
Goal: Briefly introduce application of category-specific probit regression techniques for even more accurate recovery of underlying distributions (for online supplement only).
Show code
# Prepare data (similar to before, but now as a categorical outcome)dat_long <- dat %>%pivot_longer(all_of(fearpol_new), # Use vector of 10 fear itemsnames_to ="outcome",values_to ="fear" ) %>%mutate(fear =factor(fear, ordered =FALSE), # Key change: NOT orderedoutcome =factor(outcome) # Factor for random effects grouping )# Formula for category-specific probit modelbf_fear_categorical <-bf(fear ~ race + (1| outcome))# Fit categorical modelcat_mod <-brm( bf_fear_categorical,data = dat_long,family =categorical("logit"), cores = nCoresphys,chains =4,iter =4000,warmup =1000,seed =1138,control =list(adapt_delta =0.95),file ="Models/cat_fear_fit",file_refit ="on_change")
# Define paths for cached LOO resultscat_mod_loo_path <-"Models/cat_mod_loo.rds"# Check if cached LOO existsif (file.exists(cat_mod_loo_path)) {# Read cached LOO cat_mod_loo <-readRDS(cat_mod_loo_path)} else {# Compute LOO if not cached cat_mod_loo <-loo(cat_mod)# Save LOOsaveRDS(cat_mod_loo, cat_mod_loo_path)}print(cat_mod_loo$estimates)
Again, let’s compare this categorical model with the original cumulative probit multilevel model using LOO to see if category-specific predictions substantially improves model fit.
This LOO (Leave-One-Out) cross-validation comparison shows a substantial difference between the categorical model and the hierarchical cumulative probit model. The ELPD difference of -84.8 is more than five times larger than the standard error and well beyond conventional threshold of twice the standard error (2 * -14.7 ≈ -29.4).
This suggests a statistically meaningful difference in model predictive performance. Specifically, the categorical model (cat_mod) appears to have substantially better predictive accuracy compared to the hierarchical cumulative probit model.
In this case, we would be wise to consider using a categorical model for the analysis, as it demonstrates a clear improvement in predictive performance. Despite offering a dramatic improvement over the linear model, the cumulative ordered structure may not fit the observed fear responses in these data as well as initially assumed - or at least not as well as a much more complex and computationally intensive model where every category for both groups and all outcomes is specifically predicted.
13 Linear MLM
We cannot use LOO cross-validation to directly compare ELPD estimates of model fit from the linear model predicting a composite normed mean scale combining 10 items to measure latent fear of crime and the cumulative probit multilevel model predicting latent fear of crime aggregated from the 10 item-specific estimates. However, we can use it to compare a linear multilevel model with the cumulative probit model.
Show code
# Data prep - similar to before, but now using fear values directlydat_long <- dat %>%pivot_longer(all_of(fearpol_new), # Use vector of 10 fear itemsnames_to ="outcome",values_to ="fear" ) %>%mutate(outcome =factor(outcome) # Factor for random effects grouping )# Formula for hierarchical linear modelbf_fear_linear <-bf(fear ~ race + (1| outcome))# Priors for hierarchical linear modelpriors_hier_linear <-c(prior(normal(0, 1), class ="Intercept"), prior(normal(0, 1), class ="b"), prior(exponential(1), class ="sd") )# Fit hierarchical linear modelhier_mod_linear <-brm( bf_fear_linear,data = dat_long,family =gaussian(), prior = priors_hier_linear,cores = nCoresphys,chains =4,iter =2000,warmup =1000,seed =1138,control =list(adapt_delta =0.95),file ="Models/hier_fear_linear_fit",file_refit ="on_change")
# Define paths for cached LOO resultshier_mod_linear_loo_path <-"Models/hier_mod_linear_loo.rds"# Check if cached LOO existsif (file.exists(hier_mod_linear_loo_path)) {# Read cached LOO hier_mod_linear_loo <-readRDS(hier_mod_linear_loo_path)} else {# Compute LOO if not cached hier_mod_linear_loo <-loo(hier_mod_linear)# Save LOOsaveRDS(hier_mod_linear_loo, hier_mod_linear_loo_path)}print(hier_mod_linear_loo$estimates)
Does this model do a better job predicting “very afraid” responses? Let’s extract “precise” and “generous” predictions again and then add them to our Table 2.
As before, let’s calculate model predictions of 4 or higher on the original scale to estimate the model-implied probability of scoring equal to “very afraid” (=4 or higher). Like before, we will also calculate using a more generous estimate equivalent to >=3.5 on the original scale, as it implies reporting being very afraid on at least half the ten items. Comparable to the ordinal MLM, we will generate population-level estimates with intervals that reflect uncertainty in fixed effects, random effects variance, and residual variance.
Now, let’s compare this linear multilevel model with the original cumulative probit multilevel model using LOO cross-validation to see if there are substantial differences in the fit of each model’s predictions.
This LOO comparison reveals a dramatic difference between the two models. The extremely large ELPD difference of -1919.0 is far beyond the standard error of 48.6 (more than 39 standard errors).
As before when comparing linear and cumulative probit models, we received a warning about ‘yhash’ attributes not matching. This is because the items are formatted differently - as numeric for the linear model and as ordered factors for the cumulative probit model. Other than this formatting difference and our specification of different distributional families (Gaussian vs. cumulative probit), the underlying data structure is essentially the same (just represented differently).
So, the LOO comparison should still provide meaningful insights. Key considerations are:
Are the underlying response variables fundamentally the same?
Do the models capture the same latent construct?
Are the differences in model specification minor?
In our case:
Both models use the same 10 fear items
The only difference is factor representation (numeric vs. ordered)
Posterior predictive checks suggest poor fit for the linear model
Given these points, the LOO comparison is likely informative. Under this assumption, the dramatically worse fit of the linear model (-1919.0 elpd difference) suggests:
The ordinal nature of the data is crucial
A linear model fails to capture the underlying response structure
The cumulative probit model better represents the inherent ordered categorical nature of the data
15 LOO ELPD differences
Let’s pull these various ELPD estimates and differences into a single supplementary table comparing relative model fit.
15.1 APPC
Show code
# Combine LOO resultsloo_results <-list( hier_mod_loo, hier_mod_unequal_var_loo, cat_mod_loo, hier_mod_linear_loo)# Model names in desired ordermodel_names <-c("Cumulative Probit","Cumulative Probit w/Unequal Variances","Categorical Logit","Linear")# Create mapping between model names & indices in loo_diffsmodel_mapping <-c("Cumulative Probit"="hier_mod","Cumulative Probit w/Unequal Variances"="hier_mod_unequal_var","Categorical Logit"="cat_mod","Linear"="hier_mod_linear")# Extract LOO estimatesloo_df <-do.call(rbind, lapply(loo_results, function(x) x$estimates["elpd_loo", ]))colnames(loo_df) <-c("Estimate", "SE")loo_df <-as.data.frame(loo_df)# Compute LOO differences with categorical model as referenceloo_diffs <-loo_compare(cat_mod_loo, hier_mod_loo, hier_mod_unequal_var_loo, hier_mod_linear_loo)# Create data frame with differences in the correct orderloo_diffs_df <-data.frame(Model = model_names,elpd_diff =numeric(length(model_names)),se_diff =numeric(length(model_names)))# Fill in differences based on model mappingfor(i inseq_along(model_names)) {if(model_names[i] =="Categorical Logit") { loo_diffs_df$elpd_diff[i] <-0 loo_diffs_df$se_diff[i] <-0 } else { model_key <- model_mapping[model_names[i]] loo_diffs_df$elpd_diff[i] <- loo_diffs[model_key, "elpd_diff"] loo_diffs_df$se_diff[i] <- loo_diffs[model_key, "se_diff"] }}# Create final tableelpd_table <- loo_df %>%mutate(Model = model_names,`ELPD Difference`= loo_diffs_df$elpd_diff,`SE of Difference`= loo_diffs_df$se_diff ) %>% dplyr::select(Model, everything()) %>%gt() %>%fmt_number(columns =c(Estimate, SE, `ELPD Difference`, `SE of Difference`),decimals =1 ) %>%cols_label(Estimate ="ELPD LOO",SE ="SE of ELPD",`ELPD Difference`="ELPD Difference",`SE of Difference`="SE of Difference" ) %>%tab_header(title ="APPENDIX C. Multilevel Model Comparisons Using Leave-One-Out Cross-Validation" ) %>%tab_footnote(footnote ="ELPD LOO: Expected Log Pointwise Predictive Density",locations =cells_column_labels(columns = Estimate) ) %>%tab_footnote(footnote ="Difference compared to category-specific logit model",locations =cells_column_labels(columns =`ELPD Difference`) )elpd_table
APPENDIX C. Multilevel Model Comparisons Using Leave-One-Out Cross-Validation
Model
ELPD LOO1
SE of ELPD
ELPD Difference2
SE of Difference
Cumulative Probit
−15,263.0
47.0
−84.8
14.7
Cumulative Probit w/Unequal Variances
−15,262.8
47.0
−84.7
14.7
Categorical Logit
−15,178.2
47.9
0.0
0.0
Linear
−17,182.0
56.6
−2,003.8
50.4
1 ELPD LOO: Expected Log Pointwise Predictive Density
2 Difference compared to category-specific logit model
16 APP-A: Replication data
This code chunk saves four pre-processed replication datasets to use with the tutorial code provided in the Quarto file containing Appendix 3 code.
Show code
# Setup for data loading and processinglibrary(here)library(haven)library(tidyverse)library(datawizard) # For row_means# --- Data Loading and Preprocessing from Original Quarto File ---# Load original datadat <-read_dta(file =here("Data", "feardata(12).dta"))# Select relevant columnsdat <- dat %>% dplyr::select(caseid, race, Q2_1, Q2_2, Q2_3, Q2_4, Q2_5, Q2_6, Q2_7, Q2_8, Q2_9, Q2_10)# Zap Stata formats and labelsdat <-zap_formats(dat)dat <-zap_labels(dat)dat <-zap_label(dat)# Filter for Black and White participants and re-factor 'race'dat <- dat %>% dplyr::filter(race %in%c(1,2)) %>%mutate(caseid =as_factor(caseid),race =as_factor(if_else(race ==2, "Black", "White")) )# Recode fear items (0=very unafraid to 4=very afraid)fearpol_old <-c(paste0("Q2_", 1:10))fearpol_new <-c("stopyou", "searchyou", "yellyou", "handcyou","kickyou", "pinyou", "sprayyou", "taseyou","shootyou", "killyou")dat <- dat %>%mutate(across(all_of(fearpol_old), ~5- .x, .names ="{.col}_new") %>%setNames(fearpol_new) )# Drop old fear columns and remove NAdat <- dat %>% dplyr::select(-all_of(fearpol_old)) %>%drop_na()# Create ordered factor versions for ordinal modelsdat <- dat %>%mutate(across(all_of(fearpol_new),~factor(., levels =0:4, ordered =TRUE),.names ="{.col}_ord"))# Create long format data for multilevel modelsdat_long <- dat %>%pivot_longer(all_of(fearpol_new),names_to ="outcome",values_to ="fear_numeric"# Keep as numeric for linear MLM ) %>%mutate(fear =factor(fear_numeric, levels =0:4, ordered =TRUE), # Ordered factor for CP MLMfear_categorical =factor(fear_numeric, levels =0:4, ordered =FALSE), # Unordered factor for categorical logitoutcome =factor(outcome) # Factor for random effects grouping )# Create the normed fearpol scale for linear replicationdat <- dat %>%mutate(fearpol = datawizard::row_means(pick(all_of(fearpol_new))),fearpol_norm = (fearpol -min(fearpol, na.rm =TRUE)) / (max(fearpol, na.rm =TRUE) -min(fearpol, na.rm =TRUE)) *100 )# --- Save Replication Data for Appendix 3 ---# Ensure the 'Data' directory exists within 'here()'if (!dir.exists(here("Data"))) {dir.create(here("Data"))}# Save the main 'dat' object (with fearpol_norm and ordered factors)save(dat, file =here("Data", "replication_data.RData"))# Save the 'dat_long' object (for multilevel models)save(dat_long, file =here("Data", "replication_data_long.RData"))message("Replication datasets 'replication_data.RData' and 'replication_data_long.RData' saved successfully in the 'Data' folder.")