Primary Supplement for ‘When the Mean is Meaningless’

Authors

Jonathan R. Brauer

Jacob C. Day

Published

July 30, 2025

1 Introduction

This file is the online supplement for the paper “When the Mean is Misleading: A Guide to Ordered Regression for Meaningfully Modeling Ordinal Outcomes” (Jonathan R. Brauer & Jacob C. Day). Our primary aim with this paper is to show how and why to use ordered regression techniques like cumulative probit regression when analyzing ordinal outcomes (and perhaps count and continuous outcome too!). We illustrate via simulations and then by applying these methods in an analysis of data from Pickett, Graham, and Cullen’s 2022 study reported in Criminology in “The American racial divide in fear of the police” and shared as part of their supplemental materials on OSF.

This primary supplement contains all R code and output reported in the manuscript as well as other supplementary materials. In addition to this primary supplement, we have also provided a standalone tutorial in a separate Quarto file (Appendix A) that provides a step-by-step guide to our progression from frequentist and Bayesian versions of linear, to ordinal, to multilevel ordinal models.

1.1 Why this paper?

We picked Pickett, Graham, & Cullen’s paper for several reasons. First, we wanted to illustrate these methods with data on an important topic. That way, if the tutorial’s analysis improves our understanding at all, even if slightly, then our methods paper might also help us better understand a topic about that a lot of people care about.

Second, we wanted to analyze publicly available data. We searched a LOT of criminology papers that we thought would be good candidates and usually came up empty at this juncture. So, THANK YOU to Pickett, Graham, and Cullen for practicing open science and making this analysis possible!

Third, we wanted to pick a paper that we thought was already well done - that is, a paper that did a good job of carefully and thoroughly applying traditional tools to extract rich, meaningful signals from data. In doing so, it would make our job easier as we would already have a good sense of signals in the data and we could avoid taking too many methodological detours. This approach might also help readers better assess what, if anything, might be gained by spending valuable time learning new methods compared to traditional practices published in our top journals.

Finally, our main goal was to find data for a study in which researchers analyze Likert-type ordinal survey items as outcomes and combine them into composite scales so we could illustrate how ordinal methods might help us better understand signals in our data compared to traditional approaches in this common scenario. The methods can be - and we think often should be - applied to other types of data, including rare count outcomes (e.g., self-reported crime) and even continuous outcomes with thousands of levels! (See Frank Harrell, e.g. here and here).

2 Load libraries & settings

I will start by loading various libraries that I often use for such projects.

Show code
# Packages I need
library(here)
library(haven)
library(tidyverse)
library(gt)
library(janitor)
library(vtable)
library(modelsummary)
library(easystats)
library(marginaleffects)
library(ggplot2)
library(GGally)
library(ggeffects)
library(ggdist)
library(patchwork)
library(parallelly)
library(bayesplot)
library(ggdist)
library(correlation)
library(sjmisc)
library(forcats)
library(ggtext)
library(ggmosaic)
library(gtExtras)
library(tidybayes)
library(scales)
library(purrr)
library(viridis)
library(loo)
library(broom)  

#need MASS for frequentist polr but it masks common dplyr functions 
#use explicit dplyr::select() and dplyr::filter calls 
# or don't load MASS & call MASS::polr() instead 
library(MASS)  

library(extrafont)
library(stringr)
library(ragg)

# to use brms, do following:

# install rtools 4.4 

# install rstan
# https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started

# test ability to install packages from sources using Rtools: 
# install.packages("Rcpp", type="source")
  # it works

# remove any existing rstan installs
# remove.packages("rstan")
# if (file.exists(".RData")) file.remove(".RData")

# run the next line if you already have rstan installed
# remove.packages(c("StanHeaders", "rstan"))

# install rstan
# install.packages("rstan", repos = c('https://stan-dev.r-universe.dev', getOption("repos")))

# test rstan install
# example(stan_model, package = "rstan", run.dontrun = TRUE)
  # it works 

library(rstan)

# install cmdstanr (alternative brms backend for interfacing w/stan)
# install.packages("cmdstanr", repos = c('https://stan-dev.r-universe.dev', getOption("repos")))

# newest developmental version
# install.packages("remotes")
# remotes::install_github("stan-dev/cmdstanr", force=TRUE)

# Sys.getenv("RTOOLS44_HOME")
# Sys.getenv("PATH")
# readLines("~/.Renviron")
  # had to delete old .Reviron fie pointing to Rtools40 from Documents folder 

# if prefer cmdstanr backend to brms, then: 
# cmdstanr::install_cmdstan()
# cmdstanr::check_cmdstan_toolchain(fix = TRUE)

# library(cmdstanr)


library(rstanarm)
library(brms)

print("T/F: Root 'here()' folder contains subfolder 'Models'")
[1] "T/F: Root 'here()' folder contains subfolder 'Models'"
Show code
#check for Models folder to save figures, create if one does not exist
ifelse(dir.exists(here("Models")), TRUE, dir.create(here("Models")))
[1] TRUE

I will also set some default settings.

Show code
#Fonts
# First time setup (run this once in console, not in your document):
# Import and register fonts (only needed once per system)
# Uncomment if you haven't imported fonts before
# extrafont::font_import()

# Load the fonts into the R session
extrafont::loadfonts(device = "win", quiet = TRUE)  # For Windows
# extrafont::loadfonts(device = "pdf", quiet = TRUE)  # For PDF output
# extrafont::loadfonts(device = "postscript", quiet = TRUE)  # For PostScript output

# If you only imported Times New Roman, load this instead: 
# Load the specific font you need for this session
# if(!("Times New Roman" %in% windowsFonts())) {
#   windowsFonts(`Times New Roman` = windowsFont("Times New Roman"))
# }

#To improve font rendering in viewer:
# Configure knitr to use the ragg device which handles fonts better
knitr::opts_chunk$set(
  dev = "ragg_png",
  dpi = 300
)

# Set the default theme for all ggplot2 plots
theme_set(theme_minimal(base_family = "Times New Roman"))


#set default knit & scientific notation options
knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE, fig.align="center")
options(scipen = 999, digits = 2) #set global option to two decimals 

# identify & set # of available cores
  # replaced parallel::detectCores()
  # https://www.jottr.org/2022/12/05/avoid-detectcores/
nCoresphys <- parallelly::availableCores(logical = FALSE) 

options(mc.cores = parallelly::availableCores(logical=FALSE))
#options(mc.cores = 4, logical=FALSE)
  #alt: can specify exact number of cores desired here &/or in models/ppchecks   

color_scheme_set("viridisA")
# https://mc-stan.org/bayesplot/reference/bayesplot-colors.html

3 Simulation: Figure 1

Before getting into real data, let’s illustrate the underlying issue using a simulation.

I will simulate five different ordinal distributions with distinct shapes. Think of them as five groups, with N=250 observations per distributional group (N=1,250 total).

Show code
# Set seed for reproducibility
set.seed(1138)

# Number of observations per group
n <- 250

# Function to simulate ordinal data based on probabilities
simulate_ordinal <- function(probs, n) {
  sample(1:length(probs), size = n, replace = TRUE, prob = probs)
}

# Distribution 0: Symmetric Normal (peak at 3)
probs_symmetric <- c(0.10, 0.20, 0.40, 0.20, 0.10)  # Symmetric around 3
data_symmetric <- simulate_ordinal(probs_symmetric, n)

# Original distributions
probs_uniform <- rep(0.2, 5)
data_uniform <- simulate_ordinal(probs_uniform, n)

probs_bimodal <- c(0.3, 0.1333, 0.1333, 0.1333, 0.3)
data_bimodal <- simulate_ordinal(probs_bimodal, n)

probs_skew_left <- c(0.10, 0.20, 0.15, 0.25, 0.30)
data_skew_left <- simulate_ordinal(probs_skew_left, n)

probs_skew_right <- c(0.30, 0.25, 0.15, 0.20, 0.10)
data_skew_right <- simulate_ordinal(probs_skew_right, n)

# Combine data into a data frame
df <- data.frame(
  group = factor(rep(c("1. Symmetric Normal", "2. Uniform", "3. Bimodal", 
                      "4. Skewed Left", "5. Skewed Right"), 
                    each = n),
                levels = c("1. Symmetric Normal", "2. Uniform", "3. Bimodal", 
                          "4. Skewed Left", "5. Skewed Right")),
  response = c(data_symmetric, data_uniform, data_bimodal, data_skew_left, data_skew_right)
)

# Function to get normal curve points for each group separately
get_normal_curve <- function(data, group_name, x_range = seq(0.5, 5.5, length.out = 100)) {
  mean_val <- mean(data)
  sd_val <- sd(data)
  y_vals <- dnorm(x_range, mean = mean_val, sd = sd_val)
  scaling_factor <- (max(table(data))/length(data)) / max(y_vals)
  data.frame(
    x = x_range, 
    y = y_vals * scaling_factor * 100,
    group = group_name
  )
}

# Function to get ordered probit predicted probabilities with CIs
get_probit_probs <- function(data, group_name, n_boot = 1000) {
  model <- polr(ordered(response) ~ 1, data = data.frame(response = data), method = "probit")
  probs <- predict(model, newdata = data.frame(response = 1), type = "probs")
  
  boot_probs <- matrix(nrow = n_boot, ncol = 5)
  for(i in 1:n_boot) {
    boot_data <- sample(data, replace = TRUE)
    boot_model <- try(polr(ordered(boot_data) ~ 1, method = "probit"), silent = TRUE)
    if(!inherits(boot_model, "try-error")) {
      boot_probs[i,] <- predict(boot_model, newdata = data.frame(response = 1), type = "probs")
    }
  }
  
  ci_lower <- apply(boot_probs, 2, quantile, 0.025, na.rm = TRUE)
  ci_upper <- apply(boot_probs, 2, quantile, 0.975, na.rm = TRUE)
  
  data.frame(
    x = 1:5,
    y = as.numeric(probs) * 100,
    lower = ci_lower * 100,
    upper = ci_upper * 100,
    group = group_name
  )
}

# Get curves for each group
curves_df <- bind_rows(
  get_normal_curve(data_symmetric, "1. Symmetric Normal"),
  get_normal_curve(data_uniform, "2. Uniform"),
  get_normal_curve(data_bimodal, "3. Bimodal"),
  get_normal_curve(data_skew_left, "4. Skewed Left"),
  get_normal_curve(data_skew_right, "5. Skewed Right")
)

# Get probit probabilities with CIs for each group
probit_df <- bind_rows(
  get_probit_probs(data_symmetric, "1. Symmetric Normal"),
  get_probit_probs(data_uniform, "2. Uniform"),
  get_probit_probs(data_bimodal, "3. Bimodal"),
  get_probit_probs(data_skew_left, "4. Skewed Left"),
  get_probit_probs(data_skew_right, "5. Skewed Right")
)


# Update the stats_df creation to prepend A. through E.
stats_df <- df %>%
  group_by(group) %>%
  summarise(
    mean = round(mean(response), 2),
    sd = round(sd(response), 2)
  ) %>%
  mutate(
    # Create new labels with A. or B. etc. prepended
    new_label = case_when(
      group == "1. Symmetric Normal" ~ paste0("A. Normal-like\n(mean=", mean, ", sd=", sd, ")"),
      TRUE ~ paste0(
        letters[match(group, unique(group)) + 0] %>% toupper(), 
        ". ", 
        str_remove(group, "^\\d+\\.\\s+"), 
        "\n(mean=", mean, ", sd=", sd, ")"
      )
    )
  )


# Create two separate plots

# Plot 1: Normal-approx
p1 <- ggplot(df %>% dplyr::filter(group == "1. Symmetric Normal"), aes(x = response)) +
  geom_bar(aes(y = after_stat(prop)), fill = "#FEC488FF", color = "#FEC488FF") +
  geom_line(data = curves_df %>% dplyr::filter(group == "1. Symmetric Normal") %>% mutate(y = y/100), 
            aes(x = x, y = y), color = "#341069FF", linewidth = 1.5) +
  geom_pointrange(data = probit_df %>% dplyr::filter(group == "1. Symmetric Normal") %>% 
                   mutate(y = y/100, lower = lower/100, upper = upper/100), 
                 aes(x = x, y = y, ymin = lower, ymax = upper),
                 color = "#D3436EFF", size = 1.5, linewidth = 1, fatten = 2) +
  facet_wrap(~group, ncol = 1, labeller = as_labeller(setNames(stats_df$new_label, stats_df$group))) +
  scale_x_continuous(breaks = 1:5) +
  scale_y_continuous(limits = c(0, 0.45), 
                    expand = expansion(mult = c(0.02, 0.08))) +
  labs(x = "Response Category", y = "Proportion") +
  theme_minimal() +
  theme(
        text = element_text(family = "serif"),
        axis.title.x = element_text(size=12),  
        axis.title.y = element_text(size=12),  
        axis.text.x = element_text(size=12),  
        axis.text.y = element_text(size=12),  
        strip.text = element_text(hjust = 0, size=14), 
        panel.grid = element_blank()
        )

# Plot 2: Other distributions in 2x2 grid
p2 <- ggplot(df %>% dplyr::filter(group != "1. Symmetric Normal"), aes(x = response)) +
  geom_bar(aes(y = after_stat(prop)), fill = "#FEC488FF", color = "#FEC488FF") +
  geom_line(data = curves_df %>% dplyr::filter(group != "1. Symmetric Normal") %>% mutate(y = y/100), 
            aes(x = x, y = y), color = "#341069FF", linewidth = 1.5) +
  geom_pointrange(data = probit_df %>% dplyr::filter(group != "1. Symmetric Normal") %>% 
                   mutate(y = y/100, lower = lower/100, upper = upper/100), 
                 aes(x = x, y = y, ymin = lower, ymax = upper),
                 color = "#D3436EFF", size = 1.5, linewidth = 1, fatten = 2) +
  facet_wrap(~group, ncol = 2, labeller = as_labeller(setNames(stats_df$new_label, stats_df$group))) +
  scale_x_continuous(breaks = 1:5) +
  scale_y_continuous(limits = c(0, 0.45), 
                    expand = expansion(mult = c(0.02, 0.08))) +
  labs(x = "Response Category", y = "") +
  theme_minimal() +
  theme(
        text = element_text(family = "serif"),
        axis.title.x = element_text(size=12),  
        axis.title.y = element_text(size=12),  
        axis.text.x = element_text(size=12),  
        axis.text.y = element_text(size=12),  
        strip.text = element_text(hjust = 0, size=14), 
        panel.grid = element_blank())

# Combine plots using patchwork
# combined_plot <- p1 + p2 + 
#   plot_layout(widths = c(1, 2)) +
#   plot_annotation(
#     title = "FIGURE1. Comparing Metric vs. Ordinal View of Data",
#     subtitle = "Data: Light bars show observed (simulated) proportions\nMetric view: Dark line shows linear model-implied normal curve\nOrdinal view: Point-intervals show ordered probit model-implied predicted probabilities & 95% CI",
#     theme = theme(
#       plot.title = element_text(hjust = 0.5),
#       plot.subtitle = element_text(hjust = 0)
#     )
#   )


# Create spacers for above/below normal-like
top_spacer <- plot_spacer()
bottom_spacer <- plot_spacer()

# Combine plots w/custom layout
combined_plot <- top_spacer + p2 + 
                 p1 + bottom_spacer + 
  plot_layout(
    design = "
    AB
    CB
    DB
    ",
    heights = c(1, 2, 1), 
    widths = c(1, 2)
  ) +
  plot_annotation(
    title = "FIGURE1. Comparing Metric vs. Ordinal View of Data",
    subtitle = "Data: Light bars show observed (simulated) proportions\nMetric view: Dark line shows linear model-implied normal curve\nOrdinal view: Point-intervals show ordered probit model-implied predicted probabilities & 95% CI",
    theme = theme(
      plot.title = element_text(hjust = 0.5),
      plot.subtitle = element_text(hjust = 0)
    )
  )

3.1 FIG1

Show code
combined_plot

Clearly, these distributions are very different. However, a linear model would predict no mean differences between three of these groups (A, B, C). Meanwhile, although a linear model might detect differences between the two skewed distributions (D vs. E) with a large enough sample, the model-implied differences on the mean scale are hard to interpret, and predicted differences on response probability scale may depart drastically from observed differences in the data. For example, the linear model “view” only sees about ~1 response category difference between D & E, with means implying the typical observation in D is a 3 or 4 and the typical observation in E is a 2 or 3. Yet, the majority of observations in D were in categories 4 & 5 (not 3 & 4), whereas the majority of observations in E were in 1 & 2 (not 2 & 3)!

Before moving on, it might help to visualize the latent response thresholds.

Show code
# Define the threshold annotation function
add_probit_thresholds <- function(plot) {
  plot +
    # Add vertical lines at each response category boundary
    geom_vline(xintercept = 1:4 + 0.5, linetype = "dashed", color = "gray60", alpha = 0.5) +
    annotate("text", x = 2.5, y = 0.4, 
             label = "Cumulative\nThresholds", 
             color = "gray50", 
             hjust = 0.5)
}

# Modify both p1 and p2 to add threshold labels
p1 <- ggplot(df %>% dplyr::filter(group == "1. Symmetric Normal"), aes(x = response)) +
  geom_bar(aes(y = after_stat(prop)), fill = "#FEC488FF", color = "#FEC488FF") +
  geom_line(data = curves_df %>% dplyr::filter(group == "1. Symmetric Normal") %>% mutate(y = y/100), 
            aes(x = x, y = y), color = "#341069FF", linewidth = 1.5) +
  geom_pointrange(data = probit_df %>% dplyr::filter(group == "1. Symmetric Normal") %>% 
                   mutate(y = y/100, lower = lower/100, upper = upper/100), 
                 aes(x = x, y = y, ymin = lower, ymax = upper),
                 color = "#D3436EFF", size = 1.5, linewidth = 1, fatten = 2) +
  # Add vertical threshold lines with tau labels
  geom_vline(xintercept = 1:4 + 0.5, linetype = "dashed", color = "gray60", alpha = 0.5) +
  annotate("text", x=1.65, y=0.42, label = "tau[1]", color = "gray50", 
           parse = TRUE, size=6) +
  annotate("text", x=2.65, y=0.42, label = "tau[2]", color = "gray50", 
           parse = TRUE, size=6) +
  annotate("text", x=3.65, y=0.42, label = "tau[3]", color = "gray50", 
           parse = TRUE, , size=6) +
  annotate("text", x=4.65, y=0.42, label = "tau[4]", color = "gray50", 
           parse = TRUE, , size=6) +
  facet_wrap(~group, ncol = 1, labeller = as_labeller(setNames(stats_df$new_label, stats_df$group))) +
  scale_x_continuous(breaks = 1:5) +
  scale_y_continuous(limits = c(0, 0.45), 
                    expand = expansion(mult = c(0.02, 0.08))) +
  labs(x = "Response Category", y = "Proportion") +
  theme_minimal() +
  theme(
        text = element_text(family = "serif"),
        strip.text = element_text(hjust = 0, size=14), 
        panel.grid = element_blank(), 
        axis.text = element_text(size=14),
        axis.title = element_text(size=14)
        )

p2 <- ggplot(df %>% dplyr::filter(group != "1. Symmetric Normal"), aes(x = response)) +
  geom_bar(aes(y = after_stat(prop)), fill = "#FEC488FF", color = "#FEC488FF") +
  geom_line(data = curves_df %>% dplyr::filter(group != "1. Symmetric Normal") %>% mutate(y = y/100), 
            aes(x = x, y = y), color = "#341069FF", linewidth = 1.5) +
  geom_pointrange(data = probit_df %>% dplyr::filter(group != "1. Symmetric Normal") %>% 
                   mutate(y = y/100, lower = lower/100, upper = upper/100), 
                 aes(x = x, y = y, ymin = lower, ymax = upper),
                 color = "#D3436EFF", size = 1.5, linewidth = 1, fatten = 2) +
  # Add vertical threshold lines with tau labels
  geom_vline(xintercept = 1:4 + 0.5, linetype = "dashed", color = "gray60", alpha = 0.5) +
  annotate("text", x=1.65, y=0.42, label = "tau[1]", color = "gray50", 
           parse = TRUE, size=6) +
  annotate("text", x=2.65, y=0.42, label = "tau[2]", color = "gray50", 
           parse = TRUE, size=6) +
  annotate("text", x=3.65, y=0.42, label = "tau[3]", color = "gray50", 
           parse = TRUE, , size=6) +
  annotate("text", x=4.65, y=0.42, label = "tau[4]", color = "gray50", 
           parse = TRUE, , size=6) +
facet_wrap(~group, ncol = 2, labeller = as_labeller(setNames(stats_df$new_label, stats_df$group))) +
  scale_x_continuous(breaks = 1:5) +
  scale_y_continuous(limits = c(0, 0.45), 
                    expand = expansion(mult = c(0.02, 0.08))) +
  labs(x = "Response Category", y = "") +
  theme_minimal() +
  theme(
        text = element_text(family = "serif"),
        strip.text = element_text(hjust = 0.5, size=14), 
        panel.grid = element_blank(), 
        axis.text = element_text(size=14),
        axis.title = element_text(size=14)
)

combined_plot <- p1 + p2 + 
  plot_layout(widths = c(1, 2)) +
  plot_annotation(
    title = "Comparing Metric vs. Ordinal View of Data",
    subtitle = "Data: Light bars show observed (simulated) proportions\nMetric view: Dark line shows linear model-implied normal curve\nOrdinal view: Point-intervals show ordered probit model predicted probabilities & 95% CI\n                      Dashed lines represent category thresholds in latent space",
    theme = theme(
      plot.title = element_text(hjust = 0.5),
      plot.subtitle = element_text(hjust = 0))
  )

combined_plot

Following Kurz’s brms adaptation of Kruschke, we can also visualize the latent thresholds by fitting Bayesian threshold-only models and then plotting draws from the posterior distribution.

Show code
# Function to fit and save brms model for each distribution
fit_brms_model <- function(data, group_name) {
  # Prepare the data
  my_data <- data.frame(response = data)
  
  # Fit a Bayesian cumulative probit model
  fit <- brm(
    data = my_data,
    family = cumulative(probit),
    response ~ 1,
    prior = prior(normal(0, 4), class = Intercept),
    iter = 3000, warmup = 1000, chains = 4, cores = 4,
    seed = 123,
    file = paste0("Models/sim_fit_", group_name),
    file_refit = "on_change"
  )
  
  return(fit)
}

# Fit models for each distribution
fit_symmetric <- fit_brms_model(data_symmetric, "symmetric")
fit_uniform <- fit_brms_model(data_uniform, "uniform")
fit_bimodal <- fit_brms_model(data_bimodal, "bimodal")
fit_skew_left <- fit_brms_model(data_skew_left, "skew_left")
fit_skew_right <- fit_brms_model(data_skew_right, "skew_right")

# Create a list of fits for easy reference
fits_list <- list(
  symmetric = fit_symmetric,
  uniform = fit_uniform,
  bimodal = fit_bimodal,
  skew_left = fit_skew_left,
  skew_right = fit_skew_right
)

# Function to create threshold plot
create_threshold_plot <- function(fit, group_name) {
  # Extract draws 
  draws <- as_draws_df(fit)
  
  # Compute the posterior means for each threshold
  means <- draws %>% 
    summarise(across(starts_with("b_Intercept"), mean)) %>% 
    pivot_longer(everything(), names_to = "name", values_to = "mean")
  
  # Prepare the data with explicit threshold calculation
  plot_data <- draws %>% 
    pivot_longer(starts_with("b_Intercept"), names_to = "name", values_to = "threshold") %>% 
    group_by(.draw) %>% 
    # Calculate theta_bar as the mean of all thresholds for this draw
    mutate(theta_bar = mean(threshold))
  
  # Create the plot
  ggplot(plot_data, aes(x = threshold, y = theta_bar, color = name)) +
    # Vertical lines at mean thresholds
    geom_vline(data = means,
               aes(xintercept = mean, color = name),
               linetype = 2) +
    # Points with low alpha to show density
    geom_point(alpha = 1/10) +
    scale_color_viridis_d(option = "magma", begin = 0.2, end = 0.8) +
    # Dynamic x-axis limits based on actual threshold values
    scale_x_continuous(
      breaks = means$mean,
      labels = c("τ1", "τ2", "τ3", "τ4")
    ) +
    labs(title = group_name,
         y = "Mean of Thresholds per Draw",
         x = "Latent Thresholds") +
    theme_minimal() +
    theme(
      legend.position = "none", 
      axis.text.y = element_blank(), 
      axis.ticks.y = element_blank()
    )
}
# Create threshold plots for each model
threshold_plots <- list(
  create_threshold_plot(fit_symmetric, "A. Normal-like"),
  create_threshold_plot(fit_uniform, "B. Uniform"),
  create_threshold_plot(fit_bimodal, "C. Bimodal"),
  create_threshold_plot(fit_skew_left, "D. Skewed Left"),
  create_threshold_plot(fit_skew_right, "E. Skewed Right")
)

# Combine the threshold plots
combined_threshold_plot <- wrap_plots(threshold_plots, ncol = 3)
combined_threshold_plot

4 Supp. Sim. S1a: Distributional Change w/Ideological Polarization

This simulation demonstrates a scenario where linear and ordinal models yield different substantive conclusions despite zero overall average treatment effects. We model a situation where a highly publicized police incident causes polarized reactions that vary by political ideology, illustrating how distributional changes can be obscured when focusing solely on average effects.

The key insight: Even when ordinal data appears “continuous-like” with symmetric distributions across many categories, the underlying categorical social processes can create meaningful distributional shifts that the cumulative ordinal model captures more accurately compared to the linear model - even with explicit proportional odds assumption violation.

4.1 Simulation Design

Show code
set.seed(1138)
n_per_group <- 1000

# Create symmetric baseline distributions that look "normal enough" for OLS
# This mimics what researchers see when they think linear regression should be fine
generate_baseline <- function(n, ideology) {
  # Both groups start with nearly identical, symmetric 7-point distributions
  if(ideology == "Conservative") {
    probs <- c(0.06, 0.11, 0.21, 0.28, 0.19, 0.10, 0.05)
  } else {
    probs <- c(0.05, 0.10, 0.19, 0.28, 0.21, 0.11, 0.06)
  }
  
  responses <- sample(1:7, size = n, replace = TRUE, prob = probs)
  data.frame(
    id = 1:n,
    ideology = ideology,
    time = 0,
    response = responses
  )
}

baseline_data <- rbind(
  generate_baseline(n_per_group, "Conservative"),
  generate_baseline(n_per_group, "Liberal")
)
Show code
# Generate post-incident responses with BALANCED group-specific polarization
generate_post_incident <- function(baseline_df, shock_intensity = 0.75) {
  post_data <- baseline_df
  post_data$time <- 1
  
  for(i in 1:nrow(post_data)) {
    if(runif(1) < shock_intensity) {
      current_response <- post_data$response[i]
      ideology <- post_data$ideology[i]
      
      # BALANCED polarization: same proportions, opposite directions
      if(ideology == "Conservative") {
        # Conservatives: 80% move toward "unafraid", 20% toward "afraid"
        move_toward_unafraid <- runif(1) < 0.80
      } else {
        # Liberals: 20% move toward "unafraid", 80% toward "afraid"  
        move_toward_unafraid <- runif(1) < 0.20
      }
      
      if(move_toward_unafraid) {
        # Push toward extreme "unafraid" categories (1-2)
        if(current_response >= 5) {
          post_data$response[i] <- ifelse(runif(1) < 0.7, 1, 2)
        } else if(current_response == 4) {
          post_data$response[i] <- ifelse(runif(1) < 0.6, 1, 2)
        } else {
          post_data$response[i] <- 1
        }
      } else {
        # Push toward extreme "afraid" categories (6-7)
        if(current_response <= 3) {
          post_data$response[i] <- ifelse(runif(1) < 0.7, 7, 6)
        } else if(current_response == 4) {
          post_data$response[i] <- ifelse(runif(1) < 0.6, 7, 6)
        } else {
          post_data$response[i] <- 7
        }
      }
    }
  }
  return(post_data)
}

post_data <- generate_post_incident(baseline_data, shock_intensity = 0.75)

# Combine datasets
sim_data <- rbind(baseline_data, post_data) %>%
  mutate(
    ideology = factor(ideology, levels = c("Conservative", "Liberal")),
    response_ord = factor(response, levels = 1:7, ordered = TRUE)
  )

4.2 Descriptive Analysis

Show code
# Calculate group-specific means
means_summary <- sim_data %>%
  group_by(ideology, time) %>%
  summarise(mean_response = mean(response), .groups = "drop") %>%
  pivot_wider(names_from = time, values_from = mean_response, names_prefix = "time_") %>%
  mutate(change = time_1 - time_0)

print(means_summary)
# A tibble: 2 × 4
  ideology     time_0 time_1 change
  <fct>         <dbl>  <dbl>  <dbl>
1 Conservative   3.88   2.71  -1.17
2 Liberal        4.07   5.25   1.18
Show code
# Overall treatment effect
overall_ate <- sim_data %>%
  group_by(time) %>%
  summarise(mean_resp = mean(response), .groups = "drop") %>%
  summarise(ate = diff(mean_resp)) %>%
  pull(ate)

cat("Overall Average Treatment Effect:", round(overall_ate, 3), "\n")
Overall Average Treatment Effect: 0.005 
Show code
# Show distributional changes by group
dist_summary <- sim_data %>%
  group_by(ideology, time, response) %>%
  summarise(count = n(), .groups = "drop") %>%
  group_by(ideology, time) %>%
  mutate(proportion = count / sum(count)) %>%
  ungroup()

# Create distribution comparison table
dist_table <- dist_summary %>%
  dplyr::select(ideology, time, response, proportion) %>%
  pivot_wider(names_from = c(ideology, time), values_from = proportion, values_fill = 0) %>%
  mutate(
    conservative_change = Conservative_1 - Conservative_0,
    liberal_change = Liberal_1 - Liberal_0
  )

print(round(dist_table, 3))
# A tibble: 7 × 7
  response Conservative_0 Conservative_1 Liberal_0 Liberal_1 conservative_change
     <dbl>          <dbl>          <dbl>     <dbl>     <dbl>               <dbl>
1        1          0.054          0.488     0.045     0.131               0.434
2        2          0.118          0.155     0.096     0.062               0.037
3        3          0.228          0.054     0.191     0.041              -0.174
4        4          0.278          0.08      0.294     0.071              -0.198
5        5          0.189          0.043     0.22      0.066              -0.146
6        6          0.079          0.045     0.095     0.143              -0.034
7        7          0.054          0.135     0.059     0.486               0.081
# ℹ 1 more variable: liberal_change <dbl>

4.3 Visualization

Show code
# Create visualization showing distributional changes by group
plot_data <- dist_summary %>%
  mutate(
    time_label = factor(time, levels = c(0, 1), labels = c("Pre-Incident", "Post-Incident")),
    response_label = factor(response, levels = 1:7, 
                           labels = c("Very\nUnafraid", "Unafraid", "Somewhat\nUnafraid", 
                                    "Neutral", "Somewhat\nAfraid", "Afraid", "Very\nAfraid"))
  )

polarization_viz <- ggplot(plot_data, aes(x = response_label, y = proportion, fill = time_label)) +
  geom_col(position = "dodge", alpha = 0.8, color = "white", linewidth = 0.3) +
  facet_wrap(~ideology, labeller = label_both) +
  scale_fill_manual(values = c("Pre-Incident" = "#FD9B6BFF", "Post-Incident" = "#782281FF"),
                   name = "Period") +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1),
                    expand = expansion(mult = c(0, 0.05))) +
  labs(
    title = "Ideological Polarization Following Police Incident",
    subtitle = paste0("Overall ATE = ", round(overall_ate, 3), 
                     " | Conservative Δ = ", round(means_summary$change[means_summary$ideology == "Conservative"], 3),
                     " | Liberal Δ = ", round(means_summary$change[means_summary$ideology == "Liberal"], 3)),
    x = "Fear of Police Response",
    y = "Proportion of Responses"
  ) +
  theme_minimal(base_family = "serif") +
  theme(
    plot.title = element_text(hjust = 0.5, size = 13, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 11),
    axis.text = element_text(size = 10),
    axis.title = element_text(size = 11),
    legend.position = "bottom",
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_blank(),
    strip.text = element_text(size = 11, face = "bold")
  )

polarization_viz

4.4 Model Comparison

Show code
# Check if cached models exist
linear_model_path <- here("Models", "polarization_linear_model.rds")
ordinal_model_path <- here("Models", "polarization_ordinal_model.rds")

if (file.exists(linear_model_path)) {
  linear_model <- readRDS(linear_model_path)
} else {
  # Fit linear model (using numeric version for proper LOO comparison)
  linear_model <- brm(
    response ~ time * ideology,
    data = sim_data,
    family = gaussian(),
    prior = c(
      prior(normal(4, 1), class = "Intercept"),
      prior(normal(0, 0.5), class = "b"),
      prior(exponential(1), class = "sigma")
    ),
    chains = 4, iter = 2000, warmup = 1000,
    cores = 4, seed = 1138, refresh = 0
  )
  saveRDS(linear_model, linear_model_path)
}

if (file.exists(ordinal_model_path)) {
  ordinal_model <- readRDS(ordinal_model_path)
} else {
  # Fit ordinal model (using same numeric outcome for LOO comparison)
  ordinal_model <- brm(
    response ~ time * ideology,
    data = sim_data,
    family = cumulative("probit"),
    prior = c(
      prior(normal(-1.28, 1), class = "Intercept", coef = 1),
      prior(normal(-0.67, 1), class = "Intercept", coef = 2),
      prior(normal(-0.25, 1), class = "Intercept", coef = 3),
      prior(normal(0.25, 1), class = "Intercept", coef = 4),
      prior(normal(0.67, 1), class = "Intercept", coef = 5),
      prior(normal(1.28, 1), class = "Intercept", coef = 6),
      prior(normal(0, 0.5), class = "b")
    ),
    chains = 4, iter = 2000, warmup = 1000,
    cores = 4, seed = 1138, refresh = 0
  )
  saveRDS(ordinal_model, ordinal_model_path)
}

# Extract key effects
linear_effects <- fixef(linear_model)
ordinal_effects <- fixef(ordinal_model)

# Compare main effects and interactions
effect_comparison <- tibble(
  Effect = c("Time (Main)", "Ideology (Liberal)", "Time × Ideology Interaction"),
  Linear_Est = round(c(linear_effects["time", "Estimate"],
                      linear_effects["ideologyLiberal", "Estimate"], 
                      linear_effects["time:ideologyLiberal", "Estimate"]), 3),
  Linear_CI = paste0("[", round(c(linear_effects["time", "Q2.5"],
                                 linear_effects["ideologyLiberal", "Q2.5"],
                                 linear_effects["time:ideologyLiberal", "Q2.5"]), 3), ", ",
                    round(c(linear_effects["time", "Q97.5"],
                           linear_effects["ideologyLiberal", "Q97.5"], 
                           linear_effects["time:ideologyLiberal", "Q97.5"]), 3), "]"),
  Ordinal_Est = round(c(ordinal_effects["time", "Estimate"],
                       ordinal_effects["ideologyLiberal", "Estimate"],
                       ordinal_effects["time:ideologyLiberal", "Estimate"]), 3),
  Ordinal_CI = paste0("[", round(c(ordinal_effects["time", "Q2.5"],
                                  ordinal_effects["ideologyLiberal", "Q2.5"],
                                  ordinal_effects["time:ideologyLiberal", "Q2.5"]), 3), ", ",
                     round(c(ordinal_effects["time", "Q97.5"],
                            ordinal_effects["ideologyLiberal", "Q97.5"],
                            ordinal_effects["time:ideologyLiberal", "Q97.5"]), 3), "]")
)

gt(effect_comparison) %>%
  tab_header(
    title = "Model Comparison: Linear vs. Ordinal Estimates",
    subtitle = "Coefficients and 95% credible intervals"
  ) %>%
  cols_label(
    Linear_Est = "Estimate",
    Linear_CI = "95% CrI",
    Ordinal_Est = "Estimate", 
    Ordinal_CI = "95% CrI"
  ) %>%
  tab_spanner(label = "Linear Model", columns = c(Linear_Est, Linear_CI)) %>%
  tab_spanner(label = "Ordinal Model", columns = c(Ordinal_Est, Ordinal_CI)) %>%
  tab_style(style = cell_text(weight = "bold"), locations = cells_column_labels()) %>%
  tab_style(style = cell_text(align = "center"), 
           locations = cells_body(columns = c(Linear_Est, Linear_CI, Ordinal_Est, Ordinal_CI)))
Model Comparison: Linear vs. Ordinal Estimates
Coefficients and 95% credible intervals
Effect
Linear Model
Ordinal Model
Estimate 95% CrI Estimate 95% CrI
Time (Main) -1.08 [-1.238, -0.923] -0.69 [-0.789, -0.602]
Ideology (Liberal) 0.26 [0.1, 0.419] 0.10 [0.01, 0.191]
Time × Ideology Interaction 2.20 [1.983, 2.413] 1.37 [1.235, 1.504]

4.5 Predicted Probabilities: Revealing the Polarization

Show code
# Generate predictions for all ideology × time combinations
newdata <- expand_grid(
  ideology = c("Conservative", "Liberal"),
  time = c(0, 1)
)

# Get predicted probabilities from ordinal model
ordinal_preds <- posterior_epred(ordinal_model, newdata = newdata, re_formula = NA)

# Calculate summary statistics
pred_summary <- expand_grid(
  ideology = c("Conservative", "Liberal"),
  time = c(0, 1),
  category = 1:7
) %>%
  mutate(
    row_idx = case_when(
      ideology == "Conservative" & time == 0 ~ 1,
      ideology == "Conservative" & time == 1 ~ 2,
      ideology == "Liberal" & time == 0 ~ 3,
      ideology == "Liberal" & time == 1 ~ 4
    ),
    prob = map2_dbl(row_idx, category, ~mean(ordinal_preds[, .x, .y])),
    lower = map2_dbl(row_idx, category, ~quantile(ordinal_preds[, .x, .y], 0.025)),
    upper = map2_dbl(row_idx, category, ~quantile(ordinal_preds[, .x, .y], 0.975)),
    time_label = factor(time, levels = c(0, 1), labels = c("Pre-Incident", "Post-Incident")),
    category_label = factor(category, levels = 1:7,
                           labels = c("Very\nUnafraid", "Unafraid", "Somewhat\nUnafraid", 
                                    "Neutral", "Somewhat\nAfraid", "Afraid", "Very\nAfraid"))
  )

# Plot predicted probabilities
prob_plot <- pred_summary %>%
  ggplot(aes(x = category_label, y = prob, color = time_label, shape = time_label)) +
  geom_point(size = 2.5, position = position_dodge(width = 0.4)) +
  geom_errorbar(aes(ymin = lower, ymax = upper), 
                width = 0.15, position = position_dodge(width = 0.4)) +
  facet_wrap(~ideology, labeller = label_both) +
  scale_color_manual(values = c("Pre-Incident" = "#FD9B6BFF", "Post-Incident" = "#782281FF"),
                    name = "Period") +
  scale_shape_manual(values = c("Pre-Incident" = 16, "Post-Incident" = 17),
                    name = "Period") +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(
    title = "Ordinal Model Predicted Probabilities",
    subtitle = "Ideology-specific polarization patterns revealed through category probabilities",
    x = "Fear Response Category",
    y = "Predicted Probability"
  ) +
  theme_minimal(base_family = "serif") +
  theme(
    plot.title = element_text(hjust = 0.5, size = 13, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 11),
    axis.text = element_text(size = 10),
    axis.title = element_text(size = 11),
    legend.position = "bottom",
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_blank(),
    strip.text = element_text(size = 11, face = "bold")
  )

prob_plot

4.6 Model Fit Assessment

Show code
# Compare predictive performance
loo_linear <- loo(linear_model)
loo_ordinal <- loo(ordinal_model)

# Manual comparison since loo_compare might have issues
elpd_diff <- loo_ordinal$estimates["elpd_loo", "Estimate"] - loo_linear$estimates["elpd_loo", "Estimate"]
se_diff <- sqrt(loo_ordinal$estimates["elpd_loo", "SE"]^2 + loo_linear$estimates["elpd_loo", "SE"]^2)

cat("ELPD Linear:", round(loo_linear$estimates["elpd_loo", "Estimate"], 1), "\n")
ELPD Linear: -8190 
Show code
cat("ELPD Ordinal:", round(loo_ordinal$estimates["elpd_loo", "Estimate"], 1), "\n")
ELPD Ordinal: -7229 
Show code
cat("ELPD Difference (Ordinal - Linear):", round(elpd_diff, 1), "\n")
ELPD Difference (Ordinal - Linear): 962 
Show code
cat("Standard Error:", round(se_diff, 1), "\n")
Standard Error: 58 
Show code
cat("Difference in standard errors:", round(abs(elpd_diff)/se_diff, 1), "\n")
Difference in standard errors: 17 
Show code
if(abs(elpd_diff) > 2 * se_diff) {
  cat("\n*** SUBSTANTIAL EVIDENCE favoring", ifelse(elpd_diff > 0, "ORDINAL", "LINEAR"), "model ***\n")
} else {
  cat("\nNo strong evidence favoring either model\n")
}

*** SUBSTANTIAL EVIDENCE favoring ORDINAL model ***
Show code
# Additional diagnostics
cat("\nDetailed LOO comparison:\n")

Detailed LOO comparison:
Show code
loo_comp <- loo_compare(loo_linear, loo_ordinal)
print(loo_comp)
              elpd_diff se_diff
ordinal_model    0.0       0.0 
linear_model  -961.5      36.7 
Show code
# Check if models are actually different by comparing log-likelihoods
ll_linear <- log_lik(linear_model)
ll_ordinal <- log_lik(ordinal_model)

cat("\nMean log-likelihood (linear):", round(mean(rowSums(ll_linear)), 2), "\n")

Mean log-likelihood (linear): -8188 
Show code
cat("Mean log-likelihood (ordinal):", round(mean(rowSums(ll_ordinal)), 2), "\n")
Mean log-likelihood (ordinal): -7224 
Show code
# Alternative: Compare WAIC
waic_linear <- waic(linear_model)
waic_ordinal <- waic(ordinal_model)
waic_comp <- loo_compare(waic_linear, waic_ordinal)

cat("\nWAIC comparison:\n")

WAIC comparison:
Show code
print(waic_comp)
              elpd_diff se_diff
ordinal_model    0.0       0.0 
linear_model  -961.5      36.7 
Show code
# Check actual prediction accuracy using posterior predictive
# Generate predictions for held-out test set approach
set.seed(42)
test_indices <- sample(nrow(sim_data), size = 200)
test_data <- sim_data[test_indices, ]
train_data <- sim_data[-test_indices, ]

# Calculate prediction accuracy manually
linear_preds <- posterior_predict(linear_model, newdata = test_data)
ordinal_preds <- posterior_predict(ordinal_model, newdata = test_data)

# Round linear predictions to nearest integer for comparison
linear_preds_rounded <- round(linear_preds)
linear_preds_rounded[linear_preds_rounded < 1] <- 1
linear_preds_rounded[linear_preds_rounded > 7] <- 7

# Calculate accuracy
linear_accuracy <- mean(apply(linear_preds_rounded, 1, function(x) mean(x == test_data$response)))
ordinal_accuracy <- mean(apply(ordinal_preds, 1, function(x) mean(x == test_data$response)))

cat("\nPrediction accuracy on test set:\n")

Prediction accuracy on test set:
Show code
cat("Linear model:", round(linear_accuracy, 3), "\n")
Linear model: 0.17 
Show code
cat("Ordinal model:", round(ordinal_accuracy, 3), "\n")
Ordinal model: 0.19 
Show code
# Posterior predictive checks
pp_linear <- pp_check(linear_model, ndraws = 50) + 
  ggtitle("Linear Model: Posterior Predictive Check") +
  labs(subtitle = "Model assumes continuous normal distribution") +
  theme_minimal(base_family = "serif") +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"))

pp_ordinal <- pp_check(ordinal_model, ndraws = 50, type = "bars") +
  ggtitle("Ordinal Model: Posterior Predictive Check") +
  labs(subtitle = "Model captures discrete categorical structure") +
  theme_minimal(base_family = "serif") +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"))

# Combine pp checks
pp_combined <- pp_linear / pp_ordinal
pp_combined

4.7 The Critical Difference: Predicted Probabilities

While both models show similar coefficient patterns, they make fundamentally different predictions about response probabilities. This reveals why focusing solely on coefficient interpretation can be misleading.

Show code
# Generate predictions from both models for post-incident period
newdata_post <- expand_grid(
  ideology = c("Conservative", "Liberal"),
  time = 1  # Post-incident only
)

# Get predictions from linear model (need to convert to probabilities)
linear_pred_draws <- posterior_epred(linear_model, newdata = newdata_post, re_formula = NA)

# Convert linear predictions to "probabilities" by assuming normal distribution
# and calculating probability of falling in each category range
convert_linear_to_probs <- function(linear_preds, categories = 1:7) {
  n_draws <- nrow(linear_preds)
  n_conditions <- ncol(linear_preds)
  
  # Initialize probability array
  prob_array <- array(dim = c(n_draws, n_conditions, length(categories)))
  
  for(draw in 1:n_draws) {
    for(cond in 1:n_conditions) {
      mean_pred <- linear_preds[draw, cond]
      # Get residual SD from model using modern syntax
      sigma <- as_draws_df(linear_model)$sigma[draw]
      
      # Calculate probability of each category
      for(cat in 1:length(categories)) {
        if(cat == 1) {
          prob_array[draw, cond, cat] <- pnorm(1.5, mean_pred, sigma)
        } else if(cat == length(categories)) {
          prob_array[draw, cond, cat] <- 1 - pnorm(cat - 0.5, mean_pred, sigma)
        } else {
          prob_array[draw, cond, cat] <- pnorm(cat + 0.5, mean_pred, sigma) - 
                                       pnorm(cat - 0.5, mean_pred, sigma)
        }
      }
    }
  }
  return(prob_array)
}

# Get ordinal model predictions
ordinal_pred_draws <- posterior_epred(ordinal_model, newdata = newdata_post, re_formula = NA)

# Convert linear predictions to probabilities
linear_probs <- convert_linear_to_probs(linear_pred_draws)

# Calculate observed proportions for post-incident period
observed_post <- sim_data %>%
  dplyr::filter(time == 1) %>%
  group_by(ideology, response) %>%
  summarise(count = n(), .groups = "drop") %>%
  group_by(ideology) %>%
  mutate(proportion = count / sum(count))

# Create comparison dataset
prob_comparison <- expand_grid(
  ideology = c("Conservative", "Liberal"),
  category = 1:7
) %>%
  mutate(
    condition_idx = case_when(
      ideology == "Conservative" ~ 1,
      ideology == "Liberal" ~ 2
    ),
    # Linear model predictions
    linear_prob = map2_dbl(condition_idx, category, 
                          ~mean(linear_probs[, .x, .y])),
    linear_lower = map2_dbl(condition_idx, category, 
                           ~quantile(linear_probs[, .x, .y], 0.025)),
    linear_upper = map2_dbl(condition_idx, category, 
                           ~quantile(linear_probs[, .x, .y], 0.975)),
    # Ordinal model predictions
    ordinal_prob = map2_dbl(condition_idx, category, 
                           ~mean(ordinal_pred_draws[, .x, .y])),
    ordinal_lower = map2_dbl(condition_idx, category, 
                            ~quantile(ordinal_pred_draws[, .x, .y], 0.025)),
    ordinal_upper = map2_dbl(condition_idx, category, 
                            ~quantile(ordinal_pred_draws[, .x, .y], 0.975)),
    category_label = factor(category, levels = 1:7,
                           labels = c("Very\nUnafraid", "Unafraid", "Somewhat\nUnafraid", 
                                    "Neutral", "Somewhat\nAfraid", "Afraid", "Very\nAfraid"))
  ) %>%
  # Add observed proportions
  left_join(
    observed_post %>% dplyr::select(ideology, response, proportion),
    by = c("ideology", "category" = "response")
  ) %>%
  # Reshape for plotting
  pivot_longer(
    cols = c(linear_prob, ordinal_prob, proportion),
    names_to = "type", 
    values_to = "probability"
  ) %>%
  mutate(
    lower = case_when(
      type == "linear_prob" ~ linear_lower,
      type == "ordinal_prob" ~ ordinal_lower,
      type == "proportion" ~ NA_real_
    ),
    upper = case_when(
      type == "linear_prob" ~ linear_upper,
      type == "ordinal_prob" ~ ordinal_upper,
      type == "proportion" ~ NA_real_
    ),
    type_label = factor(type, 
                       levels = c("proportion", "linear_prob", "ordinal_prob"),
                       labels = c("Observed", "Linear Model", "Ordinal Model"))
  )

# Plot model comparison with colorblind-friendly palette
model_comparison_plot <- prob_comparison %>%
  ggplot(aes(x = category_label, y = probability, color = type_label, shape = type_label)) +
  geom_point(size = 2.5, position = position_dodge(width = 0.4)) +
  geom_errorbar(aes(ymin = lower, ymax = upper), 
                width = 0.15, position = position_dodge(width = 0.4),
                na.rm = TRUE) +
  geom_line(aes(group = type_label), position = position_dodge(width = 0.4), alpha = 0.7) +
  facet_wrap(~ideology, labeller = label_both) +
  scale_color_manual(values = c("Observed" = "#000000", "Linear Model" = "#D55E00", "Ordinal Model" = "#0072B2"),
                    name = "") +
  scale_shape_manual(values = c("Observed" = 17, "Linear Model" = 15, "Ordinal Model" = 16),
                    name = "") +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(
    title = "Model Accuracy: Observed vs. Predicted Probabilities",
    subtitle = "Ordinal model captures bimodal reality; linear model predicts impossible intermediate peaks",
    x = "Fear Response Category",
    y = "Probability"
  ) +
  theme_minimal(base_family = "serif") +
  theme(
    plot.title = element_text(hjust = 0.5, size = 13, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 11),
    axis.text = element_text(size = 10),
    axis.title = element_text(size = 11),
    legend.position = "bottom",
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_blank(),
    strip.text = element_text(size = 11, face = "bold")
  )

model_comparison_plot

4.8 Combined Visualization

Show code
# Panel A: Overall symmetric distribution of response variable
overall_dist <- sim_data %>%
  dplyr::filter(time == 0) %>%  # Pre-incident baseline
  group_by(response) %>%
  summarise(count = n(), .groups = "drop") %>%
  mutate(
    proportion = count / sum(count),
    response_label = factor(response, levels = 1:7,
                       labels = c("1\nVery\nUnafraid", "2\nUnafraid",
                                  "3\nSomewhat\nUnafraid", "4\nNeutral",
                                  "5\nSomewhat\nAfraid", "6\nAfraid",
                                  "7\nVery\nAfraid"))
  )

panel_a <- overall_dist %>%
  ggplot(aes(x = response_label, y = proportion)) +
  geom_col(fill = "#756bb1", alpha = 0.8, color = "white", linewidth = 0.3) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1),
                    expand = expansion(mult = c(0, 0.05))) +
  labs(
    title = "A. Baseline Distribution",
    subtitle = "Symmetric 7-category response distribution",
    x = "Fear Response Category",
    y = "Proportion"
  ) +
  theme_minimal(base_family = "serif") +
  theme(
    plot.title = element_text(hjust = 0.5, size = 11, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 10),
    axis.text = element_text(size = 9),
    axis.title = element_text(size = 10),
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_blank()
  )

# Panel B: Use the original plot_data that should have response_label defined
# If plot_data doesn't have proper labels, recreate it here
plot_data_fixed <- dist_summary %>%
  mutate(
    time_label = factor(time, levels = c(0, 1), labels = c("Pre-Incident", "Post-Incident")),
    response_label = factor(response, levels = 1:7, 
                       labels = c("1", "2", "3", "4", "5", "6", "7"))
  )

panel_b <- plot_data_fixed %>%
  ggplot(aes(x = response_label, y = proportion, fill = time_label)) +
  geom_col(position = "dodge", alpha = 0.8, color = "white", linewidth = 0.3) +
  facet_wrap(~ideology, labeller = label_both) +
  scale_fill_manual(values = c("Pre-Incident" = "#FD9B6BFF", "Post-Incident" = "#782281FF"),
                   name = "Period") +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1),
                    expand = expansion(mult = c(0, 0.05))) +
  labs(
    title = "B. Post-Incident Polarization by Ideology",
    subtitle = paste0("Overall ATE = ", round(overall_ate, 3)),
    x = "Fear of Police Response",
    y = "Proportion of Responses"
  ) +
  theme_minimal(base_family = "serif") +
  theme(
    plot.title = element_text(hjust = 0.5, size = 11, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 10),
    axis.text = element_text(size = 9),
    axis.title = element_text(size = 10),
    legend.position = "bottom",
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_blank(),
    strip.text = element_text(size = 11, face = "bold")
  )

# Panel C1: Linear model pp check
panel_c1 <- pp_check(linear_model, ndraws = 50) + 
  ggtitle("C1. Linear Model PP Check") +
  theme_minimal(base_family = "serif") +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", size = 10),
    axis.text = element_text(size = 8),
    axis.title = element_text(size = 9)
  )

# Panel C2: Ordinal model pp check
panel_c2 <- pp_check(ordinal_model, ndraws = 50, type = "bars") +
  ggtitle("C2. Ordinal Model PP Check") +
  theme_minimal(base_family = "serif") +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", size = 10),
    axis.text = element_text(size = 8),
    axis.title = element_text(size = 9)
  )

# Panel D: Create comparison data with proper factor labels preserved
newdata_post <- expand_grid(
  ideology = c("Conservative", "Liberal"),
  time = 1
)

linear_pred_draws <- posterior_epred(linear_model, newdata = newdata_post, re_formula = NA)

convert_linear_to_probs <- function(linear_preds, categories = 1:7) {
  n_draws <- nrow(linear_preds)
  n_conditions <- ncol(linear_preds)
  prob_array <- array(dim = c(n_draws, n_conditions, length(categories)))
  
  for(draw in 1:n_draws) {
    for(cond in 1:n_conditions) {
      mean_pred <- linear_preds[draw, cond]
      sigma <- as_draws_df(linear_model)$sigma[draw]
      
      for(cat in 1:length(categories)) {
        if(cat == 1) {
          prob_array[draw, cond, cat] <- pnorm(1.5, mean_pred, sigma)
        } else if(cat == length(categories)) {
          prob_array[draw, cond, cat] <- 1 - pnorm(cat - 0.5, mean_pred, sigma)
        } else {
          prob_array[draw, cond, cat] <- pnorm(cat + 0.5, mean_pred, sigma) - 
                                       pnorm(cat - 0.5, mean_pred, sigma)
        }
      }
    }
  }
  return(prob_array)
}

ordinal_pred_draws <- posterior_epred(ordinal_model, newdata = newdata_post, re_formula = NA)
linear_probs <- convert_linear_to_probs(linear_pred_draws)

observed_post <- sim_data %>%
  filter(time == 1) %>%
  group_by(ideology, response) %>%
  summarise(count = n(), .groups = "drop") %>%
  group_by(ideology) %>%
  mutate(proportion = count / sum(count))

# Create comparison dataset with explicit factor labels
prob_comparison_fixed <- expand_grid(
  ideology = c("Conservative", "Liberal"),
  category = 1:7
) %>%
  mutate(
    condition_idx = case_when(
      ideology == "Conservative" ~ 1,
      ideology == "Liberal" ~ 2
    ),
    linear_prob = map2_dbl(condition_idx, category, ~mean(linear_probs[, .x, .y])),
    ordinal_prob = map2_dbl(condition_idx, category, ~mean(ordinal_pred_draws[, .x, .y])),
    # Explicitly create factor with labels
    category_label = factor(category, levels = 1:7,
                       labels = c("1", "2", "3", "4", "5", "6", "7"))
    ) %>%
  left_join(
    observed_post %>% dplyr::select(ideology, response, proportion),
    by = c("ideology", "category" = "response")
  ) %>%
  pivot_longer(
    cols = c(linear_prob, ordinal_prob, proportion),
    names_to = "type", 
    values_to = "probability"
  ) %>%
  mutate(
    type_label = factor(type, 
                       levels = c("proportion", "linear_prob", "ordinal_prob"),
                       labels = c("Observed", "Linear Model", "Ordinal Model"))
  )

panel_d <- prob_comparison_fixed %>%
  ggplot(aes(x = category_label, y = probability, color = type_label, shape = type_label)) +
  geom_point(size = 2.5, position = position_dodge(width = 0.4)) +
  geom_line(aes(group = type_label), position = position_dodge(width = 0.4), alpha = 0.7) +
  facet_wrap(~ideology, labeller = label_both) +
  scale_color_manual(values = c("Observed" = "#000000", "Linear Model" = "#D55E00", "Ordinal Model" = "#0072B2"),
                    name = "") +
  scale_shape_manual(values = c("Observed" = 17, "Linear Model" = 15, "Ordinal Model" = 16),
                    name = "") +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(
    title = "D. Model Predictions vs. Observed Data",
    subtitle = "Ordinal model captures bimodal reality",
    x = "Fear Response Category",
    y = "Probability"
  ) +
  theme_minimal(base_family = "serif") +
  theme(
    plot.title = element_text(hjust = 0.5, size = 11, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 10),
    axis.text = element_text(size = 9),
    axis.title = element_text(size = 10),
    legend.position = "bottom",
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_blank(),
    strip.text = element_text(size = 10, face = "bold")
  )

# Create the layout
design <- "
  AB
  AB
  CE
  DE
"

comprehensive_figure <- wrap_plots(
  A = panel_a, 
  B = panel_b,
  C = panel_c1, 
  D = panel_c2,
  E = panel_d,
  design = design,
  heights = c(1, 0.5, 1)
) + 
  plot_annotation(
    title = "Figure S1A. Ordinal Model Superiority Despite Symmetric Baseline Distributions and Assumption Violations", 
    theme = theme(
      plot.title = element_text(hjust = 0.5, size = 12, face = "bold")
      )
    )
  

comprehensive_figure

4.9 Key Insights

This simulation demonstrates several critical points about ordinal modeling in realistic scenarios:

  1. Coefficient Similarity Masks Fundamental Differences: Both linear and cumulative probit models show similar coefficient direction and significance patterns, potentially leading researchers focused on significance testing to miss dramatic differences in substantive predictions about response probabilities.
  2. Ordinal Superiority Despite Assumption Violations: The heterogeneous within-group responses (minority of conservatives moving toward “very afraid,” minority of liberals toward “very unafraid”) explicitly violate the proportional odds assumption, yet the cumulative probit model still vastly outperforms linear regression because it operates within the appropriate categorical framework.
  3. Realistic Social Complexity: Real social phenomena rarely exhibit perfect proportional odds; rather, groups often might be expected show heterogeneous responses with minority segments reacting opposite to the majority. This simulation reflects such realistic complexity while demonstrating that cumulative ordinal models handle these violations better than linear alternatives even when imperfect (and category-specific models would accurately capture even those non-monotonic effects).
  4. The “Looks Normal” Fallacy: Even symmetric 7-category distributions that appear appropriate for linear modeling can undergo categorical polarization processes with threshold-specific effects that linear models cannot represent, regardless of how “continuous” the original data appears.
  5. Predictive vs. Parametric Framework: The cumulative ordinal modeling advantage is not perfect model specification, but using a modeling approach capable of representing categorical response processes. Linear models fundamentally cannot accurately predict complexities in ordinal distributions, such as bimodal categorical outcomes or threshold-specific effects.

Bottom Line: Researchers who focus solely on coefficient direction and statistical significance interpretation would miss that linear models predict impossible intermediate distributions while ordinal models (even when their assumptions are violated) better capture the underlying categorical response patterns. The methodological choice determines whether you conclude “gradual population-wide shift” (linear) versus “severe polarization” (cumulative ordinal) with minority cross-over effects (via category-specific ordinal extension); these represent fundamentally different substantive interpretations of the same social phenomenon.

5 Supp. Sim. S1b: Heterogenous Effects w/in Zero-Inflated Crime Data

5.1 Motivation and Design

The previous simulation (S1a) demonstrated that ordinal and linear models can yield similar average treatment effects while differing in predictive accuracy. However, many criminologists often appear to prioritize coefficient interpretation rather than predictive performance. This supplemental analysis (S1b) addresses a more fundamental concern: scenarios where linear and ordinal models yield different coefficients and substantively different conclusions about the nature and extent of heterogeneous treatment effects, due to their distinct estimands.

We simulate a zero-inflated crime distribution that resembles common criminological outcomes (e.g., self-reported offending, arrest counts). The scenario models heterogeneous treatment effects of an exogenous shock (economic strain) across two subpopulations that operate through completely different social mechanisms yet produce similar conditional means. This design challenges the adequacy of linear modeling frameworks that reduce complex social processes to single mean parameters.

5.2 Mathematical Summary of Data-Generating Process

Baseline: Both groups start with identical zero-inflated distribution Y_i,t=0simtextMultinomial(pi_0)

Post-Shock: Group-specific conditional mechanisms Y_i,t=1=f(Y_i,t=0,G_i,textshock,epsilon_i)

Where f(cdot) represents:

  • High Constraint: Severity effects (existing offenders escalate)
  • Low Constraint: Prevalence effects (abstainers initiate)

Fitted Models (Both Misspecified):

  • Linear: E[Y_it]=beta_0+beta_1G_i+beta_2T_t+beta_3(G_itimesT_t)
  • Ordinal: P(Y_itleqk)=Phi(tau_k[gamma_1G_i+gamma_2T_t+gamma_3(G_itimesT_t)])

Key Insight: True model includes f(Y_i,t1,G_i,T_t,G_itimesY_i,t1timesT_t,epsilon) but fitted models assume simple additive effects. Both are misspecified, but ordinal models better capture distributional consequences of the true heterogeneous mechanisms.

5.3 Theoretical Data-Generating Process

Rather than simply specifying probability distributions, we model the true underlying heterogeneous mechanisms by which an exogenous shock affects different populations. This approach creates a realistic scenario where the data arise from complex social processes that simple Group × Time regression models cannot fully capture, yet linear models will conclude there are “no group differences” because the mechanisms happen to produce similar conditional means.

Show code
# Define simulation data generation functions 

# True latent heterogeneous shock effects function
generate_latent_shock_effects <- function(baseline_response, group, shock_intensity = 0.75) {

  # Group-specific latent parameters reflecting different social mechanisms
  if(group == "High Constraint") {
    # Severity mechanism: Conditional escalation process
    severity_threshold <- 0.6  # Probability of being affected if already offending
    escalation_rate <- 2.0    # How much existing offenders escalate
    initiation_rate <- 0.1    # Low probability of new initiation

    if(baseline_response == 0) {
      # Abstainers: low probability of initiation
      if(runif(1) < shock_intensity * initiation_rate) {
        new_response <- sample(1:2, 1, prob = c(0.7, 0.3))
      } else {
        new_response <- 0
      }
    } else {
      # Existing offenders: high probability of escalation
      if(runif(1) < shock_intensity * severity_threshold) {
        # Escalate proportionally to current level
        escalation <- rpois(1, escalation_rate * baseline_response)
        new_response <- min(6, baseline_response + escalation)
      } else {
        new_response <- baseline_response
      }
    }

  } else {  # Low Constraint
    # Prevalence mechanism: Initiation-focused process
    initiation_threshold <- 0.7  # High probability of initiation among abstainers
    severity_dampening <- 0.2    # Existing offenders less likely to escalate
    entry_level_bias <- 3.0      # New offenders cluster at low levels

    if(baseline_response == 0) {
      # Abstainers: high probability of initiation at low levels
      if(runif(1) < shock_intensity * initiation_threshold) {
        # New offenders cluster at levels 1-3
        new_response <- sample(1:3, 1, prob = c(0.4, 0.35, 0.25))
      } else {
        new_response <- 0
      }
    } else {
      # Existing offenders: dampened escalation (system is "saturated")
      if(runif(1) < shock_intensity * severity_dampening) {
        escalation <- sample(0:2, 1, prob = c(0.6, 0.3, 0.1))
        new_response <- min(6, baseline_response + escalation)
      } else {
        new_response <- baseline_response
      }
    }
  }

  return(new_response)
}


# Generate data using the heterogeneous latent process
generate_heterogeneous_data <- function(n_per_group, shock_intensity = 0.75) {

  # Define shared baseline zero-inflated crime distribution for both groups
  baseline_probs <- c(0.50, 0.25, 0.15, 0.06, 0.03, 0.01, 0.00) # Re-define locally if not global

  # Start with baseline data for both groups
  baseline_data <- rbind(
    tibble(
      id = 1:n_per_group,
      group = "High Constraint",
      time = "Baseline",
      response = sample(0:6, n_per_group, replace = TRUE, prob = baseline_probs)
    ),
    tibble(
      id = 1:n_per_group,
      group = "Low Constraint",
      time = "Baseline",
      response = sample(0:6, n_per_group, replace = TRUE, prob = baseline_probs)
    )
  )

  # Apply latent heterogeneous shock effects
  post_shock_data <- baseline_data %>%
    rowwise() %>%
    mutate(
      post_shock_response = generate_latent_shock_effects(response, group, shock_intensity),
      time = "Post-Shock",
      response = post_shock_response
    ) %>%
    dplyr::select(-post_shock_response)

  # Combine baseline and post-shock
  combined_data <- bind_rows(baseline_data, post_shock_data) %>%
    mutate(
      group = factor(group, levels = c("High Constraint", "Low Constraint")),
      time = factor(time, levels = c("Baseline", "Post-Shock")),
      shock = ifelse(time == "Post-Shock", 1, 0),
      crime_ordered = factor(response, levels = 0:6, ordered = TRUE),
      participant_id = ifelse(group == "High Constraint", id, id + n_per_group)
    )

  return(combined_data)
}


set.seed(1138)
n_per_group <- 1000

# Define shared baseline zero-inflated crime distribution for both groups
baseline_probs <- c(0.50, 0.25, 0.15, 0.06, 0.03, 0.01, 0.00)

# Calculate baseline mean for reference
baseline_mean <- sum((0:6) * baseline_probs)

# Generate the data for single dataset demonstration
# The function 'generate_heterogeneous_data' is now available globally from 'setup-sims'
sim_data <- generate_heterogeneous_data(n_per_group, shock_intensity = 0.75)

cat("True Data-Generating Process:\n")
True Data-Generating Process:
Show code
cat("- High Constraint: Conditional escalation (severity effects among existing offenders)\n")
- High Constraint: Conditional escalation (severity effects among existing offenders)
Show code
cat("- Low Constraint: Initiation-focused (prevalence effects among abstainers)\n")
- Low Constraint: Initiation-focused (prevalence effects among abstainers)
Show code
cat("- Both mechanisms calibrated to produce identical mean changes (~0.58). The 'true interaction' for the linear model's estimand (difference in mean changes) is near zero.\n") 
- Both mechanisms calibrated to produce identical mean changes (~0.58). The 'true interaction' for the linear model's estimand (difference in mean changes) is near zero.
Show code
cat("- Simple Group×Time models are misspecified (missing baseline×group×shock interactions)\n")
- Simple Group×Time models are misspecified (missing baseline×group×shock interactions)
Show code
cat("- Linear models will detect null interaction (similar means = 'no group differences')\n")
- Linear models will detect null interaction (similar means = 'no group differences')
Show code
cat("- Ordinal models may detect non-null interaction (different mechanisms across thresholds, reflecting distributional heterogeneity)\n") 
- Ordinal models may detect non-null interaction (different mechanisms across thresholds, reflecting distributional heterogeneity)

5.4 Verification: Mean Similarity Despite Mechanism Differences

Show code
# Verify that we achieve similar means through different mechanisms
observed_summary <- sim_data %>%
  group_by(group, time) %>%
  summarise(
    n = n(),
    mean_crime = mean(response),
    .groups = "drop"
  ) %>%
  pivot_wider(names_from = time, values_from = mean_crime) %>%
  mutate(change = `Post-Shock` - Baseline)

gt(observed_summary) %>%
  tab_header(title = "Observed Means by Group and Time Period") %>%
  fmt_number(columns = c(Baseline, `Post-Shock`, change), decimals = 3) %>%
  cols_label(
    group = "Constraint Group",
    change = "Shock Effect"
  ) %>%
  tab_footnote(
    footnote = "Similar post-shock means despite different underlying mechanisms",
    locations = cells_column_labels(columns = change)
  )
Observed Means by Group and Time Period
Constraint Group n Baseline Post-Shock Shock Effect1
High Constraint 1000 0.881 1.462 0.581
Low Constraint 1000 0.847 1.428 0.581
1 Similar post-shock means despite different underlying mechanisms
Show code
# Calculate true interaction effect from latent process (for the linear model's estimand)
true_group1_effect <- observed_summary$change[observed_summary$group == "High Constraint"]
true_group2_effect <- observed_summary$change[observed_summary$group == "Low Constraint"]
true_interaction <- true_group2_effect - true_group1_effect # This is the "true" mean-based interaction effect.

cat("\nExpected Model Behavior:\n")

Expected Model Behavior:
Show code
cat("Linear model expectation: Interaction ≈ 0 (correctly detecting identical mean changes – its specific estimand)\n") # Clarified "correctly detecting" for its estimand
Linear model expectation: Interaction ≈ 0 (correctly detecting identical mean changes – its specific estimand)
Show code
cat("Ordinal model expectation: Interaction ≠ 0 (detecting distributional heterogeneity, which is a different estimand)\n") # Clarified "different estimand"
Ordinal model expectation: Interaction ≠ 0 (detecting distributional heterogeneity, which is a different estimand)
Show code
cat("This difference reflects different conceptions of 'treatment heterogeneity' based on chosen estimands.\n") # Refined wording
This difference reflects different conceptions of 'treatment heterogeneity' based on chosen estimands.

5.5 Visualization of Theoretical Distributions

Show code
# Calculate actual post-shock distributions from generated data
actual_distributions <- sim_data %>%
  count(group, time, response) %>%
  group_by(group, time) %>%
  mutate(proportion = n / sum(n)) %>%
  ungroup() %>%
  dplyr::select(group, time, crime_level = response, proportion) %>%
  
  # --- Complete missing combinations with proportion = 0 ---
  # Ensure all combinations of group, time, and crime_level (0-6) exist
  complete(group, time, crime_level = 0:6, fill = list(proportion = 0)) %>%
  # Re-order factors for consistent plotting if needed (though group/time should be factors already)
  mutate(
    group = factor(group, levels = c("High Constraint", "Low Constraint")),
    time = factor(time, levels = c("Baseline", "Post-Shock"))
  )


# Calculate means for each distribution for vertical lines
distribution_means <- actual_distributions %>%
  group_by(group, time) %>%
  summarise(
    # Re-calculate mean using completed proportions, so it's consistent
    mean_crime = sum(crime_level * proportion), 
    .groups = "drop"
  ) %>%
  # Only show means for post-shock (since baseline is de-emphasized)
  filter(time == "Post-Shock")

# Create vertical plot with dodged bars
theoretical_dist_plot <- actual_distributions %>%
  ggplot(aes(x = crime_level, y = proportion, fill = time, alpha = time)) +
  geom_col(position = position_dodge(width = 0.8), width = 0.7) +
  # Add vertical lines at post-shock means
  geom_vline(data = distribution_means,
             aes(xintercept = mean_crime),
             linetype = "dashed",
             color = "black",
             size = 1) +
  # Add mean labels, explicitly setting font family here for geom_text
  geom_text(data = distribution_means, 
            aes(x = mean_crime, y = max(actual_distributions$proportion) * 0.9, # Use 'proportion' as max y from actual_distributions
                label = paste0("Mean = ", round(mean_crime, 2))),
            angle = 0, 
            vjust = 1, 
            hjust = -0.1, 
            size = 3.5,
            color = "black",
            family = "serif") + 
  facet_wrap(~group, ncol = 1, scales = "fixed") +
  scale_fill_manual(
    values = c("Baseline" = "gray60", "Post-Shock" = "#440154"),
    name = "Period"
  ) +
  scale_alpha_manual(
    values = c("Baseline" = 0.3, "Post-Shock" = 0.8),
    guide = "none"  # Don't show alpha in legend
  ) +
  scale_x_continuous(
    name = "Crime Frequency (0 = No Crime, 6 = Frequent Crime)",
    breaks = 0:6,
    labels = 0:6
  ) +
  scale_y_continuous(
    name = "Probability",
    labels = percent_format(accuracy = 1)
  ) +
  theme_minimal() +
  theme(
    text = element_text(family = "serif"), 
    strip.text = element_text(size = 10, face = "bold", family = "serif"), 
    legend.position = "bottom",
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_blank(),
    # axis.title = element_text(size = 10, family = "serif"), 
    axis.title = element_blank(), 
    axis.text = element_text(size = 10, family = "serif"), 
    plot.title = element_text(size = 11, face = "bold"),
    plot.subtitle = element_text(size = 10)
  ) +
  labs(
    title = "A. Pre/Post-Shock Crime Distributions by Group",
    subtitle = "High Constraint: Severity effects (existing offenders escalate)\nLow Constraint: Prevalence effects (abstainers begin offending)")

theoretical_dist_plot

As the figure above illustrates, this simulation demonstrates a scenario where linear and ordinal models should yield fundamentally different conclusions about the nature and extent of treatment heterogeneity despite analyzing identical data. The distributions show that both groups experience identical mean increases in crime (~0.58 units) following the shock, but through completely different social mechanisms: High Constraint groups exhibit severity effects where existing offenders escalate to higher crime levels, while Low Constraint groups exhibit prevalence effects where former abstainers begin engaging in lower-level crime.

The Conceptual Challenge: Linear models, by design, can only detect heterogeneity that manifests as differences in conditional means. When complex social mechanisms happen to produce similar average outcomes, linear models conclude there are “no group differences” - not because there is no heterogeneity, but because their methodological framework cannot conceptualize heterogeneity beyond mean comparisons. This represents a fundamental limitation rather than accurate inference.

The Ordinal Alternative: Ordinal models attempt to capture distributional patterns across multiple response thresholds rather than reducing social processes to single mean parameters. When the underlying mechanisms create different patterns of category probabilities (even with similar means), ordinal models can detect this heterogeneity through threshold-specific effects. The cumulative ordinal model will be imperfect due to proportional odds violations, but it approximates the multi-dimensional nature of the true social processes better than models that assume all meaningful variation occurs in conditional means.

Consequently, we expect linear models to yield interaction estimates near zero (correctly summarizing mean similarity, which is their primary estimand, but missing distributional complexity), while ordinal models should produce non-zero interaction estimates that capture the distinct severity versus prevalence mechanisms through their effect on the latent scale and category probabilities. This illustrates how methodological frameworks shape - and potentially limit - researchers’ ability to detect and understand heterogeneous social processes.

5.6 Model Comparison: Linear vs. Ordinal

Here we fit both linear and ordinal models to our single simulated dataset and compare their coefficient estimates. The linear model treats crime levels as continuous values, while the ordinal model respects the categorical nature of the responses.

Show code
# Fit Bayesian linear and ordinal models with caching
library(rstanarm)
library(brms)

# Set up for parallel processing
options(mc.cores = parallel::detectCores())

# Define cache file paths
linear_model_path <- here("Models", "s1b_linear_bayes.rds")
ordinal_model_path <- here("Models", "s1b_ordinal_bayes.rds")

# Fit Bayesian linear model using rstanarm
if (file.exists(linear_model_path)) {
  linear_model_bayes <- readRDS(linear_model_path)
  cat("Loaded cached Bayesian linear model\n")
} else {
  cat("Fitting Bayesian linear model...\n")
  linear_model_bayes <- stan_glm(
    response ~ group * time,
    data = sim_data,
    family = gaussian(),
    prior_intercept = normal(3, 2),
    prior = normal(0, 1),
    prior_aux = exponential(1),
    chains = 4,
    iter = 2000,
    seed = 1138
  )
  saveRDS(linear_model_bayes, linear_model_path)
  cat("Bayesian linear model fitted and cached\n")
}
Loaded cached Bayesian linear model
Show code
# Fit Bayesian ordinal model using brms
if (file.exists(ordinal_model_path)) {
  ordinal_model_bayes <- readRDS(ordinal_model_path)
  cat("Loaded cached Bayesian ordinal model\n")
} else {
  cat("Fitting Bayesian ordinal model...\n")

  # Prior for ordinal model with equal probability across 7 categories
  ord_prior <- prior(normal(-1.07, 1), class = "Intercept", coef = "1") +
               prior(normal(-0.57, 1), class = "Intercept", coef = "2") +
               prior(normal(-0.18, 1), class = "Intercept", coef = "3") +
               prior(normal(0.18, 1), class = "Intercept", coef = "4") +
               prior(normal(0.57, 1), class = "Intercept", coef = "5") +
               prior(normal(1.07, 1), class = "Intercept", coef = "6") +
               prior(normal(0, 1), class = "b")

  ordinal_model_bayes <- brm(
    crime_ordered ~ group * time,
    data = sim_data,
    family = cumulative("probit"),
    prior = ord_prior,
    chains = 4,
    iter = 2000,
    cores = 4,
    seed = 1138,
    control = list(adapt_delta = 0.95)
  )
  saveRDS(ordinal_model_bayes, ordinal_model_path)
  cat("Bayesian ordinal model fitted and cached\n")
}
Loaded cached Bayesian ordinal model
Show code
# Check convergence
cat("Linear model Rhat values:\n")
Linear model Rhat values:
Show code
linear_rhats <- summary(linear_model_bayes)[, "Rhat"]
problematic_linear <- linear_rhats[linear_rhats > 1.01]
if(length(problematic_linear) > 0) {
  print(problematic_linear)
} else {
  cat("All Rhat values <= 1.01 (good convergence)\n")
}
All Rhat values <= 1.01 (good convergence)
Show code
cat("\nOrdinal model Rhat values:\n")

Ordinal model Rhat values:
Show code
tryCatch({
  ordinal_rhats <- rhat(ordinal_model_bayes)
  problematic_ordinal <- ordinal_rhats[ordinal_rhats > 1.01 & !is.na(ordinal_rhats)]
  if(length(problematic_ordinal) > 0) {
    print(problematic_ordinal)
  } else {
    cat("All Rhat values <= 1.01 (good convergence)\n")
  }
}, error = function(e) {
  cat("Could not extract Rhat values (likely cached model)\n")
})
All Rhat values <= 1.01 (good convergence)
Show code
# Extract coefficients for comparison
library(broom.mixed)  # Need this for rstanarm models

linear_coefs_bayes <- linear_model_bayes %>%
  broom.mixed::tidy(conf.int = TRUE, conf.level = 0.95) %>%
  dplyr::filter(str_detect(term, "group|time")) %>%
  mutate(model = "Linear (Bayesian)") %>%
  dplyr::select(term, estimate, conf.low, conf.high, model)

ordinal_coefs_bayes <- ordinal_model_bayes %>%
  broom::tidy(conf.int = TRUE, conf.level = 0.95) %>%  # brms uses regular broom
  dplyr::filter(str_detect(term, "group|time")) %>%
  mutate(model = "Ordinal (Bayesian)") %>%
  dplyr::select(term, estimate, conf.low, conf.high, model)

# Combine results
model_comparison_bayes <- bind_rows(linear_coefs_bayes, ordinal_coefs_bayes) %>%
  mutate(
    term_clean = case_when(
      str_detect(term, ":") ~ "Group × Time Interaction",
      str_detect(term, "time") ~ "Time (Post vs. Pre)",
      TRUE ~ term
    )
  ) %>%
  dplyr::filter(term_clean %in% c("Time (Post vs. Pre)", "Group × Time Interaction")) %>%
  arrange(term_clean, model)

# Display coefficient comparison
gt(model_comparison_bayes) %>%
  tab_header(title = "Bayesian Model Coefficient Comparison") %>%
  fmt_number(columns = c(estimate, conf.low, conf.high), decimals = 3) %>%
  cols_label(
    term = "Raw Term",
    term_clean = "Effect",
    estimate = "Posterior Mean",
    conf.low = "95% Credible Interval Lower",
    conf.high = "95% Credible Interval Upper",
    model = "Model"
  ) %>%
  tab_footnote(
    footnote = "Bayesian models with weakly informative priors; credible intervals from posterior distributions. Note that coefficients from linear and ordinal models have different estimands: linear coefficients reflect effects on the conditional mean of the observed outcome, while ordinal coefficients reflect effects on an underlying latent continuous variable.", # Added estimand clarification
    locations = cells_title()
  )
Bayesian Model Coefficient Comparison1
Raw Term Posterior Mean 95% Credible Interval Lower 95% Credible Interval Upper Model Effect
groupLow Constraint:timePost-Shock −0.001 −0.174 0.163 Linear (Bayesian) Group × Time Interaction
groupLowConstraint:timePostMShock 0.185 0.051 0.319 Ordinal (Bayesian) Group × Time Interaction
timePost-Shock 0.582 0.468 0.699 Linear (Bayesian) Time (Post vs. Pre)
timePostMShock 0.359 0.263 0.455 Ordinal (Bayesian) Time (Post vs. Pre)
1 Bayesian models with weakly informative priors; credible intervals from posterior distributions. Note that coefficients from linear and ordinal models have different estimands: linear coefficients reflect effects on the conditional mean of the observed outcome, while ordinal coefficients reflect effects on an underlying latent continuous variable.
Show code
# Posterior predictive checks for model validation

# Panel C1: Linear model pp check
panel_c1 <- pp_check(linear_model_bayes, ndraws = 50) +
  ggtitle("C1. Linear Model Posterior Predictive Check") +
  theme_minimal() +
  theme(
    text = element_text(family = "serif"), 
    plot.title = element_text(hjust = 0.5, face = "bold", size = 11),
    axis.text = element_text(size = 10),
    # axis.title = element_text(size = 10)
    axis.title = element_blank(), 
  )

# Panel C2: Ordinal model pp check
panel_c2 <- pp_check(ordinal_model_bayes, ndraws = 50, type = "bars") +
  ggtitle("C2. Ordinal Model Posterior Predictive Check") +
  theme_minimal() +
  theme(
    text = element_text(family = "serif"), 
    plot.title = element_text(hjust = 0.5, face = "bold", size = 11),
    axis.text = element_text(size = 10),
    # axis.title = element_text(size = 10)
    axis.title = element_blank(), 
  )

# Combine the plots
library(patchwork)
ppcheck_combined <- panel_c1 / panel_c2 +
  plot_annotation(
    title = "Posterior Predictive Checks: Model Fit Comparison",
    subtitle = "Dark line = observed data; Light lines = posterior predictions",
    theme = theme(
      text = element_text(family = "serif"), 
      plot.title = element_text(size = 11, face = "bold", hjust = 0.5),
      plot.subtitle = element_text(size = 10, hjust = 0.5)
    )
  )

ppcheck_combined

Show code
# Print summary of model fit
cat("Linear Model Summary:\n")
Linear Model Summary:
Show code
cat("- Assumes continuous normal distribution for residuals around the conditional mean\n") # Clarified assumption
- Assumes continuous normal distribution for residuals around the conditional mean
Show code
cat("- Predictions may extend beyond valid range (0-6) of the observed ordinal scale\n") # Clarified "valid range"
- Predictions may extend beyond valid range (0-6) of the observed ordinal scale
Show code
cat("- Poor fit expected for discrete ordinal data, especially in terms of category-specific probabilities\n\n") # Clarified "poor fit"
- Poor fit expected for discrete ordinal data, especially in terms of category-specific probabilities
Show code
cat("Ordinal Model Summary:\n")
Ordinal Model Summary:
Show code
cat("- Respects discrete nature of responses by modeling an underlying latent variable and thresholds\n") # Clarified
- Respects discrete nature of responses by modeling an underlying latent variable and thresholds
Show code
cat("- Predictions constrained to valid categories (0-6) and inherently provide category-specific probabilities\n") # Clarified
- Predictions constrained to valid categories (0-6) and inherently provide category-specific probabilities
Show code
cat("- Better fit expected for ordinal data, especially in capturing distributional patterns\n") # Clarified
- Better fit expected for ordinal data, especially in capturing distributional patterns

5.7 Interactions Represent Different Conceptions of Treatment Heterogeneity

Linear and ordinal models conceptualize and detect treatment heterogeneity differently based on their respective estimands. The linear model asks: “Do groups differ in mean response to treatment?” The ordinal model asks: “Do groups exhibit different patterns of response across ordinal outcome categories by affecting the underlying latent variable and its thresholds?”

The plot below demonstrates how the cumulative ordinal model uses its increased flexibility (via estimation of multiple ordered thresholds instead of the mean) to better capture differences in the different distributional patterns across groups and time periods, showing predicted probabilities for each crime level. The cumulative ordinal model estimates are not perfect though, since our treatment heterogeneity approach explicitly violated the proportional odds assumption (meaning a category-specific model would be even more appropriate here).

Show code
# Generate predicted probabilities using posterior samples

newdata_grid <- expand_grid(
  group = factor(c("High Constraint", "Low Constraint")),
  time = factor(c("Baseline", "Post-Shock"))
)

# Get posterior predicted probabilities from Bayesian ordinal model
ordinal_preds_bayes <- posterior_epred(ordinal_model_bayes, newdata = newdata_grid, ndraws = 1000)

# Get posterior linear predictions (means, not individual predictions)
linear_pred_means <- posterior_epred(linear_model_bayes, newdata = newdata_grid, ndraws = 1000)

# Get residual SD for converting means to probabilities
sigma_draws <- as_draws_df(linear_model_bayes)$sigma

# Calculate predicted probability summaries for ordinal model
pred_summary_ordinal_bayes <- expand_grid(
  group = c("High Constraint", "Low Constraint"),
  time = c("Baseline", "Post-Shock"),
  category = 0:6
) %>%
  mutate(
    condition_idx = case_when(
      group == "High Constraint" & time == "Baseline" ~ 1,
      group == "High Constraint" & time == "Post-Shock" ~ 2,
      group == "Low Constraint" & time == "Baseline" ~ 3,
      group == "Low Constraint" & time == "Post-Shock" ~ 4
    ),
    # Extract posterior draws for this condition and category
    posterior_draws = map2(condition_idx, category + 1, ~ordinal_preds_bayes[, .x, .y]),
    # Calculate summary statistics from posterior
    pred_prob = map_dbl(posterior_draws, mean),
    pred_lower = map_dbl(posterior_draws, ~quantile(.x, 0.025)),
    pred_upper = map_dbl(posterior_draws, ~quantile(.x, 0.975)),
    model = "Ordinal"
  ) %>%
  dplyr::select(group, time, category, pred_prob, pred_lower, pred_upper, model)

# Convert linear MEANS to category probabilities
pred_summary_linear_bayes <- newdata_grid %>%
  mutate(condition_idx = row_number()) %>%
  crossing(category = 0:6) %>%
  mutate(
    # For each condition/category, calculate probability across posterior means
    category_probs = map2(condition_idx, category, function(cond, cat) {
      # Get linear MEANS for this condition (not individual predictions)
      linear_means <- linear_pred_means[, cond]
      # Use a single representative sigma (mean of posterior)
      sigma_mean <- mean(sigma_draws)
      # Convert means to category probabilities, assuming intervals are centered at integers
      map_dbl(linear_means, function(mu) {
        case_when(
          cat == 0 ~ pnorm(0.5, mean = mu, sd = sigma_mean),
          cat == 6 ~ 1 - pnorm(5.5, mean = mu, sd = sigma_mean),
          TRUE ~ pnorm(cat + 0.5, mean = mu, sd = sigma_mean) - pnorm(cat - 0.5, mean = mu, sd = sigma_mean)
        )
      })
    }),
    # Summarize posterior for this category
    pred_prob = map_dbl(category_probs, mean),
    pred_lower = map_dbl(category_probs, ~quantile(.x, 0.025)),
    pred_upper = map_dbl(category_probs, ~quantile(.x, 0.975)),
    model = "Linear"
  ) %>%
  dplyr::select(group, time, category, pred_prob, pred_lower, pred_upper, model)

# Combine ordinal and linear predictions
all_predictions_bayes <- bind_rows(pred_summary_ordinal_bayes, pred_summary_linear_bayes)

# Calculate observed proportions
observed_props <- sim_data %>%
  count(group, time, response) %>%
  group_by(group, time) %>%
  mutate(observed_prob = n / sum(n)) %>%
  ungroup() %>%
  dplyr::select(group, time, category = response, observed_prob) %>%
  complete(group, time, category = 0:6, fill = list(observed_prob = 0))

# Create predicted probability plot with Bayesian uncertainty
prob_plot_bayes <- ggplot() +
  # Observed data as bars
  geom_col(data = observed_props,
           aes(x = factor(category), y = observed_prob),
           alpha = 0.6, width = 0.6, fill = "gray70") +
  # Model predictions - error bars first, then points on top
  geom_errorbar(data = all_predictions_bayes,
                aes(x = factor(category), ymin = pred_lower, ymax = pred_upper, color = model),
                width = 0.2, position = position_dodge(width = 0.4), size = 0.8) +
  geom_point(data = all_predictions_bayes,
             aes(x = factor(category), y = pred_prob, color = model, shape = model),
             size = 3, position = position_dodge(width = 0.4)) +
  facet_grid(group ~ time, scales = "free_y") +
  scale_color_viridis_d(
    name = "Model Type",
    option = "plasma",
    end = 0.8,
    labels = c("Linear", "Ordinal")
  ) +
  scale_shape_manual(
    name = "Model Type",
    values = c("Linear" = 17, "Ordinal" = 16),
    labels = c("Linear", "Ordinal")
  ) +
  scale_x_discrete(name = "Crime Frequency (0 = No Crime, 6 = Frequent Crime)") +
  scale_y_continuous(name = "Probability", labels = scales::percent_format(accuracy = 1)) +
  theme_minimal() +
  theme(
    text = element_text(family = "serif"), 
    strip.text = element_text(size = 10, face = "bold"),
    legend.position = "bottom",
    panel.grid.minor = element_blank(), 
    axis.title = element_blank(), 
    plot.title = element_text(size = 11, face = "bold"),
    plot.subtitle = element_text(size = 10)
  ) +
  labs(
    title = "B. Bayesian Model Predicted Probabilities vs. Observed Data",
    subtitle = "Gray bars = observed;\nPoints = posterior means with 95% credible intervals"
  )

prob_plot_bayes

5.8 Monte Carlo Simulation Results

Now we move beyond the single dataset demonstration to examine the sampling distribution of interaction effect size estimates across many simulated datasets. This should show us how each modeling approach estimates its respective estimand and how their interaction coefficients diverge, which is important if researchers focus primarily on interpreting sign and significance of coefficients.

Show code
# Monte Carlo simulation with interaction effects, conditional means, & effect sizes

# Set simulation parameters
n_sims <- 1000
set.seed(1138)

# Check if cached results exist
monte_carlo_file <- here("Models", "s1b_monte_carlo_results_enhanced.rds")

if (file.exists(monte_carlo_file)) {
  cat("Loading cached enhanced Monte Carlo results...\n")
  sim_results <- readRDS(monte_carlo_file)
  cat("Loaded results for", max(sim_results$sim_id, na.rm = TRUE), "simulations\n")
} else {
  cat("Running enhanced Monte Carlo simulation...\n")

  # Enhanced storage for results
  sim_results <- tibble(
    sim_id = integer(),
    model_type = character(),
    interaction_estimate = numeric(),
    interaction_se = numeric(),
    interaction_lower = numeric(),
    interaction_upper = numeric(),
    significant = logical(),
    converged = logical(),
    # New columns for conditional means (on observed 0-6 scale)
    high_baseline_mean = numeric(),
    high_postshock_mean = numeric(),
    low_baseline_mean = numeric(),
    low_postshock_mean = numeric(),
    cohens_d = numeric() # This column will now correctly represent the d for each model type
  )

  # Progress tracking
  pb <- progress::progress_bar$new(
    format = "[:bar] :percent :eta",
    total = n_sims,
    clear = FALSE,
    width = 60
  )

  # Run Monte Carlo simulation
  for (sim in 1:n_sims) {
    pb$tick() # Corrected progress bar syntax (removed ->)

    # Generate data using the same latent process
    sim_data_mc <- generate_heterogeneous_data(n_per_group, shock_intensity = 0.75)
    
    # Create prediction grid
    newdata_pred <- expand_grid(
      group = factor(c("High Constraint", "Low Constraint")),
      time = factor(c("Baseline", "Post-Shock"))
    )

    # Fit linear model
    linear_mc <- lm(response ~ group * time, data = sim_data_mc)
    linear_summary <- broom::tidy(linear_mc, conf.int = TRUE)

    # Get linear predictions
    linear_preds <- predict(linear_mc, newdata = newdata_pred)
    
    # Calculate Cohen's d for linear model (pooled overall treatment effect on observed scale)
    pooled_baseline_linear <- mean(linear_preds[c(1,3)])  
    pooled_postshock_linear <- mean(linear_preds[c(2,4)])  
    cohens_d_linear <- (pooled_postshock_linear - pooled_baseline_linear) / sd(sim_data_mc$response, na.rm = TRUE)

    # Extract interaction effect for linear model
    linear_interaction_row <- linear_summary %>%
      dplyr::filter(str_detect(term, "group.*time|time.*group|:"))

    if (nrow(linear_interaction_row) > 0) {
      linear_result <- tibble(
        sim_id = sim,
        model_type = "Linear",
        interaction_estimate = linear_interaction_row$estimate[1],
        interaction_se = linear_interaction_row$std.error[1],
        interaction_lower = linear_interaction_row$conf.low[1],
        interaction_upper = linear_interaction_row$conf.high[1],
        significant = linear_interaction_row$p.value[1] < 0.05,
        converged = TRUE,
        high_baseline_mean = linear_preds[1],
        high_postshock_mean = linear_preds[2],
        low_baseline_mean = linear_preds[3],
        low_postshock_mean = linear_preds[4],
        cohens_d = cohens_d_linear
      )
    } else {
      linear_result <- tibble(
        sim_id = sim,
        model_type = "Linear",
        interaction_estimate = NA_real_,
        interaction_se = NA_real_,
        interaction_lower = NA_real_,
        interaction_upper = NA_real_,
        significant = FALSE,
        converged = FALSE,
        high_baseline_mean = NA_real_,
        high_postshock_mean = NA_real_,
        low_baseline_mean = NA_real_,
        low_postshock_mean = NA_real_,
        cohens_d = NA_real_
      )
    }

    # Fit frequentist ordinal model
    # Use a placeholder to ensure bind_rows always has a full row even if tryCatch fails
    ordinal_result_placeholder <- tibble( 
      sim_id = sim,
      model_type = "Ordinal",
      interaction_estimate = NA_real_,
      interaction_se = NA_real_,
      interaction_lower = NA_real_,
      interaction_upper = NA_real_,
      significant = FALSE,
      converged = FALSE,
      high_baseline_mean = NA_real_,
      high_postshock_mean = NA_real_,
      low_baseline_mean = NA_real_,
      low_postshock_mean = NA_real_,
      cohens_d = NA_real_ 
    )
    ordinal_result <- ordinal_result_placeholder # Initialize with placeholder

    tryCatch({
      ordinal_mc <- ordinal::clm(crime_ordered ~ group * time, data = sim_data_mc)
      ordinal_tidy <- broom::tidy(ordinal_mc, conf.int = TRUE)

      # Get ordinal predictions and convert to expected values
      ordinal_preds_probs <- predict(ordinal_mc, newdata = newdata_pred, type = "prob")$fit
      ordinal_expected_values <- apply(ordinal_preds_probs, 1, function(probs) sum((0:6) * probs))
      
      # Find interaction term for its coefficient
      ordinal_interaction_row <- ordinal_tidy %>%
        dplyr::filter(str_detect(term, "groupLow Constraint.*timePost-Shock|timePost-Shock.*groupLow Constraint"))

      if (nrow(ordinal_interaction_row) > 0) {
        # Construct ordinal_result tibble here, ensuring all columns are assigned
        ordinal_result <- tibble(
          sim_id = sim,
          model_type = "Ordinal",
          interaction_estimate = ordinal_interaction_row$estimate[1],
          interaction_se = ordinal_interaction_row$std.error[1],
          interaction_lower = ordinal_interaction_row$conf.low[1],
          interaction_upper = ordinal_interaction_row$conf.high[1],
          significant = ordinal_interaction_row$p.value[1] < 0.05,
          converged = TRUE,
          high_baseline_mean = ordinal_expected_values[1],
          high_postshock_mean = ordinal_expected_values[2],
          low_baseline_mean = ordinal_expected_values[3],
          low_postshock_mean = ordinal_expected_values[4],
          cohens_d = ordinal_interaction_row$estimate[1] # Latent coef is Cohen's d equivalent
        )
      } else {
        # If interaction_row not found, use placeholder defaults and set converged to FALSE
        ordinal_result <- ordinal_result_placeholder 
        ordinal_result$converged <- FALSE 
      }
    }, error = function(e) {
      cat("Simulation", sim, "ordinal model failed:", e$message, "\n")
      # If error, use placeholder defaults
      ordinal_result <- ordinal_result_placeholder 
      ordinal_result$converged <- FALSE 
    })

    # Store results
    sim_results <- bind_rows(sim_results, linear_result, ordinal_result)
  }

  # Save simulation results
  saveRDS(sim_results, monte_carlo_file)
  cat("Enhanced Monte Carlo simulation complete. Results cached.\n")
}
Loading cached enhanced Monte Carlo results...
Loaded results for 1000 simulations
Show code
# Display convergence summary
convergence_summary <- sim_results %>%
  group_by(model_type) %>%
  summarise(
    n_sims = n(),
    n_converged = sum(converged, na.rm = TRUE),
    convergence_rate = mean(converged, na.rm = TRUE),
    .groups = "drop"
  )

gt(convergence_summary) %>%
  tab_header(title = "Model Convergence Summary") %>%
  fmt_percent(columns = convergence_rate, decimals = 1) %>%
  cols_label(
    model_type = "Model Type",
    n_sims = "Simulations",
    n_converged = "Converged",
    convergence_rate = "Convergence Rate"
  )
Model Convergence Summary
Model Type Simulations Converged Convergence Rate
Linear 1000 1000 100.0%
Ordinal 1000 1000 100.0%

5.9 Monte Carlo Results: Sampling Distributions

This section examines how well each modeling approach recovers its respective estimand across repeated sampling. The key question: do linear and ordinal models systematically differ in their estimates of treatment heterogeneity, given their distinct conceptualizations of the underlying social processes?

Show code
# Filter to successfully converged models
converged_results <- sim_results %>%
  dplyr::filter(converged == TRUE, !is.na(interaction_estimate))

# Calculate summary statistics for interaction estimates
interaction_summary <- converged_results %>%
  group_by(model_type) %>%
  summarise(
    n = n(),
    mean_estimate = mean(interaction_estimate, na.rm = TRUE),
    median_estimate = median(interaction_estimate, na.rm = TRUE),
    sd_estimate = sd(interaction_estimate, na.rm = TRUE),
    proportion_significant = mean(significant, na.rm = TRUE), # FIXED: Use 'significant' not 'interaction_significant'
    q25 = quantile(interaction_estimate, 0.25, na.rm = TRUE),
    q75 = quantile(interaction_estimate, 0.75, na.rm = TRUE),
    .groups = "drop"
  )

gt(interaction_summary) %>%
  tab_header(
    title = "Interaction Effect Sampling Distribution Summary",
    subtitle = paste0("Based on ", n_sims, " Monte Carlo simulations")
  ) %>%
  fmt_number(columns = c(mean_estimate, median_estimate, sd_estimate, q25, q75), decimals = 4) %>%
  fmt_percent(columns = proportion_significant, decimals = 1) %>%
  cols_label(
    model_type = "Model Type",
    n = "N Converged",
    mean_estimate = "Mean",
    median_estimate = "Median",
    sd_estimate = "SD",
    proportion_significant = "% Significant",
    q25 = "Q25",
    q75 = "Q75"
  ) %>%
  tab_footnote(
    footnote = "Interaction represents group × time effect. For the linear model, this is the effect on the conditional mean. For the ordinal model, it is the effect on the latent scale, reflecting distributional shifts.",
    locations = cells_column_labels(columns = mean_estimate)
  )
Interaction Effect Sampling Distribution Summary
Based on 1000 Monte Carlo simulations
Model Type N Converged Mean1 Median SD % Significant Q25 Q75
Linear 1000 −0.0746 −0.0740 0.0469 1.8% −0.1070 −0.0440
Ordinal 1000 0.4065 0.4076 0.0587 99.9% 0.3672 0.4454
1 Interaction represents group × time effect. For the linear model, this is the effect on the conditional mean. For the ordinal model, it is the effect on the latent scale, reflecting distributional shifts.
Show code
# Reference the mean changes from our latent process (the "true" linear estimand)
cat("Mean Changes from Latent Process (True Linear Estimand):\n")
Mean Changes from Latent Process (True Linear Estimand):
Show code
cat("High Constraint change:", round(true_group1_effect, 4), "\n")
High Constraint change: 0.58 
Show code
cat("Low Constraint change:", round(true_group2_effect, 4), "\n")
Low Constraint change: 0.58 
Show code
cat("Difference in mean changes (True Linear Interaction):", round(true_interaction, 4), "\n\n")
Difference in mean changes (True Linear Interaction): 0 
Show code
cat("Expected vs Observed Model Behavior:\n")
Expected vs Observed Model Behavior:
Show code
cat("- Linear models should estimate interaction ≈", round(true_interaction, 4), " (unbiased for its estimand)\n")
- Linear models should estimate interaction ≈ 0  (unbiased for its estimand)
Show code
cat("- Ordinal models may estimate interaction ≠", round(true_interaction, 4), " because they quantify a different aspect of heterogeneity (latent scale shift)\n")
- Ordinal models may estimate interaction ≠ 0  because they quantify a different aspect of heterogeneity (latent scale shift)
Show code
cat("- This highlights that depending on the research question, different estimands and models may be more appropriate for capturing complex social processes.\n\n")
- This highlights that depending on the research question, different estimands and models may be more appropriate for capturing complex social processes.
Show code
# Test if model types differ significantly in their estimates
if (nrow(converged_results) > 0) {
  linear_estimates <- converged_results %>%
    dplyr::filter(model_type == "Linear") %>%
    pull(interaction_estimate)

  ordinal_estimates <- converged_results %>%
    dplyr::filter(model_type == "Ordinal") %>%
    pull(interaction_estimate)

  # Welch's t-test for difference in means
  if (length(linear_estimates) > 1 & length(ordinal_estimates) > 1) {
    estimate_comparison <- t.test(ordinal_estimates, linear_estimates)

    cat("Difference between model estimates (Ordinal - Linear):\n")
    cat("Mean difference:",
        round(estimate_comparison$estimate[1] - estimate_comparison$estimate[2], 4), "\n")
    cat("95% CI:", round(estimate_comparison$conf.int[1], 4),
        "to", round(estimate_comparison$conf.int[2], 4), "\n")
    cat("p-value:", format.pval(estimate_comparison$p.value, digits = 4), "\n")
  }
}
Difference between model estimates (Ordinal - Linear):
Mean difference: 0.48 
95% CI: 0.48 to 0.49 
p-value: < 0.00000000000000022 

5.10 Visualization: Interaction Coefficient Sampling Distributions

This visualization shows the sampling distributions of interaction effect estimates from both modeling approaches. It highlights how linear models accurately recover the null mean-based interaction, while ordinal models capture a significant latent scale interaction that reflects the underlying distributional heterogeneity.

Show code
# Find the significance transition thresholds for each model
transition_thresholds <- converged_results %>%
  group_by(model_type) %>%
  arrange(interaction_estimate) %>%
  summarise(
    # Find where significance changes from TRUE to FALSE (upper threshold)
    sig_upper_threshold = max(interaction_estimate[significant], na.rm = TRUE), #     # Find where significance changes from FALSE to TRUE (lower threshold) 
    sig_lower_threshold = min(interaction_estimate[significant], na.rm = TRUE), #     .groups = "drop"
  )

# Print the transition thresholds
cat("Significance Transition Thresholds:\n")
Significance Transition Thresholds:
Show code
print(transition_thresholds)
# A tibble: 2 × 3
  model_type sig_upper_threshold sig_lower_threshold
  <chr>                    <dbl>               <dbl>
1 Linear                  -0.174              -0.233
2 Ordinal                  0.611               0.233
Show code
# Find the overall range where estimates are non-significant for BOTH models
overall_nonsig_min <- max(transition_thresholds$sig_lower_threshold)  # Right edge of leftmost significant region
overall_nonsig_max <- min(transition_thresholds$sig_upper_threshold)  # Left edge of rightmost significant region

cat("\nOverall non-significant region (both models):\n")

Overall non-significant region (both models):
Show code
cat("From", overall_nonsig_min, "to", overall_nonsig_max, "\n")
From 0.23 to -0.17 
Show code
# Calculate density data
create_density_data <- function(data, model_name) {
  estimates <- data$interaction_estimate[data$model_type == model_name]
  dens <- density(estimates, n = 512)
  tibble(x = dens$x, y = dens$y, model_type = model_name)
}

linear_dens <- create_density_data(converged_results, "Linear")
ordinal_dens <- create_density_data(converged_results, "Ordinal")
all_dens <- bind_rows(linear_dens, ordinal_dens)

# Create the plot with grey box in the middle (non-significant for both)
sampling_dist_plot <- ggplot() +
  # Full density curves
  geom_line(
    data = all_dens,
    aes(x = x, y = y, color = model_type),
    size = 1
  ) +
  geom_area(
    data = all_dens,
    aes(x = x, y = y, fill = model_type),
    alpha = 0.5
  ) +
  # Grey box covering the region where BOTH models are non-significant
  geom_rect(
    aes(xmin = overall_nonsig_min, xmax = overall_nonsig_max, ymin = -Inf, ymax = Inf),
    fill = "grey50",
    alpha = 0.2
  ) +
  # Reference lines
  geom_vline(xintercept = 0, linetype = "dashed", size = 1, color = "black") +
  scale_fill_viridis_d(name = "Model Type", option = "plasma", end = 0.8) +
  scale_color_viridis_d(name = "Model Type", option = "plasma", end = 0.8) +
  scale_x_continuous(name = "Group × Time Interaction Estimate", 
                     breaks = seq(-0.2, 0.7, 0.10)) +
  scale_y_continuous(name = "Density") +
  theme_minimal() +
  theme(
    text = element_text(family = "serif"), 
    legend.position = "bottom", 
    axis.title = element_blank(), 
    plot.title = element_text(size = 11, face = "bold"),
    plot.subtitle = element_text(size = 10)
    ) +
  labs(
    title = "D. Sampling Distributions of Interaction Coefficient",
    subtitle = "Based on N=1,000 Monte Carlo simulations per model type\nGrey box = region where both models yield non-significant estimates" 
  )

sampling_dist_plot

Show code
# Create significance detection plot
significance_data <- converged_results %>%
  group_by(model_type, significant) %>% # FIXED: Use 'significant'
  summarise(count = n(), .groups = "drop") %>%
  group_by(model_type) %>%
  mutate(
    total = sum(count),
    proportion = count / total,
    significance_label = ifelse(significant, "Significant", "Non-significant") 
  )

sig_plot <- ggplot(significance_data, aes(x = model_type, y = proportion, fill = significance_label)) +
  geom_col(position = "stack", alpha = 0.8) +
  geom_text(aes(label = paste0(round(proportion * 100, 1), "%")),
            position = position_stack(vjust = 0.5), size = 4, fontface = "bold") +
  scale_fill_manual(
    name = "Effect Detection",
    values = c("Significant" = "#440154", "Non-significant" = "#fde725"),
    guide = guide_legend(reverse = TRUE)
  ) +
  scale_y_continuous(name = "Proportion", labels = percent_format()) +
  scale_x_discrete(name = "Model Type") +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    axis.title = element_text(size = 10),
    plot.title = element_text(size = 11, face = "bold", hjust = 0.5)
  ) +
  labs(title = "Proportion of Simulations Detecting Significant Interaction")

# Combine plots
combined_mc_plot <- sampling_dist_plot / sig_plot +
  plot_layout(heights = c(2, 1)) +
  plot_annotation(
    title = "Figure S1B. Monte Carlo Simulation Results: Linear vs. Ordinal Models",
    subtitle = "Heterogeneous treatment effects in zero-inflated crime data: Different estimands lead to different conclusions",
    theme = theme(
      plot.title = element_text(size = 11, face = "bold", hjust = 0.5),
      plot.subtitle = element_text(size = 10, hjust = 0.5)
    )
  )

# combined_mc_plot
Show code
# Calculate summary statistics using the Monte Carlo results
enhanced_mc_summary <- sim_results %>%
  dplyr::filter(converged == TRUE) %>%
  group_by(model_type) %>%
  summarise(
    N_estimates = n(),
    Mean_Interaction_Est = mean(interaction_estimate, na.rm = TRUE),
    Prop_Interaction_Sig = mean(significant, na.rm = TRUE),
    # These are now true Monte Carlo averages
    Avg_High_Pre = mean(high_baseline_mean, na.rm = TRUE),
    Avg_High_Post = mean(high_postshock_mean, na.rm = TRUE),
    Avg_Low_Pre = mean(low_baseline_mean, na.rm = TRUE),
    Avg_Low_Post = mean(low_postshock_mean, na.rm = TRUE),
    Avg_Cohens_D = mean(cohens_d, na.rm = TRUE),
    # Add variability measures
    SD_Cohens_D = sd(cohens_d, na.rm = TRUE),
    .groups = "drop"
  )

gt(enhanced_mc_summary) %>%
  tab_header(
    title = "Monte Carlo Simulation: Comprehensive Summary",
    subtitle = "Model Coefficients and Predicted Conditional Means (Averaged Across 1000 Simulations)"
  ) %>%
  fmt_number(columns = c(Mean_Interaction_Est, Avg_High_Pre, Avg_High_Post,
                         Avg_Low_Pre, Avg_Low_Post, Avg_Cohens_D, SD_Cohens_D), decimals = 3) %>%
  fmt_percent(columns = Prop_Interaction_Sig, decimals = 1) %>%
  cols_label(
    model_type = "Model Type",
    N_estimates = "N",
    Mean_Interaction_Est = "Interaction Coef.",
    Prop_Interaction_Sig = "% Sig.",
    Avg_High_Pre = "High/Pre",
    Avg_High_Post = "High/Post",
    Avg_Low_Pre = "Low/Pre",
    Avg_Low_Post = "Low/Post",
    Avg_Cohens_D = "Cohen's d",
    SD_Cohens_D = "SD(d)"
  ) %>%
  tab_spanner(
    label = "Average Predicted Conditional Means",
    columns = c(Avg_High_Pre, Avg_High_Post, Avg_Low_Pre, Avg_Low_Post)
  ) %>%
  tab_spanner(
    label = "Overall Treatment Effect Size",
    columns = c(Avg_Cohens_D, SD_Cohens_D)
  ) %>%
  tab_footnote(
    footnote = "Interaction coefficient from model. For the Linear Model, this reflects the average difference in conditional means. For the Ordinal Model, this reflects the average difference in latent means (on a standardized scale where latent SD=1).",
    locations = cells_column_labels(columns = Mean_Interaction_Est)
  ) %>%
  tab_footnote(
    footnote = "Averaged across 1000 Monte Carlo simulations. Shows both models recover similar conditional means when extracted from predicted distributions.",
    locations = cells_column_spanners(spanners = "Average Predicted Conditional Means")
  ) %>%
  tab_footnote(
    footnote = "Cohen's d: For the Linear Model, this is Cohen's d for the pooled overall treatment effect, calculated on the observed outcome scale. For the Ordinal Model, this is the latent Cohen's d for the interaction, which is the direct estimate of the interaction coefficient on the standardized latent scale (where latent SD=1). For the Ordinal Model, the 'Interaction Coef.' and 'Cohen's d' columns will display the same average value, as the coefficient itself is directly interpretable as Cohen's d on the latent scale. SD(d) shows variability across simulations.",
    locations = cells_column_spanners(spanners = "Overall Treatment Effect Size")
  )
Monte Carlo Simulation: Comprehensive Summary
Model Coefficients and Predicted Conditional Means (Averaged Across 1000 Simulations)
Model Type N Interaction Coef.3 % Sig.
Average Predicted Conditional Means1
Overall Treatment Effect Size2
High/Pre High/Post Low/Pre Low/Post Cohen's d SD(d)
Linear 1000 −0.075 1.8% 0.902 1.500 0.899 1.422 0.397 0.015
Ordinal 1000 0.406 99.9% 0.948 1.248 0.944 1.559 0.406 0.059
1 Averaged across 1000 Monte Carlo simulations. Shows both models recover similar conditional means when extracted from predicted distributions.
2 Cohen's d: For the Linear Model, this is Cohen's d for the pooled overall treatment effect, calculated on the observed outcome scale. For the Ordinal Model, this is the latent Cohen's d for the interaction, which is the direct estimate of the interaction coefficient on the standardized latent scale (where latent SD=1). For the Ordinal Model, the 'Interaction Coef.' and 'Cohen's d' columns will display the same average value, as the coefficient itself is directly interpretable as Cohen's d on the latent scale. SD(d) shows variability across simulations.
3 Interaction coefficient from model. For the Linear Model, this reflects the average difference in conditional means. For the Ordinal Model, this reflects the average difference in latent means (on a standardized scale where latent SD=1).

5.11 Combined Visualization

Show code
# Define the layout for the 4 panels
design <- "
  AB
  CD
"

# Combine the plots using their generated objects from previous chunks
# Apply labels (A, B, C, D) directly within the wrap_plots call for consistent placement.
combined_s1_plot <- wrap_plots(
  A = theoretical_dist_plot, 
  B = prob_plot_bayes,       
  C = ppcheck_combined,      
  D = sampling_dist_plot     
) +
  plot_layout(design = design) + # Apply the defined layout
  plot_annotation(
    title = "Figure S1B. Linear & Ordinal Modeling of Heterogeneous Effects in Simulated Crime Data",
    theme = theme(
      plot.title = element_text(hjust = 0.5, size = 12, face = "bold")
    )
  ) 

combined_s1_plot

5.12 Key Insights (for Supplementary Simulation 1B)

This simulation demonstrates several critical points about ordinal modeling for criminological research:

  1. Mean Equivalence Can Mask Underlying Heterogeneity: Linear models, which focus on conditional means, can entirely miss fundamental differences in how effects manifest. Even when complex social mechanisms lead to similar average changes in outcomes across groups, the linear model will accurately report a near-null mean-based interaction, failing to detect underlying (categorically) divergent processes.
  2. Divergent Inferential Conclusions on Heterogeneity: As a direct consequence, linear models may yield non-significant or misleadingly small (or even mis-signed) coefficients for heterogeneous effects, while ordinal models (whose coefficients reflect shifts on the latent scale) consistently detect a statistically significant and substantively meaningful interaction, leading to fundamentally different inferential conclusions about the presence and nature of heterogeneity.
  3. Ordinal Models Capture Distinct Mechanisms: By explicitly modeling the categorical nature of the data and an underlying latent variable, the cumulative ordinal model in this example was sensitive to different types of change (e.g., prevalence effects where abstainers begin offending vs. severity effects where existing offenders escalate). This provides a richer and more accurate understanding of the social processes, which are obscured by mean-focused linear models.
  4. Superiority Despite Assumption Violations: The design explicitly encodes heterogeneous treatment effects that violate the cumulative ordinal model’s proportional odds assumption. However, the cumulative ordinal model still offers a more informative signal about underlying heterogeneity than the linear model, which fundamentally mischaracterizes the ordinal outcome distribution.
  5. Implications for Causal Inference and Robustness Checks: Relying on linear models as “robustness checks” by simply comparing coefficient signs and significance can be deeply misleading. This simulation demonstrates that models sensitive to distributional patterns can provide better insights into differential causal mechanisms, thereby enhancing the accuracy of causal inference and substantive interpretation.

Bottom Line: Researchers who focus solely on coefficient direction and statistical significance would miss significant underlying heterogeneity, concluding “no group difference” when distinct social mechanisms are at play. The methodological choice determines whether you overlook fundamental differences in how groups experience an effect (linear model) versus accurately identify divergent underlying causal processes (cumulative ordinal model), leading to profoundly different substantive conclusions from the same data.

Now, let’s move on to illustrate some of these issues with real data. We are ready to load and skim data shared by Pickett and colleagues from their study on racial differences in fear of police.

6 PGC Data

In this tutorial, we are not trying to reproduce the numerous models or results for all outcomes reported in the authors’ original manuscript (though we certainly encourage that as a laudable goal). Rather, we we want to dig deeply into the application of cumulative probit models, so we will restrict our focus primarily to describing relationships between participant’s self-reported “race” and their “personal fear” of the police.

6.1 Load data

Pickett, Graham, and Cullen (hereafter PGC) provided Stata data and code (.do) files. Let’s start by reading in their data file.

Show code
dat <- read_dta(file = here("Data", "feardata(12).dta"))

Initially, Jake loaded data and translated all of PGC’s data wrangling code from their Stata .do file into R in another file. Here, since we are focusing solely on a handful of key items, we will filter the data to keep only those relevant items and then skim the data.

Show code
dat <- dat %>%
  dplyr::select(caseid, 
                race, 
                Q2_1, Q2_2, Q2_3, Q2_4, Q2_5, 
                Q2_6, Q2_7, Q2_8, Q2_9, Q2_10) 

Now, let’s use haven to zap Stata formats and attributes, then skim the data.

Show code
# str(dat)
dat <- zap_formats(dat) 
dat <- zap_labels(dat) 
dat <- zap_label(dat) 
# str(dat)

dat %>% dplyr::select(-caseid) %>% datasummary_skim()
Unique Missing Pct. Mean SD Min Median Max Histogram
race 7 0 1.8 1.1 1.0 2.0 7.0
Q2_1 5 0 3.0 1.3 1.0 3.0 5.0
Q2_2 5 0 3.1 1.3 1.0 3.0 5.0
Q2_3 6 0 3.0 1.3 1.0 3.0 5.0
Q2_4 5 0 3.1 1.4 1.0 3.0 5.0
Q2_5 6 0 3.1 1.5 1.0 3.0 5.0
Q2_6 5 0 3.0 1.5 1.0 3.0 5.0
Q2_7 5 0 3.1 1.5 1.0 3.0 5.0
Q2_8 5 0 3.0 1.5 1.0 3.0 5.0
Q2_9 5 0 3.0 1.5 1.0 3.0 5.0
Q2_10 5 0 3.0 1.6 1.0 3.0 5.0

6.2 Race/Ethnicity

We will restrict our focus to fear contrasts between Black and White participants; this was the central contrast reported throughout PGC’s paper. In some figures, PGC also reported results for a combined “other” race/ethnicity group; however, that group’s small subsample size generated much noisier estimates (i.e., very wide confidence bounds).

Show code
# table(dat$race)

dat <- dat %>%
  dplyr::filter(race %in% c(1,2)) %>%
  mutate(
    caseid = as_factor(caseid), 
    race = as_factor(if_else(race == 2, "Black", "White"))
  ) 

tabyl(dat$race) %>% gt()
dat$race n percent
White 504 0.49
Black 517 0.51

6.3 Personal fear of police

Here is PGC’s description of the “personal fear” measure.

“For our survey, we developed original measures of both personal and altruistic fear of the police (see the appendix at the end of the article). In so doing, we followed best practices for measuring fear (Ferraro, 1995), and we modeled our measures on those used in recent studies to examine similar emotions in other contexts (e.g., Burton et al., 2021; Pickett, Roche, et al., 2018). To measure personal fear, we asked respondents about their emotional fear “that the police will do the following things to you without good reason in the next five years.” They rated how afraid they were (0 = very unafraid, 4 = very afraid) of falling victim to ten types of police mistreatment (e.g., “punch or kick you,” “pepper spray you,” or “kill you”). The full list of items and question wording are provided in the appendix. We averaged the ten items to construct an index (loadings: .823 to .958, α = .98) on which higher values indicated greater personal fear.

Here is the precise description of the from their appendix:

Stem: “Now, we want to ask about EMOTIONAL FEAR. How afraid or unafraid are you that the POLICE will do the following things to you WITHOUT GOOD REASON in the next five years?”

Items: 1. Stop you 2. Search you 3. Yell at you 4. Handcuff you 5. Punch or kick you 6. Pin you to the ground 7. Pepper spray you 8. Use a taser on you 9. Shoot at you with a gun 10. Kill you

Response Scale: Very afraid, afraid, neither afraid nor unafraid, unafraid, very unafraid.

Following PGC, we will recode each of the 10 items so that higher values indicate greater fear. Specifically, the items are coded as: 0=very unfraid; 1=unafraid; 2=neither afraid nor afraid; 3=afraid; 4=very afraid. We will also follow their column naming conventions.

Show code
fearpol_old <- c(paste0("Q2_", 1:10))
fearpol_new <- c("stopyou", "searchyou", "yellyou", "handcyou", 
             "kickyou", "pinyou", "sprayyou", "taseyou", 
             "shootyou", "killyou")

dat <- dat %>% 
  mutate(
    across(all_of(fearpol_old), ~ 5 - .x, .names = "{.col}_new") %>%
      setNames(fearpol_new)
  ) 

Let’s check that it worked properly, then drop the old fear columns.

Show code
# dat %>% dplyr::select(-caseid) %>% datasummary_skim()

#Function to check recodes
tabcheck <- function(dat, var1, var2){
  tabyl(dat, {{var1}}, {{var2}}) %>% 
    adorn_title() 
}

map2(fearpol_old, fearpol_new, ~ tabcheck(dat, !!sym(.x), !!sym(.y))) 
[[1]]
      stopyou                
 Q2_1       0   1   2   3   4
    1       0   0   0   0 164
    2       0   0   0 215   0
    3       0   0 283   0   0
    4       0 168   0   0   0
    5     191   0   0   0   0

[[2]]
      searchyou                
 Q2_2         0   1   2   3   4
    1         0   0   0   0 160
    2         0   0   0 169   0
    3         0   0 277   0   0
    4         0 205   0   0   0
    5       210   0   0   0   0

[[3]]
      yellyou                
 Q2_3       0   1   2   3   4
    1       0   0   0   0 167
    2       0   0   0 195   0
    3       0   0 272   0   0
    4       0 187   0   0   0
    5     200   0   0   0   0

[[4]]
      handcyou                
 Q2_4        0   1   2   3   4
    1        0   0   0   0 186
    2        0   0   0 183   0
    3        0   0 235   0   0
    4        0 200   0   0   0
    5      217   0   0   0   0

[[5]]
      kickyou                    
 Q2_5       0   1   2   3   4 NA_
    1       0   0   0   0 212   0
    2       0   0   0 161   0   0
    3       0   0 212   0   0   0
    4       0 170   0   0   0   0
    5     265   0   0   0   0   0
 <NA>       0   0   0   0   0   1

[[6]]
      pinyou                
 Q2_6      0   1   2   3   4
    1      0   0   0   0 227
    2      0   0   0 159   0
    3      0   0 213   0   0
    4      0 161   0   0   0
    5    261   0   0   0   0

[[7]]
      sprayyou                
 Q2_7        0   1   2   3   4
    1        0   0   0   0 213
    2        0   0   0 171   0
    3        0   0 209   0   0
    4        0 164   0   0   0
    5      264   0   0   0   0

[[8]]
      taseyou                
 Q2_8       0   1   2   3   4
    1       0   0   0   0 225
    2       0   0   0 168   0
    3       0   0 204   0   0
    4       0 160   0   0   0
    5     264   0   0   0   0

[[9]]
      shootyou                
 Q2_9        0   1   2   3   4
    1        0   0   0   0 262
    2        0   0   0 131   0
    3        0   0 198   0   0
    4        0 153   0   0   0
    5      277   0   0   0   0

[[10]]
       killyou                
 Q2_10       0   1   2   3   4
     1       0   0   0   0 277
     2       0   0   0 114   0
     3       0   0 195   0   0
     4       0 153   0   0   0
     5     282   0   0   0   0

Recoding worked as intended. It appears there is one missing (NA) observation on Q2_5 (“kickyou”). It is not completely clear to me how PGC handled this particular NA value or missing data in general. Jake addressed missingness in more detail (regarding an “income” variable) in a different file where he was adapting PGC’s code to R, but I did not see any info related to how this one case was treated. Given our aims, since there is only one NA value, I am going to drop it and move on, with the expectation that any differences in reproducing estimates resulting from this decision will be trivial. However, it is worth noting that ordinal regression methods are fully compatible with multiple imputation approaches to missing data.

Show code
dat <- dat %>% 
  dplyr::select(-all_of(fearpol_old)) %>% 
  drop_na()

dat %>% dplyr::select(-caseid) %>% datasummary_skim()
Unique Missing Pct. Mean SD Min Median Max Histogram
stopyou 5 0 2.0 1.3 0.0 2.0 4.0
searchyou 5 0 1.9 1.3 0.0 2.0 4.0
yellyou 5 0 1.9 1.3 0.0 2.0 4.0
handcyou 5 0 1.9 1.4 0.0 2.0 4.0
kickyou 5 0 1.9 1.5 0.0 2.0 4.0
pinyou 5 0 1.9 1.5 0.0 2.0 4.0
sprayyou 5 0 1.9 1.5 0.0 2.0 4.0
taseyou 5 0 1.9 1.5 0.0 2.0 4.0
shootyou 5 0 1.9 1.5 0.0 2.0 4.0
killyou 5 0 2.0 1.6 0.0 2.0 4.0
race N %
White 504 49.4
Black 516 50.6

Now that we have data on 10 ordinal outcomes and one dichotomous categorical predictor, we can move on to modeling the data.

7 (Not-so-)Normal models

A primary purpose of statistical modeling is to accurately estimate relationships between observed variables in a data set; if the data and model are good, then a model can help extract accurate “signals” like causal inferences or predictions about future outcomes from observed data. We learn in introductory statistics courses that all models require assumptions. Some assumptions we make are warranted, and some assumptions are unwarranted but their violation imposes negligible penalties. However, violating some unwarranted assumptions puts our model at risk of failing to meet our primary goal of accurately estimating relationships among observed variables.

One such assumption is that popular linear modeling techniques designed for metric normal data will perform well when applied to items or scales created using ordinal items. Liddell and Kruschke (2018) convincingly show how such assumptions can go wrong, generating model predictions that exhibit poor fit to the data and resulting in substantial magnitude and even directional inference errors. Rather than duplicate those efforts, we refer readers to their paper and other work on the topic (e.g., Harrell; Bürkner & Vuorre). Instead, in this tutorial, we wish to introduce criminologists to this important topic by illustrating how cumulative probit regression models and visualization of predicted probabilities from those models can improve the signals that we extract from ordinal data.

We will use R packages to apply Bayesian ordinal regression techniques to these data. The posterior distributions that Bayesian models generate are advantageous for subsequent model checking and flexible visualization of results. However, traditional frequentist approaches for applying ordinal regressions are available in popular statistical programs (e.g., R: ordinal, MASS::polr(https://rdrr.io/cran/MASS/man/polr.html), rms::orm; SAS: proc probit, proc logistic; Stata: ologit, oprobit; continuous Y outcomes: R package rms::orm or SAS JMP; quick-start tutorials: R, Stata, SAS).

Topics covered:

  • Introduce the mosaic plot as an alternative to a scatterplot for visualizing bivariate ordinal response distributions
  • Compare how accurately intercept-only models recover underlying data distributions across linear and cumulative probit regression techniques.
  • Compare linear and cumulative probit regression results of item-specific analyses estimating race differences in fear of police on latent item scales (regression betas)
  • Briefly introduce application of category-specific probit regression techniques for even more accurate recovery of underlying distributions (for online supplement only).
  • Compare results from a linear model applied to PGC’s multi-item composite scale measuring latent fear of police to those from an ordinal multilevel model predicting latent fear of police.
  • Extract race-specific predicted probabilities and calculate black-white contrasts and illustrate visualizations for conveying these estimates.

8 Descriptive visualizations

Before we start modeling, it is a good idea to visualize our data. Let’s start by viewing the ten observed item distributions using frequency bar graphs. Remember, we want our models to accurately recover and then conditionally predict meaningful characteristics of these distributions (e.g., their means or ordinal thresholds in linear and cumulative probit models, respectively).

8.1 Univariate bar charts

Show code
# Get colors from viridis magma scale (between 0.2 and 0.8)
magma_colors <- viridis(n = 5, option = "magma", begin = 0.2, end = 0.8)
# show_col(magma(50))

# Create outcome labels 
fear_labels <- c(
  "stopyou" = "Stop You",
  "searchyou" = "Search You", 
  "yellyou" = "Yell at You",
  "handcyou" = "Handcuff You",
  "kickyou" = "Kick/Hit You",
  "pinyou" = "Pin You Down",
  "sprayyou" = "Pepper Spray You",
  "taseyou" = "Tase You",
  "shootyou" = "Shoot at You",
  "killyou" = "Kill You"
)


# Create response level labels
response_labels <- c(
  "0" = "Very Unafraid",
  "1" = "Unafraid",
  "2" = "Neither",
  "3" = "Afraid",
  "4" = "Very Afraid"
)

# Modified function to show reference lines at 25% and 50%
plot_outcome <- function(data, var, label) {
  ggplot(data, aes(y = factor(.data[[var]], 
                             levels = 0:4,
                             labels = response_labels,
                             ordered = TRUE))) +
    geom_vline(xintercept = c(102, 204, 306), color = "gray80", linetype = "dashed") +
    geom_bar(aes(fill = factor(.data[[var]], 
                              levels = 0:4,
                              labels = response_labels,
                              ordered = TRUE))) + 
    coord_cartesian(expand = FALSE) +
    scale_fill_manual(values = magma_colors) +
    labs(
      y = NULL,
      title = label
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(size = 10, hjust = 0.5),
      axis.title.x = element_blank(),
      legend.position = "none",
      panel.grid.major.y = element_blank(),
      plot.margin = margin(t = 5, r = 5, b = 15, l = 5)
    ) +
    scale_x_continuous(
      breaks = c(102, 204, 306),
      labels = c(".1", ".2", ".3")
    )
}

# Create all plots
plots <- list()
for(i in seq_along(fearpol_new)) {
  plots[[i]] <- plot_outcome(
    dat, 
    fearpol_new[i], 
    fear_labels[fearpol_new[i]]
  )
}

# Combine plots using patchwork
fear_fig <- wrap_plots(plots, ncol = 5, nrow = 2) +
  plot_annotation(
    title = "Fear of Police Item-Specific Distributions",
    subtitle = "Response distribution from Very Unafraid (0) to Very Afraid (4)",
    caption = "Note: Higher values indicate greater fear (N = 1,020)"
  ) &
  theme(
    plot.title = element_text(size = 12),
    plot.subtitle = element_text(size = 10),
    plot.caption = element_text(size = 10, hjust = 0)
  )

fear_fig

At first glance, PGC’s new fear measures appear to conflate fear with perceived risk or probability of experiencing actions, which was something they explicitly intended to avoid (see p.293). For instance, there are more people who are reportedly very unafraid the police will shoot at them or kill them than there are people very unafraid the police will stop them or search them. Maybe this is not really a problem - perhaps the degree of fear one feels about an action intrinsically depends upon one’s perceived probability that they will experience the action (aka, vulnerability model).

Now, let’s illustrate how we can use mosaic plots to visualize bivariate ordinal distributions.

8.2 Mosaic plots

Goal: Introduce the mosaic plot as an alternative to a scatterplot for visualizing bivariate ordinal response distributions

Scatterplots are notoriously bad at generating useful visual descriptions of bivariate relationships between ordinal or categorical variables. In contrast, mosaic plots are designed for this task. Will Ball describes a mosaic plot as “similar to a heatmap, except that the frequency of an observation… is proportional to the area of a tile.”

Show code
# Prepare data
datp <- dat %>%
  mutate(
    # Convert all fear variables to ordered factors
    across(c(stopyou, searchyou, yellyou, handcyou, 
             kickyou, pinyou, sprayyou, taseyou, 
             shootyou, killyou), 
           ~factor(., levels = 0:4, 
                  labels = c("Very Unafraid", "Unafraid", 
                           "Neither", "Afraid", "Very Afraid"),
                  ordered = TRUE))
  )

# Function to generate mosaic plots
mosaic_fear <- function(df, yvar, xlab, show_legend = TRUE, show_lab = TRUE) {
  ggplot(df) +
    geom_mosaic(aes(x = product(!!as.name(yvar), race), 
                    fill = !!as.name(yvar)),
                na.rm = TRUE) +
    scale_fill_viridis_d(begin = 0.2, end = 0.8,  
                        direction = 1, 
                        option = "magma") +
    guides(fill = if(show_legend) 
           guide_legend(reverse = TRUE, title = "Fear\nLevel") 
           else "none") +
    labs(x = if(show_lab) xlab else "",
         y = "") +  
    theme_minimal() +
    theme(panel.grid.major = element_blank(),
          panel.grid.minor = element_blank(),
          axis.title.x = element_text(size = 10, vjust = 3),
          axis.title.y = element_blank(),  
          axis.text.y = element_blank(),  
          legend.key.height = unit(.5, 'cm'),
          legend.key.width = unit(.5, 'cm'),
          legend.position = "right",
          legend.text = element_text(size = 10))
}

# Create plots for each outcome
mos_stop <- mosaic_fear(datp, "stopyou", "Stop", show_legend = FALSE)
mos_search <- mosaic_fear(datp, "searchyou", "Search", show_legend = FALSE)
mos_yell <- mosaic_fear(datp, "yellyou", "Yell", show_legend = FALSE)
mos_handc <- mosaic_fear(datp, "handcyou", "Handcuff", show_legend = FALSE)
mos_kick <- mosaic_fear(datp, "kickyou", "Kick/Hit", show_legend = FALSE)
mos_pin <- mosaic_fear(datp, "pinyou", "Pin Down", show_legend = FALSE)
mos_spray <- mosaic_fear(datp, "sprayyou", "Pepper Spray", show_legend = FALSE)
mos_tase <- mosaic_fear(datp, "taseyou", "Tase", show_legend = FALSE)
mos_shoot <- mosaic_fear(datp, "shootyou", "Shoot", show_legend = FALSE)
mos_kill <- mosaic_fear(datp, "killyou", "Kill", show_legend = TRUE)

# Combine plots 
fear_mosaic <- (mos_stop + mos_search) /
               (mos_yell + mos_handc) /
               (mos_kick + mos_pin) /
               (mos_spray + mos_tase) /
               (mos_shoot + mos_kill) +
  plot_layout(guides = 'collect') +
  plot_annotation(
    title = "Fear of Police by Race",
    subtitle = "Distribution of responses for White (n=504) and Black (n=516) respondents",
    caption = "Note: Higher values indicate greater fear"
  ) &
  theme(
    plot.title = element_text(size = 12),
    plot.subtitle = element_text(size = 10),
    plot.caption = element_text(size = 10, hjust = 0)
  )

fear_mosaic

Let’s try a custom layout.

8.2.1 FIG2

Show code
# Modify mosaic_fear function 
mosaic_fear <- function(df, yvar, show_legend = TRUE, show_lab = TRUE) {
  ggplot(df) +
    geom_mosaic(aes(x = product(!!as.name(yvar), race), 
                    fill = !!as.name(yvar)),
                na.rm = TRUE) +
    scale_fill_viridis_d(begin = 0.8, end = 0.2, 
                        direction = -1, 
                        option = "magma") +
    guides(fill = if(show_legend) 
           guide_legend(reverse = TRUE, 
                       title = element_blank())
           else "none") +
    labs(x = "",
         y = "",
         title = if(show_lab) fear_labels[yvar] else "") +
    theme_minimal() +
    theme(
          panel.grid.major = element_blank(),
          panel.grid.minor = element_blank(),
          axis.title = element_blank(),
          axis.text.y = element_blank(),
          axis.text.x = element_blank(),  
          legend.key.height = unit(.5, 'cm'),
          legend.key.width = unit(.5, 'cm'),
          legend.title = element_text(angle = 90),
          legend.position = "right",
          legend.text = element_text(size = 10),
          plot.title = element_text(size = 10, hjust = 0.5),
          plot.title.position = "plot"
    )
}

# Create most plots without race labels
mos_stop <- mosaic_fear(datp, "stopyou", show_legend = FALSE) 
mos_search <- mosaic_fear(datp, "searchyou", show_legend = FALSE)
mos_yell <- mosaic_fear(datp, "yellyou", show_legend = FALSE)
mos_handc <- mosaic_fear(datp, "handcyou", show_legend = FALSE)
mos_kick <- mosaic_fear(datp, "kickyou", show_legend = FALSE)
mos_pin <- mosaic_fear(datp, "pinyou", show_legend = FALSE)
mos_spray <- mosaic_fear(datp, "sprayyou", show_legend = FALSE)
mos_tase <- mosaic_fear(datp, "taseyou", show_legend = FALSE)

# Create bottom plots with race labels
mos_shoot <- mosaic_fear(datp, "shootyou", show_legend = FALSE) + 
  theme(axis.text.x = element_text(size = 14))

mos_kill <- mosaic_fear(datp, "killyou", show_legend = TRUE) + 
  theme(
    axis.text.x = element_text(size = 14), 
    panel.border = element_rect(color = "black", fill = NA, linewidth = 2)
    )

# Custom layout
layout <- "
ABKK
CDKK
EFKK
GHKK
IJL#
"

fear_mosaic2 <- mos_stop + 
                mos_search +
                mos_yell +
                mos_handc +
                mos_kick +
                mos_pin +
                mos_spray +
                mos_tase +
                mos_shoot +
                mos_kill +
                mos_kill +
                guide_area() +
  plot_layout(design = layout, 
             guides = 'collect',
             heights = c(1, 1, 1, 1, 1)) +
  plot_annotation(
    title = "FIGURE 2. Fear of Police by Race",
    subtitle = "Distribution of responses for White (n=504) and Black (n=516) respondents",
    caption = "Note: Higher values indicate greater fear"
  ) &
  theme(
    text = element_text(family = "serif"),
    plot.title = element_text(size = 14),
    plot.subtitle = element_text(size = 14),
    plot.caption = element_text(size = 14, hjust = 0),
    legend.text = element_text(size = 14),
    legend.position = "right",
    legend.title.position = "left"
  )

fear_mosaic2

For those that prefer numbers, we can also summarize these race-specific proportions in a table. However, including all estimates would be a lot of information. For simplicity, let’s focus on summarizing the extreme categories of “very unafraid” (y=0) and “very afraid” (y=4).

8.2.2 APP-B

Show code
# List fear variables
fear_vars <- names(fear_labels)

# Calculate proportions for each fear level by race
fear_props <- datp %>%
  pivot_longer(
    cols = all_of(fear_vars),
    names_to = "item",
    values_to = "response"
  ) %>%
  # Add item labels
  mutate(item_label = fear_labels[item]) %>%
  # Group by race, item, and response level
  group_by(race, item, item_label, response) %>%
  # Count occurrences
  summarize(n = n(), .groups = "drop") %>%
  # Calculate proportions within each race and item
  group_by(race, item) %>%
  mutate(prop = n / sum(n, na.rm = TRUE)) %>%
  ungroup()

# Create summary for the extreme values table
fear_summary <- fear_props %>%
  dplyr::filter(response %in% c("Very Unafraid", "Very Afraid")) %>%
  dplyr::select(race, item_label, response, prop) %>%
  tidyr::pivot_wider(
    names_from = c(race, response),
    values_from = prop
  ) %>%
  dplyr::mutate(
    diff_unafraid = `Black_Very Unafraid` - `White_Very Unafraid`,
    diff_afraid = `Black_Very Afraid` - `White_Very Afraid`,
    item_label = factor(item_label, 
                       levels = c("Stop You", "Search You", "Yell at You",
                                "Handcuff You", "Kick/Hit You", 
                                "Pin You Down", "Pepper Spray You", 
                                "Tase You", "Shoot at You", "Kill You"))
  ) %>%
  dplyr::arrange(item_label) %>%
  dplyr::rename(
    white_unafraid = `White_Very Unafraid`,
    black_unafraid = `Black_Very Unafraid`,
    white_afraid = `White_Very Afraid`,
    black_afraid = `Black_Very Afraid`
  )

# Create gt table
gt_fear <- fear_summary %>%
  gt() %>%
  # Column labels
  cols_label(
    item_label = "Police Action",
    white_unafraid = "White",
    black_unafraid = "Black",
    diff_unafraid = "Diff",
    white_afraid = "White",
    black_afraid = "Black", 
    diff_afraid = "Diff"
  ) %>%
  fmt_percent(
    columns = c(white_unafraid, black_unafraid, diff_unafraid,
               white_afraid, black_afraid, diff_afraid),
    decimals = 1
  ) %>%
  tab_spanner(
    label = "Very Unafraid (y=0)",
    columns = c(white_unafraid, black_unafraid, diff_unafraid)
  ) %>%
  tab_spanner(
    label = "Very Afraid (y=4)",
    columns = c(white_afraid, black_afraid, diff_afraid)
  ) %>%
  tab_header(
    title = "APPENDIX B. Extreme Fear Responses by Race and Police Action",
    subtitle = "Percentages reporting highest and lowest levels of fear"
  ) %>%
  # Styling
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_labels()
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_spanners()
  ) %>%
  tab_style(
    style = cell_text(align = "left"),
    locations = cells_column_labels(columns = item_label)
  ) %>%
  cols_align(
    align = "left",
    columns = item_label
  ) %>%
  cols_align(
    align = "center",
    columns = c(white_unafraid, black_unafraid, diff_unafraid,
               white_afraid, black_afraid, diff_afraid)
  )
Show code
gt_fear
APPENDIX B. Extreme Fear Responses by Race and Police Action
Percentages reporting highest and lowest levels of fear
Police Action
Very Unafraid (y=0)
Very Afraid (y=4)
White Black Diff White Black Diff
Stop You 28.8% 8.9% −19.9% 6.3% 25.4% 19.0%
Search You 31.5% 9.9% −21.7% 6.5% 24.6% 18.1%
Yell at You 29.2% 10.3% −18.9% 7.9% 24.6% 16.7%
Handcuff You 33.1% 9.7% −23.4% 7.3% 28.7% 21.3%
Kick/Hit You 41.9% 10.5% −31.4% 8.7% 32.6% 23.8%
Pin You Down 41.1% 10.5% −30.6% 9.5% 34.7% 25.2%
Pepper Spray You 41.7% 10.5% −31.2% 8.9% 32.6% 23.6%
Tase You 41.9% 10.3% −31.6% 9.5% 34.3% 24.8%
Shoot at You 44.2% 10.5% −33.8% 10.7% 40.3% 29.6%
Kill You 44.2% 11.4% −32.8% 10.3% 43.6% 33.3%

These mosaic plots help us clearly see substantial race differences exist in the ordinal response distribution for each fear of police item.

Of course, we want a precise way to describe and estimate these effects, and we want to do so using a modeling technique that would permit adjustment for potential confounders and communicate the degree of uncertainty in our estimates.

So, let’s move onto modeling joint distributions.

9 Intercept-only models

Goal: Compare how accurately intercept-only models recover underlying data distributions across linear and cumulative probit regression techniques.

We start by comparing linear and ordinal Bayesian “Intercept only” models to illustrate the poor fit of linear modeling to ordinal data and the substantial improvements we gain in predictive fit to underlying data distribution by using ordinal (e.g., cumulative probit) modeling techniques instead.

We begin with the Bayesian equivalent of simple means or intercept-only models for each of the ten outcome items. As implied by the name, each intercept-only model unconditionally estimates only the intercepts (e.g., mean values or ordinal thresholds) for each fear item without predictors.

Before we do this, let’s briefly discuss prior distibutions.

9.1 Prior distributions

If we were using classical (frequentist) techniques, these intercept estimates would be computed directly from the data or likelihood. In a Bayesian model, these estimates represent the center of a posterior sample of estimates that each are generated by combining the likelihood (e.g., observed mean or ordinal thresholds in the data) with a prior distribution of plausible values for the parameter in question. In constructing our Bayesian models, we specify highly diffuse or “flat” prior distributions so our posterior estimates are primarily generated by the likelihood and results will converge with estimates generated from comparable frequentist statistical models.

Alternatively, one might specify stronger or “more informative” prior distributions that, for instance, specify a certain range of values as highly implausible or even as impossible; doing so will result in posterior estimates that deviate from likelihood-only estimates in predictable, prior-specified ways. For a simple example, a classical model of self-reported crime counts might sometimes generate negative predicted counts for some subgroups and improbably extreme positive counts for other subgroups; in a Bayesian model, one could specify an a priori distribution of plausible values for which negative counts are impossible and extreme positive counts are improbable.

We will use rstanarm package for linear models, as it specifies diffuse priors that are scaled by default to the outcome’s response distribution. We will use brms package for cumulative probit models and will manually specify flat (equal probability) priors for ordinal models. These priors will allow for a wide range of likelihood-driven estimates and ensures results will be comparable to those that would be generated using similar frequentist techniques.

9.2 Linear: Lin~1 model (poor fit)

I start by estimating a simple linear intercept model of stopyou with no predictors to illustrate the poor fit of a linear model to the ordinal outcome. Here, I use Bayesian estimation with default priors; again, Bayesian models will be beneficial for model building and visualization later.

Show code
# List to store intercept-only linear models
int_lin_mods <- list()

# Run models for each outcome with manual caching to save model fits
for(outcome in fearpol_new) {
  # Create file path for cached model
  model_path <- paste0("Models/int_lin_mod_", outcome, ".rds")
  
  # Check if cached model exists
  if (file.exists(model_path)) {
    int_lin_mods[[outcome]] <- readRDS(model_path)
  } else {
    formula <- as.formula(paste0(outcome, " ~ 1"))
    
    int_lin_mods[[outcome]] <- stan_glm(
      formula = formula,
      data = dat,
      family = gaussian,
      # rstanarm defaults:
      # prior_intercept = normal(location = y_mean, scale = 2.5 * y_sd)
      # prior_aux = exponential(rate = 1/y_sd)
      chains = 4,
      cores = nCoresphys,
      seed = 1138
    )
    # Save model
    saveRDS(int_lin_mods[[outcome]], model_path)
  }
}

We ran intercept-only linear models for all ten outcomes. Now, let’s examine detailed output for the first model.

Show code
pp_check(int_lin_mods[["stopyou"]])

Show code
pp_check(int_lin_mods[["stopyou"]], 
         plotfun = "hist")

Show code
pp_check(int_lin_mods[["stopyou"]], 
         plotfun = "boxplot", 
         nreps = 10, notch = FALSE)

Show code
summary(int_lin_mods[["stopyou"]], 
        probs = c(0.025, 0.975), 
        digits = 2) 

Model Info:
 function:     stan_glm
 family:       gaussian [identity]
 formula:      stopyou ~ 1
 algorithm:    sampling
 sample:       4000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 1020
 predictors:   1

Estimates:
              mean   sd   2.5%   97.5%
(Intercept) 1.99   0.04 1.91   2.07   
sigma       1.33   0.03 1.27   1.39   

Fit Diagnostics:
           mean   sd   2.5%   97.5%
mean_PPD 1.99   0.06 1.87   2.11   

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
              mcse Rhat n_eff
(Intercept)   0.00 1.00 2355 
sigma         0.00 1.00 2732 
mean_PPD      0.00 1.00 2932 
log-posterior 0.02 1.00 1630 

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
Show code
prior_summary(int_lin_mods[["stopyou"]]) #check priors  
Priors for model 'int_lin_mods[["stopyou"]]' 
------
Intercept (after predictors centered)
  Specified prior:
    ~ normal(location = 2, scale = 2.5)
  Adjusted prior:
    ~ normal(location = 2, scale = 3.3)

Auxiliary (sigma)
  Specified prior:
    ~ exponential(rate = 1)
  Adjusted prior:
    ~ exponential(rate = 0.75)
------
See help('prior_summary.stanreg') for more details

The linear (metric normal) model intercept estimate accurately recovers the mean of stopyou (=1.99), which is what it is designed to do. However, the posterior predictive check plots (e.g., density; histogram; box) show that the underlying normal distribution assumption results in model predictions that fail to recover the observed ordinal item values. Let’s extract the density plots for all ten items and plot them together.

9.2.1 All PPCheck density plots

Show code
# Create list to store individual density plots
int_lin_pp_plots <- list()

# Generate pp_check plot for each outcome
for(outcome in fearpol_new) {
  int_lin_pp_plots[[outcome]] <- pp_check(int_lin_mods[[outcome]]) +
    ggtitle(gsub("you", " You", tools::toTitleCase(outcome))) +
    theme(plot.title = element_text(size = 8, hjust = 0.5))
}

# Combine all plots with explicit 2x5 layout
int_lin_pp_grid <- wrap_plots(int_lin_pp_plots, ncol = 2, nrow = 5) +
  plot_annotation(
    title = "Posterior Predictive Checks for Linear Models",
    subtitle = "Dark line = observed data, Light lines = posterior predictions",
    theme = theme(
      plot.title = element_text(size = 10),
      plot.subtitle = element_text(size = 8)
    )
  )

int_lin_pp_grid

Let’s replicate this proces for cumulative probit intercept-only models.

9.3 Cumulative probit: Ord~1 (good fit)

Show code
# Create ordered factor versions with suffix _ord
dat <- dat %>%
  mutate(across(all_of(fearpol_new), 
                ~factor(., levels = 0:4, ordered = TRUE),
                .names = "{.col}_ord"))

# Create vector of ordinal variable names
fearpol_ord <- paste0(fearpol_new, "_ord")

# Now dat contains both original numeric variables (stopyou, searchyou, etc.)
# and ordered factor versions (stopyou_ord, searchyou_ord, etc.)
# List to store intercept-only cumulative probit models
int_ord_mods <- list()

# Prior for ordinal models
# For equal probability (0.2) across 5 categories, thresholds should be:
# qnorm(c(0.2, 0.4, 0.6, 0.8)) ≈ c(-0.84, -0.25, 0.25, 0.84)
ord_prior <- prior(normal(-0.84, 1), class = "Intercept", coef = 1) +
             prior(normal(-0.25, 1), class = "Intercept", coef = 2) +
             prior(normal(0.25, 1), class = "Intercept", coef = 3) +
             prior(normal(0.84, 1), class = "Intercept", coef = 4)

# Run models for each outcome
for(outcome in fearpol_ord) {
  formula <- bf(paste0(outcome, " ~ 1"))
  
  int_ord_mods[[outcome]] <- brm(
    formula = formula,
    data = dat,
    family = cumulative("probit"),
    prior = ord_prior,
    iter = 2000, warmup = 1000, 
    chains = 4,
    cores = nCoresphys, 
    seed = 1138,
    file = paste0("Models/int_ord_mod_", outcome),
    file_refit = "on_change"
  )
}

Again, after running threshold-only models for all ten outcomes, let’s start by examining detailed output for the first model.

Show code
pp_check(int_ord_mods[["stopyou_ord"]])

Show code
pp_check(int_ord_mods[["stopyou_ord"]], 
         plotfun = "hist")

Show code
pp_check(
  int_ord_mods[["stopyou_ord"]], 
  type = "bars", 
  ndraws = 1000, 
  linewidth = 1/2, 
  size = 1/4) +
  labs(subtitle = "Intercept model") 

Show code
summary(int_ord_mods[["stopyou_ord"]], 
        probs = c(0.025, 0.975), 
        digits = 2) #summarize results
 Family: cumulative 
  Links: mu = probit; disc = identity 
Formula: stopyou_ord ~ 1 
   Data: dat (Number of observations: 1020) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]    -0.89      0.05    -0.98    -0.80 1.00     3346     2903
Intercept[2]    -0.38      0.04    -0.46    -0.30 1.00     4253     3452
Intercept[3]     0.33      0.04     0.25     0.41 1.00     4912     3396
Intercept[4]     1.00      0.05     0.90     1.09 1.00     4856     3722

Further Distributional Parameters:
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Show code
prior_summary(int_ord_mods[["stopyou_ord"]]) #check priors  
                prior     class coef group resp dpar nlpar lb ub  source
 student_t(3, 0, 2.5) Intercept                                  default
     normal(-0.84, 1) Intercept    1                                user
     normal(-0.25, 1) Intercept    2                                user
      normal(0.25, 1) Intercept    3                                user
      normal(0.84, 1) Intercept    4                                user

Notice how the posterior predictions generated by the cumulative probit model for the first outcome are a much better fit to the ordinal response data.

9.3.1 All PPCheck density plots

Show code
# Create list to store pp_check plots
int_ord_pp_plots <- list()

# Generate pp_check plot for each outcome
for(outcome in fearpol_ord) {
  int_ord_pp_plots[[outcome]] <- pp_check(int_ord_mods[[outcome]]) +
    ggtitle(gsub("you", " You", tools::toTitleCase(outcome))) +
    theme(plot.title = element_text(size = 8, hjust = 0.5))
}

# Combine all plots in a 5x2 layout
int_ord_pp_grid <- wrap_plots(int_ord_pp_plots, ncol = 2, nrow = 5) +
  plot_annotation(
    title = "Posterior Predictive Checks for Cumulative Probit Models",
    subtitle = "Dark line = observed data, Light lines = posterior predictions",
    theme = theme(
      plot.title = element_text(size = 10),
      plot.subtitle = element_text(size = 8)
    )
  )

int_ord_pp_grid

9.4 Comparing int-only lin & ord mods

Now, let’s pull ppcheck plots together. We will go with the density plots for linear models as it effectively displays the underlying normal curve assumptions underlying these models. However, I think the bar charts are more effective at showing model recovery of data for ordinal plots.

Show code
# Function to generate ppcheck plots 
for(outcome in fearpol_new) {
  ord_outcome <- paste0(outcome, "_ord")
  # Linear model density plot 
  int_lin_pp_plots[[outcome]] <- pp_check(int_lin_mods[[outcome]]) +
    ggtitle(fear_labels[outcome]) +
    coord_cartesian(xlim = c(-3, 7)) +
    scale_x_continuous(breaks = seq(0, 4, 1)) +
    scale_color_manual(
      values = c("y" = "#FD9B6BFF", "yrep" = "#782281FF"),
      labels = c("y" = "Observed", "yrep" = "Replicated")
    ) +
    theme(
      text = element_text(family = "serif"),
      plot.title = element_text(size = 14, hjust = 0.5),
      axis.text.x = if(outcome %in% c("shootyou", "killyou")) 
                    element_text(size=14) else element_blank(),
      axis.text.y = element_blank(),
      legend.position = "bottom",
      legend.title = element_text(size = 14), 
      legend.text = element_text(size = 14)
    ) +
    guides(color = guide_legend(title = "Linear:"))

  # Ordinal model bar plot 
  int_ord_pp_plots[[outcome]] <- pp_check(
    int_ord_mods[[ord_outcome]], 
    type = "bars", 
    ndraws = 1000, 
    linewidth = 1.5, 
    size = 3/4) +
    scale_x_continuous(breaks = 1:5, labels = 0:4) +  
    scale_fill_manual(
      values = c("y" = "#FD9B6BFF", "yrep" = "#782281FF"),
      labels = c("y" = "Observed", "yrep" = "Replicated")
    ) +
    scale_colour_manual(
      values = c("y" = "#FD9B6BFF", "yrep" = "#782281FF"), 
      labels = c("y" = "Observed", "yrep" = "Replicated")
    ) +
    theme(
      text = element_text(family = "serif"),
      plot.title = element_blank(),
      axis.text.x = if(outcome %in% c("shootyou", "killyou")) 
                    element_text(size=14) else element_blank(),
      axis.text.y = element_blank(),
      axis.title.y = element_blank(),
      legend.position = "bottom",
      legend.title = element_text(size = 14), 
      legend.text = element_text(size = 14)
    ) +
    guides(fill = guide_legend(title = "Ordinal:"),
           color = guide_legend(title = element_blank()))
}

# Define layout (last  extra row for model labels)
layout <- "
ABCD
EFGH
IJKL
MNOP
QRST
UVWX"  # Added row for model labels

# Add text annotations for model types
model_labels <- ggplot() + 
  annotate("text", x = 0.5, y = 0.5, 
           label = "Linear Model", 
           size = 5,
           family = "serif") +  
  theme_void()

model_labels2 <- ggplot() + 
  annotate("text", x = 0.5, y = 0.5, 
           label = "Ordinal Model", 
           size = 5,
           family = "serif") +  
  theme_void()

# Combine all plots in sequence
pp_comparison <- (int_lin_pp_plots[["stopyou"]] + # A
                int_ord_pp_plots[["stopyou"]] + # B
                int_lin_pp_plots[["searchyou"]] + # C
                int_ord_pp_plots[["searchyou"]] + # D
                int_lin_pp_plots[["yellyou"]] + # E
                int_ord_pp_plots[["yellyou"]] + # F
                int_lin_pp_plots[["handcyou"]] + # G
                int_ord_pp_plots[["handcyou"]] + # H
                int_lin_pp_plots[["kickyou"]] + # I
                int_ord_pp_plots[["kickyou"]] + # J
                int_lin_pp_plots[["pinyou"]] + # K
                int_ord_pp_plots[["pinyou"]] + # L
                int_lin_pp_plots[["sprayyou"]] + # M
                int_ord_pp_plots[["sprayyou"]] + # N
                int_lin_pp_plots[["taseyou"]] + # O
                int_ord_pp_plots[["taseyou"]] + # P
                int_lin_pp_plots[["shootyou"]] + # Q
                int_ord_pp_plots[["shootyou"]] + # R
                int_lin_pp_plots[["killyou"]] + # S
                int_ord_pp_plots[["killyou"]] + # T
                model_labels + # U
                model_labels2 + # V
                model_labels + # W
                model_labels2) + # X
  plot_layout(design = layout, 
              guides = 'collect', 
              heights = c(1, 1, 1, 1, 1, 0.3)) +
  plot_annotation(
    title = "FIGURE 3: Posterior Predictive Checks for Linear vs. Ordinal Models",
    subtitle = "Pairs show linear model density checks (left) and ordinal model bar checks (right)",
    theme = theme(
      text = element_text(family = "serif"),
      plot.title = element_text(size = 12),
      plot.subtitle = element_text(size = 11),
      legend.position = "bottom"
    )
  )

9.4.1 FIG3

Show code
pp_comparison

As Figure 3 shows, the threshold-only cumulative probit model predictions are a much better fit to the data than are the comparable intercept-only linear models. This is unsurprising, as additional parameters are estimated in CP model to summarize each item’s ordinal distribution.

We can quantify and compare the relative fit of the intercept-only linear regression and cumulative probit models for each fear of police item using Pareto-smoothed importance sampling leave-one-out cross-validation (PSIS-LOO-CV) and calculating expected log predictive density (ELPD) for each model. The ELPD provides a measure of predictive accuracy, with higher values indicating better predictive performance and, by comparing its predictive performance to unseen data, it penalizes models for extra parameters.

Now, Let’s use PSIS-LOO-CV to compare model fits with ELPD.

9.4.2 ELPD comparisons

Show code
# Check if cached comparisons exist
if (file.exists("Models/fit_comparisons.rds")) {
  fit_comparisons <- readRDS("Models/fit_comparisons.rds")
} else {
  # Function to compare linear and ordinal models for one outcome
  compare_fits <- function(lin_mod, ord_mod, outcome_name) {
    # Get LOO for each model
    loo_lin <- loo(lin_mod)
    loo_ord <- loo(ord_mod)
    
    # Get pointwise log-likelihood values
    ll_lin <- loo_lin$pointwise[,"elpd_loo"]
    ll_ord <- loo_ord$pointwise[,"elpd_loo"]
    
    # Compare total log-likelihood
    data.frame(
      outcome = outcome_name,
      elpd_lin = sum(ll_lin),
      elpd_ord = sum(ll_ord),
      diff = sum(ll_ord) - sum(ll_lin),
      se = sqrt(length(ll_lin) * var(ll_ord - ll_lin))
    )
  }
  
  # Create empty dataframe to store all comparisons
  fit_comparisons <- data.frame()
  
  # Loop through all outcomes and compare models
  for(outcome in fearpol_new) {
    # Get linear model comparison for ordinal outcome
    ord_outcome <- paste0(outcome, "_ord")
    
    comp <- compare_fits(
      lin_mod = int_lin_mods[[outcome]],
      ord_mod = int_ord_mods[[ord_outcome]],
      outcome_name = outcome
    )
    
    fit_comparisons <- rbind(fit_comparisons, comp)
  }
  
  fit_comparisons$outcome <- fear_labels[fit_comparisons$outcome]
  
  # Save the comparisons
  saveRDS(fit_comparisons, "Models/fit_comparisons.rds")
}

9.4.3 TAB1

Show code
# View results in gt() table

fit_comparisons %>%
  mutate(diff_se = diff/se) %>%
  gt() %>%
  cols_label(
    outcome = "Outcome",
    elpd_lin = "Linear ELPD",
    elpd_ord = "Ordinal ELPD", 
    diff = "Difference",
    se = "SE",
    diff_se = "Diff/SE"
  ) %>%
  fmt_number(
    columns = c("elpd_lin", "elpd_ord", "diff", "se", "diff_se"),
    decimals = 2
  )  
Outcome Linear ELPD Ordinal ELPD Difference SE Diff/SE
Stop You −1,738.41 −1,623.50 114.90 11.42 10.06
Search You −1,748.40 −1,625.31 123.09 11.83 10.41
Yell at You −1,751.51 −1,630.70 120.81 11.65 10.37
Handcuff You −1,788.46 −1,640.93 147.53 13.01 11.34
Kick/Hit You −1,847.02 −1,629.02 218.00 16.57 13.15
Pin You Down −1,856.14 −1,626.34 229.81 17.09 13.45
Pepper Spray You −1,848.66 −1,630.11 218.55 16.55 13.20
Tase You −1,858.72 −1,627.75 230.97 17.10 13.50
Shoot at You −1,893.01 −1,603.83 289.18 19.50 14.83
Kill You −1,906.10 −1,589.35 316.74 20.42 15.51

As noted above, the ELPD provides a measure of predictive accuracy, with higher values indicating better predictive performance. The ‘diff’ column shows the difference in ELPD between the ordinal and linear models (ordinal minus linear), with positive values favoring the ordinal model. The standard error (se) quantifies our uncertainty in these differences.

These tables confirm what we observed in the posterior predictive checks (Figure 2), where ordinal models better captured the discrete nature of the response categories compared to the continuous approximation used in linear models. The ordinal models showed consistently better predictive performance across all items, with ELPD differences ranging from about 115 to 319 units. The improvements were particularly substantial for more severe items like “Shoot at You” (diff ≈ 289) and “Kill You” (diff ≈ 317), which exhibited more extreme bimodal distributions. Generally, the performance of modeling ordinal variables using continuous Gaussian approximation will be better for variables with mean-centered distributions and several response categories that exhibit symmetric decay about the mean.

For statistical testing with LOO-CV comparisons, one might consider differences meaningful when they exceed twice their standard error (similar to a z-test). In this case, all differences are well over 10 standard errors from zero (around 12-20), providing very strong evidence for the superior fit of the ordinal models.

Now let’s add race as a predictor in our models.

10 Item-specific race differences in fear

Goal: Compare linear and cumulative probit regression results of item-specific analyses estimating race differences in fear of police on latent item scales (regression betas)

10.1 Linear & CP models

Show code
# List to store race linear models
race_lin_mods <- list()

# Run linear models for each outcome
for(outcome in fearpol_new) {
  # Create file path for cached model
  model_path <- paste0("Models/race_lin_mod_", outcome, ".rds")
  
  # Check if cached model exists
  if (file.exists(model_path)) {
    race_lin_mods[[outcome]] <- readRDS(model_path)
  } else {
    # Linear model with race
    race_lin_mods[[outcome]] <- stan_glm(
      formula = as.formula(paste0(outcome, " ~ race")),
      data = dat,
      family = gaussian,
      chains = 4,
      cores = nCoresphys,
      seed = 1138
    )
    # Save model
    saveRDS(race_lin_mods[[outcome]], model_path)
  }
}

# List to store ordinal race models
race_ord_mods <- list()

# Prior for ordinal race models
ord_prior <- prior(normal(-0.84, 1), class = "Intercept", coef = 1) +
             prior(normal(-0.25, 1), class = "Intercept", coef = 2) +
             prior(normal(0.25, 1), class = "Intercept", coef = 3) +
             prior(normal(0.84, 1), class = "Intercept", coef = 4) +
             prior(normal(0, 1), class = "b")  # Prior for race effect

# Run ordinal models for each outcome
for(outcome in fearpol_new) {
  ord_outcome <- paste0(outcome, "_ord")
  
  # Fit and store model with proper naming
  race_ord_mods[[ord_outcome]] <- brm(
    formula = bf(paste0(ord_outcome, " ~ race")),
    data = dat,
    family = cumulative("probit"),
    prior = ord_prior,
    iter = 2000, 
    warmup = 1000,
    chains = 4,
    cores = nCoresphys,
    seed = 1138,
    file = paste0("Models/race_ord_mod_", outcome),
    file_refit = "on_change"
  )
}

As before, let’s check detailed output from the linear and ordinal models of the first outcome item.

Show code
pp_check(race_lin_mods[["stopyou"]])

Show code
pp_check(race_lin_mods[["stopyou"]], 
         plotfun = "stat_grouped", stat="mean", group="race")

Show code
pp_check(race_lin_mods[["stopyou"]], 
         plotfun = "stat_grouped", stat="median", group="race")

Show code
summary(race_lin_mods[["stopyou"]], 
        probs = c(0.025, 0.975), 
        digits = 2) #summarize results

Model Info:
 function:     stan_glm
 family:       gaussian [identity]
 formula:      stopyou ~ race
 algorithm:    sampling
 sample:       4000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 1020
 predictors:   2

Estimates:
              mean   sd   2.5%   97.5%
(Intercept) 1.44   0.05 1.33   1.55   
raceBlack   1.09   0.08 0.94   1.24   
sigma       1.21   0.03 1.16   1.27   

Fit Diagnostics:
           mean   sd   2.5%   97.5%
mean_PPD 1.99   0.05 1.88   2.09   

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
              mcse Rhat n_eff
(Intercept)   0.00 1.00 3823 
raceBlack     0.00 1.00 3809 
sigma         0.00 1.00 4261 
mean_PPD      0.00 1.00 3956 
log-posterior 0.03 1.00 1717 

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
Show code
prior_summary(race_lin_mods[["stopyou"]]) #check priors  
Priors for model 'race_lin_mods[["stopyou"]]' 
------
Intercept (after predictors centered)
  Specified prior:
    ~ normal(location = 2, scale = 2.5)
  Adjusted prior:
    ~ normal(location = 2, scale = 3.3)

Coefficients
  Specified prior:
    ~ normal(location = 0, scale = 2.5)
  Adjusted prior:
    ~ normal(location = 0, scale = 6.6)

Auxiliary (sigma)
  Specified prior:
    ~ exponential(rate = 1)
  Adjusted prior:
    ~ exponential(rate = 0.75)
------
See help('prior_summary.stanreg') for more details
Show code
pp_check(race_ord_mods[["stopyou_ord"]])

Show code
pp_check(
  race_ord_mods[["stopyou_ord"]], 
  type = "bars", 
  ndraws = 1000, 
  linewidth = 1/2, 
  size = 1/4) +
  labs(subtitle = "Race model") 

Show code
pp_check(
  race_ord_mods[["stopyou_ord"]], 
  type = "bars_grouped",
  group = "race",
  ndraws = 1000, 
  linewidth = 1/2, 
  size = 1/4) +
  labs(subtitle = "Race model")

Show code
summary(race_ord_mods[["stopyou_ord"]], 
        probs = c(0.025, 0.975), 
        digits = 2) #summarize results
 Family: cumulative 
  Links: mu = probit; disc = identity 
Formula: stopyou_ord ~ race 
   Data: dat (Number of observations: 1020) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]    -0.53      0.05    -0.63    -0.42 1.00     4395     2762
Intercept[2]     0.04      0.05    -0.06     0.14 1.00     5212     2995
Intercept[3]     0.83      0.06     0.72     0.94 1.00     4676     3148
Intercept[4]     1.57      0.07     1.45     1.70 1.00     4217     2953
raceBlack        0.92      0.07     0.78     1.05 1.00     4663     2612

Further Distributional Parameters:
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Show code
prior_summary(race_ord_mods[["stopyou_ord"]]) #check priors  
                prior     class      coef group resp dpar nlpar lb ub
         normal(0, 1)         b                                      
         normal(0, 1)         b raceBlack                            
 student_t(3, 0, 2.5) Intercept                                      
     normal(-0.84, 1) Intercept         1                            
     normal(-0.25, 1) Intercept         2                            
      normal(0.25, 1) Intercept         3                            
      normal(0.84, 1) Intercept         4                            
       source
         user
 (vectorized)
      default
         user
         user
         user
         user

Again, the linear model recovers conditional group means, which is what it is designed to do. However, it is an otherwise poor fit to the data. For instance, the model-implied medians are very far from the observed medians for both white and black participants.

Note the cumulative probit still recovers the data far better than the linear model, though its simplifying assumptions result in some mismatch between the predictions and data for some of the thresholds in the grouped bar chart. If we had sufficient data and were worried about the slippage in fit (and more worried about it than overfitting with a more complex model), we could estimate category-specific ordinal models instead. We will illustrate how to do that for the supplement. For now though, let’s see if we can plot the effects of race (black) on latent y scale for each outcome by model.

10.1.1 FIG4

Plot latent y scale contrasts, all outcomes

Show code
# Check coefficient names in first models
# names(as_draws_df(race_lin_mods[["stopyou"]]))
# names(as_draws_df(race_ord_mods[["stopyou_ord"]]))

# Create ordered factor for outcomes
outcome_order <- fear_labels[fearpol_new]

# Extract posterior samples for race effects from both model types
all_effects <- map2_dfr(fearpol_new, fear_labels, ~{
  # Get linear model effects
  lin_draws <- as_draws_df(race_lin_mods[[.x]]) %>%
    transmute(effect = raceBlack) %>%  
    median_qi(.width = .95) %>%
    mutate(
      model = "Linear",
      outcome = .y
    )
  
  # Get ordinal model effects
  ord_draws <- as_draws_df(race_ord_mods[[paste0(.x, "_ord")]]) %>%
    transmute(effect = b_raceBlack) %>%  
    median_qi(.width = .95) %>%
    mutate(
      model = "Ordinal",
      outcome = .y
    )
  
  # Combine
  bind_rows(lin_draws, ord_draws)
})

# Create plot
ggplot(all_effects, aes(x = effect, y = outcome, color = model)) + 
  geom_linerange(aes(xmin = .lower, xmax = .upper), 
                linewidth = 1.5,
                position = position_dodge(width = 0.4)) +
  geom_point(shape = 124, 
             size = 5,
             position = position_dodge(width = 0.4)) +
  geom_vline(xintercept = 0, linetype = "dashed") +
  scale_color_manual(values = c("Linear" = "#FD9B6BFF", 
                               "Ordinal" = "#782281FF")) +
  scale_y_discrete(limits = rev(outcome_order)) +
  scale_x_continuous(lim = c(-.05, 2.05), breaks = seq(0, 2, .25)) +
  labs(x = "Effect of race (Black) on latent scale (beta est + 95% CrI)",
       y = "", 
       color = "") +
  theme_minimal() +
  theme(
    text = element_text(family = "serif"),
    legend.position = "bottom",
    panel.grid.major.y = element_blank(),
    panel.grid.minor = element_blank(),
    axis.line = element_line(color = "black"),
    axis.ticks = element_line(color = "black"), 
    axis.text = element_text(size = 14),
    axis.title.x = element_text(size = 14),
    legend.text = element_text(size = 14)
  )

At first glance, this plot seems to suggest the linear model overestimates the effects of race on the latent y scale for all 10 items. But by how much? Or, more importantly, since the linear and cumulative probit models assume very different scales for each outcome, what do these differences really mean?

Recall that the linear model is assuming an unbounded metric continuous outcome, so smaller effects could be relative to a much larger set of model-implied outcome values (i.e., with “plausible” predictions extending below y=0 and above y=4). Meanwhile, the cumulative probit model accurately assumes a discrete distribution with y-values bounded between 0 and 4, so the larger CP estimates on the latent y scale are interpreted relative to a more narrowly restricted range of y values. In short, though the CP beta coefficient is intended to be somewhat comparable to the beta from a linear model, where a one-unit increase in X (or Black vs. White contrast) is associated with a beta-unit increase in y, it is hard to precisely interpret differences in results across these models. As a result, it is understandable that people might observe estimates from different modeling specifications like these that are in the same direction, implying similar statistical inferential conclusions (e.g., all 95% CrIs exclude zero), and apparently somewhat similar magnitudes and then interpret both modeling techniques as generating “similar substantive conclusions.” But do they generate similar substantive conclusions? Perhaps, or perhaps not - it depends upon how precise we want or need our scientific conclusions to be. Let’s see if we can do better.

With ordinal models, we can calculate and plot predicted probability contrasts (black - white) for specific response categories and for each outcome. For now, we will simply focus on the predicted probability of reporting being “afraid” or “very afraid” (y>3). We will return to generating predictions for all thresholds later. However, this seems like it maps onto the implied estimand of the exemplar study, which aimed to estimate group-based differences in “fear” of police.

Interpretations of linear regression results often stop at effects on latent scales - e.g., interpreting beta coefficients as differences in conditional means of latent (though often fundamentally non-metric) distributions. Yet, since our Bayesian linear models generate a full posterior distribution, we can also use knowledge of the model’s assumptions (e.g., about standard normal distributions) and our model-based posterior predictions to similarly estimate the conditional predicted probabilities of responding “Afraid” or “Very afraid” for each group (i.e., y>=3) and uncertainty around those estimates.

10.2 Extracting useful signals

Goal: Extract race-specific predicted probabilities and calculate black-white contrasts and illustrate visualizations for conveying these estimates.

Let’s start by extracting predicted probabilities and contrasts (black - white) of “afraid” or “very afraid” (y>=3) from the ordinal model for the first outcome, then iterate across ordinal models for all outcomes.

Show code
# Function to get probabilities and contrasts for one outcome
get_prob_contrasts <- function(model, outcome_name) {
  # Get predictions for both race groups
  newdata <- data.frame(race = factor(c("White", "Black")))
  
  # Get posterior predictions
  preds <- posterior_epred(model, newdata = newdata)
  
  # Calculate P(y >= 3) for each draw
  p_high <- 1 - (preds[,,1] + preds[,,2] + preds[,,3])
  
  # Get probabilities for each group
  probs <- data.frame(
    race = c("White", "Black"),
    prob = c(mean(p_high[,1]), mean(p_high[,2])),
    lower = c(quantile(p_high[,1], 0.025), quantile(p_high[,2], 0.025)),
    upper = c(quantile(p_high[,1], 0.975), quantile(p_high[,2], 0.975))
  )
  
  # Calculate contrasts (Black - White)
  contrasts <- p_high[,2] - p_high[,1]
  
  # Summarize contrasts
  contrast_summary <- data.frame(
    outcome = outcome_name,
    estimate = median(contrasts),
    lower = quantile(contrasts, 0.025),
    upper = quantile(contrasts, 0.975)
  )
  
  # Return both
  list(
    probabilities = probs,
    contrasts = contrast_summary
  )
}

# Try it with one model
results <- get_prob_contrasts(
  race_ord_mods[["stopyou_ord"]], 
  "Stop You"
)

print("Predicted Probabilities of Afraid/Very Afraid:")
[1] "Predicted Probabilities of Afraid/Very Afraid:"
Show code
print(results$probabilities)
   race prob lower upper
1 White 0.20  0.17  0.23
2 Black 0.53  0.49  0.57
Show code
print("\nBlack-White Contrast:")
[1] "\nBlack-White Contrast:"
Show code
print(results$contrasts)
      outcome estimate lower upper
2.5% Stop You     0.33  0.28  0.37

Looks good. Let’s iterate across all ten ordinal models.

Show code
# Calculate probabilities and contrasts for all outcomes
all_results <- map2_dfr(
 fearpol_new,  # outcomes
 fear_labels,  # labels
 ~get_prob_contrasts(
   race_ord_mods[[paste0(.x, "_ord")]], 
   .y
 )$contrasts  # Just get contrasts for now
)

# Create plot
ggplot(all_results, 
      aes(y = fct_rev(fct_reorder(outcome, estimate)), 
          x = estimate, 
          xmin = lower, 
          xmax = upper)) +
 geom_pointrange(color = "#782281FF",
                size = 0.5) +
 geom_vline(xintercept = 0, linetype = "dashed") +
 labs(x = "Black-White Contrast in P(Afraid or Very Afraid)",
      y = "") +
 theme_minimal() +
 theme(
   panel.grid.major.y = element_blank(),
   panel.grid.minor = element_blank(),
   axis.line = element_line(color = "black"),
   axis.ticks = element_line(color = "black")
 )

Show code
# Also save probabilities table
all_probs <- map2_dfr(
 fearpol_new,
 fear_labels,
 ~get_prob_contrasts(
   race_ord_mods[[paste0(.x, "_ord")]], 
   .y
 )$probabilities %>%
   mutate(outcome = .y)
)

all_probs %>% gt()
race prob lower upper outcome
White 0.20 0.17 0.23 Stop You
Black 0.53 0.49 0.57 Stop You
White 0.17 0.15 0.20 Search You
Black 0.46 0.42 0.50 Search You
White 0.22 0.19 0.25 Yell at You
Black 0.48 0.44 0.52 Yell at You
White 0.19 0.16 0.23 Handcuff You
Black 0.52 0.48 0.56 Handcuff You
White 0.18 0.15 0.21 Kick/Hit You
Black 0.54 0.50 0.58 Kick/Hit You
White 0.18 0.15 0.22 Pin You Down
Black 0.56 0.52 0.60 Pin You Down
White 0.19 0.16 0.22 Pepper Spray You
Black 0.55 0.51 0.59 Pepper Spray You
White 0.19 0.16 0.23 Tase You
Black 0.57 0.53 0.61 Tase You
White 0.18 0.15 0.21 Shoot at You
Black 0.58 0.54 0.62 Shoot at You
White 0.17 0.14 0.20 Kill You
Black 0.58 0.54 0.62 Kill You

Now let’s see if we can use our posterior predictions to estimate comparable quantities for the linear models. As before, we will start with one model and then iterate acros all outcomes. First, though, we need a function to help us calculate the predicted probability of a y>=3 response given the linear model-implied predictions. Let’s write out the logic of what we need.

For a linear model with normal errors, to get P(y ≥ 3) for each group:

For Whites:

Mean = intercept SD = sigma

I want P(y ≥ 3), which should translate to: 1 - Φ((3 - intercept)/sigma)

For Blacks:

Mean = intercept + beta(race) SD = sigma Want P(y ≥ 3) This is 1 - Φ((3 - [intercept + beta])/sigma)

Where Φ is the standard normal CDF.

So, for each posterior draw we want to:

Calculate the mean for each group using intercept and race effect Use the sigma from that draw Calculate P(y ≥ 3) using the normal CDF Then get contrasts between groups

Show code
# Function to get probabilities and contrasts for one linear model
get_prob_contrasts_lin <- function(model, outcome_name) {
  # Get posterior draws
  draws <- as_draws_df(model)
  
  # Calculate P(y >= 3) for each draw
  p_white <- 1 - pnorm(3, 
                       mean = draws$`(Intercept)`, 
                       sd = draws$sigma)
  p_black <- 1 - pnorm(3, 
                       mean = draws$`(Intercept)` + draws$raceBlack, 
                       sd = draws$sigma)
  
  # Summarize probabilities
  probs <- data.frame(
    race = c("White", "Black"),
    prob = c(mean(p_white), mean(p_black)),
    lower = c(quantile(p_white, 0.025), quantile(p_black, 0.025)),
    upper = c(quantile(p_white, 0.975), quantile(p_black, 0.975))
  )
  
  # Calculate contrasts from each draw
  contrasts <- p_black - p_white
  
  contrast_summary <- data.frame(
    outcome = outcome_name,
    estimate = mean(contrasts),
    lower = quantile(contrasts, 0.025),
    upper = quantile(contrasts, 0.975)
  )
  
  list(probabilities = probs, contrasts = contrast_summary)
}

Let’s try our function with one model.

Show code
# Try with stopyou
results_lin <- get_prob_contrasts_lin(
 race_lin_mods[["stopyou"]], 
 "Stop You"
)

print("Linear Model Results for 'Stop You':")
[1] "Linear Model Results for 'Stop You':"
Show code
print("\nPredicted Probabilities of Afraid/Very Afraid by Race:")
[1] "\nPredicted Probabilities of Afraid/Very Afraid by Race:"
Show code
print(results_lin$probabilities)
   race  prob lower upper
1 White 0.099 0.082  0.12
2 Black 0.349 0.317  0.38
Show code
print("\nBlack-White Contrast:")
[1] "\nBlack-White Contrast:"
Show code
print(results_lin$contrasts)
      outcome estimate lower upper
2.5% Stop You     0.25  0.21  0.29
Show code
# Let's also print mean estimates from model for reference
print("\nModel Coefficients:")
[1] "\nModel Coefficients:"
Show code
print(summary(race_lin_mods[["stopyou"]]))

Model Info:
 function:     stan_glm
 family:       gaussian [identity]
 formula:      stopyou ~ race
 algorithm:    sampling
 sample:       4000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 1020
 predictors:   2

Estimates:
              mean   sd   10%   50%   90%
(Intercept) 1.4    0.1  1.4   1.4   1.5  
raceBlack   1.1    0.1  1.0   1.1   1.2  
sigma       1.2    0.0  1.2   1.2   1.2  

Fit Diagnostics:
           mean   sd   10%   50%   90%
mean_PPD 2.0    0.1  1.9   2.0   2.1  

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
              mcse Rhat n_eff
(Intercept)   0.0  1.0  3823 
raceBlack     0.0  1.0  3809 
sigma         0.0  1.0  4261 
mean_PPD      0.0  1.0  3956 
log-posterior 0.0  1.0  1717 

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

Our function appears to have worked. Here is a breakdown from the first model (stopyou).

For Whites:

Mean = 1.4 SD = 1.2 We want P(y ≥ 3) To standardize: (3 - 1.4)/1.2 = 1.33 standard deviations P(Z ≥ 1.33) = 1 - Φ(1.33) ≈ 0.092 This matches our calculated probability of about 0.10

For Blacks:

Mean = 2.5 SD = 1.2 We want P(y ≥ 3) To standardize: (3 - 2.5)/1.2 = 0.417 standard deviations P(Z ≥ 0.417) = 1 - Φ(0.417) ≈ 0.338 This matches our calculated probability of about 0.35

We can verify these in R using model estimates.

Show code
# For Whites
1 - pnorm((3-1.4)/1.2)
[1] 0.091
Show code
# For Blacks  
1 - pnorm((3-2.5)/1.2)
[1] 0.34

The contrast of 0.25 (95% CI: 0.21, 0.29) represents the difference in these probabilities.

Now, let’s iterate across the remaining outcomes.

Show code
# Get all linear model results
all_probs_lin <- map2_dfr(
  fearpol_new,
  fear_labels,
  ~get_prob_contrasts_lin(
    race_lin_mods[[.x]], 
    .y
  )$probabilities %>%
    mutate(outcome = .y)
)

all_contrasts_lin <- map2_dfr(
  fearpol_new,
  fear_labels,
  ~get_prob_contrasts_lin(
    race_lin_mods[[.x]], 
    .y
  )$contrasts
)

# Get all ordinal model results
all_probs_ord <- map2_dfr(
  fearpol_new,
  fear_labels,
  ~get_prob_contrasts(
    race_ord_mods[[paste0(.x, "_ord")]], 
    .y
  )$probabilities %>%
    mutate(outcome = .y)
)

all_contrasts_ord <- map2_dfr(
  fearpol_new,
  fear_labels,
  ~get_prob_contrasts(
    race_ord_mods[[paste0(.x, "_ord")]], 
    .y
  )$contrasts
)

# Add model type to contrasts for plotting later
all_contrasts_lin$model <- "Linear"
all_contrasts_ord$model <- "Ordinal"
Show code
# Linear Model Probabilities
all_probs_lin %>% 
 gt() %>%
 fmt_number(columns = c(prob, lower, upper), decimals = 3) %>%
 tab_header(
   title = "TABLE 1. Linear Model Predicted Probabilities",
   subtitle = "Probability of Afraid or Very Afraid Response by Race"
 )
TABLE 1. Linear Model Predicted Probabilities
Probability of Afraid or Very Afraid Response by Race
race prob lower upper outcome
White 0.099 0.082 0.118 Stop You
Black 0.349 0.317 0.382 Stop You
White 0.091 0.075 0.110 Search You
Black 0.307 0.277 0.339 Search You
White 0.118 0.099 0.138 Yell at You
Black 0.313 0.283 0.345 Yell at You
White 0.096 0.079 0.114 Handcuff You
Black 0.344 0.312 0.377 Handcuff You
White 0.086 0.070 0.103 Kick/Hit You
Black 0.368 0.336 0.400 Kick/Hit You
White 0.091 0.075 0.109 Pin You Down
Black 0.385 0.351 0.420 Pin You Down
White 0.094 0.077 0.113 Pepper Spray You
Black 0.368 0.335 0.401 Pepper Spray You
White 0.094 0.077 0.113 Tase You
Black 0.383 0.350 0.416 Tase You
White 0.088 0.072 0.105 Shoot at You
Black 0.412 0.378 0.447 Shoot at You
White 0.088 0.072 0.105 Kill You
Black 0.419 0.388 0.453 Kill You
Show code
# Ordinal Model Probabilities
all_probs_ord %>% 
 gt() %>%
 fmt_number(columns = c(prob, lower, upper), decimals = 3) %>%
 tab_header(
   title = "TABLE 2. Ordinal Model Predicted Probabilities",
   subtitle = "Probability of Afraid or Very Afraid Response by Race"
 )
TABLE 2. Ordinal Model Predicted Probabilities
Probability of Afraid or Very Afraid Response by Race
race prob lower upper outcome
White 0.203 0.173 0.235 Stop You
Black 0.533 0.493 0.571 Stop You
White 0.174 0.147 0.204 Search You
Black 0.464 0.425 0.504 Search You
White 0.222 0.190 0.254 Yell at You
Black 0.482 0.441 0.522 Yell at You
White 0.193 0.165 0.225 Handcuff You
Black 0.522 0.483 0.562 Handcuff You
White 0.176 0.148 0.206 Kick/Hit You
Black 0.545 0.504 0.584 Kick/Hit You
White 0.185 0.155 0.216 Pin You Down
Black 0.559 0.520 0.600 Pin You Down
White 0.191 0.162 0.223 Pepper Spray You
Black 0.551 0.510 0.591 Pepper Spray You
White 0.192 0.163 0.225 Tase You
Black 0.565 0.525 0.605 Tase You
White 0.175 0.147 0.206 Shoot at You
Black 0.581 0.540 0.620 Shoot at You
White 0.172 0.145 0.202 Kill You
Black 0.583 0.544 0.623 Kill You
Show code
# Linear Model Contrasts
all_contrasts_lin %>% 
 gt() %>%
 fmt_number(columns = c(estimate, lower, upper), decimals = 3) %>%
 tab_header(
   title = "TABLE 3. Linear Model Black-White Contrasts",
   subtitle = "Difference in Probability of Afraid or Very Afraid Response"
 )
TABLE 3. Linear Model Black-White Contrasts
Difference in Probability of Afraid or Very Afraid Response
outcome estimate lower upper model
Stop You 0.250 0.214 0.286 Linear
Search You 0.215 0.183 0.250 Linear
Yell at You 0.195 0.161 0.230 Linear
Handcuff You 0.249 0.212 0.284 Linear
Kick/Hit You 0.283 0.249 0.317 Linear
Pin You Down 0.293 0.256 0.329 Linear
Pepper Spray You 0.274 0.238 0.310 Linear
Tase You 0.289 0.254 0.324 Linear
Shoot at You 0.324 0.287 0.362 Linear
Kill You 0.331 0.296 0.367 Linear
Show code
# Ordinal Model Contrasts
all_contrasts_ord %>% 
 gt() %>%
 fmt_number(columns = c(estimate, lower, upper), decimals = 3) %>%
 tab_header(
   title = "TABLE 4. Ordinal Model Black-White Contrasts",
   subtitle = "Difference in Probability of Afraid or Very Afraid Response"
 )
TABLE 4. Ordinal Model Black-White Contrasts
Difference in Probability of Afraid or Very Afraid Response
outcome estimate lower upper model
Stop You 0.331 0.284 0.375 Ordinal
Search You 0.290 0.244 0.335 Ordinal
Yell at You 0.260 0.214 0.306 Ordinal
Handcuff You 0.328 0.283 0.373 Ordinal
Kick/Hit You 0.370 0.323 0.414 Ordinal
Pin You Down 0.375 0.329 0.421 Ordinal
Pepper Spray You 0.359 0.314 0.404 Ordinal
Tase You 0.373 0.327 0.418 Ordinal
Shoot at You 0.406 0.360 0.451 Ordinal
Kill You 0.410 0.366 0.456 Ordinal

Now, let’s generate a plot like before, but this time we are plotting predicted probabilities of y>=3 from linear and ordinal models instead of beta-based effects on latent outcome scale.

10.2.1 Plot Black-White contrasts by outcome & model

Show code
# Create ordered factor for outcomes
outcome_order <- fear_labels[fearpol_new]

# Combine contrasts from both models
all_contrasts <- bind_rows(all_contrasts_lin, all_contrasts_ord) %>% 
  mutate(
    outcome = factor(outcome, levels = rev(outcome_order))  # Use rev() here
  )

# Create plot
race_contrast_plot <- ggplot(all_contrasts, 
       aes(y = outcome,    # Remove fct_reorder()
           x = estimate, 
           xmin = lower, 
           xmax = upper,
           color = model)) +
  geom_pointrange(position = position_dodge(width = 0.8),
                 size = 3/4, linewidth=1.5) +
  geom_vline(xintercept = 0, linetype = "dashed") +
  scale_color_manual(values = c("Linear" = "#FD9B6BFF", 
                               "Ordinal" = "#782281FF")) +
    coord_cartesian(xlim = c(0, .5)) +
    scale_x_continuous(breaks = seq(0, .5, .1)) +
  labs(x = "Diff. in pred. prob. of \"afraid\" or \"very afraid\"\n(P(y>=3|Black) - P(y>=3|White)), by model",
       y = "", 
       color = "") +
  guides(
    color = guide_legend(
      override.aes = list(
        size = c(1, 1)
      ))) + 
  theme_minimal() +
  theme(
    text = element_text(family = "serif"),
    legend.position = "bottom",
    panel.grid.major.y = element_blank(),
    panel.grid.minor = element_blank(),
    axis.line = element_line(color = "black"),
    axis.ticks = element_line(color = "black"), 
    axis.text = element_text(size = 14),
    axis.title.x = element_text(size = 14),
    legend.text = element_text(size = 14)
  )

race_contrast_plot

10.2.2 Plot group-specific predicted probabilities by outcome & model

Note contrasts are larger in the ordinal model. This plot appears to show the exact opposite pattern seemingly implied by the plotted effects on the latent scale above. I will try to explain this later. First, let’s plot the group-specific predicted probabilities themselves rather than the contrasts so we can see how these contrasts are estimated.

Show code
# Define colors for legend order
model_colors <- c(
    "Linear.White" = "#FECD90FF",  # Lighter orange
    "Linear.Black" = "#FD9B6BFF", # Darker orange
    "Ordinal.White" = "#BA3878FF",  # Lighter purple
    "Ordinal.Black" = "#782281FF" # Darker purple
)

# Create ordered factor for outcomes
outcome_order <- fear_labels[fearpol_new]

# Combine probabilities from both models
all_probs <- bind_rows(
  all_probs_lin %>% mutate(model = "Linear"),
  all_probs_ord %>% mutate(model = "Ordinal")
) %>%
  mutate(
    outcome = factor(outcome, levels = rev(outcome_order)),
    race = factor(race, levels = c("White", "Black")),
    # Create ordered interaction for legend
    model_race = factor(interaction(model, race), 
                       levels = c("Linear.White", "Ordinal.White",
                                "Linear.Black", "Ordinal.Black"))
  )

race_pp_plot <- ggplot(all_probs, 
       aes(y = outcome, 
           x = prob, 
           xmin = lower, 
           xmax = upper,
           color = model_race,
           shape = race)) +
  geom_pointrange(aes(y = as.numeric(outcome) + 
                     if_else(model == "Ordinal", 0.3, 0)),
                 size = 3/4, linewidth = 1.5) +
  scale_color_manual(values = model_colors) +
  scale_shape_manual(
    values = c("White" = 3, "Black" = 16)) +
  scale_y_discrete(limits = rev(outcome_order)) +
  guides(
    shape = "none",
    color = guide_legend(
      override.aes = list(
        shape = c(3, 3, 16, 16)  
      ))
  ) +
  coord_cartesian(xlim = c(0, .7)) +
  scale_x_continuous(breaks = seq(0, .7, .1)) +
  labs(x = "Pred. prob. of \"afraid\" or \"very afraid\"\n(P(y>=3|race+e)), by model",
       y = "",
       color = "") +
  theme_minimal() +
  theme(
    text = element_text(family = "serif"),
    legend.position = "bottom",
    panel.grid.major.y = element_blank(),
    panel.grid.minor = element_blank(),
    axis.line = element_line(color = "black"),
    axis.ticks = element_line(color = "black")
  )

race_pp_plot

Let’s pull these plots together.

Show code
race_pp_plot + race_contrast_plot + 
  plot_annotation(
    title = "FIGURE 3: Race-Specific Predicted Probabilities of \"Afraid\" or \"Very Afraid\" Responses and Black-White Probability Contrasts, by Linear or Ordinal Models",
    theme = theme(
      plot.title = element_text(size = 12)
    ))

Before we move on, let’s make a version that includes the observed group-specific proportions for y>=3 and contrasts. We will also use different point shapes and fills to make it easier to tell which groups are represented by each interval estimate and clean up other aesthetics.

10.2.3 FIG5

Show code
# First calculate observed proportions for each outcome by race
obs_props <- dat %>%
  group_by(race) %>%
  summarize(
    across(all_of(fearpol_new), 
           ~mean(. >= 3, na.rm = TRUE)  # P(y >= 3)
    )
  ) %>%
  pivot_longer(
    cols = all_of(fearpol_new),
    names_to = "var",
    values_to = "prop"
  ) %>%
  mutate(
    outcome = factor(fear_labels[var], levels = rev(outcome_order))
  )

# Calculate observed contrasts 
obs_contrasts <- obs_props %>%
  pivot_wider(
    names_from = race,
    values_from = prop
  ) %>%
  mutate(
    contrast = Black - White,
    model = "Observed"  # Add model label for legend matching
  )


# Define colors with desired legend order
model_colors <- c(
    "Linear.White" = "#FD9B6BFF",    
    "Linear.Black" = "#FD9B6BFF", 
    "Ordinal.White" = "#782281FF",
    "Ordinal.Black" = "#782281FF",
    "Observed.White" = "black",       
    "Observed.Black" = "black"
)

# Create ordered factor levels for guide
guide_order <- c("Linear.White", "Linear.Black", "Ordinal.White", 
                 "Ordinal.Black", "Observed.White", "Observed.Black")

# Combine probabilities from both models with new order
all_probs <- bind_rows(
  all_probs_lin %>% mutate(model = "Linear"),
  all_probs_ord %>% mutate(model = "Ordinal")
) %>%
  mutate(
    outcome = factor(outcome, levels = rev(outcome_order)),
    race = factor(race, levels = c("White", "Black")),
    # Create ordered interaction for legend with new order
    model_race = factor(interaction(model, race), 
                       levels = guide_order[guide_order != "Observed.White" & 
                                          guide_order != "Observed.Black"])
  )

# Create base plot with new order
race_pp_plot <- ggplot(all_probs, 
       aes(y = outcome, 
           x = prob, 
           xmin = lower, 
           xmax = upper,
           color = model_race,
           shape = race)) +  
  geom_pointrange(aes(y = as.numeric(outcome) + 
                     if_else(model == "Ordinal", 0.3, 0),
                     fill = interaction(model, race)), 
                 size = 3,
                 linewidth = 1.5, 
                 fatten = 1,
                 shape = 21) +
  scale_color_manual(values = model_colors) +
  scale_shape_manual(
    values = c("White" = 21, "Black" = 21)) +
  scale_fill_manual(
    values = c("Linear.White" = "white", 
               "Linear.Black" = "#FD9B6BFF",
               "Ordinal.White" = "white", 
               "Ordinal.Black" = "#782281FF")) +
  scale_y_discrete(limits = rev(outcome_order)) +
  guides(
    shape = "none",
    fill = "none",
    color = guide_legend(
      override.aes = list(
        shape = c(21, 21, 21, 21),
        fill = c("white", "#FD9B6BFF", "white", "#782281FF")
      ))
  ) +
  coord_cartesian(xlim = c(0, .7)) +
  scale_x_continuous(breaks = seq(0, .7, .1)) +
  labs(x = "Pred. prob. of \"afraid\" or \"very afraid\"\n(P(y>=3|race)), by model",
       y = "",
       color = "") +
  theme_minimal() +
  theme(
    text = element_text(family = "serif"),
    legend.position = "bottom",
    panel.grid.major.y = element_blank(),
    panel.grid.minor = element_blank(),
    axis.line = element_line(color = "black"),
    axis.ticks = element_line(color = "black"),
    axis.text = element_text(size = 14),
    axis.title.x = element_text(size = 14),
    legend.text = element_text(size = 14)
  )

# Add observed points to prop plot
race_pp_plot2 <- race_pp_plot + 
  geom_point(
    data = obs_props,
    mapping = aes(
      x = prop, 
      y = outcome,
      color = factor(if_else(race == "White", 
                            "Observed.White", 
                            "Observed.Black"),
                    levels = guide_order)
      ),
    shape = 23,  
    size = 3,
    fill = if_else(obs_props$race == "White", "white", "black"),
    inherit.aes = FALSE
  ) +
  scale_color_manual(
    values = model_colors,
    breaks = guide_order
  ) +
  guides(
    shape = "none",
    color = guide_legend(
      override.aes = list(
        shape = c(21, 21, 21, 21, 23, 23),
        fill = c("white", "#FD9B6BFF", "white", "#782281FF", "white", "black"), 
        size = c(1, 1, 1, 1, 3, 3)
      ))
  )

# Add observed contrasts to contrast plot
race_contrast_plot2 <- race_contrast_plot + 
  geom_point(
    data = obs_contrasts,
    aes(x = contrast,
        y = outcome,
        color = model),  # Add to color aesthetic for legend
    shape = 23,  # Diamond shape
    fill = "black",
    size = 3,
    inherit.aes = FALSE
  ) +
  scale_color_manual(
    values = c("Linear" = "#FD9B6BFF", 
              "Ordinal" = "#782281FF",
              "Observed" = "black"),
    breaks = c("Linear", "Ordinal", "Observed")
  ) +
  guides(
    color = guide_legend(
      override.aes = list(
        shape = c(21, 21, 23),
        fill = c("#FD9B6BFF", "#782281FF", "black"),
        size = c(1, 1, 3)
      ))
  )

race_pp_plot2 + race_contrast_plot2 + 
  plot_annotation(
    title = "FIGURE 5: Race-Specific Predicted Probabilities of \"Afraid\" or \"Very Afraid\" Responses and Black-White Probability Contrasts, by Linear or Ordinal Models",
    theme = theme(
      plot.title = element_text(size = 12)
    ))

First, as expected, ordinal point-intervals accurately recover observed data. Second, though a naive interpretation of the beta estimates might suggest larger race estimates in the linear model, conversion to predicted probabilities shows this would be an inaccurate interpretation (more on this soon). All linear probability and contrast predictions substantially underestimate observed values. In fact, none of the linear model intervals successfully recovers the observed race-specific proportions or Black-White contrasts from any of the outcome items.

Let’s see if we can quantify and visualize the degree of underestimation in these linear models.

Show code
# calculate avg & range of linear underestimation of observed data

# print("Structure of obs_props:")
# str(obs_props)
# 
# print("\nStructure of all_probs_lin:")
# str(all_probs_lin)
# 
# print("\nStructure of obs_contrasts:")
# str(obs_contrasts)
# 
# print("\nStructure of all_contrasts_lin:")
# str(all_contrasts_lin)


# 1. Calculate probability underestimates
# Match on both outcome and race
prob_underestimates <- obs_props %>%
  mutate(outcome_str = as.character(outcome)) %>%
  left_join(
    all_probs_lin %>%
      mutate(outcome_str = gsub(" ", " ", outcome)), 
    by = c("outcome_str" = "outcome", "race")
  ) %>%
  mutate(
    difference = prop - prob
  )

# 2. Summary statistics for probability underestimates
prob_summary <- prob_underestimates %>%
  summarize(
    avg_underestimate = mean(difference, na.rm = TRUE),
    min_underestimate = min(difference, na.rm = TRUE),
    max_underestimate = max(difference, na.rm = TRUE)
  )

# 3. Summary by race
prob_by_race <- prob_underestimates %>%
  group_by(race) %>%
  summarize(
    avg_underestimate = mean(difference, na.rm = TRUE),
    min_underestimate = min(difference, na.rm = TRUE),
    max_underestimate = max(difference, na.rm = TRUE)
  )

# 4. Calculate contrast underestimates
contrast_underestimates <- obs_contrasts %>%
  mutate(outcome_str = as.character(outcome)) %>%
  left_join(
    all_contrasts_lin %>%
      mutate(outcome_str = gsub(" ", " ", outcome)), 
    by = c("outcome_str" = "outcome")
  ) %>%
  mutate(
    difference = contrast - estimate
  )

# 5. Summary statistics for contrast underestimates
contrast_summary <- contrast_underestimates %>%
  summarize(
    avg_underestimate = mean(difference, na.rm = TRUE),
    min_underestimate = min(difference, na.rm = TRUE),
    max_underestimate = max(difference, na.rm = TRUE)
  )

# 6. Print results
cat("Probability Underestimates (Observed - Linear):\n")
Probability Underestimates (Observed - Linear):
Show code
print(prob_underestimates %>% dplyr::select(outcome_str, race, observed = prop, predicted = prob, difference))
# A tibble: 20 × 5
   outcome_str      race  observed predicted difference
   <chr>            <chr>    <dbl>     <dbl>      <dbl>
 1 Stop You         White    0.192    0.0995     0.0930
 2 Search You       White    0.179    0.0915     0.0871
 3 Yell at You      White    0.234    0.118      0.116 
 4 Handcuff You     White    0.196    0.0959     0.101 
 5 Kick/Hit You     White    0.181    0.0857     0.0948
 6 Pin You Down     White    0.187    0.0915     0.0950
 7 Pepper Spray You White    0.198    0.0942     0.104 
 8 Tase You         White    0.196    0.0939     0.103 
 9 Shoot at You     White    0.173    0.0879     0.0847
10 Kill You         White    0.167    0.0880     0.0787
11 Stop You         Black    0.545    0.349      0.195 
12 Search You       Black    0.461    0.307      0.154 
13 Yell at You      Black    0.471    0.313      0.158 
14 Handcuff You     Black    0.521    0.344      0.177 
15 Kick/Hit You     Black    0.547    0.368      0.178 
16 Pin You Down     Black    0.564    0.385      0.179 
17 Pepper Spray You Black    0.548    0.368      0.180 
18 Tase You         Black    0.568    0.383      0.185 
19 Shoot at You     Black    0.591    0.412      0.179 
20 Kill You         Black    0.593    0.419      0.174 
Show code
cat("\nAverage Probability Underestimate:", round(prob_summary$avg_underestimate, 3), "\n")

Average Probability Underestimate: 0.14 
Show code
cat("Range of Probability Underestimates:", round(prob_summary$min_underestimate, 3), 
    "to", round(prob_summary$max_underestimate, 3), "\n")
Range of Probability Underestimates: 0.079 to 0.2 
Show code
cat("\nProbability Underestimates by Race:\n")

Probability Underestimates by Race:
Show code
print(prob_by_race)
# A tibble: 2 × 4
  race  avg_underestimate min_underestimate max_underestimate
  <chr>             <dbl>             <dbl>             <dbl>
1 Black            0.176             0.154              0.195
2 White            0.0957            0.0787             0.116
Show code
cat("\nContrast Underestimates (Observed - Linear):\n")

Contrast Underestimates (Observed - Linear):
Show code
print(contrast_underestimates %>% dplyr::select(outcome_str, observed = contrast, predicted = estimate, difference))
# A tibble: 10 × 4
   outcome_str      observed predicted difference
   <chr>               <dbl>     <dbl>      <dbl>
 1 Stop You            0.352     0.250     0.102 
 2 Search You          0.283     0.215     0.0673
 3 Yell at You         0.237     0.195     0.0415
 4 Handcuff You        0.325     0.249     0.0764
 5 Kick/Hit You        0.366     0.283     0.0832
 6 Pin You Down        0.377     0.293     0.0844
 7 Pepper Spray You    0.350     0.274     0.0762
 8 Tase You            0.371     0.289     0.0823
 9 Shoot at You        0.418     0.324     0.0943
10 Kill You            0.426     0.331     0.0949
Show code
cat("\nAverage Contrast Underestimate:", round(contrast_summary$avg_underestimate, 3), "\n")

Average Contrast Underestimate: 0.08 
Show code
cat("Range of Contrast Underestimates:", round(contrast_summary$min_underestimate, 3), 
    "to", round(contrast_summary$max_underestimate, 3), "\n")
Range of Contrast Underestimates: 0.042 to 0.1 
Show code
# 7. Visualize contrast underestimates

p1 <- ggplot(contrast_underestimates, aes(x = reorder(outcome_str, difference), y = difference)) +
  geom_col(fill = "#782281FF") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_text(aes(label = round(difference, 3)), hjust = -0.2) +
  labs(
    title = "Underestimates in Black-White Contrasts",
    subtitle = "Observed - Linear Model Predictions",
    x = "",
    y = "Difference (Observed - Predicted)"
  ) +
  coord_flip() +
  theme_minimal() +
  theme(
    panel.grid.minor = element_blank(),
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 14)
  )

print(p1)

Show code
# 8. Visualize probability underestimates by race
p2 <- ggplot(prob_underestimates, aes(x = reorder(outcome_str, difference), y = difference, fill = race)) +
  geom_col(position = "dodge") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  scale_fill_manual(values = c("White" = "#FECD90FF", "Black" = "#FD9B6BFF")) +
  labs(
    title = "Probability Underestimates by Outcome and Race",
    subtitle = "Observed - Linear Model Prediction",
    x = "",
    y = "Difference (Observed - Predicted)"
  ) +
  coord_flip() +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    panel.grid.minor = element_blank(),
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 14)
  )

print(p2)

The linear model, on average, underestimated the probability of “afraid” or “very afraid” responses by 0.14, with underestimates ranging from 0.079 to 0.2. Similarly, the linear model underestimated the Black-White contrast by an average of 0.08, with underestimates ranging from 0.042 to 0.1. This pattern aligns with the ceiling and floor effects described by Kruschke and Liddell (2018), where linear models struggle to represent data clustered at extreme values due to their inherent assumption of normality. Overall, these plots clearly demonstrate the linear model’s limitations in representing the observed data.

10.3 Explaining differences between latent & predicted probability scales

First, let’s review why we might generally expect to see different patterns for latent and predicted probability scales across linear and ordinal models.

Differences between Linear and Ordinal Models: - Linear models and ordinal models handle categorical outcomes differently. Linear models (like linear regression) treat the ordinal outcome as a continuous variable, which can lead to problematic assumptions when the underlying scale isn’t truly continuous.

Latent Scale vs. Predicted Probability: - Latent Scale (Betas): This represents the underlying linear predictor. In ordinal models (like probit or logit ordinal regression), this scale is transformed through a cumulative link function. - Predicted Probabilities: This is the actual (estimated) probability of being in a specific category or above a certain threshold.

Apparent discrepancies can occur because of the non-linear transformation of the latent scale in ordinal models. - Linear Model: Assumes a linear relationship across the entire scale, which can overestimate effects by treating the ordinal categories as equidistant. - Ordinal Model: Uses a cumulative link function that can capture non-linear relationships more accurately.

So, different patterns on the latent scale and on predicted probability scales might emerge because, when transformed to predicted probabilities, more or less pronounced differences in the ordinal model might emerge due to its more specific handling of category boundaries. This is a classic example of why ordinal models should be preferred for categorical ordered outcomes. As we saw in the intercept-only models above, the non-linear transformation in ordinal models allows for: - More accurate representation of category boundaries - Better handling of the inherent ordering in categorical responses - Potentially more realistic and directly interpretable modeling of the underlying psychological process

Now, let me try to specifically explain what is happening in this specific case.

  1. Normal Distribution Assumptions: In a linear model with normal distribution assumptions, the probability density is highest near the mean and symmetrically decreases as you move away from the center. This means:
  • Extreme values (very unafraid or very afraid) get “pulled” toward the central tendency
  • The model assumes roughly equal spacing between categories
  • Probabilities at the tails are compressed due to the normal distribution’s shape
  1. Bimodal and Skewed Distribution: With the observed item distribution (flat or bimodal with clustering at extremes), the linear model’s normal assumption becomes particularly problematic:
  • The symmetric normal distribution doesn’t match the actual data distribution
  • Extreme group-specific clusters (whites very unafraid, blacks very afraid) are not well-represented
  • Linear model tries to “average out” these extreme clusters
  1. Why Larger Latent Scale Effects:
  • The linear model compensates for the mismatch between its assumptions and data by increasing the coefficient magnitude on the latent scale
  • This allows the model to try to capture the group differences, but does so by inflating the linear predictor
  • However, due to normal distribution constraints, this doesn’t translate directly to extreme probability predictions
  1. Ordinal Model Advantages: The ordinal model (cumulative link model) is more flexible:
  • Uses cumulative probabilities across category thresholds
  • Can more accurately represent non-linear relationships
  • Doesn’t assume equal spacing or symmetric distribution
  • Directly models the probability of being at or above each threshold

Demonstration of the differences in effects by scaling:

  • Linear model, effect on latent “stopyou”: Coefficient of 1.09 for Black race
  • Ordinal model, effect on latent “stopyou”: Coefficient of 0.92 for Black race
  • Linear model, contrast PP(stopyou >= 3|black) - PP(stopyou >= 3|white): 0.250
  • Ordinal model, pred P(stopyou >= 3|black): 0.331
  • Despite slightly smaller coefficient magnitudes on latent scale, the probability contrasts for y>=3 is larger in ordinal model due to different distributional assumptions

11 Race differences in latent fear

Goal: Compare results from a linear model applied to PGC’s multi-item composite scale measuring latent fear of police to those from an ordinal multilevel model predicting latent fear of police.

11.1 Is a composite “latent” fear measure appropriate?

In PGC’s study, they did not estimate item-specific models. Rather, after reporting some common indicators of unidimensionality (e.g., high Cronbach’s alpha; significant factor loadings), they created a composite scale by averaging all ten outcome items (0 to 4, by 0.1) to essentially predict “latent” fear of police. Then, they normed or re-scaled the measure to range from 0 to 100 so they could interpret coefficients as effects on a “percentage” of the outcome scale.

Unfortunately, such transformations only provide the illusion of interpretability. We cannot take ordinal items, add them up, rescale them, and then accurately interpret results in metric scaling (e.g., as percentages). Averaging ordinal items assumes equal intervals between response categories, which is unlikely to be true (e.g., the “distance” between “very unafraid” and “unafraid” may not equal the distance between “unafraid” and “neither”). Also, creating a composite score by averaging loses information about item-specific response patterns and assumes all items equally represent the underlying construct.

Meanwhile, rescaling to 0-100 and interpreting as “percentages” is problematic because it implies metric properties that do not exist in the original ordinal data. The “percentage” interpretation suggests a meaningful zero point and equal units, but the transformation does not address the fundamental ordinal nature of the data.

Fortunately, there are alternatives. We can use ordinal models that respect the ordinal nature of the items and allow for unequal distances between thresholds. Also, we can estimate item-specific effects (like we did) or use multilevel modeling to account for item clustering while estimating overall effects on “latent” fear of police.

Before we illustrate such a multilevel modeling approach, it is helpful to visualize whether predicted probabilities for all five response categories (from 0 to 4) actually cluster across all ten outcome items, and whether response-specific contrasts also cluster across outcomes. If so, this would bolster our confidence in assuming the appropriateness of predicting overall effects on fear as a latent construct.

11.2 Do all ten outcome items appear to assess “latent” fear?

Let’s start by generating race-specific predicted probabilities and black-white contrasts for all thresholds and for all ten outcomes from cumulative probit models.

Show code
# Get predictions for both races
nd <- tibble(
  race = c("White", "Black")
)

# Get predictions for all models
race_preds <- purrr::map_dfr(
  race_ord_mods, 
  ~add_epred_draws(.x, newdata = nd), 
  .id = ".model"
) %>%
  ungroup()

# Calculate predicted probabilities by race
race_probs <- race_preds %>%
  group_by(.model, race, .category) %>%
  summarise(
    p = mean(.epred),
    ll = quantile(.epred, 0.025),
    ul = quantile(.epred, 0.975)
  )

# Calculate Black-White contrasts
race_contrasts <- race_preds %>%
  dplyr::select(.draw, .model, race, .category, .epred) %>%
  pivot_wider(names_from = race, values_from = .epred) %>%
  mutate(diff = Black - White) %>%
  group_by(.model, .category) %>%
  summarise(
    con = mean(diff),
    ll = quantile(diff, 0.025),
    ul = quantile(diff, 0.975)
  )

Now, let’s assume responses to each outcome represent indicators of a single underlying latent fear of police construct. If so, when we “stack” the race-specific probabilities and black-white contrasts from all models together, they should cluster across outcomes. Let’s see what happens when we plot probabilities and contrasts for all response categories, overlapping interval estimates for all 10 items.

Show code
# Predicted probabilities plot
p1 <- ggplot(race_probs, 
       aes(x = .category, y = p, ymin = ll, ymax = ul, 
           color = race)) +
  geom_hline(yintercept = scales::breaks_pretty()(range(0, .6)), 
             color = "lightgrey") +
  geom_linerange(linewidth = 1, 
                 position = position_dodge(width = 1/3)) +
  geom_point(aes(shape = race), 
             position = position_dodge(width = 1/3), 
             size = 6) +
  scale_color_manual(NULL, 
                    values = c("Black" = "#100B2DFF", 
                              "White" = "#FEC488FF")) +
  scale_shape_manual(NULL, values = c(3, 3)) +
  scale_y_continuous(limits = c(0, .61), 
                    breaks = scales::breaks_pretty()) +
  labs(title = "Predicted Category Probabilities by Race",
       x = "Response Category",
       y = "Predicted Probability") +
  theme_minimal() +
  theme(
    legend.background = element_blank(),
    legend.position = "bottom",
    panel.grid = element_blank(),
    axis.line = element_line(color = "black"),
    axis.ticks = element_line(color = "black")
  )

# Contrast plot
p2 <- ggplot(race_contrasts, aes(x = .category, y = con)) +
  geom_hline(yintercept = scales::breaks_pretty()(range(-0.4, 0.4)), 
             color = "lightgrey") +
  geom_hline(yintercept = 0, color = "black") +
  geom_pointrange(aes(ymin = ll, ymax = ul), 
                 linewidth = 1, 
                 size = 1.5, 
                 color = "#B2357BFF",
                 shape = 3) +
  scale_y_continuous(limits = c(-0.42, 0.42), 
                    breaks = scales::breaks_pretty()) +
  labs(title = "Black-White Contrasts in Predicted Probabilities",
       x = "Response Category",
       y = "Contrast") +
  theme_minimal() +
  theme(
    legend.background = element_blank(),
    panel.grid = element_blank(),
    axis.line = element_line(color = "black"),
    axis.ticks = element_line(color = "black")
  )

# Combine plots
p1 + p2 + 
  plot_annotation(
    title = "Visualizing Clustering of Estimates by Response Category Across Ordinal Models of All Ten Fear Items",
    theme = theme(
      plot.title = element_text(size = 12)
    ))

Overall, there is substantial clustering, though certain response categories (e.g., 0, 4) show more variation than others (2,3,4).

One of our goals here is to provide an intuitive, visualized sense of what our multilevel model will be doing later - essentially it will use each of these threshold-specific and item-specific (single-level) model estimates to generate overall aggregated estimates. Let’s try it.

11.3 CP: Multilevel model

Show code
# Data prep - need to convert to long format
dat_long <- dat %>%
  pivot_longer(
    all_of(fearpol_new),  # Use our vector of 10 fear items
    names_to = "outcome",
    values_to = "fear"
  ) %>%
  mutate(
    fear = factor(fear, ordered = TRUE),  # Ensure ordered factor
    outcome = factor(outcome)  # Factor for random effects grouping
  )

# Formula for hierarchical ordinal model
bf_fear <- bf(fear | thres(4) ~ race + (race | outcome))

# Priors (same thresholds as before, add random effects)
priors_hier <- c(
  # Fixed effect priors (same as before)
  prior(normal(-0.84, 1), class = "Intercept", coef = "1"),
  prior(normal(-0.25, 1), class = "Intercept", coef = "2"),
  prior(normal(0.25, 1), class = "Intercept", coef = "3"),
  prior(normal(0.84, 1), class = "Intercept", coef = "4"),
  prior(normal(0, 1), class = "b"),
  # Random effects priors
  prior(exponential(1), class = "sd"),
  prior(lkj(2), class = "cor")  # Prior for correlation of random effects
)

# Fit hierarchical model
hier_mod <- brm(
  bf_fear,
  data = dat_long,
  family = cumulative("probit"),
  prior = priors_hier,
  cores = nCoresphys,
  chains = 4,
  iter = 2000,
  warmup = 1000,
  seed = 1138,
  control = list(adapt_delta = 0.95),
  file = "Models/hier_fear_fit",
  file_refit = "on_change"
)
Show code
# coefgrp <- c("raceBlack")
mcmc_plot(hier_mod, 
          # variable = coefgrp, 
          regex = TRUE, 
          prob = 0.80, prob_outer = 0.95) + 
  labs(title = "Coefficient plot", 
       subtitle = "Posterior intervals for all beta coefficients 
       \nwith medians, 80%, and 95% intervals")

Show code
summary(hier_mod, 
        probs = c(0.025, 0.975), 
        digits = 2)
 Family: cumulative 
  Links: mu = probit; disc = identity 
Formula: fear | thres(4) ~ race + (race | outcome) 
   Data: dat_long (Number of observations: 10200) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~outcome (Number of levels: 10) 
                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept)                0.09      0.03     0.04     0.15 1.00     1720
sd(raceBlack)                0.17      0.05     0.10     0.29 1.00     1583
cor(Intercept,raceBlack)    -0.83      0.16    -0.99    -0.40 1.00     1457
                         Tail_ESS
sd(Intercept)                2303
sd(raceBlack)                2415
cor(Intercept,raceBlack)     2200

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]    -0.31      0.03    -0.38    -0.24 1.00     1898     2480
Intercept[2]     0.23      0.03     0.17     0.30 1.00     1881     2408
Intercept[3]     0.89      0.03     0.82     0.95 1.00     1867     2440
Intercept[4]     1.43      0.04     1.36     1.50 1.00     1983     2213
raceBlack        0.98      0.06     0.86     1.10 1.00     1557     2135

Further Distributional Parameters:
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Show code
prior_summary(hier_mod)
                prior     class      coef   group resp dpar nlpar lb ub
         normal(0, 1)         b                                        
         normal(0, 1)         b raceBlack                              
 student_t(3, 0, 2.5) Intercept                                        
     normal(-0.84, 1) Intercept         1                              
     normal(-0.25, 1) Intercept         2                              
      normal(0.25, 1) Intercept         3                              
      normal(0.84, 1) Intercept         4                              
 lkj_corr_cholesky(2)         L                                        
 lkj_corr_cholesky(2)         L           outcome                      
       exponential(1)        sd                                    0   
       exponential(1)        sd           outcome                  0   
       exponential(1)        sd Intercept outcome                  0   
       exponential(1)        sd raceBlack outcome                  0   
       source
         user
 (vectorized)
      default
         user
         user
         user
         user
         user
 (vectorized)
         user
 (vectorized)
 (vectorized)
 (vectorized)
Show code
# Posterior predictive checks
pp_check(hier_mod, ndraws = 100) +
  theme_minimal()

Show code
mcmc_plot(hier_mod, type = "dens_overlay") +
  theme_minimal()

Show code
pp_check(
  hier_mod, 
  type = "bars", 
  ndraws = 1000, 
  linewidth = 1/2, 
  size = 1/4) +
  labs(subtitle = "MLM latent fear") 

Show code
pp_check(
  hier_mod, 
  type = "bars_grouped",
  group = "race",
  ndraws = 1000, 
  linewidth = 1/2, 
  size = 1/4) +
  labs(subtitle = "MLM latent fear")

Show code
mcmc_plot(hier_mod, type = "trace") +
  theme_minimal()

Show code
# Rhat values
rhats <- rhat(hier_mod)
print("Rhat values > 1.01:")
[1] "Rhat values > 1.01:"
Show code
print(rhats[rhats > 1.01])
<NA> 
  NA 
Show code
# Effective sample sizes
neff <- neff_ratio(hier_mod)
print("ESS ratios < 0.1:")
[1] "ESS ratios < 0.1:"
Show code
print(neff[neff < 0.1])
<NA> 
  NA 
Show code
# Pairs plot for key parameters
pairs(hier_mod, pars = c("b_race", "sd_outcome__Intercept", "sd_outcome__race"))

Show code
# Check random effects
ranef(hier_mod)
$outcome
, , Intercept

          Estimate Est.Error     Q2.5   Q97.5
handcyou     0.038     0.046 -0.04777  0.1294
kickyou     -0.052     0.048 -0.15259  0.0385
killyou     -0.097     0.049 -0.20011 -0.0053
pinyou      -0.041     0.047 -0.13910  0.0510
searchyou    0.056     0.046 -0.03520  0.1497
shootyou    -0.085     0.049 -0.18762  0.0066
sprayyou    -0.030     0.047 -0.12450  0.0604
stopyou      0.089     0.048  0.00093  0.1902
taseyou     -0.038     0.048 -0.13418  0.0545
yellyou      0.126     0.050  0.03232  0.2327

, , raceBlack

          Estimate Est.Error   Q2.5  Q97.5
handcyou    -0.082     0.079 -0.239  0.076
kickyou      0.070     0.083 -0.090  0.237
killyou      0.238     0.085  0.081  0.416
pinyou       0.088     0.081 -0.069  0.252
searchyou   -0.170     0.081 -0.333 -0.013
shootyou     0.204     0.084  0.050  0.380
sprayyou     0.040     0.083 -0.128  0.199
stopyou     -0.133     0.080 -0.292  0.022
taseyou      0.079     0.082 -0.077  0.240
yellyou     -0.252     0.083 -0.417 -0.091
Show code
# Define paths for cached LOO results
hier_mod_loo_path <- "Models/hier_mod_loo.rds"

# Check if cached LOO exists
if (file.exists(hier_mod_loo_path)) {
  # Read cached LOO
  hier_mod_loo <- readRDS(hier_mod_loo_path)
} else {
  # Compute LOO if not cached
  hier_mod_loo <- loo(hier_mod)
  
  # Save LOO
  saveRDS(hier_mod_loo, hier_mod_loo_path)
}

print(hier_mod_loo$estimates)
         Estimate    SE
elpd_loo   -15263 46.98
p_loo          19  0.17
looic       30526 93.96

The multilevel model (MLM) appears to have fit without issue and recovers the underlying data reasonably well. Now we will extract population-level predictions using re_formula = NA to get predictions that marginalize over the random effects of outcomes, effectively giving us estimates for the latent “fear of police” construct. Then, we can calculate and plot race-specific posterior probabilities and black-white contrasts for each response category.

Show code
# Prediction grid 
nd <- tibble(
  race = factor(c("White", "Black"))
)

# Get predictions, marginalizing over random effects
mlm_preds <- add_epred_draws(hier_mod, 
                            newdata = nd, 
                            re_formula = NA) %>%
  ungroup()

# Get predicted probabilities for latent fear
mlm_probs <- mlm_preds %>%
  group_by(race, .category) %>%
  summarise(
    p = mean(.epred),
    ll = quantile(.epred, .025),
    ul = quantile(.epred, .975)
  )

# Calculate contrasts for latent fear
mlm_contrasts <- mlm_preds %>%
  dplyr::select(.draw, .category, race, .epred) %>%
  pivot_wider(names_from = race, values_from = .epred) %>%
  mutate(diff = Black - White) %>%
  group_by(.category) %>%
  summarise(
    con = mean(diff),
    ll = quantile(diff, .025),
    ul = quantile(diff, .975)
  )

Let’s plot MLM estimates.

Show code
# Predicted probabilities plot
p1 <- ggplot(mlm_probs, 
       aes(x = .category, y = p, ymin = ll, ymax = ul, 
           color = race)) +
  geom_pointrange(linewidth = 1, 
                 position = position_dodge(width = 1/3), 
                 size = 2.5,
                 linewidth = 2, 
                 fatten = 1) +  
  scale_color_manual(NULL, 
                    values = c("Black" = "#100B2DFF", 
                              "White" = "#FEC488FF")) +
  scale_shape_manual(NULL, values = c(16, 16)) +
  scale_y_continuous(limits = c(0, .61), 
                    breaks = scales::breaks_pretty()) +
  labs(title = "A. Predicted Category Probabilities by Race",
       x = "Response Category",
       y = "Predicted Probability") +
  theme_minimal() +
  theme(
    legend.background = element_blank(),
    legend.position = "bottom",
    panel.grid = element_blank(),
    axis.line = element_line(color = "black"),
    axis.ticks = element_line(color = "black")
  )

# Contrast plot
p2 <- ggplot(mlm_contrasts, aes(x = .category, y = con)) +
  geom_hline(yintercept = 0, color = "black") +
  geom_pointrange(aes(ymin = ll, ymax = ul), 
                 size = 3/4,
                 linewidth = 1,
                 color = "#B2357BFF") +
  labs(title = "B. Black-White Contrasts in Predicted Probabilities",
       x = "Response Category",
       y = "Contrast") +
  theme_minimal() +
  theme(
    legend.background = element_blank(),
    panel.grid = element_blank(),
    axis.line = element_line(color = "black"),
    axis.ticks = element_line(color = "black")
  )

# Combine plots
p1 + p2 + 
  plot_annotation(
    title = "MLM Estimates of Latent Fear",
    theme = theme(
      plot.title = element_text(size = 12)
    ))

Most of the 95% CrI error bars are relatively small, implying relatively precise estimates. This is the case even though we use re_formula = NA to marginalize over random effects, which means these interval estimates incorporate uncertainty from both fixed and random effects in the predictions. The narrower intervals may reflect a relatively large sample size (N=1,020), the examination of population-level effects averaged across all ten items, and the items showing relatively consistent effects (i.e., small variation in random effects; recall clustering of estimates in Figure 4).

Let’s pull all this together.

11.3.1 FIG6

Show code
# For mlm predictions, get full posterior for violin plots
mlm_preds_violin <- mlm_preds %>%
  group_by(race, .category, .draw) %>%
  summarise(
    p = mean(.epred),
    .groups = "drop"
  )

# Combined predictions plot
p1 <- ggplot() +
  # Individual model estimates (semi-transparent, jittered)
  geom_pointrange(data = race_probs,
                 aes(x = .category, y = p, ymin = ll, ymax = ul, 
                     color = race),
                 position = position_jitterdodge(
                   jitter.width = 0.2,
                   jitter.height = 0,
                   dodge.width = 0.4,
                   seed = 1138
                   ), 
                 alpha = 0.8,
                 linewidth = 1,
                 shape = 3) +
  # MLM estimates as violin plots
  geom_violin(data = mlm_preds_violin,
             aes(x = .category, y = p, fill = race),
             position = position_dodge(width = 0.4),
             alpha = 0.2,
             scale = "width") +
  scale_color_manual(NULL, 
                    values = c("Black" = "#100B2DFF", 
                              "White" = "#FEC488FF")) +
  scale_fill_manual(NULL,
                   values = c("Black" = "#100B2DFF", 
                             "White" = "#FEC488FF")) +
  scale_y_continuous(limits = c(0, .61), 
                    breaks = scales::breaks_pretty()) +
  labs(title = "Predicted Category Probabilities by Race",
       subtitle = "Semi-transparent intervals show individual items; violin plots show MLM estimates",
       x = "Response Category",
       y = "Predicted Probability") +
  theme_minimal() +
  theme(
    text = element_text(family = "serif"),
    legend.background = element_blank(),
    legend.position = "bottom",
    panel.grid = element_blank(),
    axis.line = element_line(color = "black"),
    axis.ticks = element_line(color = "black"), 
    axis.title = element_text(size = 14),
    axis.text = element_text(size = 14),
    legend.text = element_text(size = 14)
  )

# Similar approach for contrasts
mlm_contrasts_violin <- mlm_preds %>%
  dplyr::select(.draw, race, .category, .epred) %>%
  pivot_wider(names_from = race, values_from = .epred) %>%
  mutate(diff = Black - White) %>%
  group_by(.category, .draw) %>%
  summarise(
    con = mean(diff),
    .groups = "drop"
  )

p2 <- ggplot() +
  # Individual model contrasts
  geom_pointrange(data = race_contrasts,
                 aes(x = .category, y = con, ymin = ll, ymax = ul),
                 position = position_jitter(width = 0.2),
                 alpha = 0.8,
                 linewidth = 1,
                 color = "#B2357BFF") +
  # MLM contrasts as violin
  geom_violin(data = mlm_contrasts_violin,
             aes(x = .category, y = con),
             fill = "#B2357BFF",
             alpha = 0.2) +
  geom_hline(yintercept = 0, color = "black") +
  labs(title = "Black-White Contrasts in Predicted Probabilities",
       subtitle = "Semi-transparent intervals show individual items; violin plots show MLM estimates",
       x = "Response Category",
       y = "Contrast") +
  theme_minimal() +
  theme(
    text = element_text(family = "serif"),
    legend.background = element_blank(),
    panel.grid = element_blank(),
    axis.line = element_line(color = "black"),
    axis.ticks = element_line(color = "black"),
    axis.title = element_text(size = 14),
    axis.text = element_text(size = 14),
    legend.text = element_text(size = 14)
  )

# Combine plots
p1 + p2 + 
  plot_annotation(
    title = "FIGURE 6: Comparing Individual Item and MLM Estimates",
    theme = theme(
      plot.title = element_text(size = 12)
    ))

Overall, item-specific models generate probability and contrast estimates that closely cluster across the ordinal response distribution, which is what we would expect to see if the individual items indeed measured a latent “fear of police” attitudinal construct. Two notable exceptions emerged: there is substantial heterogeneity across items in the predicted probabilities of “very unafraid” for Whites and of “very afraid” for Blacks, which results in item-specific heterogeneity in the predicted contrasts for these ordinal response categories.

Likewise, the multilevel model estimates are very precise where underlying item-specific predictions are closely clustered (e.g., for 8 of 10 predicted probabilities and 3 of 5 contrasts), while uncertainty intervals are wider where there is more variation in item-specific predictions.

Examination of observed proportions in these extreme columns (Appendix A) provides a clue as to the sources of heterogeneity and uncertain predictions.

First, a larger proportion of White participants report being very unafraid (y=0) of physically intensive (and “objectively” scarier) police use of force actions, while a smaller proportion report being very unafraid of more common and less physically intrusive actions. This pattern suggests that the fear items may in part be measuring perceived probability of experiencing these actions, at least for White participants.

Second, and in direct contrast to these findings for Whites, a larger proportion of Black participants report being very afraid (y=4) of the physically intensive (and “objectively” scarier) police use of force actions, while a smaller proportion report being very afraid of more common and less physically intrusive actions.

Collectively, these patterns reveal important race-specific and outcome-specific complexities in racial differences in fear of police - patterns that we are unable to detect if we start by assuming the appropriateness of latent measurement and analysis.

11.4 Lin: Latent effects on normed mean scale

Show code
#DV: Personal Fear of Police - Normed Scale
dat <- dat %>% 
  mutate(
    fearpol = datawizard::row_means(pick(all_of(fearpol_new))),
    fearpol_norm = (fearpol - min(fearpol, na.rm = TRUE)) / 
      (max(fearpol, na.rm = TRUE) - min(fearpol, na.rm = TRUE)) * 100
  )
# summary(dat$fearpol)
# summary(dat$fearpol_norm)

# Model path for the cached model
model_path <- "Models/race_lin_mod_fearpol_norm.rds"

# Check if cached model exists
if (file.exists(model_path)) {
  race_lin_mod_fearpol_norm <- readRDS(model_path)
} else {
  # Linear model with race for normed fear of police scale
  race_lin_mod_fearpol_norm <- stan_glm(
    formula = fearpol_norm ~ race,
    data = dat,
    family = gaussian,
    chains = 4,
    cores = nCoresphys,
    seed = 1138
  )
  # Save model
  saveRDS(race_lin_mod_fearpol_norm, model_path)
}
Show code
pp_check(race_lin_mod_fearpol_norm)

Show code
pp_check(race_lin_mod_fearpol_norm, 
         plotfun = "stat_grouped", stat="mean", group="race")

Show code
pp_check(race_lin_mod_fearpol_norm, 
         plotfun = "stat_grouped", stat="median", group="race")

Show code
summary(race_lin_mod_fearpol_norm, 
        probs = c(0.025, 0.975), 
        digits = 2) 

Model Info:
 function:     stan_glm
 family:       gaussian [identity]
 formula:      fearpol_norm ~ race
 algorithm:    sampling
 sample:       4000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 1020
 predictors:   2

Estimates:
              mean   sd    2.5%   97.5%
(Intercept) 32.17   1.32 29.55  34.76  
raceBlack   31.65   1.83 28.19  35.28  
sigma       29.47   0.65 28.22  30.79  

Fit Diagnostics:
           mean   sd    2.5%   97.5%
mean_PPD 48.18   1.32 45.62  50.80  

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
              mcse Rhat n_eff
(Intercept)   0.02 1.00 3698 
raceBlack     0.03 1.00 3720 
sigma         0.01 1.00 3980 
mean_PPD      0.02 1.00 3916 
log-posterior 0.03 1.00 1716 

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
Show code
prior_summary(race_lin_mod_fearpol_norm)
Priors for model 'race_lin_mod_fearpol_norm' 
------
Intercept (after predictors centered)
  Specified prior:
    ~ normal(location = 48, scale = 2.5)
  Adjusted prior:
    ~ normal(location = 48, scale = 84)

Coefficients
  Specified prior:
    ~ normal(location = 0, scale = 2.5)
  Adjusted prior:
    ~ normal(location = 0, scale = 167)

Auxiliary (sigma)
  Specified prior:
    ~ exponential(rate = 1)
  Adjusted prior:
    ~ exponential(rate = 0.03)
------
See help('prior_summary.stanreg') for more details

Before interpreting, let’s add an exact frequentist replication.

Show code
# Exact frequentist replication
library(sandwich)
library(lmtest)
ols_model <- lm(fearpol_norm ~ race, data = dat)
robust_results <- coeftest(ols_model, vcov = vcovHC(ols_model, type = "HC1"))

# This should yield very close to our Bayesian linear estimate & Pickett et al.'s 32.088 estimate
robust_results

t test of coefficients:

            Estimate Std. Error t value            Pr(>|t|)    
(Intercept)    32.19       1.33    24.2 <0.0000000000000002 ***
raceBlack      31.62       1.84    17.1 <0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

After averaging across the ten fear items and rescaling to 0-100, our Bayesian linear model shows that Black respondents reported substantially higher fear of police than White respondents. Specifically, the raceBlack coefficient of 31.65 (95% CI: 28.19, 35.28) represents the estimated mean difference between Black and White respondents on the normed fear scale (0-100), which means that Black respondents scored an average of about 32 points higher than White respondents, whose mean score was also about 32 points.

However, remember the following limitations in interpreting this coefficient:

  1. While the rescaling to 0-100 makes the coefficient seem more interpretable (e.g., “32 percentage points”), this interpretation assumes equal intervals between original response categories.

  2. The scale transformation doesn’t address the ordinal nature of the underlying items - the distance between “very unafraid” and “unafraid” may not equal the distance between “unafraid” and “neither”

  3. The linear model assumes normally distributed errors around predicted means, which may not be appropriate for bounded scores derived from ordinal items

What if we were to back-transform these predictions from the 0-100 scale to the original 0-4 scale?

For White respondents:

  • Normed prediction: 32.17 (fundamentally uninterpretable quantity)
  • Back-transformed: (32.17/100) * 4 = 1.29 on 0-4 scale

For Black respondents:

  • Normed prediction: 32.17 + 31.65 = 63.82 (again, uninterpretable quantity)
  • Back-transformed: (63.82/100) * 4 = 2.55 on 0-4 scale

So on the original ordinal scale where:

0 = very unafraid 1 = unafraid 2 = neither afraid nor unafraid 3 = afraid 4 = very afraid

The model predicts that on average:

  • White respondents fall between “unafraid” and “neither” (closer to “unafraid”)
  • Black respondents fall between “neither” and “afraid” (closer to “afraid”)

This back-transformation helps illustrate what these predictions mean in terms of the mean values on original response categories, though we should still note the limitations of treating ordinal responses as continuous. After all, what do these mean values even mean? As noted earlier, precise response category predictions from item-specific models were substantially different from ordinal predictions, and the linear model estimates consistently failed where ordinal estimates reasonably succeeded to recover the underlying data. Now, let’s compare precise model-based category predictions for “latent” fear of police, focusing on race-specific predicted probabilities of reportedly being “very afraid” of police overall.

11.4.1 Comparing MLM and linear latent model predictions for “very afraid”

What are the predicted probabilities and contrasts for “very afraid” implied by the linear model? This is not a question with a straightforward or obvious answer. Let’s consider two different answers using either the precise “very afraid” category cutoff (y>=4) or a more generous cutoff falling between “afraid” and “very afraid” (y>=3.5).

First, let’s calculate model predictions of 4 or higher on the original scale to estimate the model-implied probability of scoring equal to “very afraid” (=4 or higher).

Show code
get_prob_contrasts_lin_fearpol_norm <- function(model) {
  # Get posterior draws
  draws <- as_draws_df(model)
  
  # Use 100 (=4) as the cutoff for "very afraid"
  p_white <- 1 - pnorm(100, 
                       mean = draws$`(Intercept)`, 
                       sd = draws$sigma)
  p_black <- 1 - pnorm(100, 
                       mean = draws$`(Intercept)` + draws$raceBlack, 
                       sd = draws$sigma)
  
  # Summarize probabilities
  probs <- data.frame(
    race = c("White", "Black"),
    prob = c(mean(p_white), mean(p_black)),
    lower = c(quantile(p_white, 0.025), quantile(p_black, 0.025)),
    upper = c(quantile(p_white, 0.975), quantile(p_black, 0.975))
  )
  
  # Calculate contrasts from each draw
  contrasts <- p_black - p_white
  
  contrast_summary <- data.frame(
    estimate = mean(contrasts),
    lower = quantile(contrasts, 0.025),
    upper = quantile(contrasts, 0.975)
  )
  
  list(probabilities = probs, contrasts = contrast_summary)
}

# Run function on cached model
fearpol_norm_results <- get_prob_contrasts_lin_fearpol_norm(race_lin_mod_fearpol_norm)

fearpol_norm_results$probabilities %>% gt()
race prob lower upper
White 0.011 0.0074 0.015
Black 0.110 0.0917 0.131
Show code
fearpol_norm_results$contrasts %>% gt()
estimate lower upper
0.099 0.082 0.12

Now, let’s try a more generous estimate of a normed score equivalent to >=3.5 on the original scale, as it implies at least half of respondents would have reported being very afraid to have a score of that value.

Show code
# Mapping 3.5 to the normed scale
# Original scale: 0 to 4
# Normed scale: 0 to 100
# 3.5 / 4 * 100 = 88 (equivalent point on 0 to 100 scale)

get_prob_contrasts_lin_fearpol_norm <- function(model) {
  # Get posterior draws
  draws <- as_draws_df(model)
  
  # Use 87.5 as generous cutoff for "very afraid"
  p_white <- 1 - pnorm(87.5, 
                       mean = draws$`(Intercept)`, 
                       sd = draws$sigma)
  p_black <- 1 - pnorm(87.5, 
                       mean = draws$`(Intercept)` + draws$raceBlack, 
                       sd = draws$sigma)
  
  # Summarize probabilities
  probs <- data.frame(
    race = c("White", "Black"),
    prob = c(mean(p_white), mean(p_black)),
    lower = c(quantile(p_white, 0.025), quantile(p_black, 0.025)),
    upper = c(quantile(p_white, 0.975), quantile(p_black, 0.975))
  )
  
  # Calculate contrasts from each draw
  contrasts <- p_black - p_white
  
  contrast_summary <- data.frame(
    estimate = mean(contrasts),
    lower = quantile(contrasts, 0.025),
    upper = quantile(contrasts, 0.975)
  )
  
  list(probabilities = probs, contrasts = contrast_summary)
}

# Run the function on cached model
fearpol_norm_results2 <- get_prob_contrasts_lin_fearpol_norm(race_lin_mod_fearpol_norm)

fearpol_norm_results2$probabilities %>% gt()
race prob lower upper
White 0.03 0.023 0.04
Black 0.21 0.185 0.24
Show code
fearpol_norm_results2$contrasts %>% gt()
estimate lower upper
0.18 0.16 0.21

How do these compare with estimates from the ordinal MLM model?

Show code
mlm_probs %>%
  dplyr::filter(.category == 4) %>%
  dplyr::select(race, p, ll, ul) %>% gt()
p ll ul
Black
0.328 0.300 0.356
White
0.076 0.067 0.087
Show code
mlm_contrasts %>%
  dplyr::filter(.category == 4) %>%
  dplyr::select(con, ll, ul) %>% gt()
con ll ul
0.25 0.22 0.28

Lets pull all these together into one table.

Show code
# Prepare linear model results (precise = 100 threshold)
lin_mod_100 <- bind_rows(
  fearpol_norm_results$probabilities %>% 
    mutate(
      model = "Linear Model (100)", 
      estimate_group = case_when(
        race == "Black" ~ "P(\"Very Afraid\"|Black)",
        race == "White" ~ "P(\"Very Afraid\"|White)"
      )
    ) %>% 
    dplyr::select(model, estimate_group, race, prob, lower, upper),
  fearpol_norm_results$contrasts %>% 
    mutate(
      model = "Linear Model (100)", 
      estimate_group = "P(\"Very Afraid\"|Black) - P(\"Very Afraid\"|White)",
      race = "Black - White",
      prob = estimate,
      lower = fearpol_norm_results$contrasts$lower,
      upper = fearpol_norm_results$contrasts$upper
    ) %>% 
    dplyr::select(model, estimate_group, race, prob, lower, upper)
)

# Prepare linear model results (generous = 87.5 threshold)
lin_mod_87.5 <- bind_rows(
  fearpol_norm_results2$probabilities %>% 
    mutate(
      model = "Linear Model (87.5)", 
      estimate_group = case_when(
        race == "Black" ~ "P(\"Very Afraid\"|Black)",
        race == "White" ~ "P(\"Very Afraid\"|White)"
      )
    ) %>% 
    dplyr::select(model, estimate_group, race, prob, lower, upper),
  fearpol_norm_results2$contrasts %>% 
    mutate(
      model = "Linear Model (87.5)", 
      estimate_group = "P(\"Very Afraid\"|Black) - P(\"Very Afraid\"|White)",
      race = "Black - White",
      prob = estimate,
      lower = fearpol_norm_results2$contrasts$lower,
      upper = fearpol_norm_results2$contrasts$upper
    ) %>% 
    dplyr::select(model, estimate_group, race, prob, lower, upper)
)

# Prepare multilevel model results
mlm_probs_results <- mlm_probs %>%
  dplyr::filter(.category == 4) %>%
  mutate(
    model = "Multilevel Model", 
    estimate_group = case_when(
      race == "Black" ~ "P(\"Very Afraid\"|Black)",
      race == "White" ~ "P(\"Very Afraid\"|White)"
    )
  ) %>%
  dplyr::select(model, estimate_group, race, prob = p, lower = ll, upper = ul)

mlm_contrasts_result <- mlm_contrasts %>%
  dplyr::filter(.category == 4) %>%
  mutate(
    model = "Multilevel Model", 
    estimate_group = "P(\"Very Afraid\"|Black) - P(\"Very Afraid\"|White)",
    race = "Black - White",
    prob = con,
    lower = ll,
    upper = ul
  ) %>% 
  dplyr::select(model, estimate_group, race, prob, lower, upper)

# Combine all results
combined_results <- bind_rows(
  lin_mod_100,
  lin_mod_87.5,
  mlm_probs_results,
  mlm_contrasts_result
)

# Create gt table
gt_table <- combined_results %>%
  gt(
    groupname_col = "model",
    rowname_col = "estimate_group"
  ) %>%
  fmt_number(
    columns = c(prob, lower, upper),
    decimals = 3
  ) %>%
  cols_label(
    race = "Race",
    prob = "Estimate",
    lower = "Lower",
    upper = "Upper"
  ) %>%
  tab_spanner(
    label = "95% CrI",
    columns = c(lower, upper)
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_labels()
  ) %>%
  tab_header(
    title = "Comparative Fear of Police Model Results"
  )

gt_table
Comparative Fear of Police Model Results
Race Estimate
95% CrI
Lower Upper
Linear Model (100)
P("Very Afraid"|White) White 0.011 0.007 0.015
P("Very Afraid"|Black) Black 0.110 0.092 0.131
P("Very Afraid"|Black) - P("Very Afraid"|White) Black - White 0.099 0.082 0.119
Linear Model (87.5)
P("Very Afraid"|White) White 0.030 0.023 0.040
P("Very Afraid"|Black) Black 0.211 0.185 0.239
P("Very Afraid"|Black) - P("Very Afraid"|White) Black - White 0.181 0.156 0.208
Multilevel Model
P("Very Afraid"|Black) Black 0.328 0.300 0.356
P("Very Afraid"|White) White 0.076 0.067 0.087
P("Very Afraid"|Black) - P("Very Afraid"|White) Black - White 0.251 0.218 0.284

Let’s tweak the formatting.

While we are at it, let’s add observed proportions too. First, we will calculate the race-specific proportion of “very afraid” (y=4) responses for each item. Next, we will calculate the mean of the 10 proportions (one per item) for each race group. This gives us an average race-specific proportion of “very afraid” responses across all items. Finally, we will calculate the black-white contrast in these proportions.

11.4.2 TAB2

Show code
# Prepare linear model results (precise = 100 threshold)
lin_mod_100 <- bind_rows(
  fearpol_norm_results$probabilities %>% 
    mutate(
      model = "Linear: Precise (y>=4; normed >= 100)", 
      estimate_group = case_when(
        race == "Black" ~ "Pred. Probability: P(\"Very Afraid\"|Black)",
        race == "White" ~ "Pred. Probability: P(\"Very Afraid\"|White)"
      )
    ) %>% 
    dplyr::select(model, estimate_group, prob, lower, upper),
  fearpol_norm_results$contrasts %>% 
    mutate(
      model = "Linear: Precise (y>=4; normed >= 100)", 
      estimate_group = "Contrast: P(\"Very Afraid\"|Black) - P(\"Very Afraid\"|White)",
      prob = estimate,
      lower = fearpol_norm_results$contrasts$lower,
      upper = fearpol_norm_results$contrasts$upper
    ) %>% 
    dplyr::select(model, estimate_group, prob, lower, upper)
)

# Prepare linear model results (generous = 87.5 threshold)
lin_mod_87.5 <- bind_rows(
  fearpol_norm_results2$probabilities %>% 
    mutate(
      model = "Linear: Generous (y>=3.5, normed >=87.5)", 
      estimate_group = case_when(
        race == "Black" ~ "Pred. Probability: P(\"Very Afraid\"|Black)",
        race == "White" ~ "Pred. Probability: P(\"Very Afraid\"|White)"
      )
    ) %>% 
    dplyr::select(model, estimate_group, prob, lower, upper),
  fearpol_norm_results2$contrasts %>% 
    mutate(
      model = "Linear: Generous (y>=3.5, normed >=87.5)", 
      estimate_group = "Contrast: P(\"Very Afraid\"|Black) - P(\"Very Afraid\"|White)",
      prob = estimate,
      lower = fearpol_norm_results2$contrasts$lower,
      upper = fearpol_norm_results2$contrasts$upper
    ) %>% 
    dplyr::select(model, estimate_group, prob, lower, upper)
)

# Prepare multilevel model results
mlm_probs_results <- mlm_probs %>%
  ungroup() %>%
  dplyr::filter(.category == 4) %>%
  mutate(
    model = "Ordinal MLM (y=4)", 
    estimate_group = case_when(
      race == "Black" ~ "Pred. Probability: P(\"Very Afraid\"|Black)",
      race == "White" ~ "Pred. Probability: P(\"Very Afraid\"|White)"
    )
  ) %>%
  dplyr::select(model, estimate_group, prob = p, lower = ll, upper = ul)

mlm_contrasts_result <- mlm_contrasts %>%
  dplyr::filter(.category == 4) %>%
  mutate(
    model = "Ordinal MLM (y=4)", 
    estimate_group = "Contrast: P(\"Very Afraid\"|Black) - P(\"Very Afraid\"|White)",
    prob = con,
    lower = ll,
    upper = ul
  ) %>% 
  dplyr::select(model, estimate_group, prob, lower, upper)

# Calculate observed proportions
obs_props <- dat %>%
  # For each outcome and race
  group_by(race) %>%
  summarize(
    across(all_of(fearpol_new), ~mean(. == 4)),
    .groups = "drop"
  ) %>%
  # Get mean across all items for each race
  mutate(
    prob = rowMeans(dplyr::select(., all_of(fearpol_new)))
  ) %>%
  # Format to match other results
  mutate(
    model = "Observed Proportions",
    estimate_group = case_when(
      race == "Black" ~ "Pred. Probability: P(\"Very Afraid\"|Black)",
      race == "White" ~ "Pred. Probability: P(\"Very Afraid\"|White)"
    ),
    lower = NA,
    upper = NA
  ) %>%
  dplyr::select(model, estimate_group, prob, lower, upper)

# Calculate observed contrast
black_prop <- obs_props %>% 
  dplyr::filter(estimate_group == "Pred. Probability: P(\"Very Afraid\"|Black)") %>% 
  pull(prob)
white_prop <- obs_props %>% 
  dplyr::filter(estimate_group == "Pred. Probability: P(\"Very Afraid\"|White)") %>% 
  pull(prob)

obs_contrast <- data.frame(
  model = "Observed Proportions",
  estimate_group = "Contrast: P(\"Very Afraid\"|Black) - P(\"Very Afraid\"|White)",
  prob = black_prop - white_prop,
  lower = NA,
  upper = NA
)

# Combine all results
combined_results <- bind_rows(
  mlm_probs_results,
  lin_mod_100,
  lin_mod_87.5,
  mlm_contrasts_result
)

# Add observed results
combined_results_with_obs <- bind_rows(
  combined_results,
  obs_props,
  obs_contrast
) %>%
  # Include "observed" latent results
  mutate(
    model = factor(model, 
                  levels = c("Observed Proportions",
                           "Ordinal MLM (y=4)",
                           "Linear: Precise (y>=4; normed >= 100)",
                           "Linear: Generous (y>=3.5, normed >=87.5)")),
    estimate_group = factor(estimate_group,
                          levels = c("Pred. Probability: P(\"Very Afraid\"|Black)",
                                   "Pred. Probability: P(\"Very Afraid\"|White)",
                                   "Contrast: P(\"Very Afraid\"|Black) - P(\"Very Afraid\"|White)"))
  )

# Create gt table
gt_table <- combined_results_with_obs %>%
  arrange(estimate_group, model) %>%
  gt(
    groupname_col = "estimate_group",
    rowname_col = "model"
  ) %>%
  fmt_number(
    columns = c(prob, lower, upper),
    decimals = 3
  ) %>%
  cols_label(
    prob = "Estimate",
    lower = "Lower",
    upper = "Upper"
  ) %>%
  tab_spanner(
    label = "95% CrI",
    columns = c(lower, upper)
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_labels()
  ) %>%
  tab_style(
    style = cell_text(align = "left"),
    locations = cells_stub()
  ) %>%
  tab_header(
    title = "TABLE 2. Comparing Ordinal & Linear Model Estimates of 'Very Afraid' of Police"
  ) %>%
  sub_missing(
    columns = everything(),
    rows = everything(),
    missing_text = "--"
) %>% 
    tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_row_groups(groups = everything())
  )

gt_table
TABLE 2. Comparing Ordinal & Linear Model Estimates of 'Very Afraid' of Police
Estimate
95% CrI
Lower Upper
Pred. Probability: P("Very Afraid"|Black)
Observed Proportions 0.321
Ordinal MLM (y=4) 0.328 0.300 0.356
Linear: Precise (y>=4; normed >= 100) 0.110 0.092 0.131
Linear: Generous (y>=3.5, normed >=87.5) 0.211 0.185 0.239
Pred. Probability: P("Very Afraid"|White)
Observed Proportions 0.086
Ordinal MLM (y=4) 0.076 0.067 0.087
Linear: Precise (y>=4; normed >= 100) 0.011 0.007 0.015
Linear: Generous (y>=3.5, normed >=87.5) 0.030 0.023 0.040
Contrast: P("Very Afraid"|Black) - P("Very Afraid"|White)
Observed Proportions 0.235
Ordinal MLM (y=4) 0.251 0.218 0.284
Linear: Precise (y>=4; normed >= 100) 0.099 0.082 0.119
Linear: Generous (y>=3.5, normed >=87.5) 0.181 0.156 0.208

Note that none of the 95% credibility intervals from either the “precise” and “generous” linear model estimates of race-specific “very afraid” probabilities or black-white probability contrasts contained the observed parameters. In contrast, all three of the 95% CrI intervals from the ordinal model estimates contain the observed parameter.

12 Supplemental Analyses

12.1 Unequal group variances

The cumulative probit models estimated above assume equal group variance (i.e., equal variances for White and Black response distributions). Yet, sometimes this assumption is not appropriate, and it can result in poor recovery of underlying ordinal data. Below, we modified the specifications of our cumulative probit multilevel model to include a variance structure that allows for unequal group variances by race. In brms, we achieve this by adding a group-level variance parameter with the ~ (1 | race) syntax in the random effects specification.

Show code
# Supplemental analysis allowing unequal variances by race
bf_fear_unequal_var <- bf(fear | thres(4) ~ race + (1 | race) + (race | outcome))

# Modify priors to include the new variance structure
priors_hier_unequal_var <- c(
  # Fixed effect priors (same as before)
  prior(normal(-0.84, 1), class = "Intercept", coef = "1"),
  prior(normal(-0.25, 1), class = "Intercept", coef = "2"),
  prior(normal(0.25, 1), class = "Intercept", coef = "3"),
  prior(normal(0.84, 1), class = "Intercept", coef = "4"),
  prior(normal(0, 1), class = "b"),
  # Random effects priors
  prior(exponential(1), class = "sd"),
  prior(lkj(2), class = "cor")  # Prior for correlation of random effects
)

# Fit model with unequal variances
hier_mod_unequal_var <- brm(
  bf_fear_unequal_var,
  data = dat_long,
  family = cumulative("probit"),
  prior = priors_hier_unequal_var,
  cores = nCoresphys,
  chains = 4,
  iter = 2000,
  warmup = 1000,
  seed = 1138,
  control = list(adapt_delta = 0.95),
  file = "Models/hier_fear_fit_unequal_var",
  file_refit = "on_change"
)

49 divergent transitions. Should increase warmup and draws, perhaps increase adapt_delta to 0.99.

Show code
# coefgrp <- c("raceBlack")
mcmc_plot(hier_mod_unequal_var, 
          # variable = coefgrp, 
          regex = TRUE, 
          prob = 0.80, prob_outer = 0.95) + 
  labs(title = "Coefficient plot", 
       subtitle = "Posterior intervals for all beta coefficients 
       \nwith medians, 80%, and 95% intervals")

Show code
summary(hier_mod_unequal_var, 
        probs = c(0.025, 0.975), 
        digits = 2) 
 Family: cumulative 
  Links: mu = probit; disc = identity 
Formula: fear | thres(4) ~ race + (1 | race) + (race | outcome) 
   Data: dat_long (Number of observations: 10200) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~outcome (Number of levels: 10) 
                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept)                0.08      0.03     0.04     0.15 1.00     1530
sd(raceBlack)                0.17      0.05     0.10     0.29 1.00     1354
cor(Intercept,raceBlack)    -0.82      0.16    -0.99    -0.39 1.00     1424
                         Tail_ESS
sd(Intercept)                2237
sd(raceBlack)                2162
cor(Intercept,raceBlack)     2203

~race (Number of levels: 2) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.51      0.48     0.01     1.84 1.00     1529     2013

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]    -0.48      0.41    -1.54     0.21 1.00     2064     1757
Intercept[2]     0.07      0.41    -0.99     0.75 1.00     2080     1782
Intercept[3]     0.72      0.41    -0.34     1.41 1.00     2083     1710
Intercept[4]     1.27      0.41     0.21     1.95 1.00     2093     1688
raceBlack        0.71      0.59    -0.74     1.72 1.00     2079     2133

Further Distributional Parameters:
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Show code
prior_summary(hier_mod_unequal_var)
                prior     class      coef   group resp dpar nlpar lb ub
         normal(0, 1)         b                                        
         normal(0, 1)         b raceBlack                              
 student_t(3, 0, 2.5) Intercept                                        
     normal(-0.84, 1) Intercept         1                              
     normal(-0.25, 1) Intercept         2                              
      normal(0.25, 1) Intercept         3                              
      normal(0.84, 1) Intercept         4                              
 lkj_corr_cholesky(2)         L                                        
 lkj_corr_cholesky(2)         L           outcome                      
       exponential(1)        sd                                    0   
       exponential(1)        sd           outcome                  0   
       exponential(1)        sd Intercept outcome                  0   
       exponential(1)        sd raceBlack outcome                  0   
       exponential(1)        sd              race                  0   
       exponential(1)        sd Intercept    race                  0   
       source
         user
 (vectorized)
      default
         user
         user
         user
         user
         user
 (vectorized)
         user
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
Show code
# Posterior predictive checks
pp_check(hier_mod_unequal_var, ndraws = 100) +
  theme_minimal()

Show code
mcmc_plot(hier_mod_unequal_var, type = "dens_overlay") +
  theme_minimal()

Show code
pp_check(
  hier_mod_unequal_var, 
  type = "bars", 
  ndraws = 1000, 
  linewidth = 1/2, 
  size = 1/4) +
  labs(subtitle = "MLM latent fear") 

Show code
pp_check(
  hier_mod_unequal_var, 
  type = "bars_grouped",
  group = "race",
  ndraws = 1000, 
  linewidth = 1/2, 
  size = 1/4) +
  labs(subtitle = "MLM latent fear")

Show code
mcmc_plot(hier_mod_unequal_var, type = "trace") +
  theme_minimal()

Show code
# Rhat values
rhats <- rhat(hier_mod_unequal_var)
print("Rhat values > 1.01:")
[1] "Rhat values > 1.01:"
Show code
print(rhats[rhats > 1.01])
<NA> 
  NA 
Show code
# Effective sample sizes
neff <- neff_ratio(hier_mod_unequal_var)
print("ESS ratios < 0.1:")
[1] "ESS ratios < 0.1:"
Show code
print(neff[neff < 0.1])
<NA> 
  NA 
Show code
# Check random effects
ranef(hier_mod_unequal_var)
$outcome
, , Intercept

          Estimate Est.Error    Q2.5   Q97.5
handcyou     0.041     0.047 -0.0498  0.1348
kickyou     -0.049     0.046 -0.1440  0.0391
killyou     -0.093     0.049 -0.1939 -0.0014
pinyou      -0.037     0.047 -0.1306  0.0544
searchyou    0.059     0.047 -0.0323  0.1530
shootyou    -0.083     0.048 -0.1798  0.0077
sprayyou    -0.027     0.046 -0.1232  0.0596
stopyou      0.092     0.049  0.0015  0.1962
taseyou     -0.036     0.047 -0.1293  0.0551
yellyou      0.128     0.049  0.0389  0.2310

, , raceBlack

          Estimate Est.Error   Q2.5  Q97.5
handcyou    -0.088     0.081 -0.250  0.067
kickyou      0.061     0.081 -0.098  0.219
killyou      0.230     0.084  0.075  0.405
pinyou       0.080     0.080 -0.074  0.241
searchyou   -0.178     0.081 -0.339 -0.024
shootyou     0.197     0.083  0.035  0.356
sprayyou     0.032     0.080 -0.128  0.192
stopyou     -0.141     0.081 -0.304  0.015
taseyou      0.072     0.081 -0.089  0.234
yellyou     -0.259     0.082 -0.435 -0.112


$race
, , Intercept

      Estimate Est.Error  Q2.5 Q97.5
White    -0.17      0.41 -1.22  0.52
Black     0.11      0.40 -0.68  1.08
Show code
# Define paths for cached LOO results
hier_mod_unequal_var_loo_path <- "Models/hier_mod_unequal_var_loo.rds"

# Check if cached LOO exists
if (file.exists(hier_mod_unequal_var_loo_path)) {
  # Read cached LOO
  hier_mod_unequal_var_loo <- readRDS(hier_mod_unequal_var_loo_path)
} else {
  # Compute LOO if not cached
  hier_mod_unequal_var_loo <- loo(hier_mod_unequal_var)
  
  # Save LOO
  saveRDS(hier_mod_unequal_var_loo, hier_mod_unequal_var_loo_path)
}

print(hier_mod_unequal_var_loo$estimates) 
         Estimate    SE
elpd_loo   -15263 46.99
p_loo          19  0.17
looic       30526 93.97

Let’s compare this model with the original multilevel model specifying equal variances using LOO (leave-one-out) cross-validation to see if allowing for unequal variances improves model fit.

Show code
# Compare models
loo_compare(hier_mod_loo, hier_mod_unequal_var_loo)
                     elpd_diff se_diff
hier_mod_unequal_var  0.0       0.0   
hier_mod             -0.2       0.1   

The model with unequal variances (hier_mod_unequal_var) and the original hierarchical model (hier_mod) perform almost identically in terms of predictive accuracy. The ELPD difference is about twice the magnitude of the standard error, which suggests that allowing for unequal variances might capture slightly more variation in the data, though an ELPD difference of -0.2 indicates the degree of improvement over the simpler model is very small. So, in this case, the added model complexity and computational costs are not worth it given the comparable predictive performance of both models.

12.2 Category-specific probit models (supplement)

Goal: Briefly introduce application of category-specific probit regression techniques for even more accurate recovery of underlying distributions (for online supplement only).

Show code
# Prepare data (similar to before, but now as a categorical outcome)
dat_long <- dat %>%
  pivot_longer(
    all_of(fearpol_new),  # Use vector of 10 fear items
    names_to = "outcome",
    values_to = "fear"
  ) %>%
  mutate(
    fear = factor(fear, ordered = FALSE),  # Key change: NOT ordered
    outcome = factor(outcome)  # Factor for random effects grouping
  )

# Formula for category-specific probit model
bf_fear_categorical <- bf(fear ~ race + (1 | outcome))


# Fit categorical model
cat_mod <- brm(
  bf_fear_categorical,
  data = dat_long,
  family = categorical("logit"),  
  cores = nCoresphys,
  chains = 4,
  iter = 4000,
  warmup = 1000,
  seed = 1138,
  control = list(adapt_delta = 0.95),
  file = "Models/cat_fear_fit",
  file_refit = "on_change"
)
Show code
# coefgrp <- c("raceBlack")
mcmc_plot(cat_mod, 
          # variable = coefgrp, 
          regex = TRUE, 
          prob = 0.80, prob_outer = 0.95) + 
  labs(title = "Coefficient plot", 
       subtitle = "Posterior intervals for all beta coefficients 
       \nwith medians, 80%, and 95% intervals")

Show code
summary(cat_mod, 
        probs = c(0.025, 0.975), 
        digits = 2) 
 Family: categorical 
  Links: mu1 = logit; mu2 = logit; mu3 = logit; mu4 = logit 
Formula: fear ~ race + (1 | outcome) 
   Data: dat_long (Number of observations: 10200) 
  Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
         total post-warmup draws = 12000

Multilevel Hyperparameters:
~outcome (Number of levels: 10) 
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(mu1_Intercept)     0.25      0.09     0.13     0.47 1.00     3661     5842
sd(mu2_Intercept)     0.32      0.10     0.18     0.58 1.00     3275     5241
sd(mu3_Intercept)     0.35      0.11     0.20     0.62 1.00     3489     5931
sd(mu4_Intercept)     0.07      0.05     0.00     0.20 1.00     3349     4228

Regression Coefficients:
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
mu1_Intercept    -0.48      0.10    -0.67    -0.29 1.00     3976     5313
mu2_Intercept    -0.65      0.11    -0.88    -0.41 1.00     4029     5635
mu3_Intercept    -1.30      0.13    -1.56    -1.05 1.00     3985     5727
mu4_Intercept    -1.48      0.06    -1.60    -1.37 1.00    11553     9025
mu1_raceBlack     0.51      0.07     0.37     0.66 1.00     8524     9004
mu2_raceBlack     1.55      0.07     1.42     1.68 1.00     8117     8383
mu3_raceBlack     2.06      0.07     1.92     2.20 1.00     8512     9047
mu4_raceBlack     2.63      0.07     2.48     2.77 1.00     8821     9113

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Show code
prior_summary(cat_mod) 
                prior     class      coef   group resp dpar nlpar lb ub
               (flat)         b                         mu1            
               (flat)         b raceBlack               mu1            
               (flat)         b                         mu2            
               (flat)         b raceBlack               mu2            
               (flat)         b                         mu3            
               (flat)         b raceBlack               mu3            
               (flat)         b                         mu4            
               (flat)         b raceBlack               mu4            
 student_t(3, 0, 2.5) Intercept                         mu1            
 student_t(3, 0, 2.5) Intercept                         mu2            
 student_t(3, 0, 2.5) Intercept                         mu3            
 student_t(3, 0, 2.5) Intercept                         mu4            
 student_t(3, 0, 2.5)        sd                         mu1        0   
 student_t(3, 0, 2.5)        sd                         mu2        0   
 student_t(3, 0, 2.5)        sd                         mu3        0   
 student_t(3, 0, 2.5)        sd                         mu4        0   
 student_t(3, 0, 2.5)        sd           outcome       mu1        0   
 student_t(3, 0, 2.5)        sd Intercept outcome       mu1        0   
 student_t(3, 0, 2.5)        sd           outcome       mu2        0   
 student_t(3, 0, 2.5)        sd Intercept outcome       mu2        0   
 student_t(3, 0, 2.5)        sd           outcome       mu3        0   
 student_t(3, 0, 2.5)        sd Intercept outcome       mu3        0   
 student_t(3, 0, 2.5)        sd           outcome       mu4        0   
 student_t(3, 0, 2.5)        sd Intercept outcome       mu4        0   
       source
      default
 (vectorized)
      default
 (vectorized)
      default
 (vectorized)
      default
 (vectorized)
      default
      default
      default
      default
      default
      default
      default
      default
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
Show code
# Posterior predictive checks
pp_check(cat_mod, ndraws = 100) +
  theme_minimal()

Show code
mcmc_plot(cat_mod, type = "dens_overlay") +
  theme_minimal()

Show code
pp_check(
  cat_mod, 
  type = "bars", 
  ndraws = 1000, 
  linewidth = 1/2, 
  size = 1/4) +
  labs(subtitle = "MLM latent fear") 

Show code
pp_check(
  cat_mod, 
  type = "bars_grouped",
  group = "race",
  ndraws = 1000, 
  linewidth = 1/2, 
  size = 1/4) +
  labs(subtitle = "MLM latent fear")

Show code
mcmc_plot(cat_mod, type = "trace") +
  theme_minimal()

Show code
# Rhat values
rhats <- rhat(cat_mod)
print("Rhat values > 1.01:")
[1] "Rhat values > 1.01:"
Show code
print(rhats[rhats > 1.01])
named numeric(0)
Show code
# Effective sample sizes
neff <- neff_ratio(cat_mod)
print("ESS ratios < 0.1:")
[1] "ESS ratios < 0.1:"
Show code
print(neff[neff < 0.1])
named numeric(0)
Show code
# Check random effects
ranef(cat_mod)
$outcome
, , mu1_Intercept

          Estimate Est.Error    Q2.5   Q97.5
handcyou     0.219      0.12 -0.0079  0.4628
kickyou     -0.063      0.12 -0.2966  0.1690
killyou     -0.244      0.12 -0.4909 -0.0072
pinyou      -0.117      0.12 -0.3589  0.1172
searchyou    0.291      0.12  0.0646  0.5479
shootyou    -0.223      0.12 -0.4704  0.0107
sprayyou    -0.094      0.12 -0.3336  0.1360
stopyou      0.151      0.12 -0.0810  0.3965
taseyou     -0.128      0.12 -0.3618  0.1019
yellyou      0.220      0.12 -0.0049  0.4708

, , mu2_Intercept

          Estimate Est.Error   Q2.5   Q97.5
handcyou      0.11      0.13 -0.148  0.3751
kickyou      -0.13      0.13 -0.401  0.1277
killyou      -0.30      0.14 -0.586 -0.0402
pinyou       -0.13      0.13 -0.399  0.1371
searchyou     0.32      0.13  0.059  0.5874
shootyou     -0.27      0.14 -0.541 -0.0022
sprayyou     -0.14      0.13 -0.407  0.1249
stopyou       0.40      0.13  0.148  0.6747
taseyou      -0.17      0.13 -0.440  0.0868
yellyou       0.32      0.13  0.067  0.5974

, , mu3_Intercept

          Estimate Est.Error   Q2.5  Q97.5
handcyou     0.187      0.15 -0.100  0.484
kickyou     -0.078      0.15 -0.369  0.210
killyou     -0.498      0.16 -0.818 -0.199
pinyou      -0.097      0.15 -0.384  0.190
searchyou    0.151      0.15 -0.137  0.455
shootyou    -0.345      0.15 -0.655 -0.052
sprayyou    -0.024      0.15 -0.310  0.265
stopyou      0.461      0.15  0.179  0.760
taseyou     -0.050      0.15 -0.342  0.235
yellyou      0.319      0.15  0.043  0.614

, , mu4_Intercept

          Estimate Est.Error   Q2.5 Q97.5
handcyou   -0.0128     0.062 -0.155 0.108
kickyou    -0.0218     0.062 -0.165 0.091
killyou     0.0679     0.077 -0.036 0.253
pinyou      0.0135     0.062 -0.109 0.153
searchyou  -0.0559     0.075 -0.237 0.052
shootyou    0.0520     0.070 -0.053 0.221
sprayyou   -0.0171     0.060 -0.155 0.099
stopyou    -0.0111     0.064 -0.157 0.116
taseyou     0.0058     0.060 -0.114 0.139
yellyou    -0.0214     0.065 -0.176 0.100
Show code
# Define paths for cached LOO results
cat_mod_loo_path <- "Models/cat_mod_loo.rds"

# Check if cached LOO exists
if (file.exists(cat_mod_loo_path)) {
  # Read cached LOO
  cat_mod_loo <- readRDS(cat_mod_loo_path)
} else {
  # Compute LOO if not cached
  cat_mod_loo <- loo(cat_mod)
  
  # Save LOO
  saveRDS(cat_mod_loo, cat_mod_loo_path)
}

print(cat_mod_loo$estimates) 
         Estimate    SE
elpd_loo   -15178 47.93
p_loo          36  0.16
looic       30356 95.86

Again, let’s compare this categorical model with the original cumulative probit multilevel model using LOO to see if category-specific predictions substantially improves model fit.

Show code
# Compare models
loo_compare(hier_mod_loo, cat_mod_loo)
         elpd_diff se_diff
cat_mod    0.0       0.0  
hier_mod -84.8      14.7  

This LOO (Leave-One-Out) cross-validation comparison shows a substantial difference between the categorical model and the hierarchical cumulative probit model. The ELPD difference of -84.8 is more than five times larger than the standard error and well beyond conventional threshold of twice the standard error (2 * -14.7 ≈ -29.4).

This suggests a statistically meaningful difference in model predictive performance. Specifically, the categorical model (cat_mod) appears to have substantially better predictive accuracy compared to the hierarchical cumulative probit model.

In this case, we would be wise to consider using a categorical model for the analysis, as it demonstrates a clear improvement in predictive performance. Despite offering a dramatic improvement over the linear model, the cumulative ordered structure may not fit the observed fear responses in these data as well as initially assumed - or at least not as well as a much more complex and computationally intensive model where every category for both groups and all outcomes is specifically predicted.

13 Linear MLM

We cannot use LOO cross-validation to directly compare ELPD estimates of model fit from the linear model predicting a composite normed mean scale combining 10 items to measure latent fear of crime and the cumulative probit multilevel model predicting latent fear of crime aggregated from the 10 item-specific estimates. However, we can use it to compare a linear multilevel model with the cumulative probit model.

Show code
# Data prep - similar to before, but now using fear values directly
dat_long <- dat %>%
  pivot_longer(
    all_of(fearpol_new),  # Use vector of 10 fear items
    names_to = "outcome",
    values_to = "fear"
  ) %>%
  mutate(
    outcome = factor(outcome)  # Factor for random effects grouping
  )

# Formula for hierarchical linear model
bf_fear_linear <- bf(fear ~ race + (1 | outcome))

# Priors for hierarchical linear model
priors_hier_linear <- c(
  prior(normal(0, 1), class = "Intercept"),  
  prior(normal(0, 1), class = "b"),  
  prior(exponential(1), class = "sd")  
)

# Fit hierarchical linear model
hier_mod_linear <- brm(
  bf_fear_linear,
  data = dat_long,
  family = gaussian(),  
  prior = priors_hier_linear,
  cores = nCoresphys,
  chains = 4,
  iter = 2000,
  warmup = 1000,
  seed = 1138,
  control = list(adapt_delta = 0.95),
  file = "Models/hier_fear_linear_fit",
  file_refit = "on_change"
)
Show code
# coefgrp <- c("raceBlack")
mcmc_plot(hier_mod_linear, 
          # variable = coefgrp, 
          regex = TRUE, 
          prob = 0.80, prob_outer = 0.95) + 
  labs(title = "Coefficient plot", 
       subtitle = "Posterior intervals for all beta coefficients 
       \nwith medians, 80%, and 95% intervals")

Show code
summary(hier_mod_linear, 
        probs = c(0.025, 0.975), 
        digits = 2) 
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: fear ~ race + (1 | outcome) 
   Data: dat_long (Number of observations: 10200) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~outcome (Number of levels: 10) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.02      0.02     0.00     0.06 1.00     1886     1864

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     1.29      0.02     1.25     1.33 1.00     4582     2357
raceBlack     1.26      0.03     1.21     1.31 1.00     5172     3003

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.30      0.01     1.29     1.32 1.00     5705     2922

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Show code
prior_summary(hier_mod_linear)   
                prior     class      coef   group resp dpar nlpar lb ub
         normal(0, 1)         b                                        
         normal(0, 1)         b raceBlack                              
         normal(0, 1) Intercept                                        
       exponential(1)        sd                                    0   
       exponential(1)        sd           outcome                  0   
       exponential(1)        sd Intercept outcome                  0   
 student_t(3, 0, 2.5)     sigma                                    0   
       source
         user
 (vectorized)
         user
         user
 (vectorized)
 (vectorized)
      default
Show code
# Posterior predictive checks
pp_check(hier_mod_linear, ndraws = 100) +
  theme_minimal()

Show code
mcmc_plot(hier_mod_linear, type = "dens_overlay") +
  theme_minimal()

Show code
pp_check(race_lin_mods[["stopyou"]], 
         plotfun = "stat_grouped", stat="mean", group="race")

Show code
pp_check(race_lin_mods[["stopyou"]], 
         plotfun = "stat_grouped", stat="median", group="race")

Show code
mcmc_plot(hier_mod_linear, type = "trace") +
  theme_minimal()

Show code
# Rhat values
rhats <- rhat(hier_mod_linear)
print("Rhat values > 1.01:")
[1] "Rhat values > 1.01:"
Show code
print(rhats[rhats > 1.01])
named numeric(0)
Show code
# Effective sample sizes
neff <- neff_ratio(hier_mod_linear)
print("ESS ratios < 0.1:")
[1] "ESS ratios < 0.1:"
Show code
print(neff[neff < 0.1])
named numeric(0)
Show code
# Check random effects
ranef(hier_mod_linear)
$outcome
, , Intercept

          Estimate Est.Error   Q2.5 Q97.5
handcyou  -0.00109     0.019 -0.045 0.040
kickyou   -0.00813     0.021 -0.061 0.028
killyou    0.00520     0.020 -0.032 0.051
pinyou     0.00115     0.020 -0.039 0.045
searchyou -0.01239     0.023 -0.070 0.022
shootyou   0.00421     0.020 -0.033 0.053
sprayyou  -0.00465     0.020 -0.052 0.032
stopyou    0.01348     0.023 -0.019 0.074
taseyou    0.00067     0.019 -0.040 0.043
yellyou    0.00334     0.020 -0.034 0.051
Show code
# Define paths for cached LOO results
hier_mod_linear_loo_path <- "Models/hier_mod_linear_loo.rds"

# Check if cached LOO exists
if (file.exists(hier_mod_linear_loo_path)) {
  # Read cached LOO
  hier_mod_linear_loo <- readRDS(hier_mod_linear_loo_path)
} else {
  # Compute LOO if not cached
  hier_mod_linear_loo <- loo(hier_mod_linear)
  
  # Save LOO
  saveRDS(hier_mod_linear_loo, hier_mod_linear_loo_path)
}

print(hier_mod_linear_loo$estimates) 
         Estimate      SE
elpd_loo -17182.0  56.608
p_loo         4.7   0.056
looic     34364.0 113.215

14 Linear MLM predictions

Does this model do a better job predicting “very afraid” responses? Let’s extract “precise” and “generous” predictions again and then add them to our Table 2.

As before, let’s calculate model predictions of 4 or higher on the original scale to estimate the model-implied probability of scoring equal to “very afraid” (=4 or higher). Like before, we will also calculate using a more generous estimate equivalent to >=3.5 on the original scale, as it implies reporting being very afraid on at least half the ten items. Comparable to the ordinal MLM, we will generate population-level estimates with intervals that reflect uncertainty in fixed effects, random effects variance, and residual variance.

Show code
# Get prediction grid
nd <- tibble(
  race = factor(c("White", "Black"))
)

# First threshold (precise = 100 or ordinal scale = 4.0)
lin_mlm_preds_100 <- add_epred_draws(hier_mod_linear, 
                                newdata = nd, 
                                re_formula = NA) %>%
  ungroup() %>%
  group_by(.draw) %>%
  mutate(
    sigma = as_draws_df(hier_mod_linear)$sigma[.draw],
    sigma_re = as_draws_df(hier_mod_linear)$sd_outcome__Intercept[.draw],
    pred_sd = sqrt(sigma^2 + sigma_re^2),
    prob_high_fear = 1 - pnorm(4, mean = .epred, sd = pred_sd)
  )

# Get predicted probabilities (4.0)
lin_mlm_probs_100 <- lin_mlm_preds_100 %>%
  group_by(race) %>%
  summarise(
    p = mean(prob_high_fear),
    ll = quantile(prob_high_fear, .025),
    ul = quantile(prob_high_fear, .975)
  )

# Calculate contrasts (4.0)
lin_mlm_contrasts_100 <- lin_mlm_preds_100 %>%
  dplyr::select(.draw, race, prob_high_fear) %>%
  pivot_wider(names_from = race, values_from = prob_high_fear) %>%
  mutate(diff = Black - White) %>%
  ungroup() %>%
  summarise(
    con = mean(diff),
    ll = quantile(diff, .025),
    ul = quantile(diff, .975)
  )

# Second threshold (generous = 87.5 or ordinal scale = 3.5)
lin_mlm_preds_87.5 <- add_epred_draws(hier_mod_linear, 
                                newdata = nd, 
                                re_formula = NA) %>%
  ungroup() %>%
  group_by(.draw) %>%
  mutate(
    sigma = as_draws_df(hier_mod_linear)$sigma[.draw],
    sigma_re = as_draws_df(hier_mod_linear)$sd_outcome__Intercept[.draw],
    pred_sd = sqrt(sigma^2 + sigma_re^2),
    prob_high_fear = 1 - pnorm(3.5, mean = .epred, sd = pred_sd)
  )

# Get predicted probabilities (3.5)
lin_mlm_probs_87.5 <- lin_mlm_preds_87.5 %>%
  group_by(race) %>%
  summarise(
    p = mean(prob_high_fear),
    ll = quantile(prob_high_fear, .025),
    ul = quantile(prob_high_fear, .975)
  )

# Calculate contrasts (3.5)
lin_mlm_contrasts_87.5 <- lin_mlm_preds_87.5 %>%
  dplyr::select(.draw, race, prob_high_fear) %>%
  pivot_wider(names_from = race, values_from = prob_high_fear) %>%
  mutate(diff = Black - White) %>%
  ungroup() %>%
  summarise(
    con = mean(diff),
    ll = quantile(diff, .025),
    ul = quantile(diff, .975)
  )

# Format results for both thresholds
lin_mlm_results_100 <- bind_rows(
  lin_mlm_probs_100 %>% 
    mutate(
      model = "Linear MLM: Precise (y>=4)", 
      estimate_group = case_when(
        race == "Black" ~ "Pred. Probability: P(\"Very Afraid\"|Black)",
        race == "White" ~ "Pred. Probability: P(\"Very Afraid\"|White)"
      )
    ) %>% 
    dplyr::select(model, estimate_group, prob = p, lower = ll, upper = ul),
  lin_mlm_contrasts_100 %>% 
    mutate(
      model = "Linear MLM: Precise (y>=4)", 
      estimate_group = "Contrast: P(\"Very Afraid\"|Black) - P(\"Very Afraid\"|White)",
      prob = con
    ) %>% 
    dplyr::select(model, estimate_group, prob, lower = ll, upper = ul)
)

lin_mlm_results_87.5 <- bind_rows(
  lin_mlm_probs_87.5 %>% 
    mutate(
      model = "Linear MLM: Generous (y>=3.5)", 
      estimate_group = case_when(
        race == "Black" ~ "Pred. Probability: P(\"Very Afraid\"|Black)",
        race == "White" ~ "Pred. Probability: P(\"Very Afraid\"|White)"
      )
    ) %>% 
    dplyr::select(model, estimate_group, prob = p, lower = ll, upper = ul),
  lin_mlm_contrasts_87.5 %>% 
    mutate(
      model = "Linear MLM: Generous (y>=3.5)", 
      estimate_group = "Contrast: P(\"Very Afraid\"|Black) - P(\"Very Afraid\"|White)",
      prob = con
    ) %>% 
    dplyr::select(model, estimate_group, prob, lower = ll, upper = ul)
)

# Calculate observed proportions 
obs_props <- dat %>%
  # For each outcome and race
  group_by(race) %>%
  summarize(
    across(all_of(fearpol_new), ~mean(. == 4)),
    .groups = "drop"
  ) %>%
  # Get mean across all items for each race
  mutate(
    prob = rowMeans(dplyr::select(., all_of(fearpol_new)))
  ) %>%
  # Format to match other results
  mutate(
    model = "Observed Proportions",
    estimate_group = case_when(
      race == "Black" ~ "Pred. Probability: P(\"Very Afraid\"|Black)",
      race == "White" ~ "Pred. Probability: P(\"Very Afraid\"|White)"
    ),
    lower = NA,
    upper = NA
  ) %>%
  dplyr::select(model, estimate_group, prob, lower, upper)

# Calculate observed contrast
black_prop <- obs_props %>% 
  dplyr::filter(estimate_group == "Pred. Probability: P(\"Very Afraid\"|Black)") %>% 
  pull(prob)
white_prop <- obs_props %>% 
  dplyr::filter(estimate_group == "Pred. Probability: P(\"Very Afraid\"|White)") %>% 
  pull(prob)

obs_contrast <- data.frame(
  model = "Observed Proportions",
  estimate_group = "Contrast: P(\"Very Afraid\"|Black) - P(\"Very Afraid\"|White)",
  prob = black_prop - white_prop,
  lower = NA,
  upper = NA
)

# Combine everything
combined_results_with_obs <- bind_rows(
  obs_props,
  obs_contrast,
  mlm_probs_results,
  mlm_contrasts_result,
  lin_mlm_results_100,
  lin_mlm_results_87.5,
  lin_mod_100,  # Add back normed results
  lin_mod_87.5  # Add back normed results
) %>%
  # Update factor levels to maintain order
  mutate(
    model = factor(model, 
                  levels = c("Observed Proportions",
                           "Ordinal MLM (y=4)",
                           "Linear MLM: Precise (y>=4)",
                           "Linear MLM: Generous (y>=3.5)",
                           "Linear: Precise (y>=4; normed >= 100)",
                           "Linear: Generous (y>=3.5, normed >=87.5)")),
    estimate_group = factor(estimate_group,
                          levels = c("Pred. Probability: P(\"Very Afraid\"|Black)",
                                   "Pred. Probability: P(\"Very Afraid\"|White)",
                                   "Contrast: P(\"Very Afraid\"|Black) - P(\"Very Afraid\"|White)"))
  )

# Create gt table
gt_table <- combined_results_with_obs %>%
  arrange(estimate_group, model) %>%
  gt(
    groupname_col = "estimate_group",
    rowname_col = "model"
  ) %>%
  fmt_number(
    columns = c(prob, lower, upper),
    decimals = 3
  ) %>%
  cols_label(
    prob = "Estimate",
    lower = "Lower",
    upper = "Upper"
  ) %>%
  tab_spanner(
    label = "95% CrI",
    columns = c(lower, upper)
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_labels()
  ) %>%
  tab_style(
    style = cell_text(align = "left"),
    locations = cells_stub()
  ) %>%
  tab_header(
    title = "TABLE 2. Comparing Ordinal & Linear Model Estimates of 'Very Afraid' of Police"
  ) %>%
  sub_missing(
    columns = everything(),
    rows = everything(),
    missing_text = "--"
  ) %>% 
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_row_groups(groups = everything())
  )

gt_table
TABLE 2. Comparing Ordinal & Linear Model Estimates of 'Very Afraid' of Police
Estimate
95% CrI
Lower Upper
Pred. Probability: P("Very Afraid"|Black)
Observed Proportions 0.321
Ordinal MLM (y=4) 0.328 0.300 0.356
Linear MLM: Precise (y>=4) 0.133 0.126 0.140
Linear MLM: Generous (y>=3.5) 0.233 0.224 0.243
Linear: Precise (y>=4; normed >= 100) 0.110 0.092 0.131
Linear: Generous (y>=3.5, normed >=87.5) 0.211 0.185 0.239
Pred. Probability: P("Very Afraid"|White)
Observed Proportions 0.086
Ordinal MLM (y=4) 0.076 0.067 0.087
Linear MLM: Precise (y>=4) 0.019 0.017 0.021
Linear MLM: Generous (y>=3.5) 0.045 0.042 0.049
Linear: Precise (y>=4; normed >= 100) 0.011 0.007 0.015
Linear: Generous (y>=3.5, normed >=87.5) 0.030 0.023 0.040
Contrast: P("Very Afraid"|Black) - P("Very Afraid"|White)
Observed Proportions 0.235
Ordinal MLM (y=4) 0.251 0.218 0.284
Linear MLM: Precise (y>=4) 0.114 0.108 0.121
Linear MLM: Generous (y>=3.5) 0.188 0.179 0.198
Linear: Precise (y>=4; normed >= 100) 0.099 0.082 0.119
Linear: Generous (y>=3.5, normed >=87.5) 0.181 0.156 0.208

Now, let’s compare this linear multilevel model with the original cumulative probit multilevel model using LOO cross-validation to see if there are substantial differences in the fit of each model’s predictions.

Show code
# Compare models
loo_compare(hier_mod_loo, hier_mod_linear_loo)
                elpd_diff se_diff
hier_mod            0.0       0.0
hier_mod_linear -1919.0      48.6

This LOO comparison reveals a dramatic difference between the two models. The extremely large ELPD difference of -1919.0 is far beyond the standard error of 48.6 (more than 39 standard errors).

As before when comparing linear and cumulative probit models, we received a warning about ‘yhash’ attributes not matching. This is because the items are formatted differently - as numeric for the linear model and as ordered factors for the cumulative probit model. Other than this formatting difference and our specification of different distributional families (Gaussian vs. cumulative probit), the underlying data structure is essentially the same (just represented differently).

So, the LOO comparison should still provide meaningful insights. Key considerations are:

  • Are the underlying response variables fundamentally the same?
  • Do the models capture the same latent construct?
  • Are the differences in model specification minor?

In our case:

  • Both models use the same 10 fear items
  • The only difference is factor representation (numeric vs. ordered)
  • Posterior predictive checks suggest poor fit for the linear model

Given these points, the LOO comparison is likely informative. Under this assumption, the dramatically worse fit of the linear model (-1919.0 elpd difference) suggests:

  • The ordinal nature of the data is crucial
  • A linear model fails to capture the underlying response structure
  • The cumulative probit model better represents the inherent ordered categorical nature of the data

15 LOO ELPD differences

Let’s pull these various ELPD estimates and differences into a single supplementary table comparing relative model fit.

15.1 APPC

Show code
# Combine LOO results
loo_results <- list(
  hier_mod_loo,
  hier_mod_unequal_var_loo,
  cat_mod_loo,
  hier_mod_linear_loo
)

# Model names in desired order
model_names <- c(
  "Cumulative Probit",
  "Cumulative Probit w/Unequal Variances",
  "Categorical Logit",
  "Linear"
)

# Create mapping between model names & indices in loo_diffs
model_mapping <- c(
  "Cumulative Probit" = "hier_mod",
  "Cumulative Probit w/Unequal Variances" = "hier_mod_unequal_var",
  "Categorical Logit" = "cat_mod",
  "Linear" = "hier_mod_linear"
)

# Extract LOO estimates
loo_df <- do.call(rbind, lapply(loo_results, function(x) x$estimates["elpd_loo", ]))
colnames(loo_df) <- c("Estimate", "SE")
loo_df <- as.data.frame(loo_df)

# Compute LOO differences with categorical model as reference
loo_diffs <- loo_compare(cat_mod_loo,
                        hier_mod_loo,
                        hier_mod_unequal_var_loo,
                        hier_mod_linear_loo)

# Create data frame with differences in the correct order
loo_diffs_df <- data.frame(
  Model = model_names,
  elpd_diff = numeric(length(model_names)),
  se_diff = numeric(length(model_names))
)

# Fill in differences based on model mapping
for(i in seq_along(model_names)) {
  if(model_names[i] == "Categorical Logit") {
    loo_diffs_df$elpd_diff[i] <- 0
    loo_diffs_df$se_diff[i] <- 0
  } else {
    model_key <- model_mapping[model_names[i]]
    loo_diffs_df$elpd_diff[i] <- loo_diffs[model_key, "elpd_diff"]
    loo_diffs_df$se_diff[i] <- loo_diffs[model_key, "se_diff"]
  }
}

# Create final table
elpd_table <- loo_df %>%
  mutate(
    Model = model_names,
    `ELPD Difference` = loo_diffs_df$elpd_diff,
    `SE of Difference` = loo_diffs_df$se_diff
  ) %>%
  dplyr::select(Model, everything()) %>%
  gt() %>%
  fmt_number(
    columns = c(Estimate, SE, `ELPD Difference`, `SE of Difference`),
    decimals = 1
  ) %>%
  cols_label(
    Estimate = "ELPD LOO",
    SE = "SE of ELPD",
    `ELPD Difference` = "ELPD Difference",
    `SE of Difference` = "SE of Difference"
  ) %>%
  tab_header(
    title = "APPENDIX C. Multilevel Model Comparisons Using Leave-One-Out Cross-Validation"
  ) %>%
  tab_footnote(
    footnote = "ELPD LOO: Expected Log Pointwise Predictive Density",
    locations = cells_column_labels(columns = Estimate)
  ) %>%
  tab_footnote(
    footnote = "Difference compared to category-specific logit model",
    locations = cells_column_labels(columns = `ELPD Difference`)
  )

elpd_table
APPENDIX C. Multilevel Model Comparisons Using Leave-One-Out Cross-Validation
Model ELPD LOO1 SE of ELPD ELPD Difference2 SE of Difference
Cumulative Probit −15,263.0 47.0 −84.8 14.7
Cumulative Probit w/Unequal Variances −15,262.8 47.0 −84.7 14.7
Categorical Logit −15,178.2 47.9 0.0 0.0
Linear −17,182.0 56.6 −2,003.8 50.4
1 ELPD LOO: Expected Log Pointwise Predictive Density
2 Difference compared to category-specific logit model

16 APP-A: Replication data

This code chunk saves four pre-processed replication datasets to use with the tutorial code provided in the Quarto file containing Appendix 3 code.

Show code
# Setup for data loading and processing
library(here)
library(haven)
library(tidyverse)
library(datawizard) # For row_means

# --- Data Loading and Preprocessing from Original Quarto File ---
# Load original data
dat <- read_dta(file = here("Data", "feardata(12).dta"))

# Select relevant columns
dat <- dat %>%
  dplyr::select(caseid,
                race,
                Q2_1, Q2_2, Q2_3, Q2_4, Q2_5,
                Q2_6, Q2_7, Q2_8, Q2_9, Q2_10)

# Zap Stata formats and labels
dat <- zap_formats(dat)
dat <- zap_labels(dat)
dat <- zap_label(dat)

# Filter for Black and White participants and re-factor 'race'
dat <- dat %>%
  dplyr::filter(race %in% c(1,2)) %>%
  mutate(
    caseid = as_factor(caseid),
    race = as_factor(if_else(race == 2, "Black", "White"))
  )

# Recode fear items (0=very unafraid to 4=very afraid)
fearpol_old <- c(paste0("Q2_", 1:10))
fearpol_new <- c("stopyou", "searchyou", "yellyou", "handcyou",
             "kickyou", "pinyou", "sprayyou", "taseyou",
             "shootyou", "killyou")
dat <- dat %>%
  mutate(
    across(all_of(fearpol_old), ~ 5 - .x, .names = "{.col}_new") %>%
      setNames(fearpol_new)
  )

# Drop old fear columns and remove NA
dat <- dat %>%
  dplyr::select(-all_of(fearpol_old)) %>%
  drop_na()

# Create ordered factor versions for ordinal models
dat <- dat %>%
  mutate(across(all_of(fearpol_new),
                ~factor(., levels = 0:4, ordered = TRUE),
                .names = "{.col}_ord"))

# Create long format data for multilevel models
dat_long <- dat %>%
  pivot_longer(
    all_of(fearpol_new),
    names_to = "outcome",
    values_to = "fear_numeric" # Keep as numeric for linear MLM
  ) %>%
  mutate(
    fear = factor(fear_numeric, levels = 0:4, ordered = TRUE), # Ordered factor for CP MLM
    fear_categorical = factor(fear_numeric, levels = 0:4, ordered = FALSE), # Unordered factor for categorical logit
    outcome = factor(outcome) # Factor for random effects grouping
  )

# Create the normed fearpol scale for linear replication
dat <- dat %>%
  mutate(
    fearpol = datawizard::row_means(pick(all_of(fearpol_new))),
    fearpol_norm = (fearpol - min(fearpol, na.rm = TRUE)) /
      (max(fearpol, na.rm = TRUE) - min(fearpol, na.rm = TRUE)) * 100
  )

# --- Save Replication Data for Appendix 3 ---
# Ensure the 'Data' directory exists within 'here()'
if (!dir.exists(here("Data"))) {
  dir.create(here("Data"))
}

# Save the main 'dat' object (with fearpol_norm and ordered factors)
save(dat, file = here("Data", "replication_data.RData"))

# Save the 'dat_long' object (for multilevel models)
save(dat_long, file = here("Data", "replication_data_long.RData"))

message("Replication datasets 'replication_data.RData' and 'replication_data_long.RData' saved successfully in the 'Data' folder.")