Primary Supplement for ‘When the Mean is Meaningless’
Authors
Jonathan R. Brauer
Jacob C. Day
Published
July 30, 2025
1 Introduction
This file is the online supplement for the paper “When the Mean is Misleading: A Guide to Ordered Regression for Meaningfully Modeling Ordinal Outcomes” (Jonathan R. Brauer & Jacob C. Day). Our primary aim with this paper is to show how and why to use ordered regression techniques like cumulative probit regression when analyzing ordinal outcomes (and perhaps count and continuous outcome too!). We illustrate via simulations and then by applying these methods in an analysis of data from Pickett, Graham, and Cullen’s 2022 study reported in Criminology in “The American racial divide in fear of the police” and shared as part of their supplemental materials on OSF.
This primary supplement contains all R code and output reported in the manuscript as well as other supplementary materials. In addition to this primary supplement, we have also provided a standalone tutorial in a separate Quarto file (Appendix A) that provides a step-by-step guide to our progression from frequentist and Bayesian versions of linear, to ordinal, to multilevel ordinal models.
1.1 Why this paper?
We picked Pickett, Graham, & Cullen’s paper for several reasons. First, we wanted to illustrate these methods with data on an important topic. That way, if the tutorial’s analysis improves our understanding at all, even if slightly, then our methods paper might also help us better understand a topic about that a lot of people care about.
Second, we wanted to analyze publicly available data. We searched a LOT of criminology papers that we thought would be good candidates and usually came up empty at this juncture. So, THANK YOU to Pickett, Graham, and Cullen for practicing open science and making this analysis possible!
Third, we wanted to pick a paper that we thought was already well done - that is, a paper that did a good job of carefully and thoroughly applying traditional tools to extract rich, meaningful signals from data. In doing so, it would make our job easier as we would already have a good sense of signals in the data and we could avoid taking too many methodological detours. This approach might also help readers better assess what, if anything, might be gained by spending valuable time learning new methods compared to traditional practices published in our top journals.
Finally, our main goal was to find data for a study in which researchers analyze Likert-type ordinal survey items as outcomes and combine them into composite scales so we could illustrate how ordinal methods might help us better understand signals in our data compared to traditional approaches in this common scenario. The methods can be - and we think often should be - applied to other types of data, including rare count outcomes (e.g., self-reported crime) and even continuous outcomes with thousands of levels! (See Frank Harrell, e.g. here and here).
2 Load libraries & settings
I will start by loading various libraries that I often use for such projects.
Show code
# Packages I needlibrary(here)library(haven)library(tidyverse)library(gt)library(janitor)library(vtable)library(modelsummary)library(easystats)library(marginaleffects)library(ggplot2)library(GGally)library(ggeffects)library(ggdist)library(patchwork)library(parallelly)library(bayesplot)library(ggdist)library(correlation)library(sjmisc)library(forcats)library(ggtext)library(ggmosaic)library(gtExtras)library(tidybayes)library(scales)library(purrr)library(viridis)library(loo)library(broom) #need MASS for frequentist polr but it masks common dplyr functions #use explicit dplyr::select() and dplyr::filter calls # or don't load MASS & call MASS::polr() instead library(MASS) library(extrafont)library(stringr)library(ragg)# to use brms, do following:# install rtools 4.4 # install rstan# https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started# test ability to install packages from sources using Rtools: # install.packages("Rcpp", type="source")# it works# remove any existing rstan installs# remove.packages("rstan")# if (file.exists(".RData")) file.remove(".RData")# run the next line if you already have rstan installed# remove.packages(c("StanHeaders", "rstan"))# install rstan# install.packages("rstan", repos = c('https://stan-dev.r-universe.dev', getOption("repos")))# test rstan install# example(stan_model, package = "rstan", run.dontrun = TRUE)# it works library(rstan)# install cmdstanr (alternative brms backend for interfacing w/stan)# install.packages("cmdstanr", repos = c('https://stan-dev.r-universe.dev', getOption("repos")))# newest developmental version# install.packages("remotes")# remotes::install_github("stan-dev/cmdstanr", force=TRUE)# Sys.getenv("RTOOLS44_HOME")# Sys.getenv("PATH")# readLines("~/.Renviron")# had to delete old .Reviron fie pointing to Rtools40 from Documents folder # if prefer cmdstanr backend to brms, then: # cmdstanr::install_cmdstan()# cmdstanr::check_cmdstan_toolchain(fix = TRUE)# library(cmdstanr)library(rstanarm)library(brms)print("T/F: Root 'here()' folder contains subfolder 'Models'")
#check for Models folder to save figures, create if one does not existifelse(dir.exists(here("Models")), TRUE, dir.create(here("Models")))
[1] TRUE
I will also set some default settings.
Show code
#Fonts# First time setup (run this once in console, not in your document):# Import and register fonts (only needed once per system)# Uncomment if you haven't imported fonts before# extrafont::font_import()# Load the fonts into the R sessionextrafont::loadfonts(device ="win", quiet =TRUE) # For Windows# extrafont::loadfonts(device = "pdf", quiet = TRUE) # For PDF output# extrafont::loadfonts(device = "postscript", quiet = TRUE) # For PostScript output# If you only imported Times New Roman, load this instead: # Load the specific font you need for this session# if(!("Times New Roman" %in% windowsFonts())) {# windowsFonts(`Times New Roman` = windowsFont("Times New Roman"))# }#To improve font rendering in viewer:# Configure knitr to use the ragg device which handles fonts betterknitr::opts_chunk$set(dev ="ragg_png",dpi =300)# Set the default theme for all ggplot2 plotstheme_set(theme_minimal(base_family ="Times New Roman"))#set default knit & scientific notation optionsknitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE, fig.align="center")options(scipen =999, digits =2) #set global option to two decimals # identify & set # of available cores# replaced parallel::detectCores()# https://www.jottr.org/2022/12/05/avoid-detectcores/nCoresphys <- parallelly::availableCores(logical =FALSE) options(mc.cores = parallelly::availableCores(logical=FALSE))#options(mc.cores = 4, logical=FALSE)#alt: can specify exact number of cores desired here &/or in models/ppchecks color_scheme_set("viridisA")# https://mc-stan.org/bayesplot/reference/bayesplot-colors.html
3 Simulation: Figure 1
Before getting into real data, let’s illustrate the underlying issue using a simulation.
I will simulate five different ordinal distributions with distinct shapes. Think of them as five groups, with N=250 observations per distributional group (N=1,250 total).
Show code
# Set seed for reproducibilityset.seed(1138)# Number of observations per groupn <-250# Function to simulate ordinal data based on probabilitiessimulate_ordinal <-function(probs, n) {sample(1:length(probs), size = n, replace =TRUE, prob = probs)}# Distribution 0: Symmetric Normal (peak at 3)probs_symmetric <-c(0.10, 0.20, 0.40, 0.20, 0.10) # Symmetric around 3data_symmetric <-simulate_ordinal(probs_symmetric, n)# Original distributionsprobs_uniform <-rep(0.2, 5)data_uniform <-simulate_ordinal(probs_uniform, n)probs_bimodal <-c(0.3, 0.1333, 0.1333, 0.1333, 0.3)data_bimodal <-simulate_ordinal(probs_bimodal, n)probs_skew_left <-c(0.10, 0.20, 0.15, 0.25, 0.30)data_skew_left <-simulate_ordinal(probs_skew_left, n)probs_skew_right <-c(0.30, 0.25, 0.15, 0.20, 0.10)data_skew_right <-simulate_ordinal(probs_skew_right, n)# Combine data into a data framedf <-data.frame(group =factor(rep(c("1. Symmetric Normal", "2. Uniform", "3. Bimodal", "4. Skewed Left", "5. Skewed Right"), each = n),levels =c("1. Symmetric Normal", "2. Uniform", "3. Bimodal", "4. Skewed Left", "5. Skewed Right")),response =c(data_symmetric, data_uniform, data_bimodal, data_skew_left, data_skew_right))# Function to get normal curve points for each group separatelyget_normal_curve <-function(data, group_name, x_range =seq(0.5, 5.5, length.out =100)) { mean_val <-mean(data) sd_val <-sd(data) y_vals <-dnorm(x_range, mean = mean_val, sd = sd_val) scaling_factor <- (max(table(data))/length(data)) /max(y_vals)data.frame(x = x_range, y = y_vals * scaling_factor *100,group = group_name )}# Function to get ordered probit predicted probabilities with CIsget_probit_probs <-function(data, group_name, n_boot =1000) { model <-polr(ordered(response) ~1, data =data.frame(response = data), method ="probit") probs <-predict(model, newdata =data.frame(response =1), type ="probs") boot_probs <-matrix(nrow = n_boot, ncol =5)for(i in1:n_boot) { boot_data <-sample(data, replace =TRUE) boot_model <-try(polr(ordered(boot_data) ~1, method ="probit"), silent =TRUE)if(!inherits(boot_model, "try-error")) { boot_probs[i,] <-predict(boot_model, newdata =data.frame(response =1), type ="probs") } } ci_lower <-apply(boot_probs, 2, quantile, 0.025, na.rm =TRUE) ci_upper <-apply(boot_probs, 2, quantile, 0.975, na.rm =TRUE)data.frame(x =1:5,y =as.numeric(probs) *100,lower = ci_lower *100,upper = ci_upper *100,group = group_name )}# Get curves for each groupcurves_df <-bind_rows(get_normal_curve(data_symmetric, "1. Symmetric Normal"),get_normal_curve(data_uniform, "2. Uniform"),get_normal_curve(data_bimodal, "3. Bimodal"),get_normal_curve(data_skew_left, "4. Skewed Left"),get_normal_curve(data_skew_right, "5. Skewed Right"))# Get probit probabilities with CIs for each groupprobit_df <-bind_rows(get_probit_probs(data_symmetric, "1. Symmetric Normal"),get_probit_probs(data_uniform, "2. Uniform"),get_probit_probs(data_bimodal, "3. Bimodal"),get_probit_probs(data_skew_left, "4. Skewed Left"),get_probit_probs(data_skew_right, "5. Skewed Right"))# Update the stats_df creation to prepend A. through E.stats_df <- df %>%group_by(group) %>%summarise(mean =round(mean(response), 2),sd =round(sd(response), 2) ) %>%mutate(# Create new labels with A. or B. etc. prependednew_label =case_when( group =="1. Symmetric Normal"~paste0("A. Normal-like\n(mean=", mean, ", sd=", sd, ")"),TRUE~paste0( letters[match(group, unique(group)) +0] %>%toupper(), ". ", str_remove(group, "^\\d+\\.\\s+"), "\n(mean=", mean, ", sd=", sd, ")" ) ) )# Create two separate plots# Plot 1: Normal-approxp1 <-ggplot(df %>% dplyr::filter(group =="1. Symmetric Normal"), aes(x = response)) +geom_bar(aes(y =after_stat(prop)), fill ="#FEC488FF", color ="#FEC488FF") +geom_line(data = curves_df %>% dplyr::filter(group =="1. Symmetric Normal") %>%mutate(y = y/100), aes(x = x, y = y), color ="#341069FF", linewidth =1.5) +geom_pointrange(data = probit_df %>% dplyr::filter(group =="1. Symmetric Normal") %>%mutate(y = y/100, lower = lower/100, upper = upper/100), aes(x = x, y = y, ymin = lower, ymax = upper),color ="#D3436EFF", size =1.5, linewidth =1, fatten =2) +facet_wrap(~group, ncol =1, labeller =as_labeller(setNames(stats_df$new_label, stats_df$group))) +scale_x_continuous(breaks =1:5) +scale_y_continuous(limits =c(0, 0.45), expand =expansion(mult =c(0.02, 0.08))) +labs(x ="Response Category", y ="Proportion") +theme_minimal() +theme(text =element_text(family ="serif"),axis.title.x =element_text(size=12), axis.title.y =element_text(size=12), axis.text.x =element_text(size=12), axis.text.y =element_text(size=12), strip.text =element_text(hjust =0, size=14), panel.grid =element_blank() )# Plot 2: Other distributions in 2x2 gridp2 <-ggplot(df %>% dplyr::filter(group !="1. Symmetric Normal"), aes(x = response)) +geom_bar(aes(y =after_stat(prop)), fill ="#FEC488FF", color ="#FEC488FF") +geom_line(data = curves_df %>% dplyr::filter(group !="1. Symmetric Normal") %>%mutate(y = y/100), aes(x = x, y = y), color ="#341069FF", linewidth =1.5) +geom_pointrange(data = probit_df %>% dplyr::filter(group !="1. Symmetric Normal") %>%mutate(y = y/100, lower = lower/100, upper = upper/100), aes(x = x, y = y, ymin = lower, ymax = upper),color ="#D3436EFF", size =1.5, linewidth =1, fatten =2) +facet_wrap(~group, ncol =2, labeller =as_labeller(setNames(stats_df$new_label, stats_df$group))) +scale_x_continuous(breaks =1:5) +scale_y_continuous(limits =c(0, 0.45), expand =expansion(mult =c(0.02, 0.08))) +labs(x ="Response Category", y ="") +theme_minimal() +theme(text =element_text(family ="serif"),axis.title.x =element_text(size=12), axis.title.y =element_text(size=12), axis.text.x =element_text(size=12), axis.text.y =element_text(size=12), strip.text =element_text(hjust =0, size=14), panel.grid =element_blank())# Combine plots using patchwork# combined_plot <- p1 + p2 + # plot_layout(widths = c(1, 2)) +# plot_annotation(# title = "FIGURE1. Comparing Metric vs. Ordinal View of Data",# subtitle = "Data: Light bars show observed (simulated) proportions\nMetric view: Dark line shows linear model-implied normal curve\nOrdinal view: Point-intervals show ordered probit model-implied predicted probabilities & 95% CI",# theme = theme(# plot.title = element_text(hjust = 0.5),# plot.subtitle = element_text(hjust = 0)# )# )# Create spacers for above/below normal-liketop_spacer <-plot_spacer()bottom_spacer <-plot_spacer()# Combine plots w/custom layoutcombined_plot <- top_spacer + p2 + p1 + bottom_spacer +plot_layout(design =" AB CB DB ",heights =c(1, 2, 1), widths =c(1, 2) ) +plot_annotation(title ="FIGURE1. Comparing Metric vs. Ordinal View of Data",subtitle ="Data: Light bars show observed (simulated) proportions\nMetric view: Dark line shows linear model-implied normal curve\nOrdinal view: Point-intervals show ordered probit model-implied predicted probabilities & 95% CI",theme =theme(plot.title =element_text(hjust =0.5),plot.subtitle =element_text(hjust =0) ) )
3.1 FIG1
Show code
combined_plot
Clearly, these distributions are very different. However, a linear model would predict no mean differences between three of these groups (A, B, C). Meanwhile, although a linear model might detect differences between the two skewed distributions (D vs. E) with a large enough sample, the model-implied differences on the mean scale are hard to interpret, and predicted differences on response probability scale may depart drastically from observed differences in the data. For example, the linear model “view” only sees about ~1 response category difference between D & E, with means implying the typical observation in D is a 3 or 4 and the typical observation in E is a 2 or 3. Yet, the majority of observations in D were in categories 4 & 5 (not 3 & 4), whereas the majority of observations in E were in 1 & 2 (not 2 & 3)!
Before moving on, it might help to visualize the latent response thresholds.
Show code
# Define the threshold annotation functionadd_probit_thresholds <-function(plot) { plot +# Add vertical lines at each response category boundarygeom_vline(xintercept =1:4+0.5, linetype ="dashed", color ="gray60", alpha =0.5) +annotate("text", x =2.5, y =0.4, label ="Cumulative\nThresholds", color ="gray50", hjust =0.5)}# Modify both p1 and p2 to add threshold labelsp1 <-ggplot(df %>% dplyr::filter(group =="1. Symmetric Normal"), aes(x = response)) +geom_bar(aes(y =after_stat(prop)), fill ="#FEC488FF", color ="#FEC488FF") +geom_line(data = curves_df %>% dplyr::filter(group =="1. Symmetric Normal") %>%mutate(y = y/100), aes(x = x, y = y), color ="#341069FF", linewidth =1.5) +geom_pointrange(data = probit_df %>% dplyr::filter(group =="1. Symmetric Normal") %>%mutate(y = y/100, lower = lower/100, upper = upper/100), aes(x = x, y = y, ymin = lower, ymax = upper),color ="#D3436EFF", size =1.5, linewidth =1, fatten =2) +# Add vertical threshold lines with tau labelsgeom_vline(xintercept =1:4+0.5, linetype ="dashed", color ="gray60", alpha =0.5) +annotate("text", x=1.65, y=0.42, label ="tau[1]", color ="gray50", parse =TRUE, size=6) +annotate("text", x=2.65, y=0.42, label ="tau[2]", color ="gray50", parse =TRUE, size=6) +annotate("text", x=3.65, y=0.42, label ="tau[3]", color ="gray50", parse =TRUE, , size=6) +annotate("text", x=4.65, y=0.42, label ="tau[4]", color ="gray50", parse =TRUE, , size=6) +facet_wrap(~group, ncol =1, labeller =as_labeller(setNames(stats_df$new_label, stats_df$group))) +scale_x_continuous(breaks =1:5) +scale_y_continuous(limits =c(0, 0.45), expand =expansion(mult =c(0.02, 0.08))) +labs(x ="Response Category", y ="Proportion") +theme_minimal() +theme(text =element_text(family ="serif"),strip.text =element_text(hjust =0, size=14), panel.grid =element_blank(), axis.text =element_text(size=14),axis.title =element_text(size=14) )p2 <-ggplot(df %>% dplyr::filter(group !="1. Symmetric Normal"), aes(x = response)) +geom_bar(aes(y =after_stat(prop)), fill ="#FEC488FF", color ="#FEC488FF") +geom_line(data = curves_df %>% dplyr::filter(group !="1. Symmetric Normal") %>%mutate(y = y/100), aes(x = x, y = y), color ="#341069FF", linewidth =1.5) +geom_pointrange(data = probit_df %>% dplyr::filter(group !="1. Symmetric Normal") %>%mutate(y = y/100, lower = lower/100, upper = upper/100), aes(x = x, y = y, ymin = lower, ymax = upper),color ="#D3436EFF", size =1.5, linewidth =1, fatten =2) +# Add vertical threshold lines with tau labelsgeom_vline(xintercept =1:4+0.5, linetype ="dashed", color ="gray60", alpha =0.5) +annotate("text", x=1.65, y=0.42, label ="tau[1]", color ="gray50", parse =TRUE, size=6) +annotate("text", x=2.65, y=0.42, label ="tau[2]", color ="gray50", parse =TRUE, size=6) +annotate("text", x=3.65, y=0.42, label ="tau[3]", color ="gray50", parse =TRUE, , size=6) +annotate("text", x=4.65, y=0.42, label ="tau[4]", color ="gray50", parse =TRUE, , size=6) +facet_wrap(~group, ncol =2, labeller =as_labeller(setNames(stats_df$new_label, stats_df$group))) +scale_x_continuous(breaks =1:5) +scale_y_continuous(limits =c(0, 0.45), expand =expansion(mult =c(0.02, 0.08))) +labs(x ="Response Category", y ="") +theme_minimal() +theme(text =element_text(family ="serif"),strip.text =element_text(hjust =0.5, size=14), panel.grid =element_blank(), axis.text =element_text(size=14),axis.title =element_text(size=14))combined_plot <- p1 + p2 +plot_layout(widths =c(1, 2)) +plot_annotation(title ="Comparing Metric vs. Ordinal View of Data",subtitle ="Data: Light bars show observed (simulated) proportions\nMetric view: Dark line shows linear model-implied normal curve\nOrdinal view: Point-intervals show ordered probit model predicted probabilities & 95% CI\n Dashed lines represent category thresholds in latent space",theme =theme(plot.title =element_text(hjust =0.5),plot.subtitle =element_text(hjust =0)) )combined_plot
Following Kurz’s brms adaptation of Kruschke, we can also visualize the latent thresholds by fitting Bayesian threshold-only models and then plotting draws from the posterior distribution.
Show code
# Function to fit and save brms model for each distributionfit_brms_model <-function(data, group_name) {# Prepare the data my_data <-data.frame(response = data)# Fit a Bayesian cumulative probit model fit <-brm(data = my_data,family =cumulative(probit), response ~1,prior =prior(normal(0, 4), class = Intercept),iter =3000, warmup =1000, chains =4, cores =4,seed =123,file =paste0("Models/sim_fit_", group_name),file_refit ="on_change" )return(fit)}# Fit models for each distributionfit_symmetric <-fit_brms_model(data_symmetric, "symmetric")fit_uniform <-fit_brms_model(data_uniform, "uniform")fit_bimodal <-fit_brms_model(data_bimodal, "bimodal")fit_skew_left <-fit_brms_model(data_skew_left, "skew_left")fit_skew_right <-fit_brms_model(data_skew_right, "skew_right")# Create a list of fits for easy referencefits_list <-list(symmetric = fit_symmetric,uniform = fit_uniform,bimodal = fit_bimodal,skew_left = fit_skew_left,skew_right = fit_skew_right)# Function to create threshold plotcreate_threshold_plot <-function(fit, group_name) {# Extract draws draws <-as_draws_df(fit)# Compute the posterior means for each threshold means <- draws %>%summarise(across(starts_with("b_Intercept"), mean)) %>%pivot_longer(everything(), names_to ="name", values_to ="mean")# Prepare the data with explicit threshold calculation plot_data <- draws %>%pivot_longer(starts_with("b_Intercept"), names_to ="name", values_to ="threshold") %>%group_by(.draw) %>%# Calculate theta_bar as the mean of all thresholds for this drawmutate(theta_bar =mean(threshold))# Create the plotggplot(plot_data, aes(x = threshold, y = theta_bar, color = name)) +# Vertical lines at mean thresholdsgeom_vline(data = means,aes(xintercept = mean, color = name),linetype =2) +# Points with low alpha to show densitygeom_point(alpha =1/10) +scale_color_viridis_d(option ="magma", begin =0.2, end =0.8) +# Dynamic x-axis limits based on actual threshold valuesscale_x_continuous(breaks = means$mean,labels =c("τ1", "τ2", "τ3", "τ4") ) +labs(title = group_name,y ="Mean of Thresholds per Draw",x ="Latent Thresholds") +theme_minimal() +theme(legend.position ="none", axis.text.y =element_blank(), axis.ticks.y =element_blank() )}# Create threshold plots for each modelthreshold_plots <-list(create_threshold_plot(fit_symmetric, "A. Normal-like"),create_threshold_plot(fit_uniform, "B. Uniform"),create_threshold_plot(fit_bimodal, "C. Bimodal"),create_threshold_plot(fit_skew_left, "D. Skewed Left"),create_threshold_plot(fit_skew_right, "E. Skewed Right"))# Combine the threshold plotscombined_threshold_plot <-wrap_plots(threshold_plots, ncol =3)combined_threshold_plot
This simulation demonstrates a scenario where linear and ordinal models yield different substantive conclusions despite zero overall average treatment effects. We model a situation where a highly publicized police incident causes polarized reactions that vary by political ideology, illustrating how distributional changes can be obscured when focusing solely on average effects.
The key insight: Even when ordinal data appears “continuous-like” with symmetric distributions across many categories, the underlying categorical social processes can create meaningful distributional shifts that the cumulative ordinal model captures more accurately compared to the linear model - even with explicit proportional odds assumption violation.
4.1 Simulation Design
Show code
set.seed(1138)n_per_group <-1000# Create symmetric baseline distributions that look "normal enough" for OLS# This mimics what researchers see when they think linear regression should be finegenerate_baseline <-function(n, ideology) {# Both groups start with nearly identical, symmetric 7-point distributionsif(ideology =="Conservative") { probs <-c(0.06, 0.11, 0.21, 0.28, 0.19, 0.10, 0.05) } else { probs <-c(0.05, 0.10, 0.19, 0.28, 0.21, 0.11, 0.06) } responses <-sample(1:7, size = n, replace =TRUE, prob = probs)data.frame(id =1:n,ideology = ideology,time =0,response = responses )}baseline_data <-rbind(generate_baseline(n_per_group, "Conservative"),generate_baseline(n_per_group, "Liberal"))
# Check if models are actually different by comparing log-likelihoodsll_linear <-log_lik(linear_model)ll_ordinal <-log_lik(ordinal_model)cat("\nMean log-likelihood (linear):", round(mean(rowSums(ll_linear)), 2), "\n")
4.7 The Critical Difference: Predicted Probabilities
While both models show similar coefficient patterns, they make fundamentally different predictions about response probabilities. This reveals why focusing solely on coefficient interpretation can be misleading.
Show code
# Generate predictions from both models for post-incident periodnewdata_post <-expand_grid(ideology =c("Conservative", "Liberal"),time =1# Post-incident only)# Get predictions from linear model (need to convert to probabilities)linear_pred_draws <-posterior_epred(linear_model, newdata = newdata_post, re_formula =NA)# Convert linear predictions to "probabilities" by assuming normal distribution# and calculating probability of falling in each category rangeconvert_linear_to_probs <-function(linear_preds, categories =1:7) { n_draws <-nrow(linear_preds) n_conditions <-ncol(linear_preds)# Initialize probability array prob_array <-array(dim =c(n_draws, n_conditions, length(categories)))for(draw in1:n_draws) {for(cond in1:n_conditions) { mean_pred <- linear_preds[draw, cond]# Get residual SD from model using modern syntax sigma <-as_draws_df(linear_model)$sigma[draw]# Calculate probability of each categoryfor(cat in1:length(categories)) {if(cat ==1) { prob_array[draw, cond, cat] <-pnorm(1.5, mean_pred, sigma) } elseif(cat ==length(categories)) { prob_array[draw, cond, cat] <-1-pnorm(cat -0.5, mean_pred, sigma) } else { prob_array[draw, cond, cat] <-pnorm(cat +0.5, mean_pred, sigma) -pnorm(cat -0.5, mean_pred, sigma) } } } }return(prob_array)}# Get ordinal model predictionsordinal_pred_draws <-posterior_epred(ordinal_model, newdata = newdata_post, re_formula =NA)# Convert linear predictions to probabilitieslinear_probs <-convert_linear_to_probs(linear_pred_draws)# Calculate observed proportions for post-incident periodobserved_post <- sim_data %>% dplyr::filter(time ==1) %>%group_by(ideology, response) %>%summarise(count =n(), .groups ="drop") %>%group_by(ideology) %>%mutate(proportion = count /sum(count))# Create comparison datasetprob_comparison <-expand_grid(ideology =c("Conservative", "Liberal"),category =1:7) %>%mutate(condition_idx =case_when( ideology =="Conservative"~1, ideology =="Liberal"~2 ),# Linear model predictionslinear_prob =map2_dbl(condition_idx, category, ~mean(linear_probs[, .x, .y])),linear_lower =map2_dbl(condition_idx, category, ~quantile(linear_probs[, .x, .y], 0.025)),linear_upper =map2_dbl(condition_idx, category, ~quantile(linear_probs[, .x, .y], 0.975)),# Ordinal model predictionsordinal_prob =map2_dbl(condition_idx, category, ~mean(ordinal_pred_draws[, .x, .y])),ordinal_lower =map2_dbl(condition_idx, category, ~quantile(ordinal_pred_draws[, .x, .y], 0.025)),ordinal_upper =map2_dbl(condition_idx, category, ~quantile(ordinal_pred_draws[, .x, .y], 0.975)),category_label =factor(category, levels =1:7,labels =c("Very\nUnafraid", "Unafraid", "Somewhat\nUnafraid", "Neutral", "Somewhat\nAfraid", "Afraid", "Very\nAfraid")) ) %>%# Add observed proportionsleft_join( observed_post %>% dplyr::select(ideology, response, proportion),by =c("ideology", "category"="response") ) %>%# Reshape for plottingpivot_longer(cols =c(linear_prob, ordinal_prob, proportion),names_to ="type", values_to ="probability" ) %>%mutate(lower =case_when( type =="linear_prob"~ linear_lower, type =="ordinal_prob"~ ordinal_lower, type =="proportion"~NA_real_ ),upper =case_when( type =="linear_prob"~ linear_upper, type =="ordinal_prob"~ ordinal_upper, type =="proportion"~NA_real_ ),type_label =factor(type, levels =c("proportion", "linear_prob", "ordinal_prob"),labels =c("Observed", "Linear Model", "Ordinal Model")) )# Plot model comparison with colorblind-friendly palettemodel_comparison_plot <- prob_comparison %>%ggplot(aes(x = category_label, y = probability, color = type_label, shape = type_label)) +geom_point(size =2.5, position =position_dodge(width =0.4)) +geom_errorbar(aes(ymin = lower, ymax = upper), width =0.15, position =position_dodge(width =0.4),na.rm =TRUE) +geom_line(aes(group = type_label), position =position_dodge(width =0.4), alpha =0.7) +facet_wrap(~ideology, labeller = label_both) +scale_color_manual(values =c("Observed"="#000000", "Linear Model"="#D55E00", "Ordinal Model"="#0072B2"),name ="") +scale_shape_manual(values =c("Observed"=17, "Linear Model"=15, "Ordinal Model"=16),name ="") +scale_y_continuous(labels = scales::percent_format(accuracy =1)) +labs(title ="Model Accuracy: Observed vs. Predicted Probabilities",subtitle ="Ordinal model captures bimodal reality; linear model predicts impossible intermediate peaks",x ="Fear Response Category",y ="Probability" ) +theme_minimal(base_family ="serif") +theme(plot.title =element_text(hjust =0.5, size =13, face ="bold"),plot.subtitle =element_text(hjust =0.5, size =11),axis.text =element_text(size =10),axis.title =element_text(size =11),legend.position ="bottom",panel.grid.minor =element_blank(),panel.grid.major.x =element_blank(),strip.text =element_text(size =11, face ="bold") )model_comparison_plot