This file contains all R code and output (e.g., data wrangling; analysis; visualizations) that I conducted for the working titled paper: “Victimogenic Beauty: Attractiveness and Sexual Victimization Risks among University Students” by AUTHORS.
If you are interested in learning about how I became involved in this paper and some preliminary exchanges that shaped the analytic decisions below, click the “Introductory Comments” link below to expand and view these contents. Otherwise, skip over it to loading R libraries and settings.
Introductory Comments
REDACTED PARAGRAPH (Anonymized for peer review)
After reading the initial draft of the front end, I asked if the data will be publicly available upon publication, if I would have access to the data to conduct analyses, and if I could review a copy of the preregistration to compare the preregistered plan with the reported measurements and analysis. AUTHOR affirmed all of the above.
I also recommended adding a correlation matrix and discussed the inclusion of the “number of lifetime sexual partners” (NLP), a proxy for sexual activity, as a covariate in regression models. The addition of this variable to the dataset and model represents an important difference between our original study and the current study. Yet, I questioned the appropriateness of conceptualizing NLP as a confounder that should be adjusted for via covariate stratification and, instead, suggested it might be more appropriately conceptualized and modeled as a mediator. Here was my exact comment:
“The SRPA effect estimate is very small; sure, it is statistically significant, but that is due to the very large sample size. What is the bivariate correlation between SRPA and victimization, and between SRPA and NLP? You referred to NLP as a confounder but, as described, is it not a mediator instead (like time in public or other routine activities)? That is, do you assume that NLP causes SRPA, or SRPA causes NLP (or some alternative causal assumption)? If SRPA (as a proxy for attractiveness) causes more lifetime partners (proxy for sexual activity), then a mediation model is appropriate (like we used).”
My reasoning was based on our original causal model found in Figure 1 (2020, p.101652) and on AUTHOR’s first draft of the manuscript’s front end. Specifically, AUTHOR had argued:
“The interpretation of the link between youth physical attractiveness and sexual victimization among students as a consequence of perpetrator’s selection of more gratifiable targets is challenged by the fact that physically attractive individuals may have different routine activities when compared to less attractive individuals. Possibly because they have a lower level of anxiety and neuroticism and a higher self-esteem and social status (Ciosta & Maestripieri, 2023; Feingold, 1992), physically attractive students have a larger number of friends (Gordon et al., 2013), spend more time in public places (Savolainen et al., 2020) and have on average more sexual interactions (Felson et al., 2021; Rhodes, Simmons, & Petrs, 2005). The risk of sexual victimization being related with individual’s prior sexual history and activity (Abbey et al., 1996; Himelein et al., 1994; Koss & Dinero, 1989; Schapanski et al., 2021; Synovitz, & Byrne, 2010), the attractivity-victimization link could therefore be the consequence of a mere differential exposure to situations involving sexual encounters by attractive individuals.”
The manuscript continued by stating:
“Spending more time at more risky timing is a known risk factor for sexual victimization (Quigg et al. 2020). In a study based on a representative sample of Finnish youths, Savolainen, Brauer & Ellonen (2020) confirmed that attractive individuals had a riskier lifestyle (going more out for parties and at more risky timing), and this feature partially mediated the attractivity-sexual victimization relationship. However, victim’s sexual activity was not controlled in the study and may have contributed to the attractivity-victimization link.”
So, is sexual activity as measured by NLP a confounder or a mediator? For now, this is a theoretical question that requires us to make a causal assumption one way or the other. Sure, it is also an empirical question, but it is not one that can be answered with these data. Barring an explicit research design for identifying and estimating potential causal effects of the exposure (attractiveness) on the mediator (NLP), confounders and mediators are statistically indistinguishable. That is, inclusion of the NLP variable into a regression model would have the same consequence irrespective of whether it is in reality a mediator or a confounder of the SRPA-victimization relationship.
Hence, in my email reply to AUTHOR, I also offered the following suggestion:
“In general, I would state the causal assumptions formally (e.g., in a directed acyclic graph), estimate appropriate model (e.g., indirect effects if appropriate; or, document confounding bias), and then discuss alternative interpretations from different plausible causal assumptions. It is fine if those additional models are not explicitly identified in your preregistration – just be transparent about what is confirmatory and what is exploratory (and, in this case, justified theoretically and follows precedent from replicated study).”
In reply, AUTHOR provided me with a copy of the preregistration and the data. The following represents my attempt to follow my suggestion above. I try to strike a balance between following the preregistration design while also presenting competing “confounding” and “mediating” causal assumptions, and demonstrating how these different assumptions affect interpretation of the findings.
2 Load libraries & set options
Show code
library(here)library(haven)library(ggplot2)library(tidyverse)library(naniar)library(dagitty)library(ggdag)library(ggraph)library(ggdist)library(gt)library(gtExtras)library(ggthemes)library(gtools)# library(summarytools)library(modelsummary)library(ggmosaic)library(GGally)library(viridis)library(ggalluvial)library(patchwork)library(mediation)library(CMAverse)library(rstanarm)library(brms)library(bayesplot)library(tidybayes)library(marginaleffects)library(parallelly)library(ordPens)# https://cran.r-project.org/bin/windows/Rtools/rtools40.html# install.packages("cmdstanr",repos = c("https://mc-stan.org/r-packages/",# getOption("repos")))# library(cmdstanr)# cmdstanr::check_cmdstan_toolchain(fix = TRUE)# check_cmdstan_toolchain()#set default knit & scientific notation optionsknitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE, fig.align="center")options(scipen =999, digits =2) #set global option to two decimals # identify & set # of available cores# replaced parallel::detectCores()# https://www.jottr.org/2022/12/05/avoid-detectcores/nCoresphys <- parallelly::availableCores(logical =FALSE) theme_set(bayesplot::theme_default())# options(mc.cores = parallelly::availableCores(logical=FALSE))# options(mc.cores = 4, logical=FALSE)#alt: can specify exact number of cores desired here &/or in models/ppchecks print("T/F: Root 'here()' folder contains subfolder 'Models'")
#check for Models folder to save figures, create if one does not existifelse(dir.exists(here("Models")), TRUE, dir.create(here("Models")))
[1] TRUE
3 FIG1: DAGs
The preregistration mentions stratifying on age as a potential confounder. It also mentions adjusting for general positive self-perceptions by stratifying on participant’s self-evaluations of intelligence (SRintel) and popularity (SRpopular).
Gender is identified as a possible moderator and calls for separate analyses for males and females. Thus, models using the full sample will stratify on gender, and subsequent models will generate gender-specific estimates.
The preregistration also identifies sexual activity, measured as the respondent’s reported number of lifetime sexual partners (nlp), as a possible confounder. However, it is unclear whether sexual activity would confound or mediate the potential effect of (self-reported) physical attractiveness (SRPA) on sexual victimization. For instance, sexual activity might confound this effect by causing heightened self-assessments and, thus, higher responses on SRPA items. However, higher physical attractiveness might also increase desirability and, likewise, might increase opportunities for and participation in sexual activity which, in turn, increases sexual victimization risks. These two alternative possibilities are demonstrated in DAGs below.
Show code
#function to shorten arrows (edges)# https://stackoverflow.com/questions/65420136/how-do-you-adjust-the-arrows-in-a-ggplot-of-a-ggdagshorten_dag_arrows <-function(tidy_dag, proportion){# Update underlying ggdag objecttidy_dag$data <- dplyr::mutate(tidy_dag$data, xend = (1-proportion/2)*(xend - x) + x, yend = (1-proportion/2)*(yend - y) + y,xstart = (1-proportion/2)*(x - xend) + xend,ystart = (1-proportion/2)*(y-yend) + yend)return(tidy_dag)}################# DAG 1 (NLP as confounder)DAG1 <-dagify( Y ~ A + M + C, A ~ M + C,exposure ="A",outcome ="Y",coords=list(x=c(A=1, M=1.5, Y=2, C=1.5),y=c(A=1, M=2, Y=1, C=3) )) %>%tidy_dagitty() %>% dplyr::mutate(covar =if_else(name =="C", "grey", "darkslategrey"), covar =if_else(name =="M", "maroon", covar) )#function to shorten arrows - set percentage to shortenDAG1p <-shorten_dag_arrows(DAG1, 0.25)#create factor variable to isolate edge of interest, permits specifying edge colortestdat <- DAG1p %>% dplyr::mutate(myedge1 =if_else(DAG1p$data$name =="A"& DAG1p$data$to =="Y", "focal", "other"), modlinetype =ifelse(myedge1 =="focal", "dashed", "solid") ) DAG1p2 <- testdat %>%ggplot(aes(x=x, y=y, xend=xend, yend=yend)) +geom_dag_edges(aes(x = xstart, y = ystart, edge_color=myedge1, edge_linetype = modlinetype), show.legend =FALSE) +geom_dag_text(label=c("Physically\nAttractive\n(SRPA)", "Measured\nConfounders\n(E.g., Gender)", "# of Lifetime\nSexual Partners\n(NLP)", "Victimiz."), aes(color=covar), size=4) +theme_dag() +guides(fill ='none', color ='none', edge_color ='none') +scale_color_manual(values =c("darkslategrey","grey", "maroon")) +scale_edge_colour_manual(values=c("darkslategrey","grey")) +scale_x_continuous(expand=expansion(mult=c(0.3,0.3))) +scale_y_continuous(expand=expansion(mult=c(0.3,0.3))) +annotate("text", x =-Inf, y =Inf, label =paste("DAG 1"), vjust =4, hjust =-.2) +# annotate("text", x = -Inf, y = Inf, label = paste("DAG 1"), vjust = 7, hjust = -2)# ggtitle("DAG 1") + theme(plot.title =element_text(size =14))################# DAG 2 (NLP as mediator)DAG2 <-dagify( Y ~ A + M + C, M ~ A + C, A ~ C,exposure ="A",outcome ="Y",coords=list(x=c(A=1, M=1.5, Y=2, C=1.5),y=c(A=1, M=2, Y=1, C=3) )) %>%tidy_dagitty() %>% dplyr::mutate(covar =if_else(name =="C", "grey", "darkslategrey"), covar =if_else(name =="M", "maroon", covar) )#function to shorten arrows - set percentage to shortenDAG2p <-shorten_dag_arrows(DAG2, 0.25)#create factor variable to isolate edge of interest, permits specifying edge colortestdat <- DAG2p %>% dplyr::mutate(myedge1 =if_else(DAG2p$data$name =="A"& DAG2p$data$to =="M"| DAG2p$data$name =="M"& DAG2p$data$to =="Y"| DAG2p$data$name =="A"& DAG2p$data$to =="Y", "focal", "other"), modlinetype =ifelse(myedge1 =="focal", "dashed", "solid") ) DAG2p2 <- testdat %>%ggplot(aes(x=x, y=y, xend=xend, yend=yend)) +geom_dag_edges(aes(x = xstart, y = ystart, edge_color=myedge1, edge_linetype = modlinetype), show.legend =FALSE) +geom_dag_text(label=c("Physically\nAttractive\n(SRPA)", "Measured\nConfounders\n(E.g., Gender)", "# of Lifetime\nSexual Partners\n(NLP)", "Victimiz."), aes(color=covar), size=4) +theme_dag() +guides(fill ='none', color ='none', edge_color ='none') +scale_color_manual(values =c("darkslategrey","grey", "maroon")) +scale_edge_colour_manual(values=c("darkslategrey","grey")) +scale_x_continuous(expand=expansion(mult=c(0.3,0.3))) +scale_y_continuous(expand=expansion(mult=c(0.3,0.3))) +annotate("text", x =-Inf, y =Inf, label =paste("DAG 2"), vjust =4, hjust =-.2) +# annotate("text", x = -Inf, y = Inf, label = paste("DAG 2"), vjust = 5, hjust = -.5)# ggtitle("DAG 2") + theme(plot.title =element_text(size =14))# ggsave("DAG1.jpeg", DAG1p2)# ggsave("DAG2.jpeg", DAG2p2)
Figure 1. DAGs illustrating assumed causal relationship between self-reported physical attractiveness and victimization, with sexual activity (number of lifetime sexual partners) as a confounder (DAG 1) or mediator (DAG 2)
As a result of this ambiguity, initial models will estimate SRPA effects with and without stratifying on nlp to assess the potential total and net effects after adjusting for confounding and/or mediating effects. Exploratory models will also present estimates of indirect effects; these should be interpreted with caution given the unclear role and lack of precise causal identification.
4 Data
4.1 Description of variables
I received an SPSS datafile containing the relevant items needed to conduct the analysis. The file I received appears to be a raw subset of a much larger database containing survey responses of university students in France. In his initial drafts, AUTHOR had retained as eligible participants only males and female university students (dropping non-students and those who did not answer the student status question). He had also removed those who failed two attention checks. Specifically, participants were asked if they gave accurate answers at the beginning of the questionnaire and at the very end; unreliable responses and non-responses were dropped.
Q3. Selon l'état civil à la naissance, quel est votre sexe ?
Q5. Quel est votre âge ?
Q10. Êtes-vous actuellement étudiant.e ?
Q17. Avez-vous une activité rémunérée pour subvenir à vos études (hors stage) ?
Q721. Si vous avez commencé le questionnaire « juste pour voir » et avez répondu de manière très approximative ou inexacte, pouvez-vous le préciser ? Jusqu’à présent, j’ai répondu de manière précise et exacte.
Q115. Les questions suivantes portent sur votre vie affective et sexuelle. Avez-vous déjà eu un rapport sexuel ?
Q117. Avec combien de partenaires en tout avez-vous eu au moins un rapport sexuel ?
Q331 Depuis que vous êtes étudiant.e dans l’enseignement supérieur, est-ce que quelqu’un a tenté contre votre volonté de vous faire l’une des choses suivantes, mais sans y parvenir : Vous embrasser, toucher la poitrine, les seins, l’entrejambe, l’ain
Q368 Depuis que vous êtes étudiant.e dans l’enseignement supérieur, est-ce que quelqu’un vous a effectivement fait contre votre volonté l’une des choses suivantes : Vous embrasser, toucher la poitrine, les seins, l’entrejambe, l’aine, les fesses, a
Q148 Depuis que vous êtes étudiant.e dans l’enseignement supérieur, est-ce que quelqu’un a tenté contre votre volonté de vous faire l’une des choses suivantes, mais sans y parvenir : Pénétration sexuelle (quand une personne introduit son pénis, doi
Q405 Depuis que vous êtes étudiant.e dans l’enseignement supérieur, est-ce que quelqu’un vous a effectivement fait contre votre volonté l’une des choses suivantes : Pénétration sexuelle (quand une personne introduit son pénis, doigt ou un objet dan
Q81 Intelligente
Q82 Populaire, ma compagnie est recherchée
Q83 Physiquement séduisant.e
Q85 Attirant.e., remarqué.e pour sa beauté physique
Q765 Pouvez-vous confirmer que vous avez répondu aussi sincèrement que possible à cette étude (si vous répondez "non", vous pourrez tout de même participer au tirage au sort et recevoir les résultats) ?
Q84 Fort.e physiquement, musclé.e, puissant.e
1
23
Oui
Non
Oui
Oui
NA
1
1
1
1
7
7
7
7
1
7
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
2
21
Oui
Non
Oui
Non
NA
0
0
0
0
6
6
4
4
1
4
2
18
Oui
Oui
Oui
Oui
3
0
0
0
0
5
3
5
4
NA
4
2
18
Oui
Non
Oui
Oui
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
Oui
NA
Oui
Oui
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
Show code
# ChecK1: Approximate translation: "Q10. Are you currently a student?" Retain affirmative responses ("Oui")# CHECK2: Approx. translation: "Q721. If you started the questionnaire “just to see” and answered very roughly or inaccurately, can you clarify this? So far I have responded precisely and accurately." Retain participants who responded affirmatively ("Oui") that they have responded precisely and accurately # Q765_final_check: Approx. translation: "Q765 Can you confirm that you responded as truthfully as possible to this study (if you answer "no", you will still be able to participate in the draw and receive the results)?" Drop disconfirming ("3") responses. #initial N=86,426 rows datv3 <- datv3 %>% zap_labels %>% zap_label %>% dplyr::filter(ChecK1 =="Oui") %>%#N=77,597 (dropped n=8,829 non-students) dplyr::filter(CHECK2 =="Oui") %>%#N=73,093 (dropped n=4,504 initial attn check) dplyr::filter(Q765_final_check %in%c(1,2)) %>%#N=55,786 (dropped n=17,307 reportedly "unreliable" [=3, n=89] or missing a response [NA, n=17,218] on final attn check) dplyr::filter(Gender_HF %in%c("1","2")) %>%#dropped N=240 not reporting M/F sex on birth records (neither: n=44; don't know: n=22; don't wish to answer: n=163; NA: n=11)rename(sexvic1 = Q331_Attempt_sexual_touch_Vict1, sexvic2 = Q368_Sexual_touch_Vict2, sexvic3 = Q148_Attempt_rape_Vict3, sexvic4 = Q405_Rape_Vict4, SRPA1 = Q83_physically_seducing_SRPA1, SRPA2 = Q85_Attractive_SRPA2, nlp = Lifetime_N_sex_partners,age = Age ) %>%mutate(nlp =if_else(Ever_sexual_intercourse =="Non", 0, nlp), #skip pattern to replace NA with "0" sex partners for valid observationseversex =if_else(Ever_sexual_intercourse =="Oui", "1", Ever_sexual_intercourse), eversex =if_else(Ever_sexual_intercourse =="Non", "0", eversex) ) %>%mutate(sexvicsum = sexvic1 + sexvic2 + sexvic3 + sexvic4, SRPAavg = (SRPA1 + SRPA2)/2, SRPA95 =as_factor(if_else( SRPAavg >=quantile(SRPAavg, probs =0.95, na.rm =TRUE), 1, 0)),gender =as_factor(Gender_HF), worked =as_factor(if_else(Q17_work_during_studies %in%"Je ne réponds pas", NA, Q17_work_during_studies)),worked =if_else(worked =="Oui", "1", worked),worked =if_else(worked =="Non", "0", worked),agez =as.numeric(scale(age)), SRintelz =as.numeric(scale(Q81_intelligent)), SRpopularz =as.numeric(scale(Q82_popular)), SRmuscularz =as.numeric(scale(Q84_Strong_muscular)), SRPAavgz =as.numeric(scale(SRPAavg)), nlpz =as.numeric(scale(nlp)), sexvicsumf =factor(sexvicsum, ordered=TRUE, levels =c(0, 1, 2, 3, 4)), nlpf =factor(nlp, ordered=TRUE, levels =c(0,1,2,3,4,5,6,7,8,9)) ) # Note: stdzing cols using scale() generates a matrix col w/in df# This causes problems (e.g., w/avg_comparisons())# One solution is to wrap in as.numeric() to convert back to numeric vector# class(datv3$age)# https://stackoverflow.com/questions/70377294/why-does-mutateacross-with-scale-add-1-to-the-column-headerhead(datv3) %>%gt()
After filtering on student status, gender (M/F), and data quality checks for reliable survey completers, this data file contains survey responses from N=55,546 eligible male and female students who passed two attention checks (out of 86,426 initial rows from Qualtrics). Let’s take a look at remaining missing data patterns among responses from eligible participants on variables we will use in analyses.
Among eligible survey completers, missingness rates were negligible or low for most variables, and missingness on all but NLP did not appear to vary systematically across levels of our key DV or IV. Full information or casewise deletion approaches should be appropriate for most of these variables.
However, missingness on NLP (~9%, n=5,133) appears potentially problematic, as those not responding about number of sexual partners were more likely to have high SRPA scores and to report sexual victimization. In fact, 17.9% (1-0.821=0.179) of the 55,546 eligible participants not missing on NLP report at least one victimization experience compared to 41% of the 4,771 participants with missing NLP values who reported on sexual victimization.
Yet, most of the 5,133 participants that did not report on the number of life partners did report on the previous skip pattern variable indicating any past sexual activity; that variable had only n=91 (0.16%) missing observations.
So, I will remove rows with missing data on all study variables except NLP, then conduct complete case analysis when including NLP (dropping NA values). Given the potentially biasing missingness patterns on this variable, I will also conduct sensitivity analyses using the more crudely measured yet more completely reported dichotomous “sexual activity” variable instead.
Preregistration stated expectation that predicted correlations would be stronger for girls compared to boys. So, here are initial bivariate correlations grouped by gender.
The correlation between SRPA and victimization appears to be somewhat stronger for girls; however, girls also report more victimization experiences than boys, and there may be systematic gender differences in levels of confounding variables. So, we will need to examine contrasts and relative risk by gender in multivariable models later.
5.3 Mosaic Plots
Scatterplots are notoriously bad at generating useful visual descriptions of bivariate relationships between ordinal or categorical variables. So, I will use a mosaic plot instead. As Will Ball describes, a mosaic plot as “similar to a heatmap, except that the frequency of an observation… is proportional to the area of a tile.”
#Function to quickly generate custom mosaic plots with labels # see https://stackoverflow.com/questions/58034811/ggplot2-not-recognizing-a-column-in-a-function-call # for help with passing column names to ggplot in functionsmosyplot <-function(df, yvar, xvar, xlabel, ylabel, mytitle, mysubtitle){ggplot() +geom_mosaic(data=df, aes(x =product(!!as.name(yvar), !!as.name(xvar)), fill =!!as.name(yvar)),na.rm=TRUE) +labs(x = xlabel,y = ylabel,title=mytitle,subtitle = mysubtitle) +scale_fill_viridis_d(begin=.1, end=.9, direction=-1, option="magma") +theme_minimal() +theme(panel.grid.major =element_blank(),panel.grid.minor =element_blank(),axis.text.y =element_blank()) }# SRPAavgmosyx <-mosyplot(df=datv3, "sexvicsum", "SRPAavg", "SR Physical Attractiveness (Mean scale)","Proportion victimized", "Victimization by SRPAavg","Left to Right = (1) Lowest to (7) Highest SR Attractiveness")mosyx
Show code
# SRPA95mosyx95 <-mosyplot(df=datv3, "sexvicsum", "SRPA95", "SR Physical Attractiveness (95pctl)","Proportion victimized", "Victimization by SRPA95","Left to Right = (0) 0 to 94pctl / (1) >= 95pctl SR Attractiveness")mosyx95
Show code
mosym <-mosyplot(df=datv3, "sexvicsum", "nlp", "# of Lifetime Partners","Proportion victimized", "Victimization by # of Lifetime Partners","Left to Right = (0) None to (9) Nine or more partners")mosym
Show code
mosmx <-mosyplot(df=datv3, "nlp", "SRPAavg", "SR Physical Attractiveness (Mean scale)","Proportion w/# of Lifetime Partners", "# of Lifetime Partners by SRPAavg","Left to Right = (1) Lowest to (7) Highest SR Attractiveness")mosmx
Show code
mosmx95 <-mosyplot(df=datv3, "nlp", "SRPA95", "SR Physical Attractiveness (95 pctl)","Proportion w/# of Lifetime Partners", "# of Lifetime Partners by SRPA95","Left to Right = (0) 0 to 94pctl / (1) >= 95pctl SR Attractiveness")mosmx95
5.3.1 FIG2: Combined mosaic plots (full & by gender)
Let’s split the sample by gender and then combine these plots into one.
As expected, victimization risks increase with SR physical attractiveness and sexual activity, and sexual activity is higher for those self-rated as more physically attractive.
6 Modeling
Before estimating a model, it is a good idea to view distributions of our key variables to ensure we specify appropriate link functions.
SRPAavg has 13 categories and exhibits approximately symmetric decay about the mean. It might be appropriate to treat it as a metric continuous predictor, though there is little downside to treating it as an ordinal variable (given we have sufficient data, which we do, and ample processing power).
In contrast, nlp and sexvicsum exhibit highly skewed ordinal distributions. Ordinal modeling techniques are likely to be more appropriate for these variables.
6.1 Ordinal: So What?
Building a regression model that treats ordinal variables as if they are metric continuous variables can generate model predictions that exhibit poor fit to the data and, ultimately, can result in substantial inference errors (e.g., see Liddell & Kruschke 2018).
In the first couple models below, I present linear and ordinal Bayesian “Intercept only” models to illustrate the poor fit of linear modeling to ordinal data and the substantial improvements we gain in predictive fit to underlying data distribution by using ordinal (e.g., cumulative probit) modeing instead.
6.2 Intercept: Lin~1 model (poor fit)
I start by estimating a simple linear intercept model of sexvicsum with no predictors to illustrate the poor fit of a linear model to the ordinal outcome. I use Bayesian estimation with diffuse priors; Bayesian models will be beneficial for model building and visualization later.
Model Info:
function: stan_glm
family: gaussian [identity]
formula: sexvicsum ~ 1
algorithm: sampling
sample: 4000 (posterior sample size)
priors: see help('prior_summary')
observations: 52191
predictors: 1
Estimates:
mean sd 2.5% 97.5%
(Intercept) 0.35 0.00 0.34 0.36
sigma 0.79 0.00 0.78 0.79
Fit Diagnostics:
mean sd 2.5% 97.5%
mean_PPD 0.35 0.00 0.34 0.36
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.00 1.00 2892
sigma 0.00 1.00 2877
mean_PPD 0.00 1.00 3347
log-posterior 0.02 1.00 1741
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
Show code
prior_summary(mod1) #check priors
Priors for model 'mod1'
------
Intercept (after predictors centered)
~ normal(location = 0.3, scale = 2.5)
Auxiliary (sigma)
Specified prior:
~ exponential(rate = 1)
Adjusted prior:
~ exponential(rate = 1.3)
------
See help('prior_summary.stanreg') for more details
The linear (metric normal) model above recovers the mean of sexvicsum (=0.3), which is what it is designed to do. However, the posterior predictive check plots (e.g., density; histogram; box) show that the underlying normal distribution assumption results in model predictions that fail to recover the observed ordinal item values.
Now, let’s compare how a basic cumulative probit model without predictors fits the data.
6.3 Priors on ordinal response distribution
I will specify what is essentially a flat prior distribution on the response thresholds in the cumulative probit model by specifying values that equate to equal probabilities (=1/5 or 0.20) for each of the five response categories. Transforming these to cumulative normal distribution thresholds results in values of c(-.84, -.25, .25, .84).
I could specify somewhat more informative priors. For instance, I know that sexual victimization is a relatively rare event, so I could specify the probability of a “0” response as something like 0.6, then specify lower [=0.10 instead of 0.20] and equal probabilities for the remaining categories. One advantage (or drawback, depending on your perspective and aims) of specifying flat priors is that results will typically converge with those generated by traditional frequentist methods.
6.4 Intercept: Ord~1 (better fit)
Show code
#Ordinal cumulative probit model of sexual victimization index #estimation is slow - cache model fits#sexvicsum is a 5-category ordinal variable (0 to 4)# table(datv3$sexvicsumf)# develop formulabf.form = brms::bf(sexvicsumf ~1)# brms default prior is a wide student's t distget_prior(bf.form, data = datv3, family =cumulative('probit'))
# use this code to identify values that set priors to equal likelihood for all thresholds# tibble(rating = 0:4) %>% # mutate(proportion = 1/5) %>% # mutate(cumulative_proportion = cumsum(proportion)) %>% # mutate(right_hand_threshold = qnorm(cumulative_proportion))# if I wanted to specify more informative priors, I could use this code to identify values that set priors to 0.6 for "no victimization" and equal 0.1 probability for all nonzero thresholds# tibble(rating = 0:4) %>%# mutate(proportion = if_else(rating == 0,.6,.1)) %>%# mutate(cumulative_proportion = cumsum(proportion)) %>%# mutate(right_hand_threshold = qnorm(cumulative_proportion))# altpriors = c(# prior(normal(0.25, 1), class = Intercept, coef = 1),# prior(normal(0.52, 1), class = Intercept, coef = 2),# prior(normal(0.84, 1), class = Intercept, coef = 3),# prior(normal(1.28, 1), class = Intercept, coef = 4)# )priors =c(prior(normal(-0.84, 1), class = Intercept, coef =1),prior(normal(-0.25, 1), class = Intercept, coef =2),prior(normal(0.25, 1), class = Intercept, coef =3),prior(normal(0.84, 1), class = Intercept, coef =4))#sexvicsum - Thresholds only, Cumulative probit model:mod2 <-brm(data = datv3,family =cumulative("probit"), sexvicsumf ~1,# prior(normal(0, 4), class = Intercept), #see kruschke/kurtz re: priors for this modelprior = priors,iter =2000, warmup =1000, chains =4, cores = nCoresphys, backend ="cmdstanr",# threads = threading(2), #consider within-chain threading if machine has at least 8 physical coresseed =8675309,file ="Models/mod2_fit",file_refit ="on_change" )
Family: cumulative
Links: mu = probit; disc = identity
Formula: sexvicsumf ~ 1
Data: datv3 (Number of observations: 52191)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1] 0.84 0.01 0.83 0.85 1.00 4089 3227
Intercept[2] 1.23 0.01 1.22 1.25 1.00 4921 3451
Intercept[3] 1.89 0.01 1.87 1.92 1.00 5022 3277
Intercept[4] 2.32 0.02 2.29 2.35 1.00 4696 3062
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc 1.00 0.00 1.00 1.00 NA NA NA
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Show code
prior_summary(mod2) #check priors
prior class coef group resp dpar nlpar lb ub source
student_t(3, 0, 2.5) Intercept default
normal(-0.84, 1) Intercept 1 user
normal(-0.25, 1) Intercept 2 user
normal(0.25, 1) Intercept 3 user
normal(0.84, 1) Intercept 4 user
As expected, posterior predictions generated by the cumulative probit model are a much better fit to the ordinal response data.
Now, let’s add SRPA as a metric (linear) predictor. Standardizing SRPA first provides interpretive advantages, as the beta coefficient in the cumulative probit model without predictors represents the (unadjusted, unconditional) predicted average marginal effect on latent victimization of increasing from an average SRPA score (mean=0) and one standard deviation units above average SRPA score (+1SD = 1).
Ultimately, I will illustrate what I think are preferable ways of interpreting and visualizing results of ordinal models using “maximum” (monotonic) marginal contrasts. Also, since the key predictors are ordinal items (nlp) or a composite of ordinal items (SRPA), I will follow this model with more complex versions that estimate monotonic effects using cumulative probit links for the key predictors (SRPA & nlp) as well as for the ordinal response variable. For now, let’s examine results of a bivariate cumulative probit model of victimization with a standardized metric SRPA (SRPAavgz) predictor.
6.5 Bivariate: Ord~Lin (sexvicsumf by SRPAavgz)
Show code
#Ordinal cumulative probit model of sexual victimization index # set priors: # equal likelihood for all outcome thresholds# weakly informative normal(0,1) for default betapriors2 =c(prior(normal(-0.84, 1), class = Intercept, coef =1),prior(normal(-0.25, 1), class = Intercept, coef =2),prior(normal(0.25, 1), class = Intercept, coef =3),prior(normal(0.84, 1), class = Intercept, coef =4), prior(normal(0, 1), class = b))#sexvicsum - bivariate sexvicsum by SRPAavg:mod3 <-brm(data = datv3,family =cumulative("probit"), sexvicsumf ~1+ SRPAavgz,# prior(normal(0, 4), class = Intercept), #see kruschke/kurtz re: priors for this modelprior = priors2,iter =2000, warmup =1000, chains =4, cores = nCoresphys, backend ="cmdstanr",# threads = threading(2), #consider within-chain threading if machine has at least 8 physical coresseed =8675309, file ="Models/mod3_fit",file_refit ="on_change" )
Family: cumulative
Links: mu = probit; disc = identity
Formula: sexvicsumf ~ 1 + SRPAavgz
Data: datv3 (Number of observations: 52191)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1] 0.85 0.01 0.84 0.86 1.00 5065 3316
Intercept[2] 1.25 0.01 1.23 1.26 1.00 4541 3160
Intercept[3] 1.92 0.01 1.90 1.94 1.00 5513 3212
Intercept[4] 2.35 0.02 2.32 2.38 1.00 5549 3067
SRPAavgz 0.17 0.01 0.16 0.18 1.00 4874 2751
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc 1.00 0.00 1.00 1.00 NA NA NA
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Show code
prior_summary(mod3) #check priors
prior class coef group resp dpar nlpar lb ub
normal(0, 1) b
normal(0, 1) b SRPAavgz
student_t(3, 0, 2.5) Intercept
normal(-0.84, 1) Intercept 1
normal(-0.25, 1) Intercept 2
normal(0.25, 1) Intercept 3
normal(0.84, 1) Intercept 4
source
user
(vectorized)
default
user
user
user
user
6.5.1 Visualize probs & contrasts by response category (mod3)
If the ordinal sexual victimization index (sexvicsum) can be plausibly assumed to represent a coarse measure of an underlying continuous latent sexual victimization variable (e.g., Bürkner & Vuorre, 2019; see Kurz), then the beta for SRPAavg from a cumulative probit response model represents the estimated average marginal effects of a 1-SD increase (from the mean to +1SD) in SRPAavg on the latent continuous victimization scale.
Moreover, with the latent response expressed on a standard normal scale in a cumulative probit model, the beta can be interpreted as a latent Cohen’s D. Hence, the beta coefficients themselves in these models are useful for visualizing effect estimates. (Of course, causal interpretations require additional assumptions as well…)
Yet, with ordinal data, it can be helpful to visualize predicted probabilities and contrasts of these predicted probabilities separately for each ordinal response category. Let’s do that next.
Show code
#avg contrast on latent scale#ANOVA# latent scale - below generates same contrast estimate as model beta# see https://solomonkurz.netlify.app/blog/2023-05-21-causal-inference-with-ordinal-regression/# update the data grid# nd_id <- datv3 %>% # mutate(id = row_number()) %>% # dplyr::select(id, nlpz, SRintelz, SRpopularz, agez, worked, gender) %>% # expand_grid(SRPAavgz = 0:1) %>% # mutate(row = 1:n())# compute the posterior predictions# fitted(mod3,# newdata = nd_id,# scale = "linear",# summary = F) %>% # # convert the results to a data frame# data.frame() %>% # # rename the columns# set_names(pull(nd_id, row)) %>% # # add a numeric index for the MCMC draws# mutate(draw = 1:n()) %>% # # convert to the long format# pivot_longer(-draw) %>% # # convert the row column from the character to the numeric format# mutate(row = as.double(name)) %>% # # join the nd_id predictor grid to the output# left_join(nd_id, by = "row") %>% # # drop two of the unnecessary columns# dplyr::select(-name, -row) %>% # # convert to a wider format so we can compute the contrast# pivot_wider(names_from = SRPAavgz, values_from = value) %>% # # compute the AME contrast# mutate(tau = `1` - `0`) %>% # # compute the average AME value within each MCMC draw# group_by(draw) %>% # summarise(ame = mean(tau)) %>% # # remove the draw index column# dplyr::select(ame) %>% # # now summarize the AME across the MCMC draws# summarise(m = mean(ame),# s = sd(ame),# ll = quantile(ame, probs = .025),# ul = quantile(ame, probs = .975))# #contrasts by response category# define the predictor gridnd <-tibble(SRPAavgz =0:1)# compute and savefit3_conditional_probabilities_by_group <- nd %>%add_epred_draws(mod3) %>%ungroup()# show the first 6 rows# fit3_conditional_probabilities_by_group %>% # head()#plot# conditional probabilities (alt method - data type not working)# mod3pred <- predictions(mod3, newdata = nd) %>% # data.frame() %>% # mutate(SRPAavgz = ifelse(SRPAavgz == 0, "italic(p[j])^0", "italic(p[j])^1")) #use fitted + epred workflowmod3pred <- fit3_conditional_probabilities_by_group %>%# now summarize the category-level probabilities, by SRPAavggroup_by(SRPAavgz, .category) %>%summarise(p =mean(.epred),sd =sd(.epred),ll =quantile(.epred, probs = .025),ul =quantile(.epred, probs = .975)) %>%data.frame() %>%mutate(SRPAavgz =ifelse(SRPAavgz ==0, "italic(p[j])^0", "italic(p[j])^1")) #function to find & drop leading zeroes (used for x-axis label)dropLeadingZero <-function(l){str_replace(l, '0(?=.)', '')}#full range of y responses (0:4)mod3p1 <- mod3pred %>%ggplot(aes(x = .category, y = p, ymin = ll, ymax = ul, color = SRPAavgz)) +geom_hline(yintercept =0:9/10, color ="lightgrey") +geom_hline(yintercept =0, color ="black") +geom_linerange(linewidth =3/4, position =position_dodge(width =1/3)) +geom_point(aes(shape = SRPAavgz),position =position_dodge(width =1/3),size =2) +scale_color_viridis_d(NULL, option ="magma", begin = .3, end = .6,labels = scales::parse_format()) +scale_shape_manual(NULL, values =c(20, 18),labels = scales::parse_format()) +scale_y_continuous(limits =c(0, .9), breaks=seq(0, .9, by = .2), labels = dropLeadingZero) +labs(subtitle ="Pred. response probabilities\n(all)") +theme(legend.background =element_blank(),legend.position =c(0.9, 0.85), legend.key.height=unit(.5, 'cm'),legend.key.width=unit(.5, 'cm'), legend.title =element_text(size=10), legend.text=element_text(size=10), axis.title.y =element_blank(), axis.title.x =element_blank())#nonzero victimization values only (1:4)mod3p2 <- mod3pred %>%# dplyr::filter(.category != 0) %>% ggplot(aes(x = .category, y = p, ymin = ll, ymax = ul, color = SRPAavgz)) +geom_hline(yintercept =0:5/50, color ="lightgrey") +geom_hline(yintercept =0, color ="black") +geom_linerange(linewidth =3/4, position =position_dodge(width =1/3)) +geom_point(aes(shape = SRPAavgz),position =position_dodge(width =1/3),size =2) +scale_color_viridis_d(NULL, option ="magma", begin = .3, end = .6,labels = scales::parse_format()) +scale_shape_manual(NULL, values =c(20, 18),labels = scales::parse_format()) +scale_y_continuous(limits =c(0, .11), breaks=seq(0, .1, by = .02), labels = dropLeadingZero) +labs(subtitle ="(y > 0)",x =expression(Victimiz.~response~option~(italic(j)))) +theme(legend.background =element_blank(),legend.position =c(0.9, 0.85), legend.key.height=unit(.5, 'cm'),legend.key.width=unit(.5, 'cm'), legend.title =element_text(size=10), legend.text=element_text(size=10), axis.title.y =element_blank() )# contrasts (alt method not working - data type error)# p2 <- comparisons(mod3, newdata = nd, by = "group") # contrastsmod3con <- fit3_conditional_probabilities_by_group %>% dplyr::select(SRPAavgz, .draw, .category, .epred) %>%pivot_wider(names_from = SRPAavgz, values_from = .epred) %>%mutate(`1 - 0`=`1`-`0`) %>%# now summarize the category-level probabilities, by SRPAavggroup_by(.category) %>%summarise(p =mean(`1 - 0`),sd =sd(`1 - 0`),ll =quantile(`1 - 0`, probs = .025),ul =quantile(`1 - 0`, probs = .975)) %>%data.frame() mod3p3 <- mod3con %>%ggplot(aes(x = .category, y = p, ymin = ll, ymax = ul)) +geom_hline(yintercept =-2:2/40, color ="lightgrey") +geom_hline(yintercept =0, color ="black") +geom_pointrange(linewidth =3/4, size =1/3) +scale_y_continuous(expression(italic(p[j])^1-italic(p[j])^0), limits =c(-0.06, 0.06), breaks=seq(-.05, .05, by = .025), labels = dropLeadingZero) +labs(subtitle =expression(Delta[italic(j)]^{AME}~(contrasts)),x =expression(Victimiz.~response~option~(italic(j))))# combine and entitle((mod3p3 | mod3p1 / mod3p2)) +plot_annotation(title ="Cumulative probit y ~ SRPAavgz")
Given that sexual victimization is a rare event in these data (p|sexvicsum > 0 ~ .18), a difference of 0.04 in the overall probability of victimization (i.e., j=0 or “no victimization” response contrast) between those with mean SRPAavg scores (SRPAavgz=0) and those with +1SD scores (SRPAavgz=1) might represent a relatively substantial effect.
Yet, a +1SD-unit increase in SRPA and its average marginal effect on ordinal victimization responses remain a bit vague and difficult to interpret. So, I will modify this modeling approach by: (1) estimating and interpreting “maximum” predictive contrast effects instead, and (2) generating maximum contrasts with a cumulative probit link for the key predictor(s) to estimate the monotonic marginal effects of moving “up” each of the 13 ordinal SRPA categories on the (conditional) probabilities of each response category.
Why not just treat SRPA (or nlp) as metric variables? Well, doing so assumes both monotonic and equal effects on the outcome across all category increases on the predictor variable. Yet, what if higher attractiveness has larger effects across some categories than others? Treating ordinal outcomes and predictors as metric can result in substantial inference errors, such as magnitude and, in some cases, even directional sign errors (see McElreath’s Statistical Rethinking [2020, 2nd Ed.]; Liddell & Kruschke 2018.
One preferable solution is to retain the monotonicity assumption but relax the “equal” effects across categories assumption by specifying a cumulative probit link for the ordinal predictor. The brms package offers this option by using its mo() (meaning monotonic) function for ordinal predictors.
Let’s re-estimate the bivariate model above with SRPA specified as an ordinal predictor (using an ordered factor version - SRPAf). Then, rather than estimating average marginal contrasts of potential outcomes for those with “mean” and “+1SD” SRPA scores, I will estimate “maximum” marginal contrasts (see McElreath 2020; Kurz) as differences in predicted probabilities for each victimization response category across “minimum” (SRPAavg=1) and “maximum” (SRPAavg=7) SRPA scores.
6.6 Priors on parameters for ordinal predictors
Since I am adding predictors, I need to add prior distributions on their parameters. A weakly informative prior distribution (Normal[0,1]) for covariate beta coefficients will be highly diffuse in this context to allow for a wide range of likelihood-driven estimates. The priors for SRPA’s (and later NLP’s) monotonic beta coefficient are equivalent to this covariate prior distribution; the hyperparamenter (sd=1) was simply divided by twelve (or nine for NLP) estimated ordinal thresholds or cutpoints for SRPA (1/12 = 0.083; for NLP, 1/9 = 0.11).
A Dirichlet prior distribution is specified for SRPA’s (and later NLP’s) cumulative monotonic simplex parameter (delta), which is a multivariate extension of the beta distribution and, likewise, is a distribution for probabilities. A Dirichlet distribution is parameterized using a vector of alpha values that regulate the range of potential probability values for each cutpoint. McElreath notes that an alpha=2 prior distribution is a very weak prior that allows each cutpoint probability to vary substantially from one another; specifying alpha=2 for all 12 cutpoints results in a uniform prior distribution across the cutpoints.
As usual, with help from McElreath (2020, p.393) and Kurz’s bookdown translation & code, we can visualize the specified Dirichlet prior on mo(SRPA) simplex parameter in plots below.
set.seed(8675309)delta <-rdirichlet(20, alpha =rep(2, 12)) # str(delta)#n=20 example draws from Dirichlet distribution with alpha=rep(2,12)# one draw bolded to illustrate variation in a single vector (see McElreath p.393) delta %>%data.frame() %>%set_names(1:12) %>%mutate(row =1:n()) %>%pivot_longer(-row, names_to ="index") %>%mutate(index =as.numeric(index)) %>%ggplot(aes(x = index, y = value, group = row,alpha = row ==3, color = row ==3)) +geom_line() +geom_point() +scale_alpha_manual(values =c(1/3, 1)) +scale_color_manual(values =canva_pal("Green fields")(4)[1:2]) +scale_x_continuous(breaks =1:12) +ylab("probability") +theme(legend.position ="none")
Overall, applied to our large sample size, the likelihood should dominate these diffuse priors, resulting in data-driven estimates that are consistent with those that would be generated in a comparable frequentist analysis. Now, let’s move onto the model.
6.7 M1 (Bivariate): Ord~Ord (sexvicsumf by moSRPAf)
Show code
#Ordinal cumulative probit model of sexual victimization index #with ordinal (monotonic cumulative probit) SRPA predictordatv3tmp <- datv3 %>%mutate(SRPAf = SRPA1 + SRPA2, SRPAf =factor(SRPAf, ordered = T) )# levels(datv3tmp$SRPAf)# set priors: # equal likelihood for all outcome thresholds# weakly informative normal(0,1) for default beta#13 SRPA response categories = 12 thresholds # so, beta prior (0,1) equiv to (0,[1/12]) or (0,.083) priors3 =c(prior(normal(-0.84, 1), class = Intercept, coef =1),prior(normal(-0.25, 1), class = Intercept, coef =2),prior(normal(0.25, 1), class = Intercept, coef =3),prior(normal(0.84, 1), class = Intercept, coef =4), prior(normal(0, 1), class = b), prior(normal(0, 0.083), class = b, coef = moSRPAf),prior(dirichlet(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), class = simo, coef = moSRPAf1))#sexvicsum - bivariate sexvicsum by mo(SRPAf):mod3mo <-brm(data = datv3tmp,family =cumulative("probit"), sexvicsumf ~1+mo(SRPAf),# prior(normal(0, 4), class = Intercept), #see kruschke/kurtz re: priors for this modelprior = priors3,iter =2000, warmup =1000, chains =4, cores = nCoresphys, backend ="cmdstanr",# threads = threading(2), #consider within-chain threading if machine has at least 8 physical coresseed =8675309, file ="Models/mod3mo_fit",file_refit ="on_change" )
Family: cumulative
Links: mu = probit; disc = identity
Formula: sexvicsumf ~ 1 + mo(SRPAf)
Data: datv3tmp (Number of observations: 52191)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1] 1.08 0.02 1.05 1.13 1.00 4225 3048
Intercept[2] 1.48 0.02 1.44 1.52 1.00 4368 2616
Intercept[3] 2.16 0.02 2.11 2.20 1.00 4788 2799
Intercept[4] 2.59 0.03 2.54 2.64 1.00 4920 2739
moSRPAf 0.05 0.00 0.05 0.06 1.00 4799 3218
Monotonic Simplex Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
moSRPAf1[1] 0.05 0.03 0.01 0.11 1.00 4229 2501
moSRPAf1[2] 0.03 0.02 0.00 0.08 1.00 5951 2804
moSRPAf1[3] 0.05 0.03 0.01 0.11 1.00 4592 2125
moSRPAf1[4] 0.04 0.02 0.01 0.09 1.00 4847 2401
moSRPAf1[5] 0.07 0.03 0.02 0.13 1.00 4716 2316
moSRPAf1[6] 0.03 0.02 0.00 0.07 1.00 4744 2302
moSRPAf1[7] 0.29 0.04 0.22 0.37 1.00 6008 2884
moSRPAf1[8] 0.17 0.04 0.10 0.24 1.00 5966 2607
moSRPAf1[9] 0.09 0.04 0.03 0.17 1.00 4312 2286
moSRPAf1[10] 0.09 0.04 0.02 0.17 1.00 4326 2394
moSRPAf1[11] 0.04 0.03 0.01 0.11 1.00 5244 2476
moSRPAf1[12] 0.04 0.02 0.01 0.09 1.00 5297 2584
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc 1.00 0.00 1.00 1.00 NA NA NA
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Show code
prior_summary(mod3mo) #check priors
prior class coef group resp
normal(0, 1) b
normal(0, 0.083) b moSRPAf
student_t(3, 0, 2.5) Intercept
normal(-0.84, 1) Intercept 1
normal(-0.25, 1) Intercept 2
normal(0.25, 1) Intercept 3
normal(0.84, 1) Intercept 4
dirichlet(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2) simo moSRPAf1
dpar nlpar lb ub source
user
user
default
user
user
user
user
user
Again, posterior predictive checks confirm that a cumulative probit response model fits the ordinal outcome data quite well.
A primary reason to specify a cumulative probit link to model monotonic effects of the ordinal independent variable (SRPA) is that the equidistant category assumption underlying traditional metric modeling approaches render estimates that are insensitive to (i.e., average over) potentially important differences in effects across crude or discrete ordinal categories. One practical consequence may be downwardly biased coefficients or misspecification of nonlinear effects. In this case, compare the “conditional effects” plots (see tabs above) from this model with those generated by the previous (“Ord-Lin”) model. The conditional effects plot from this fully ordinal bivariate model reveals that increased victimization risks emerge primarily at the upper range of SRPA scores (e.g., >8 on the ordinal scale ranging from 2 to 14). Interestingly, examination of the conditional effects plot also reveals that increased victimization risks emerge primarily at the upper range of SRPA scores (e.g., >8 on the ordinal scale ranging from 2 to 14). This is lower than the 95th-percentile threshold suggested by Savolainen and colleagues (i.e., the 95th percentile is a score of 12 on the 2 to 14 scale), though the pattern is consistent with their expectation that increased victimization risks should be observed at the highest ranges of physical attractiveness scales.
The beta coefficient in this model represents the average monotonic effect of a one-category increase in a summed SRPA scale (ranging from “2” to “14”) on latent victimization. We could multiply the beta by the number of SRPA thresholds (x12) to estimate the “maximum” effect on the latent victimization scale.
Instead, to improve interpretation and better match our estimates to the ordinal outcome variable, I will estimate predicted probabilities and contrasts between the minimum and maximum SRPA categories for each of the victimization response categories.
6.7.1 FIG3: Visualize max moSRPA effect by response category (mod3mo)
Show code
#contrasts by response category# define the predictor gridndmo <-tibble(SRPAf =c("2","14"))# compute and savefit3mo_conditional_probabilities_by_group <- ndmo %>%add_epred_draws(mod3mo) %>%ungroup() %>%mutate(SRPAf =if_else(SRPAf=="2", "min", "max"))# show the first 6 rows# fit3_conditional_probabilities_by_group %>% # head()#use fitted + epred workflowmod3mopred <- fit3mo_conditional_probabilities_by_group %>%group_by(SRPAf, .category) %>%summarise(p =mean(.epred),sd =sd(.epred),ll =quantile(.epred, probs = .025),ul =quantile(.epred, probs = .975)) %>%data.frame() %>%mutate(SRPAf =ifelse(SRPAf =="min", "italic(p[j])^min", "italic(p[j])^max")) #function to find & drop leading zeroes (used for x-axis label)dropLeadingZero <-function(l){str_replace(l, '0(?=.)', '')}#full range of y responses (0:4)mod3mop1 <- mod3mopred %>%ggplot(aes(x = .category, y = p, ymin = ll, ymax = ul, color = SRPAf)) +geom_hline(yintercept =0:9/10, color ="lightgrey") +geom_linerange(linewidth =3/4, position =position_dodge(width =1/3)) +geom_point(aes(shape = SRPAf),position =position_dodge(width =1/3),size =2) +scale_color_viridis_d(NULL, option ="magma", begin = .3, end = .6,labels = scales::parse_format()) +scale_shape_manual(NULL, values =c(20, 18),labels = scales::parse_format()) +scale_y_continuous(limits =c(0, .9), breaks=seq(0, .9, by = .2), labels = dropLeadingZero) +labs(subtitle ="Pred. response probabilities\n(all)") +theme(legend.background =element_blank(),legend.position =c(0.9, 0.85), legend.key.height=unit(.5, 'cm'),legend.key.width=unit(.5, 'cm'), legend.title =element_text(size=14), legend.text=element_text(size=14), axis.title.y =element_blank(), axis.title.x =element_blank())#nonzero victimization values only (1:4)mod3mop2 <- mod3mopred %>%# dplyr::filter(.category != 0) %>% ggplot(aes(x = .category, y = p, ymin = ll, ymax = ul, color = SRPAf)) +geom_hline(yintercept =0:7/50, color ="lightgrey") +geom_linerange(linewidth =3/4, position =position_dodge(width =1/3)) +geom_point(aes(shape = SRPAf),position =position_dodge(width =1/3),size =2) +scale_color_viridis_d(NULL, option ="magma", begin = .3, end = .6,labels = scales::parse_format()) +scale_shape_manual(NULL, values =c(20, 18),labels = scales::parse_format()) +scale_y_continuous(limits =c(0, .15), breaks=seq(0, .14, by = .02), labels = dropLeadingZero) +labs(subtitle ="(y > 0)",x =expression(Victimiz.~response~option~(italic(j)))) +theme(legend.background =element_blank(),legend.position =c(0.9, 0.85), legend.key.height=unit(.5, 'cm'),legend.key.width=unit(.5, 'cm'), legend.title =element_text(size=14), legend.text=element_text(size=14), axis.title.y =element_blank() )# contrasts (alt method not working - data type error)# p2 <- comparisons(mod3, newdata = nd, by = "group") # contrastsmod3mocon <- fit3mo_conditional_probabilities_by_group %>% dplyr::select(SRPAf, .draw, .category, .epred) %>%pivot_wider(names_from = SRPAf, values_from = .epred) %>%mutate(`max - min`=`max`-`min`) %>%# now summarize the category-level probabilities, by SRPAgroup_by(.category) %>%summarise(p =mean(`max - min`),sd =sd(`max - min`),ll =quantile(`max - min`, probs = .025),ul =quantile(`max - min`, probs = .975)) %>%data.frame() mod3mop3 <- mod3mocon %>%ggplot(aes(x = .category, y = p, ymin = ll, ymax = ul)) +geom_hline(yintercept =-4:4/20, color ="lightgrey") +geom_hline(yintercept =0, color ="black") +geom_pointrange(linewidth =3/4, size =1/3) +scale_y_continuous(expression(italic(p[j])^max-italic(p[j])^min), limits =c(-0.21, 0.2), breaks=seq(-.2, .2, by = .05), labels = dropLeadingZero) +labs(subtitle =expression(Delta[italic(j)]^{AME}~(max~contrasts)),x =expression(Victimiz.~response~option~(italic(j)))) +theme(axis.title.y =element_text(size=14) )# combine and entitlemod3plot <- ((mod3mop3 | mod3mop1 / mod3mop2)) +plot_annotation(title ="Cumulative probit y ~ mo(SRPAf)", subtitle ="Maximum monotonic AME of SRPA on Victimization") mod3plot
These plots show estimates of the maximum monotonic AME of SRPA on sexual victimization as predicted probability potential outcomes and probability contrasts for each ordinal victimization response category across minimum and maximum SRPA scores. Examining the “zero” response category contrast reveals that the probability of any victimization is predicted to be approximately 18-percentage points higher (diff = -0.178 [95% CrI: [-0.198, -0.160]) among those with the highest compared to the lowest SRPA scores.
This is a descriptive bivariate estimate unadjusted for potential measured confounders. Using the same modeling and estimation approach, let’s next add covariates (as metric standardized variables or unordered factors) to adjust for some potential sources of confounding.
6.8 M2 (Controls): Ord~Ord (sexvicsumf by moSRPAf)
Show code
#Ordinal cumulative probit model of sexual victimization index #with ordinal (monotonic cumulative probit) SRPA predictor & linear controls#sexvicsum - bivariate sexvicsum by mo(SRPAf):mod4mo <-brm(data = datv3tmp,family =cumulative("probit"), sexvicsumf ~1+mo(SRPAf) + SRintelz + SRpopularz + agez + worked + gender,prior = priors3,iter =2000, warmup =1000, chains =4, cores = nCoresphys, backend ="cmdstanr",# threads = threading(2), #consider within-chain threading if machine has at least 8 physical coresseed =8675309, file ="Models/mod4mo_fit",file_refit ="on_change" )
prior class coef group resp
normal(0, 1) b
normal(0, 1) b agez
normal(0, 1) b gender2
normal(0, 0.083) b moSRPAf
normal(0, 1) b SRintelz
normal(0, 1) b SRpopularz
normal(0, 1) b worked1
student_t(3, 0, 2.5) Intercept
normal(-0.84, 1) Intercept 1
normal(-0.25, 1) Intercept 2
normal(0.25, 1) Intercept 3
normal(0.84, 1) Intercept 4
dirichlet(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2) simo moSRPAf1
dpar nlpar lb ub source
user
(vectorized)
(vectorized)
user
(vectorized)
(vectorized)
(vectorized)
default
user
user
user
user
user
6.8.1 Visualize max moSRPA effect by response category (mod4mo)
Show code
#contrasts by response category# update the data grid# ndmo_id <- datv3tmp %>% # mutate(id = 1:n()) %>% # dplyr::select(id, SRintelz, SRpopularz, agez, worked, gender) %>% # expand_grid(SRPAf = c("2","14")) %>% # mutate(row = 1:n())# conditional probabilities (marginaleffects workflow - fixed matrix issue)# mod4mopred <- avg_predictions(mod4mo, newdata = ndmo_id, # by = c("SRPAf", "group")) %>% # data.frame() %>%# mutate(SRPAf = ifelse(SRPAf == "2", "E[italic(j)](hat(italic(p))[italic(ij)]^min)", "E[italic(j)](hat(italic(p))[italic(ij)]^max)")) #method above for aap are too computationally intensive# see here: https://marginaleffects.com/vignettes/performance.html# One option is to limit variables averaged over. E.g., # predictions(# mod4mo,# type = "response",# by = "SRPAf",# datagrid(SRPAf = c("2","14")), # variables = c("SRintelz", "SRpopularz", "agez"))# For now, I am opting to estimate adjusted predictions at mean (apm) insteadmod4mopred <-predictions(mod4mo, newdata =datagrid(SRPAf =c("2","14")), by =c("SRPAf","group")) %>%data.frame() %>%mutate(SRPAf =ifelse(SRPAf =="2", "E[italic(j)](hat(italic(p))[italic(ij)]^min)", "E[italic(j)](hat(italic(p))[italic(ij)]^max)"))#function to find & drop leading zeroes (used for x-axis label)dropLeadingZero <-function(l){str_replace(l, '0(?=.)', '')}#full range of y responses (0:4)mod4mop1 <- mod4mopred %>%ggplot(aes(x = group, y = estimate, ymin = conf.low, ymax = conf.high, color = SRPAf)) +geom_hline(yintercept =0:9/10, color ="lightgrey") +geom_linerange(linewidth =3/4, position =position_dodge(width =1/3)) +geom_point(aes(shape = SRPAf),position =position_dodge(width =1/3),size =2) +scale_color_viridis_d(NULL, option ="magma", begin = .3, end = .6,labels = scales::parse_format()) +scale_shape_manual(NULL, values =c(20, 18),labels = scales::parse_format()) +scale_y_continuous(limits =c(0, .9), breaks=seq(0, .9, by = .2), labels = dropLeadingZero) +labs(subtitle ="Pred. response probabilities\n(all)") +theme(legend.background =element_blank(),legend.position =c(0.9, 0.85), legend.key.height=unit(.5, 'cm'),legend.key.width=unit(.5, 'cm'), legend.title =element_text(size=10), legend.text=element_text(size=10), axis.title.y =element_blank(), axis.title.x =element_blank())#nonzero victimization values only (1:4)mod4mop2 <- mod4mopred %>%# dplyr::filter(group != 0) %>% ggplot(aes(x = group, y = estimate, ymin = conf.low, ymax = conf.high, color = SRPAf)) +geom_hline(yintercept =0:7/50, color ="lightgrey") +geom_linerange(linewidth =3/4, position =position_dodge(width =1/3)) +geom_point(aes(shape = SRPAf),position =position_dodge(width =1/3),size =2) +scale_color_viridis_d(NULL, option ="magma", begin = .3, end = .6,labels = scales::parse_format()) +scale_shape_manual(NULL, values =c(20, 18),labels = scales::parse_format()) +scale_y_continuous(limits =c(0, .15), breaks=seq(0, .14, by = .02), labels = dropLeadingZero) +labs(subtitle ="(y > 0)",x =expression(Victimiz.~response~option~(italic(j)))) +theme(legend.background =element_blank(),legend.position =c(0.9, 0.85), legend.key.height=unit(.5, 'cm'),legend.key.width=unit(.5, 'cm'), legend.title =element_text(size=10), legend.text=element_text(size=10), axis.title.y =element_blank() )# contrasts (marginaleffects workflow - fixed matrix issue)# also too computationally intensive# mod4mocon <- avg_comparisons(mod4mo, variables = "SRPAf") %>%# data.frame() # estimate contrasts at meansmod4mocon <-comparisons(mod4mo, variables =list(SRPAf =c("2","14")), newdata ="mean", by ="group") %>%data.frame() mod4mop3 <- mod4mocon %>%ggplot(aes(x = group, y = estimate, ymin = conf.low, ymax = conf.high)) +geom_hline(yintercept =-4:4/20, color ="lightgrey") +geom_hline(yintercept =0, color ="black") +geom_pointrange(linewidth =3/4, size =1/3) +scale_y_continuous(expression(italic(p[j])^max-italic(p[j])^min), limits =c(-0.2, 0.2), breaks=seq(-.2, .2, by = .05), labels = dropLeadingZero) +labs(subtitle =expression(Delta[italic(j)]^{MEM}~(max~contrasts)),x =expression(Victimiz.~response~option~(italic(j))))# combine and entitle((mod4mop3 | mod4mop1 / mod4mop2)) +plot_annotation(title ="Cumulative probit y ~ mo(SRPAf) + Controls", subtitle ="Maximum monotonic MEM of SRPA on Victimization")
After adjusting for these covariates, the SRPAf coefficient shrinks slightly (from 0.05 to 0.04).
Also, as expected, the beta coefficient for gender shows girls report substantially more sexual victimization experiences than boys. Likewise, it will be important to interact SRPAf with gender later to explore possible heterogeneity in predicted probabilities and contrasts.
Unfortunately, estimating average predictions and contracts marginalized across all covariates from these ordinal models with a large (~50k) dataset is too computationally intensive; doing so bogged down my computer and resulted in multiple crashes. For now, I’ve settled on the much more computationally efficient yet somewhat less interpretable marginal effects at means (MEM) instead to generate predictions and contrasts by response category in models with covariates. Like the beta coefficient, the estimated maximum contrast is slightly smaller (-0.154 versus -0.178 from Bivariate model). However, as the predicted probabilities are estimated at the means and not averaged over all covariate values, these estimates are not directly comparable.
For now, let’s adjust for “nlp” as an ordinal variable with a cumulative monotonic effect on victimization and compare the MEM contrast.
Before we do, now might be a good time to formalize the “full” (Controls + NLP) Model 3 in equation form. For more information about these models and specifications, see previous links to Kurz’s blog and bookdowns; Bürkner’s papers, & McElreath’s Statistical Rethinking.
6.9 M3 (NLP): Ord~Ord (sexvicsumf by moSRPAf)
Show code
#Ordinal cumulative probit model of sexual victimization index #with ordinal (monotonic cumulative probit) SRPA predictor, linear controls, & mo(nlp)#10 nlp response categories = 9 thresholds # so, beta prior (0,1) equiv to (0,[1/9]) or (0,.11) priors4 =c(prior(normal(-0.84, 1), class = Intercept, coef =1),prior(normal(-0.25, 1), class = Intercept, coef =2),prior(normal(0.25, 1), class = Intercept, coef =3),prior(normal(0.84, 1), class = Intercept, coef =4), prior(normal(0, 1), class = b), prior(normal(0, 0.083), class = b, coef = moSRPAf),prior(normal(0, 0.11), class = b, coef = monlpf),prior(dirichlet(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), class = simo, coef = moSRPAf1),prior(dirichlet(2, 2, 2, 2, 2, 2, 2, 2, 2), class = simo, coef = monlpf1))#sexvicsum - bivariate sexvicsum by mo(SRPAf):mod5mo <-brm(data = datv3tmp,family =cumulative("probit"), sexvicsumf ~1+mo(SRPAf) +mo(nlpf) + SRintelz + SRpopularz + agez + worked + gender,prior = priors4,iter =2000, warmup =1000, chains =4, cores = nCoresphys, backend ="cmdstanr",# threads = threading(2), #consider within-chain threading if machine has at least 8 physical coresseed =8675309, file ="Models/mod5mo_fit",file_refit ="on_change" )
prior class coef group resp
normal(0, 1) b
normal(0, 1) b agez
normal(0, 1) b gender2
normal(0, 0.11) b monlpf
normal(0, 0.083) b moSRPAf
normal(0, 1) b SRintelz
normal(0, 1) b SRpopularz
normal(0, 1) b worked1
student_t(3, 0, 2.5) Intercept
normal(-0.84, 1) Intercept 1
normal(-0.25, 1) Intercept 2
normal(0.25, 1) Intercept 3
normal(0.84, 1) Intercept 4
dirichlet(2, 2, 2, 2, 2, 2, 2, 2, 2) simo monlpf1
dirichlet(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2) simo moSRPAf1
dpar nlpar lb ub source
user
(vectorized)
(vectorized)
user
user
(vectorized)
(vectorized)
(vectorized)
default
user
user
user
user
user
user
6.9.1 Visualize max moSRPA effect by response category (mod5mo)
Show code
#contrasts by response category# update the data grid# ndmo_id <- datv3tmp %>% # mutate(id = 1:n()) %>% # dplyr::select(id, SRintelz, SRpopularz, agez, worked, gender) %>% # expand_grid(SRPAf = c("2","14")) %>% # mutate(row = 1:n())# conditional probabilities (marginaleffects workflow - fixed matrix issue)# mod5mopred <- avg_predictions(mod5mo, newdata = ndmo_id, # by = c("SRPAf", "group")) %>% # data.frame() %>%# mutate(SRPAf = ifelse(SRPAf == "2", "E[italic(j)](hat(italic(p))[italic(ij)]^min)", "E[italic(j)](hat(italic(p))[italic(ij)]^max)")) #method above for aap are too computationally intensive# see here: https://marginaleffects.com/vignettes/performance.html# One option is to limit variables averaged over. E.g., # predictions(# mod5mo,# type = "response",# by = "SRPAf",# datagrid(SRPAf = c("2","14")), # variables = c("SRintelz", "SRpopularz", "agez"))# For now, I am opting to estimate adjusted predictions at mean (apm) insteadmod5mopred <-predictions(mod5mo, newdata =datagrid(SRPAf =c("2","14")), by =c("SRPAf","group")) %>%data.frame() %>%mutate(SRPAf =ifelse(SRPAf =="2", "E[italic(j)](hat(italic(p))[italic(ij)]^min)", "E[italic(j)](hat(italic(p))[italic(ij)]^max)"))#function to find & drop leading zeroes (used for x-axis label)dropLeadingZero <-function(l){str_replace(l, '0(?=.)', '')}#full range of y responses (0:4)mod5mop1 <- mod5mopred %>%ggplot(aes(x = group, y = estimate, ymin = conf.low, ymax = conf.high, color = SRPAf)) +geom_hline(yintercept =0:9/10, color ="lightgrey") +geom_linerange(linewidth =3/4, position =position_dodge(width =1/3)) +geom_point(aes(shape = SRPAf),position =position_dodge(width =1/3),size =2) +scale_color_viridis_d(NULL, option ="magma", begin = .3, end = .6,labels = scales::parse_format()) +scale_shape_manual(NULL, values =c(20, 18),labels = scales::parse_format()) +scale_y_continuous(limits =c(0, .9), breaks=seq(0, .9, by = .2), labels = dropLeadingZero) +labs(subtitle ="Pred. response probabilities\n(all)") +theme(legend.background =element_blank(),legend.position =c(0.9, 0.85), legend.key.height=unit(.5, 'cm'),legend.key.width=unit(.5, 'cm'), legend.title =element_text(size=10), legend.text=element_text(size=10), axis.title.y =element_blank(), axis.title.x =element_blank())#nonzero victimization values only (1:4)mod5mop2 <- mod5mopred %>%# dplyr::filter(group != 0) %>% ggplot(aes(x = group, y = estimate, ymin = conf.low, ymax = conf.high, color = SRPAf)) +geom_hline(yintercept =0:7/50, color ="lightgrey") +geom_linerange(linewidth =3/4, position =position_dodge(width =1/3)) +geom_point(aes(shape = SRPAf),position =position_dodge(width =1/3),size =2) +scale_color_viridis_d(NULL, option ="magma", begin = .3, end = .6,labels = scales::parse_format()) +scale_shape_manual(NULL, values =c(20, 18),labels = scales::parse_format()) +scale_y_continuous(limits =c(0, .15), breaks=seq(0, .14, by = .02), labels = dropLeadingZero) +labs(subtitle ="(y > 0)",x =expression(Victimiz.~response~option~(italic(j)))) +theme(legend.background =element_blank(),legend.position =c(0.9, 0.85), legend.key.height=unit(.5, 'cm'),legend.key.width=unit(.5, 'cm'), legend.title =element_text(size=10), legend.text=element_text(size=10), axis.title.y =element_blank() )# contrasts (marginaleffects workflow - fixed matrix issue)# also too computationally intensive# mod5mocon <- avg_comparisons(mod5mo, variables = "SRPAf") %>%# data.frame() # estimate contrasts at meansmod5mocon <-comparisons(mod5mo, variables =list(SRPAf =c("2","14")), newdata ="mean", by ="group") %>%data.frame() mod5mop3 <- mod5mocon %>%ggplot(aes(x = group, y = estimate, ymin = conf.low, ymax = conf.high)) +geom_hline(yintercept =-4:4/20, color ="lightgrey") +geom_hline(yintercept =0, color ="black") +geom_pointrange(linewidth =3/4, size =1/3) +scale_y_continuous(expression(italic(p[j])^max-italic(p[j])^min), limits =c(-0.2, 0.2), breaks=seq(-.2, .2, by = .05), labels = dropLeadingZero) +labs(subtitle =expression(Delta[italic(j)]^{MEM}~(max~contrasts)),x =expression(Victimiz.~response~option~(italic(j))))# combine and entitle((mod5mop3 | mod5mop1 / mod5mop2)) +plot_annotation(title ="Cumulative probit y ~ mo(SRPAf) + Controls + NLP", subtitle ="Maximum monotonic MEM of SRPA on Victimization")
After adjusting for nlp, the SRPAf coefficient shrinks substantially (0.04 in Controls model to 0.02).
Once again, estimating average predictions and contracts marginalized across all covariates from these ordinal models with a large (~50k) dataset is too computationally intensive (attempting it crashed my computer multiple times). For now, I’ve continued to rely on the much more computationally efficient marginal effects at means (MEM) estimation approach instead to generate predictions and contrasts by response category. The estimated maximum MEM contrast is substantially smaller (-0.046 versus -0.136 from Controls model).
Again, the predicted probabilities are not averaged over all covariate values, so the estimates are not directly comparable (i.e., they are conditional average marginal effects, not average marginal effects). Nonetheless, they do illustrate that stratification on nlp accounts for a substantial portion of the observed association between SRPA and sexual victimization. Recall, the degree to which this overlap represents plausible confounding or mediation processes is unknown.
Later, I estimate indirect effects of SRPA on victimization via nlp to illustrate the inferences we might make under the assumption that a mediation process generated these data.
For now, let’s combine some of these estimates into plots that show how they change across models.
7 Combining estimates across models
7.1 Comparing beta estimates of SRPA effect on latent victimization
Let’s start by plotting the beta estimates from models without covariates, with covariates, and with nlp (a confounder and/or mediator). Recall, these betas represent effect estimates on a latent continuous response scale.
#estimated maximum monotonic effect of min-to-max increase in SRPA on latent victimization scale
Before plotting contrasts, I will make sure all the maximum and minimum predicted probabilities, maximum contrast (pmax - pmin), and maximum adjusted risk ratios (pmax / pmin) estimates from Models 1 to 3 are easily accessible for reporting in the paper.
This combined plot makes it easier to see that covariate adjustment in the “Controls” model has little impact on the maximum effect contrast. However, as expected given the strong ordinal bivariate association between SRPA and NLP (see mosaic plots above), adjusting for NLP results in substantial reductions in SRPA’s maximum effect contrasts. Of course, the question is whether this reduction is due to proper stratification on a strong confounder or improper adjustment on a mediator. Again, we cannot ascertain which is the case in these data. However, it is worth noting that one reason for conceptualizing sexual activity as a confounder was that it might cause heightened self-assessments and, thus, higher responses on SRPA items. Yet, results from the “Controls” model reveals little change in SRPA’s estimated maximum effect after adjusting for other measures of heightened self-assessments (e.g., self-reported intelligence and popularity). Given this, it remains plausible that the NLP’s impact on SRPA’s effect estimates at least partly represents mediation processes. Thus, operating on the strong assumption that the data generating process reflects underlying causal mediation processes, exploratory mediation models below further assess “direct” and potential “indirect” effects of SRPA on victimization.
Let’s wrap up by compiling some key coefficients into a table.
In mediation models below, I specified yval=0 to estimating differences relative risks of “not” experiencing victimization (versus reporting 1,2,3 or 4 of the measured victimization experiences). The victimzation (y) model is specified as ordinal (estimated with polr package). However, the models are simplified relative to those estimated above as sexual activity (nlp) and physical attractiveness (SRPAavgz) are treated as metric standardized variables. I specified astar at the maximum SRPA score and a at the minimum SRPA score, which will essentially generate counterfactual contrasts that represent differences in the relative risks of any victimization. Values greater than one will indicate greater risks of victimization among respondents with maximum SRPA scores compared to minimum scores - or, more specifically, indicate greater risks of reporting “no victimization” [y=0] among those with minimum SRPA scores relative to those with maximum scores.
An important point here: Since non-victimization (y=0) is much more prevalent than “any” victimization (y>0), CMAverse contrast estimates on the relative risk scale for non-victimization will be much smaller than those describing victimization risks. For instance, consider a difference in predicted probabilities of non-victimization of 0.8 for Group 1 versus 0.7 in Group 2. The “risks” of not being victimized are greater in Group 1, with a prevalence (risk) ratio of 1.14 (0.8/0.7). In comparison, the risks of being victimized instead are 0.2 for Group 1 and 0.3 for Group 3; the risks are greater for Group 2, with a prevalence (risk) ratio of 1.5 (0.3/0.2). Note that both estimates are describing the same absolute contrast, but the relative risk estimates vary due to differences in baseline prevalance rates of non-victimization versus victimization.
Given potential confusion caused by this property of relative risk estimates, I will follow these ordinal y models with a logistic model predicting “any” victimization (collapsing y>0 sexual victimization responses into “1” response category) to generate victimization risk estimates and report these in the supplement section of the paper. I will also replicate the final logistic model with the “eversex” dichotomous mediator variable to assess whether results are sensitive to substantial nonrandom missingness (MNAR) on the mediator.
Finally, I will focus on results from the model specifying an EM interaction (EMint), though I present estimates from ordinal y models without and with a specified exposure-mediator (SRPA:NLP) interaction for supplementary comparison. In cases where the EMint is negligible, results will converge across both model types. However, where a nonnegligible EMint exists, mediation models should account for it.
8.1 CMAverse (ordinal Y, metric X & M) w/out exposure-mediator interaction
excess rel. risk due to pure natural indirect effect
NA
0.08
Show code
#save estimates and intervals dataest2dat <-as.data.frame(est2[9:12]) %>%rownames_to_column("CMA_est")#Transformation function used to display two decimal places on x axisscaleFUN <-function(dec) sprintf("%.2f", dec)# drop proportion estimatesest2dat %>%filter(!CMA_est %in%c("pm", "pe", "ERpnie(prop)", "ERintref(prop)","ERintmed(prop)","ERcde(prop)", "int", "ERpnie", "ERintmed", "ERintref", "ERcde")) %>%mutate(CMA_est =factor(CMA_est, levels=c("Rte", "Rtnie", "Rpnie", "Rtnde", "Rpnde", "Rcde"))) %>%# reorder the coefficients so that the largest is at the top of the plot# mutate(term = fct_reorder(CMA_est, effect.pe)) %>%ggplot(aes(effect.pe, CMA_est)) +geom_point() +geom_errorbarh(aes(xmin = effect.ci.low, xmax = effect.ci.high), height = .1) +# add in a dotted line at zerogeom_vline(xintercept =1, lty =2) +scale_x_continuous(labels=scaleFUN) +labs(x ="Estimated effects on risk ratio scale",y =NULL,title ="Effect estimates from causal mediation decomposition (risk ratios)" ) +theme_minimal()
Show code
#save estimates and intervals dataest2dat <-as.data.frame(est2[9:12]) %>%rownames_to_column("CMA_est")# drop proportion estimatesest2dat %>%filter(!CMA_est %in%c("pm", "pe", "ERpnie(prop)", "ERintref(prop)","ERintmed(prop)","ERcde(prop)", "int", "Rte", "Rtnie", "Rpnie", "Rtnde", "Rpnde", "Rcde")) %>%mutate(CMA_est =factor(CMA_est, levels=c("ERpnie", "ERintmed", "ERintref", "ERcde"))) %>%# reorder the coefficients so that the largest is at the top of the plot# mutate(term = fct_reorder(CMA_est, effect.pe)) %>%ggplot(aes(effect.pe, CMA_est)) +geom_point() +geom_errorbarh(aes(xmin = effect.ci.low, xmax = effect.ci.high), height = .1) +# add in a dotted line at zerogeom_vline(xintercept =0, lty =2) +labs(x ="Estimated effects on excess risks scale",y =NULL,title ="Effect estimates from causal mediation decomposition (excess risks)" ) +theme_minimal()
Show code
#save estimates and intervals dataest1dat <-as.data.frame(est1[9:12]) %>%rownames_to_column("CMA_est")#Transformation function used to display two decimal places on x axisscaleFUN <-function(dec) sprintf("%.2f", dec)# drop proportion estimatesest1dat %>%filter(!CMA_est %in%c("pm")) %>%mutate(CMA_est =factor(CMA_est, levels=c("Rte", "Rtnie", "Rpnie", "Rtnde", "Rpnde", "Rcde"))) %>%# reorder the coefficients so that the largest is at the top of the plot# mutate(term = fct_reorder(CMA_est, effect.pe)) %>%ggplot(aes(effect.pe, CMA_est)) +geom_point() +geom_errorbarh(aes(xmin = effect.ci.low, xmax = effect.ci.high), height = .1) +# add in a dotted line at zerogeom_vline(xintercept =1, lty =2) +scale_x_continuous(labels=scaleFUN) +labs(x ="Estimated effects on risk ratio scale",y =NULL,title ="Effect estimates from causal mediation decomposition (risk ratios)" ) +theme_minimal()
Show code
summary(est2)
Causal Mediation Analysis
# Outcome regression:
Call:
MASS::polr(formula = sexvicsumf ~ SRPAavgz + nlpz + SRPAavgz *
nlpz + SRintelz + SRpopularz + agez + worked + gender, data = getCall(x$reg.output$yreg)$data,
weights = getCall(x$reg.output$yreg)$weights)
Coefficients:
Value Std. Error t value
SRPAavgz 0.1016 0.0151 6.743
nlpz 0.5150 0.0121 42.726
SRintelz 0.0235 0.0136 1.736
SRpopularz 0.0659 0.0146 4.503
agez -0.0071 0.0142 -0.501
worked1 0.2423 0.0270 8.987
gender2 1.3328 0.0337 39.508
SRPAavgz:nlpz -0.0504 0.0115 -4.385
Intercepts:
Value Std. Error t value
0|1 2.668 0.032 82.565
1|2 3.484 0.035 100.791
2|3 5.118 0.045 113.788
3|4 6.228 0.063 98.800
Residual Deviance: 58058.96
AIC: 58082.96
# Mediator regressions:
Call:
glm(formula = nlpz ~ SRPAavgz + SRintelz + SRpopularz + agez +
worked + gender, family = gaussian(), data = getCall(x$reg.output$mreg[[1L]])$data,
weights = getCall(x$reg.output$mreg[[1L]])$weights)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.06602 0.00786 -8.40 < 0.0000000000000002 ***
SRPAavgz 0.23160 0.00489 47.32 < 0.0000000000000002 ***
SRintelz -0.04326 0.00443 -9.77 < 0.0000000000000002 ***
SRpopularz 0.05807 0.00496 11.72 < 0.0000000000000002 ***
agez 0.27763 0.00497 55.82 < 0.0000000000000002 ***
worked1 0.20981 0.00963 21.78 < 0.0000000000000002 ***
gender2 0.06040 0.00906 6.67 0.000000000027 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 0.84)
Null deviance: 47116 on 47524 degrees of freedom
Residual deviance: 40096 on 47518 degrees of freedom
AIC: 126808
Number of Fisher Scoring iterations: 2
# Effect decomposition on the risk ratio scale via the regression-based approach
Direct counterfactual imputation estimation with
bootstrap standard errors, percentile confidence intervals and p-values
Estimate Std.error 95% CIL 95% CIU P.val
Rcde 1.075733 0.017457 1.044443 1.106 <0.0000000000000002 ***
Rpnde 1.057036 0.020473 1.022165 1.100 <0.0000000000000002 ***
Rtnde 1.070284 0.015466 1.042387 1.097 <0.0000000000000002 ***
Rpnie 1.082160 0.005176 1.072984 1.090 <0.0000000000000002 ***
Rtnie 1.095723 0.006380 1.083309 1.106 <0.0000000000000002 ***
Rte 1.158218 0.019117 1.124665 1.192 <0.0000000000000002 ***
ERcde 0.079926 0.018498 0.046880 0.113 <0.0000000000000002 ***
ERintref -0.022890 0.008081 -0.034897 -0.010 <0.0000000000000002 ***
ERintmed 0.019023 0.009399 0.002053 0.033 <0.0000000000000002 ***
ERpnie 0.082160 0.005176 0.072984 0.090 <0.0000000000000002 ***
ERcde(prop) 0.505162 0.061713 0.374300 0.586 <0.0000000000000002 ***
ERintref(prop) -0.144673 0.056123 -0.236395 -0.055 <0.0000000000000002 ***
ERintmed(prop) 0.120231 0.064746 0.010838 0.228 <0.0000000000000002 ***
ERpnie(prop) 0.519280 0.055900 0.446870 0.634 <0.0000000000000002 ***
pm 0.639511 0.089812 0.479590 0.832 <0.0000000000000002 ***
int -0.024442 0.010118 -0.043886 -0.006 <0.0000000000000002 ***
pe 0.494838 0.061713 0.413955 0.626 <0.0000000000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Rcde: controlled direct effect risk ratio; Rpnde: pure natural direct effect risk ratio; Rtnde: total natural direct effect risk ratio; Rpnie: pure natural indirect effect risk ratio; Rtnie: total natural indirect effect risk ratio; Rte: total effect risk ratio; ERcde: excess relative risk due to controlled direct effect; ERintref: excess relative risk due to reference interaction; ERintmed: excess relative risk due to mediated interaction; ERpnie: excess relative risk due to pure natural indirect effect; ERcde(prop): proportion ERcde; ERintref(prop): proportion ERintref; ERintmed(prop): proportion ERintmed; ERpnie(prop): proportion ERpnie; pm: overall proportion mediated; int: overall proportion attributable to interaction; pe: overall proportion eliminated)
Relevant variable values:
$a
[1] -2
$astar
[1] 2.3
$yval
$yval[[1]]
[1] 0
$mval
$mval[[1]]
[1] 0
Show code
summary(est1)
Causal Mediation Analysis
# Outcome regression:
Call:
MASS::polr(formula = sexvicsumf ~ SRPAavgz + nlpz + SRintelz +
SRpopularz + agez + worked + gender, data = getCall(x$reg.output$yreg)$data,
weights = getCall(x$reg.output$yreg)$weights)
Coefficients:
Value Std. Error t value
SRPAavgz 0.09081 0.0149 6.114
nlpz 0.50206 0.0117 42.798
SRintelz 0.02244 0.0135 1.657
SRpopularz 0.06689 0.0147 4.565
agez -0.00316 0.0141 -0.224
worked1 0.24352 0.0270 9.033
gender2 1.33319 0.0337 39.517
Intercepts:
Value Std. Error t value
0|1 2.678 0.032 83.040
1|2 3.494 0.035 101.262
2|3 5.129 0.045 114.154
3|4 6.240 0.063 99.037
Residual Deviance: 58078.21
AIC: 58100.21
# Mediator regressions:
Call:
glm(formula = nlpz ~ SRPAavgz + SRintelz + SRpopularz + agez +
worked + gender, family = gaussian(), data = getCall(x$reg.output$mreg[[1L]])$data,
weights = getCall(x$reg.output$mreg[[1L]])$weights)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.06602 0.00786 -8.40 < 0.0000000000000002 ***
SRPAavgz 0.23160 0.00489 47.32 < 0.0000000000000002 ***
SRintelz -0.04326 0.00443 -9.77 < 0.0000000000000002 ***
SRpopularz 0.05807 0.00496 11.72 < 0.0000000000000002 ***
agez 0.27763 0.00497 55.82 < 0.0000000000000002 ***
worked1 0.20981 0.00963 21.78 < 0.0000000000000002 ***
gender2 0.06040 0.00906 6.67 0.000000000027 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 0.84)
Null deviance: 47116 on 47524 degrees of freedom
Residual deviance: 40096 on 47518 degrees of freedom
AIC: 126808
Number of Fisher Scoring iterations: 2
# Effect decomposition on the risk ratio scale via the regression-based approach
Direct counterfactual imputation estimation with
bootstrap standard errors, percentile confidence intervals and p-values
Estimate Std.error 95% CIL 95% CIU P.val
Rcde 1.066767 0.009451 1.047852 1.082 <0.0000000000000002 ***
Rpnde 1.082590 0.011641 1.059508 1.101 <0.0000000000000002 ***
Rtnde 1.057304 0.008105 1.041055 1.071 <0.0000000000000002 ***
Rpnie 1.103033 0.003812 1.094798 1.109 <0.0000000000000002 ***
Rtnie 1.077270 0.003547 1.072617 1.085 <0.0000000000000002 ***
Rte 1.166241 0.010863 1.143345 1.181 <0.0000000000000002 ***
pm 0.503194 0.041604 0.441280 0.597 <0.0000000000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Rcde: controlled direct effect risk ratio; Rpnde: pure natural direct effect risk ratio; Rtnde: total natural direct effect risk ratio; Rpnie: pure natural indirect effect risk ratio; Rtnie: total natural indirect effect risk ratio; Rte: total effect risk ratio; pm: overall proportion mediated)
Relevant variable values:
$a
[1] -2
$astar
[1] 2.3
$yval
$yval[[1]]
[1] 0
$mval
$mval[[1]]
[1] 0
The “No EMint” model estimates above do not account for a possible interaction between SRPA and NLP; like a classic” mediation model (and like our models above), it assumes these variables do not interact in their effects (apart from a pure mediation process). With this assumption, the model predicts sizable independent direct and indirect effects, implying that higher SRPA scores directly increase victimization risks as well as indirectly increase risks by increasing NLP. (Or, if a confounder, in indirect portion might be interpreted as the portion of the SRPA/victimization relationship that is due to confounding.)
However, when estimating indirect (or confounding) effects, it is important to account for the possibility of exposure-mediator (EMint) interactions (or, in this case, possibly an exposure-confounder interaction). So, I will focus on results from the EMint model, though supplemental results above also report estimates from a model without an EMint specified.
The estimates above predict relative risk contrasts for “no” versus “any” victimization (yval=0). Notice that these estimates are smaller than might be expected compared to similar contrasts from Bayesian models above. They are also a bit more confusing because they represent contrasts in relative risks of non-victimization rather than risks of victimization. Below, I modify the EMint model to predict a binary victimization outcome item using logistic regression and flip the astar and a values; this will provide more interpretable estimates of the relative risks of “any” victimization for high versus low SRPA scores. Then, I will replicate this new model after substituting NLP with an alternative dichotomous “eversex” mediator that has substantially less missing data.
excess rel. risk due to pure natural indirect effect
0.75
0.37
pm
proportion mediated
0.54
0.55
pe
proportion eliminated
0.61
0.83
Show code
#save estimates and intervals dataest3dat <-as.data.frame(est3[9:12]) %>%rownames_to_column("CMA_est")#Transformation function used to display two decimal places on x axisscaleFUN <-function(dec) sprintf("%.2f", dec)# drop proportion estimatesest3dat %>%filter(!CMA_est %in%c("pm", "pe", "ERpnie(prop)", "ERintref(prop)","ERintmed(prop)","ERcde(prop)", "int", "ERpnie", "ERintmed", "ERintref", "ERcde")) %>%mutate(CMA_est =factor(CMA_est, levels=c("Rte", "Rtnie", "Rpnie", "Rtnde", "Rpnde", "Rcde"))) %>%# reorder the coefficients so that the largest is at the top of the plot# mutate(term = fct_reorder(CMA_est, effect.pe)) %>%ggplot(aes(effect.pe, CMA_est)) +geom_point() +geom_errorbarh(aes(xmin = effect.ci.low, xmax = effect.ci.high), height = .1) +# add in a dotted line at zerogeom_vline(xintercept =1, lty =2) +scale_x_continuous(labels=scaleFUN) +labs(x ="Estimated effects on risk ratio scale",y =NULL,title ="Effect estimates from causal mediation decomposition (risk ratios)" ) +theme_minimal()
Show code
#save estimates and intervals dataest3dat <-as.data.frame(est3[9:12]) %>%rownames_to_column("CMA_est")# drop proportion estimatesest3dat %>%filter(!CMA_est %in%c("pm", "pe", "ERpnie(prop)", "ERintref(prop)","ERintmed(prop)","ERcde(prop)", "int", "Rte", "Rtnie", "Rpnie", "Rtnde", "Rpnde", "Rcde")) %>%mutate(CMA_est =factor(CMA_est, levels=c("ERpnie", "ERintmed", "ERintref", "ERcde"))) %>%# reorder the coefficients so that the largest is at the top of the plot# mutate(term = fct_reorder(CMA_est, effect.pe)) %>%ggplot(aes(effect.pe, CMA_est)) +geom_point() +geom_errorbarh(aes(xmin = effect.ci.low, xmax = effect.ci.high), height = .1) +# add in a dotted line at zerogeom_vline(xintercept =0, lty =2) +labs(x ="Estimated effects on excess risks scale",y =NULL,title ="Effect estimates from causal mediation decomposition (excess risks)" ) +theme_minimal()
Show code
#save estimates and intervals dataest4dat <-as.data.frame(est4[9:12]) %>%rownames_to_column("CMA_est")#Transformation function used to display two decimal places on x axisscaleFUN <-function(dec) sprintf("%.2f", dec)# drop proportion estimatesest1dat %>%filter(!CMA_est %in%c("pm")) %>%mutate(CMA_est =factor(CMA_est, levels=c("Rte", "Rtnie", "Rpnie", "Rtnde", "Rpnde", "Rcde"))) %>%# reorder the coefficients so that the largest is at the top of the plot# mutate(term = fct_reorder(CMA_est, effect.pe)) %>%ggplot(aes(effect.pe, CMA_est)) +geom_point() +geom_errorbarh(aes(xmin = effect.ci.low, xmax = effect.ci.high), height = .1) +# add in a dotted line at zerogeom_vline(xintercept =1, lty =2) +scale_x_continuous(labels=scaleFUN) +labs(x ="Estimated effects on risk ratio scale",y =NULL,title ="Effect estimates from causal mediation decomposition (risk ratios)" ) +theme_minimal()
Show code
#save estimates and intervals dataest4dat <-as.data.frame(est4[9:12]) %>%rownames_to_column("CMA_est")# drop proportion estimatesest4dat %>%filter(!CMA_est %in%c("pm", "pe", "ERpnie(prop)", "ERintref(prop)","ERintmed(prop)","ERcde(prop)", "int", "Rte", "Rtnie", "Rpnie", "Rtnde", "Rpnde", "Rcde")) %>%mutate(CMA_est =factor(CMA_est, levels=c("ERpnie", "ERintmed", "ERintref", "ERcde"))) %>%# reorder the coefficients so that the largest is at the top of the plot# mutate(term = fct_reorder(CMA_est, effect.pe)) %>%ggplot(aes(effect.pe, CMA_est)) +geom_point() +geom_errorbarh(aes(xmin = effect.ci.low, xmax = effect.ci.high), height = .1) +# add in a dotted line at zerogeom_vline(xintercept =0, lty =2) +labs(x ="Estimated effects on excess risks scale",y =NULL,title ="Effect estimates from causal mediation decomposition (excess risks)" ) +theme_minimal()
Show code
summary(est3)
Causal Mediation Analysis
# Outcome regression:
Call:
glm(formula = anyvic ~ SRPAavgz + nlpz + SRPAavgz * nlpz + SRintelz +
SRpopularz + agez + worked + gender, family = binomial(),
data = getCall(x$reg.output$yreg)$data, weights = getCall(x$reg.output$yreg)$weights)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.65136 0.03238 -81.89 < 0.0000000000000002 ***
SRPAavgz 0.10067 0.01521 6.62 0.000000000036 ***
nlpz 0.50369 0.01240 40.62 < 0.0000000000000002 ***
SRintelz 0.02627 0.01374 1.91 0.056 .
SRpopularz 0.06389 0.01482 4.31 0.000016217068 ***
agez -0.00521 0.01434 -0.36 0.716
worked1 0.23785 0.02740 8.68 < 0.0000000000000002 ***
gender2 1.31125 0.03384 38.74 < 0.0000000000000002 ***
SRPAavgz:nlpz -0.05682 0.01188 -4.78 0.000001724855 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 44777 on 47524 degrees of freedom
Residual deviance: 40438 on 47516 degrees of freedom
AIC: 40456
Number of Fisher Scoring iterations: 5
# Mediator regressions:
Call:
glm(formula = nlpz ~ SRPAavgz + SRintelz + SRpopularz + agez +
worked + gender, family = gaussian(), data = getCall(x$reg.output$mreg[[1L]])$data,
weights = getCall(x$reg.output$mreg[[1L]])$weights)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.06602 0.00786 -8.40 < 0.0000000000000002 ***
SRPAavgz 0.23160 0.00489 47.32 < 0.0000000000000002 ***
SRintelz -0.04326 0.00443 -9.77 < 0.0000000000000002 ***
SRpopularz 0.05807 0.00496 11.72 < 0.0000000000000002 ***
agez 0.27763 0.00497 55.82 < 0.0000000000000002 ***
worked1 0.20981 0.00963 21.78 < 0.0000000000000002 ***
gender2 0.06040 0.00906 6.67 0.000000000027 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 0.84)
Null deviance: 47116 on 47524 degrees of freedom
Residual deviance: 40096 on 47518 degrees of freedom
AIC: 126808
Number of Fisher Scoring iterations: 2
# Effect decomposition on the odds ratio scale via the regression-based approach
Direct counterfactual imputation estimation with
bootstrap standard errors, percentile confidence intervals and p-values
Estimate Std.error 95% CIL 95% CIU P.val
Rcde 1.51872 0.13059 1.38354 1.786 <0.0000000000000002 ***
Rpnde 1.55553 0.13105 1.42428 1.838 <0.0000000000000002 ***
Rtnde 1.25219 0.10891 1.13751 1.510 <0.0000000000000002 ***
Rpnie 1.75472 0.03917 1.68811 1.815 <0.0000000000000002 ***
Rtnie 1.41254 0.03720 1.37396 1.490 <0.0000000000000002 ***
Rte 2.19725 0.18045 2.00909 2.595 <0.0000000000000002 ***
ERcde 0.46844 0.11195 0.35009 0.698 <0.0000000000000002 ***
ERintref 0.08709 0.02242 0.06286 0.140 <0.0000000000000002 ***
ERintmed -0.11300 0.09252 -0.20965 0.104 0.3
ERpnie 0.75472 0.03917 0.68812 0.815 <0.0000000000000002 ***
ERcde(prop) 0.39127 0.03529 0.34601 0.452 <0.0000000000000002 ***
ERintref(prop) 0.07274 0.01233 0.05435 0.091 <0.0000000000000002 ***
ERintmed(prop) -0.09438 0.07661 -0.19926 0.065 0.3
ERpnie(prop) 0.63038 0.09022 0.44518 0.761 <0.0000000000000002 ***
pm 0.53600 0.04418 0.45742 0.594 <0.0000000000000002 ***
int -0.02164 0.07041 -0.12034 0.127 0.8
pe 0.60873 0.03529 0.54805 0.654 <0.0000000000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Rcde: controlled direct effect odds ratio; Rpnde: pure natural direct effect odds ratio; Rtnde: total natural direct effect odds ratio; Rpnie: pure natural indirect effect odds ratio; Rtnie: total natural indirect effect odds ratio; Rte: total effect odds ratio; ERcde: excess relative risk due to controlled direct effect; ERintref: excess relative risk due to reference interaction; ERintmed: excess relative risk due to mediated interaction; ERpnie: excess relative risk due to pure natural indirect effect; ERcde(prop): proportion ERcde; ERintref(prop): proportion ERintref; ERintmed(prop): proportion ERintmed; ERpnie(prop): proportion ERpnie; pm: overall proportion mediated; int: overall proportion attributable to interaction; pe: overall proportion eliminated)
Relevant variable values:
$a
[1] 2.3
$astar
[1] -2
$yval
$yval[[1]]
[1] 1
$mval
$mval[[1]]
[1] 0
Show code
summary(est4)
Causal Mediation Analysis
# Outcome regression:
Call:
glm(formula = anyvic ~ SRPAavgz + eversex + SRPAavgz * eversex +
SRintelz + SRpopularz + agez + worked + gender, family = binomial(),
data = getCall(x$reg.output$yreg)$data, weights = getCall(x$reg.output$yreg)$weights)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.1826 0.0387 -82.16 < 0.0000000000000002 ***
SRPAavgz 0.0930 0.0288 3.23 0.0012 **
eversex 1.0267 0.0325 31.61 < 0.0000000000000002 ***
SRintelz 0.0159 0.0125 1.27 0.2024
SRpopularz 0.0986 0.0134 7.35 0.0000000000002 ***
agez 0.0487 0.0109 4.45 0.0000084175657 ***
worked1 0.2889 0.0245 11.81 < 0.0000000000000002 ***
gender2 1.1971 0.0293 40.87 < 0.0000000000000002 ***
SRPAavgz:eversex 0.0654 0.0310 2.11 0.0349 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 52298 on 52126 degrees of freedom
Residual deviance: 47978 on 52118 degrees of freedom
AIC: 47996
Number of Fisher Scoring iterations: 5
# Mediator regressions:
Call:
glm(formula = eversex ~ SRPAavgz + SRintelz + SRpopularz + agez +
worked + gender, family = binomial(), data = getCall(x$reg.output$mreg[[1L]])$data,
weights = getCall(x$reg.output$mreg[[1L]])$weights)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.8150 0.0193 42.12 < 0.0000000000000002 ***
SRPAavgz 0.5743 0.0122 46.92 < 0.0000000000000002 ***
SRintelz -0.1015 0.0107 -9.45 < 0.0000000000000002 ***
SRpopularz 0.0863 0.0121 7.12 0.0000000000011 ***
agez 0.8307 0.0185 44.85 < 0.0000000000000002 ***
worked1 0.5483 0.0252 21.77 < 0.0000000000000002 ***
gender2 0.1522 0.0220 6.91 0.0000000000049 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 63556 on 52126 degrees of freedom
Residual deviance: 55833 on 52120 degrees of freedom
AIC: 55847
Number of Fisher Scoring iterations: 5
# Effect decomposition on the odds ratio scale via the regression-based approach
Direct counterfactual imputation estimation with
bootstrap standard errors, percentile confidence intervals and p-values
Estimate Std.error 95% CIL 95% CIU P.val
Rcde 1.47991 0.21325 1.12966 1.856 <0.0000000000000002 ***
Rpnde 1.71192 0.09894 1.53998 1.873 <0.0000000000000002 ***
Rtnde 1.87909 0.09659 1.71700 2.030 <0.0000000000000002 ***
Rpnie 1.36991 0.03289 1.30560 1.423 <0.0000000000000002 ***
Rtnie 1.50369 0.04893 1.44058 1.593 <0.0000000000000002 ***
Rte 2.57419 0.12750 2.33918 2.796 <0.0000000000000002 ***
ERcde 0.27061 0.10812 0.08141 0.444 <0.0000000000000002 ***
ERintref 0.44130 0.06020 0.32637 0.541 <0.0000000000000002 ***
ERintmed 0.49237 0.08219 0.33757 0.634 <0.0000000000000002 ***
ERpnie 0.36991 0.03289 0.30560 0.423 <0.0000000000000002 ***
ERcde(prop) 0.17191 0.06435 0.05274 0.263 <0.0000000000000002 ***
ERintref(prop) 0.28034 0.03194 0.22886 0.337 <0.0000000000000002 ***
ERintmed(prop) 0.31277 0.04678 0.23669 0.397 <0.0000000000000002 ***
ERpnie(prop) 0.23499 0.02643 0.19697 0.284 <0.0000000000000002 ***
pm 0.54776 0.03821 0.50133 0.624 <0.0000000000000002 ***
int 0.59311 0.07767 0.46555 0.729 <0.0000000000000002 ***
pe 0.82809 0.06435 0.73673 0.947 <0.0000000000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Rcde: controlled direct effect odds ratio; Rpnde: pure natural direct effect odds ratio; Rtnde: total natural direct effect odds ratio; Rpnie: pure natural indirect effect odds ratio; Rtnie: total natural indirect effect odds ratio; Rte: total effect odds ratio; ERcde: excess relative risk due to controlled direct effect; ERintref: excess relative risk due to reference interaction; ERintmed: excess relative risk due to mediated interaction; ERpnie: excess relative risk due to pure natural indirect effect; ERcde(prop): proportion ERcde; ERintref(prop): proportion ERintref; ERintmed(prop): proportion ERintmed; ERpnie(prop): proportion ERpnie; pm: overall proportion mediated; int: overall proportion attributable to interaction; pe: overall proportion eliminated)
Relevant variable values:
$a
[1] 2.3
$astar
[1] -2
$yval
$yval[[1]]
[1] 1
$mval
$mval[[1]]
[1] 0
Assuming sexual activity (measured as number of lifetime sexual partners, or NLP) mediates a causal effect of physical attractiveness (SRPA) on sexual victimization, these results provide estimates of the relative magnitudes of direct and indirect effects on the victimization risk ratio scale. As with earlier models, these estimates were generated as “maximum” risk ratio contrasts comparing relative risks of victimization for those with highest and lowest SRPA scores. Even if the mediation assumption is wholly invalid and NLP is fully a confounder of the SRPA/victimization relationship, these results provide useful insights into the nature of the joint relationships.
The pure natural direct effect (PNDE) might be considered a lower bound estimate that represents a minimum effect under an assumption that NLP is a confounder rather than a mediator, as the so-called “direct” effect represents associations generated by unmeasured mechanisms (e.g., target gratifiability) after adjusting out any effects that are related to the mediator (sexual activity). The “pure natural direct effect” (PNDE) estimate predicts those with the highest SRPA scores directly experience approximately 1.56 times the risk of victimization, or a 56% increase in risk of victimization, compared to those with the lowest SRPA scores after adjusting for NLP and controls.
In comparison, if NLP indeed is (at least partly) a mediator, the model also predicts sizeable indirect effects of SRPA on victimization risks through its effects on NLP. the total “indirect” effect, or that attributed to covariation and interaction with NLP, is 1.41. The proportion eliminated estimate (0.61) suggests that roughly three-fifths of the total effect of SRPA on the relative risks of sexual victimization appears to be due to more physically attractive participants’ increased exposure risks from sexual activity. The total RR effect estimate is 2.20, which equates to a total effect of 2.2 times the risk due to combined effects of increased exposure risks from sexual activity and other unmeasured (“direct” effect) mechanisms.
For more information about CMAverse models and effect estimates, see my blog posts on the topic.
9 Supplemental Interaction Models
Now, with primary models completed, let’s turn to supplementary models assessing effect heterogeneity across gender and nlp.
First, to assess whether estimates vary across gender (noted as possibility in preregistration based on prior research), I add an interaction between SRPA and gender to the “Controls” model (ie., with covariates but without NLP due to ambiguity of NLP’s status as a confounding or mediating mechanism), then plot predicted probabilities and contrasts for each ordinal response category for males and females.
9.1 Gender: Ord~Ord (sexvicsumf by moSRPAf:gender)
Show code
#Ordinal cumulative probit model of sexual victimization index #with ordinal (monotonic cumulative probit) SRPA predictor, # linear controls, & mo(SRPA):gender#10 nlp response categories = 9 thresholds # so, beta prior (0,1) equiv to (0,[1/9]) or (0,.11) #sexvicsum - bivariate sexvicsum by mo(SRPAf):mod6mo <-brm(data = datv3tmp,family =cumulative("probit"), sexvicsumf ~1+mo(SRPAf) + SRintelz + SRpopularz + agez + worked + gender +mo(SRPAf):gender,prior = priors3,iter =2000, warmup =1000, chains =4, cores = nCoresphys, backend ="cmdstanr",# threads = threading(2), #consider within-chain threading if machine has at least 8 physical coresseed =8675309, file ="Models/mod6mo_fit",file_refit ="on_change" )
prior class coef group
normal(0, 1) b
normal(0, 1) b agez
normal(0, 1) b gender2
normal(0, 0.083) b moSRPAf
normal(0, 1) b moSRPAf:gender2
normal(0, 1) b SRintelz
normal(0, 1) b SRpopularz
normal(0, 1) b worked1
student_t(3, 0, 2.5) Intercept
normal(-0.84, 1) Intercept 1
normal(-0.25, 1) Intercept 2
normal(0.25, 1) Intercept 3
normal(0.84, 1) Intercept 4
dirichlet(1) simo moSRPAf:gender21
dirichlet(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2) simo moSRPAf1
resp dpar nlpar lb ub source
user
(vectorized)
(vectorized)
user
(vectorized)
(vectorized)
(vectorized)
(vectorized)
default
user
user
user
user
default
user
9.1.1 Visualize max moSRPA effect by response category & gender (mod6mo)
Show code
#manual predictions at user-specified means or representative values# function to generate var mean# meanvar <- function(x){# mean(x, na.rm=TRUE)# }#define predictor grid for conditional avg at covariate means or rep values# ndmo_gen <- tibble(# gender= c("1","2"), # nlpf = "2", #set at 2 lifetime partners (mean=1.9) # SRintelz = 0, # SRpopularz = 0, # agez = 0, # worked = "1") %>%# expand_grid(SRPAf = c("2","14")) # generate and save predictions at user-specified covariate values# mod6mopred <- predictions(mod6mo, newdata = ndmo_gen) %>% # data.frame() %>%# mutate(# colgrp = if_else(gender == "1" & SRPAf == "2", "M: E[italic(j)](hat(italic(p))[italic(ij)]^min)", "M: E[italic(j)](hat(italic(p))[italic(ij)]^max)"), # colgrp = if_else(gender == "2" & SRPAf == "2", "F: E[italic(j)](hat(italic(p))[italic(ij)]^min)", colgrp), # colgrp = if_else(gender == "2" & SRPAf == "14", "F: E[italic(j)](hat(italic(p))[italic(ij)]^max)", colgrp), # SRPAf = ifelse(SRPAf == "2", "E[italic(j)](hat(italic(p))[italic(ij)]^min)", "E[italic(j)](hat(italic(p))[italic(ij)]^max)"))#Let's go with automated predictions by response category at means mod6mopred <-predictions(mod6mo, variables =list(SRPAf =c("2","14")), newdata ="mean", by =c("SRPAf", "group","gender")) %>%data.frame() %>%mutate(colgrp =if_else(gender =="1"& SRPAf =="2", "M: E[italic(j)](hat(italic(p))[italic(ij)]^min)", "M: E[italic(j)](hat(italic(p))[italic(ij)]^max)"),colgrp =if_else(gender =="2"& SRPAf =="2", "F: E[italic(j)](hat(italic(p))[italic(ij)]^min)", colgrp),colgrp =if_else(gender =="2"& SRPAf =="14", "F: E[italic(j)](hat(italic(p))[italic(ij)]^max)", colgrp),SRPAf =ifelse(SRPAf =="2", "E[italic(j)](hat(italic(p))[italic(ij)]^min)", "E[italic(j)](hat(italic(p))[italic(ij)]^max)"))#function to find & drop leading zeroes (used for x-axis label)dropLeadingZero <-function(l){str_replace(l, '0(?=.)', '')}mod6mopred_m <- mod6mopred %>% dplyr::filter(gender=="1")mod6mopred_f <- mod6mopred %>% dplyr::filter(gender=="2")#full range of y responses (0:4)mod6mop1 <-ggplot(mod6mopred, aes(x = group, y = estimate, ymin = conf.low, ymax = conf.high, color = colgrp)) +geom_hline(yintercept =0:10/10, color ="lightgrey") +geom_linerange(data=mod6mopred_m, aes(color = colgrp), linewidth =3/4, position =position_dodge(width =1/3)) +geom_point(data=mod6mopred_m, aes(shape = colgrp),position =position_dodge(width =1/3),size =2) +geom_linerange(data=mod6mopred_f, aes(color = colgrp), linewidth =3/4, position =position_dodge(width =1/6)) +geom_point(data=mod6mopred_f, aes(shape = colgrp),position =position_dodge(width =1/6),size =2) +scale_color_viridis_d(NULL, option ="magma", begin = .2, end = .8,labels = scales::parse_format()) +scale_shape_manual(NULL, values =c(19, 19, 17, 17),labels = scales::parse_format()) +scale_y_continuous(limits =c(0, 1), breaks=seq(0, 1, by = .2), labels = dropLeadingZero) +labs(subtitle ="Pred. response probabilities\n(all)") +theme(legend.background =element_blank(),legend.position =c(0.9, 0.85), legend.key.height=unit(.5, 'cm'),legend.key.width=unit(.5, 'cm'), legend.title =element_text(size=10), legend.text=element_text(size=10), axis.title.y =element_blank(), axis.title.x =element_blank())#nonzero victimization values only (1:4)mod6mop2 <-ggplot(mod6mopred, aes(x = group, y = estimate, ymin = conf.low, ymax = conf.high, color = colgrp)) +geom_hline(yintercept =0:7/50, color ="lightgrey") +geom_linerange(data=mod6mopred_m, aes(color = colgrp), linewidth =3/4, position =position_dodge(width =1/3)) +geom_point(data=mod6mopred_m, aes(shape = colgrp),position =position_dodge(width =1/3),size =2) +geom_linerange(data=mod6mopred_f, aes(color = colgrp), linewidth =3/4, position =position_dodge(width =1/6)) +geom_point(data=mod6mopred_f, aes(shape = colgrp),position =position_dodge(width =1/6),size =2) +scale_color_viridis_d(NULL, option ="magma", begin = .2, end = .8,labels = scales::parse_format()) +scale_shape_manual(NULL, values =c(19, 19, 17, 17),labels = scales::parse_format()) +scale_y_continuous(limits =c(0, .14), breaks=seq(0, .14, by = .02), labels = dropLeadingZero) +labs(subtitle ="(y > 0)",x =expression(Victimiz.~response~option~(italic(j)))) +theme(legend.background =element_blank(),legend.position =c(0.9, 0.85), legend.key.height=unit(.5, 'cm'),legend.key.width=unit(.5, 'cm'), legend.title =element_text(size=10), legend.text=element_text(size=10), axis.title.y =element_blank() )# estimate contrasts at means or rep valuesmod6mocon <-comparisons(mod6mo, variables =list(SRPAf =c("2","14")), newdata ="mean", by =c("group","gender")) %>%data.frame() %>%mutate( gender =if_else(gender =="1", "Males", "Females"))mod6mop3 <- mod6mocon %>%ggplot(aes(x = group, y = estimate, ymin = conf.low, ymax = conf.high, color = gender)) +geom_hline(yintercept =-4:4/20, color ="lightgrey") +geom_hline(yintercept =0, color ="black") +geom_pointrange(aes(color = gender), linewidth =3/4, size =1/3, position =position_dodge(width =1/3)) +scale_color_viridis_d(NULL, option ="magma", begin = .3, end = .7,labels = scales::parse_format()) +scale_y_continuous(expression(italic(p[j])^max-italic(p[j])^min), limits =c(-0.2, 0.2), breaks=seq(-.2, .2, by = .05), labels = dropLeadingZero) +labs(subtitle =expression(Delta[italic(j)]^{MEM}~(max~contrasts)),x =expression(Victimiz.~response~option~(italic(j)))) +theme(legend.background =element_blank(),legend.position =c(0.9, 0.85), legend.key.height=unit(.5, 'cm'),legend.key.width=unit(.5, 'cm'), legend.title =element_text(size=10), legend.text=element_text(size=10) )# combine and entitle((mod6mop3 | mod6mop1 / mod6mop2)) +plot_annotation(title ="Cumulative probit y ~ mo(SRPAf):Gender + Controls", subtitle ="Maximum monotonic MEM of SRPA on Victimization by Gender")
9.1.2 “Interaction Model 2 (Controls)” Contrasts
Show code
mod6mocon %>%gt()
rowid
term
group
contrast
gender
estimate
conf.low
conf.high
predicted_lo
predicted_hi
predicted
tmp_idx
1
SRPAf
0
mean(14) - mean(2)
Males
-0.0898
-0.1100
-0.0725
0.94524
0.8555
0.9289
1
2
SRPAf
0
mean(14) - mean(2)
Females
-0.1442
-0.1716
-0.1130
0.81470
0.6706
0.7896
2
1
SRPAf
1
mean(14) - mean(2)
Males
0.0419
0.0341
0.0506
0.03297
0.0749
0.0414
1
2
SRPAf
1
mean(14) - mean(2)
Females
0.0437
0.0348
0.0512
0.09074
0.1344
0.0997
2
1
SRPAf
2
mean(14) - mean(2)
Males
0.0366
0.0294
0.0452
0.01854
0.0552
0.0249
1
2
SRPAf
2
mean(14) - mean(2)
Females
0.0635
0.0497
0.0757
0.07271
0.1362
0.0838
2
1
SRPAf
3
mean(14) - mean(2)
Males
0.0078
0.0060
0.0100
0.00246
0.0102
0.0036
1
2
SRPAf
3
mean(14) - mean(2)
Females
0.0217
0.0166
0.0264
0.01488
0.0366
0.0181
2
1
SRPAf
4
mean(14) - mean(2)
Males
0.0035
0.0027
0.0047
0.00077
0.0043
0.0012
1
2
SRPAf
4
mean(14) - mean(2)
Females
0.0154
0.0114
0.0192
0.00689
0.0222
0.0088
2
Estimates from this model shows higher predicted probabilities of reported sexual victimization for females compared to males across both the lowest and highest SRPA scores. Likewise, the “conditional effects” plot displaying predicted probabilities by gender across all SRPA categories (see tab above) also illustrates these gender differences. Interestingly, examination of the conditional effects plot also reveals that the pattern of increased victimization risks emerging primarily at the upper range of SRPA scores (e.g., >8 on the ordinal scale ranging from 2 to 14) is apparent both for females and males.
Also, the maximum effect contrasts indicate that differences in predicted probabilities between those with the lowest and highest SRPA scores are larger for females than for males. Recall, these estimates adjust for measured potential confounders but not nlp.
Let’s rerun the model to see if patterns hold after adjusting for NLP. This time, though, we will increase the number of iterations and the warmup. This is because two mo(x) variables and an interaction with one of the mo(x) variables is asking a lot from our data, and our previous specifications resulted in low ESS (<1000) & high Rhat (>1.01) values (see here for more information). With four chains each generating 10k iterations (minus 1k warmup), we will end with 36k post-warmup draws from the posterior (compared to 4k in simpler models above). This may be going a bit overboard here, but the model already takes a long time to run so I want to be sure we have enough samples to generate valid conclusions about the posterior.
9.2 Gender: Ord~Ord (sexvicsumf by moSRPAf:gender, add NLP)
Show code
#Ordinal cumulative probit model of sexual victimization index #with ordinal (monotonic cumulative probit) SRPA predictor, # linear controls, mo(SRPA):gender, & mo(nlp)#sexvicsum - bivariate sexvicsum by mo(SRPAf):mod7mo <-brm(data = datv3tmp,family =cumulative("probit"), sexvicsumf ~1+mo(SRPAf) +mo(nlpf) + SRintelz + SRpopularz + agez + worked + gender +mo(SRPAf):gender,prior = priors4,iter =10000, warmup =1000, chains =4, cores = nCoresphys, backend ="cmdstanr",# threads = threading(2), #consider within-chain threading if machine has at least 8 physical coresseed =8675309, file ="Models/mod7mo_fit",file_refit ="on_change" )
prior class coef group
normal(0, 1) b
normal(0, 1) b agez
normal(0, 1) b gender2
normal(0, 0.083) b moSRPAf
normal(0, 1) b moSRPAf:gender2
normal(0, 1) b SRintelz
normal(0, 1) b SRpopularz
normal(0, 1) b worked1
student_t(3, 0, 2.5) Intercept
normal(-0.84, 1) Intercept 1
normal(-0.25, 1) Intercept 2
normal(0.25, 1) Intercept 3
normal(0.84, 1) Intercept 4
dirichlet(1) simo moSRPAf:gender21
dirichlet(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2) simo moSRPAf1
resp dpar nlpar lb ub source
user
(vectorized)
(vectorized)
user
(vectorized)
(vectorized)
(vectorized)
(vectorized)
default
user
user
user
user
default
user
9.2.1 Visualize max moSRPA effect by response category & gender (mod7mo)
Show code
#manual predictions at user-specified means or representative values# function to generate var mean# meanvar <- function(x){# mean(x, na.rm=TRUE)# }#define predictor grid for conditional avg at covariate means or rep values# ndmo_gen <- tibble(# gender= c("1","2"), # nlpf = "2", #set at 2 lifetime partners (mean=1.9) # SRintelz = 0, # SRpopularz = 0, # agez = 0, # worked = "1") %>%# expand_grid(SRPAf = c("2","14")) # generate and save predictions at user-specified covariate values# mod7mopred <- predictions(mod7mo, newdata = ndmo_gen) %>% # data.frame() %>%# mutate(# colgrp = if_else(gender == "1" & SRPAf == "2", "M: E[italic(j)](hat(italic(p))[italic(ij)]^min)", "M: E[italic(j)](hat(italic(p))[italic(ij)]^max)"), # colgrp = if_else(gender == "2" & SRPAf == "2", "F: E[italic(j)](hat(italic(p))[italic(ij)]^min)", colgrp), # colgrp = if_else(gender == "2" & SRPAf == "14", "F: E[italic(j)](hat(italic(p))[italic(ij)]^max)", colgrp), # SRPAf = ifelse(SRPAf == "2", "E[italic(j)](hat(italic(p))[italic(ij)]^min)", "E[italic(j)](hat(italic(p))[italic(ij)]^max)"))#Let's go with automated predictions by response category at means mod7mopred <-predictions(mod7mo, variables =list(SRPAf =c("2","14")), newdata ="mean", by =c("SRPAf", "group","gender")) %>%data.frame() %>%mutate(colgrp =if_else(gender =="1"& SRPAf =="2", "M: E[italic(j)](hat(italic(p))[italic(ij)]^min)", "M: E[italic(j)](hat(italic(p))[italic(ij)]^max)"),colgrp =if_else(gender =="2"& SRPAf =="2", "F: E[italic(j)](hat(italic(p))[italic(ij)]^min)", colgrp),colgrp =if_else(gender =="2"& SRPAf =="14", "F: E[italic(j)](hat(italic(p))[italic(ij)]^max)", colgrp),SRPAf =ifelse(SRPAf =="2", "E[italic(j)](hat(italic(p))[italic(ij)]^min)", "E[italic(j)](hat(italic(p))[italic(ij)]^max)"))#function to find & drop leading zeroes (used for x-axis label)dropLeadingZero <-function(l){str_replace(l, '0(?=.)', '')}mod7mopred_m <- mod7mopred %>% dplyr::filter(gender=="1")mod7mopred_f <- mod7mopred %>% dplyr::filter(gender=="2")#full range of y responses (0:4)mod7mop1 <-ggplot(mod7mopred, aes(x = group, y = estimate, ymin = conf.low, ymax = conf.high, color = colgrp)) +geom_hline(yintercept =0:10/10, color ="lightgrey") +geom_linerange(data=mod7mopred_m, aes(color = colgrp), linewidth =3/4, position =position_dodge(width =1/3)) +geom_point(data=mod7mopred_m, aes(shape = colgrp),position =position_dodge(width =1/3),size =2) +geom_linerange(data=mod7mopred_f, aes(color = colgrp), linewidth =3/4, position =position_dodge(width =1/6)) +geom_point(data=mod7mopred_f, aes(shape = colgrp),position =position_dodge(width =1/6),size =2) +scale_color_viridis_d(NULL, option ="magma", begin = .2, end = .8,labels = scales::parse_format()) +scale_shape_manual(NULL, values =c(19, 19, 17, 17),labels = scales::parse_format()) +scale_y_continuous(limits =c(0, 1), breaks=seq(0, 1, by = .2), labels = dropLeadingZero) +labs(subtitle ="Pred. response probabilities\n(all)") +theme(legend.background =element_blank(),legend.position =c(0.9, 0.85), legend.key.height=unit(.5, 'cm'),legend.key.width=unit(.5, 'cm'), legend.title =element_text(size=10), legend.text=element_text(size=10), axis.title.y =element_blank(), axis.title.x =element_blank())#nonzero victimization values only (1:4)mod7mop2 <-ggplot(mod7mopred, aes(x = group, y = estimate, ymin = conf.low, ymax = conf.high, color = colgrp)) +geom_hline(yintercept =0:7/50, color ="lightgrey") +geom_linerange(data=mod7mopred_m, aes(color = colgrp), linewidth =3/4, position =position_dodge(width =1/3)) +geom_point(data=mod7mopred_m, aes(shape = colgrp),position =position_dodge(width =1/3),size =2) +geom_linerange(data=mod7mopred_f, aes(color = colgrp), linewidth =3/4, position =position_dodge(width =1/6)) +geom_point(data=mod7mopred_f, aes(shape = colgrp),position =position_dodge(width =1/6),size =2) +scale_color_viridis_d(NULL, option ="magma", begin = .2, end = .8,labels = scales::parse_format()) +scale_shape_manual(NULL, values =c(19, 19, 17, 17),labels = scales::parse_format()) +scale_y_continuous(limits =c(0, .14), breaks=seq(0, .14, by = .02), labels = dropLeadingZero) +labs(subtitle ="(y > 0)",x =expression(Victimiz.~response~option~(italic(j)))) +theme(legend.background =element_blank(),legend.position =c(0.9, 0.85), legend.key.height=unit(.5, 'cm'),legend.key.width=unit(.5, 'cm'), legend.title =element_text(size=10), legend.text=element_text(size=10), axis.title.y =element_blank() )# estimate contrasts at means or rep valuesmod7mocon <-comparisons(mod7mo, variables =list(SRPAf =c("2","14")), newdata ="mean", by =c("group","gender")) %>%data.frame() %>%mutate( gender =if_else(gender =="1", "Males", "Females"))mod7mop3 <- mod7mocon %>%ggplot(aes(x = group, y = estimate, ymin = conf.low, ymax = conf.high, color = gender)) +geom_hline(yintercept =-4:4/20, color ="lightgrey") +geom_hline(yintercept =0, color ="black") +geom_pointrange(aes(color = gender), linewidth =3/4, size =1/3, position =position_dodge(width =1/3)) +scale_color_viridis_d(NULL, option ="magma", begin = .3, end = .7,labels = scales::parse_format()) +scale_y_continuous(expression(italic(p[j])^max-italic(p[j])^min), limits =c(-0.2, 0.2), breaks=seq(-.2, .2, by = .05), labels = dropLeadingZero) +labs(subtitle =expression(Delta[italic(j)]^{MEM}~(max~contrasts)),x =expression(Victimiz.~response~option~(italic(j)))) +theme(legend.background =element_blank(),legend.position =c(0.9, 0.85), legend.key.height=unit(.5, 'cm'),legend.key.width=unit(.5, 'cm'), legend.title =element_text(size=10), legend.text=element_text(size=10) )# combine and entitle((mod7mop3 | mod7mop1 / mod7mop2)) +plot_annotation(title ="Cumulative probit y ~ mo(SRPAf):Gender + Controls + NLP", subtitle ="Maximum monotonic MEM of SRPA on Victimization by Gender")
9.2.2 “Interaction Model 3 (Controls + NLP)” Contrasts
Show code
mod7mocon %>%gt()
rowid
term
group
contrast
gender
estimate
conf.low
conf.high
predicted_lo
predicted_hi
predicted
tmp_idx
1
SRPAf
0
mean(14) - mean(2)
Females
-0.03191
-0.05434
-0.00729
0.87898
0.84699
0.88303
1
2
SRPAf
0
mean(14) - mean(2)
Males
-0.02123
-0.03154
0.01197
0.97431
0.95349
0.97030
2
1
SRPAf
1
mean(14) - mean(2)
Females
0.01423
0.00327
0.02406
0.06817
0.08241
0.06628
1
2
SRPAf
1
mean(14) - mean(2)
Males
0.01268
-0.00726
0.01859
0.01741
0.02981
0.01988
2
1
SRPAf
2
mean(14) - mean(2)
Females
0.01391
0.00317
0.02378
0.04496
0.05890
0.04324
1
2
SRPAf
2
mean(14) - mean(2)
Males
0.00751
-0.00417
0.01133
0.00758
0.01496
0.00895
2
1
SRPAf
3
mean(14) - mean(2)
Females
0.00262
0.00059
0.00458
0.00588
0.00850
0.00557
1
2
SRPAf
3
mean(14) - mean(2)
Males
0.00081
-0.00043
0.00129
0.00058
0.00137
0.00071
2
1
SRPAf
4
mean(14) - mean(2)
Females
0.00114
0.00026
0.00205
0.00200
0.00314
0.00187
1
2
SRPAf
4
mean(14) - mean(2)
Males
0.00023
-0.00012
0.00038
0.00013
0.00035
0.00016
2
We knew from earlier models that stratifying on NLP would result in much smaller predicted probability contrasts (i.e., SRPA’s maximum effect estimates). Comparing this gender interaction model to the previous interaction model without NLP reveals that the gender differences in contrasts have nearly disappeared after adjusting for NLP. This might indicate that observed gender differences in the effects of SRPA on sexual victimization are accounted for largely by increased exposure risks of physically attractive and sexually active women compared to men. The small remaining estimated effect may be accounted for by other unmeasured mechanisms like gratifiability.