--- title: "Comparing Intersectional Inequality Across Groups" author: "Hamid Bulut" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Comparing Intersectional Inequality Across Groups} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.5 ) ``` ## Introduction MAIHDA is often used to estimate how much outcome variation lies between intersectional strata in a single population. The group comparison workflow asks a related contextual question: is intersectional inequality larger in some higher-level groups than in others? Examples include comparing gender-by-class inequality across countries, race-by-gender inequality across regions, or intersectional inequality across survey waves. In the MAIHDA package, this is handled by `maihda()` with a `group` argument or, when you want only the group comparison table, by `compare_maihda_groups()`. ## Example data: countries, gender, and socioeconomic status The package includes `maihda_country_data`, a teaching dataset based on a balanced subset of OECD PISA 2018 data. Each row is a student, the outcome is a mathematics score, and the intersectional strata are formed by `gender` and `ses`. The higher-level grouping variable is `country`, which lets us ask whether the VPC/ICC for gender-by-SES inequality differs across countries. ```{r data} library(MAIHDA) data("maihda_country_data") country_counts <- as.data.frame(table(maihda_country_data$country)) names(country_counts) <- c("country", "n") country_counts table(maihda_country_data$gender, maihda_country_data$ses) ``` ## One-call workflow Use `maihda()` when you want the full decomposition and the group comparison in one object. The formula below defines six intersectional strata (`gender:ses`) and asks for a separate decomposition within each country. Because `maihda()` is a two-model workflow, it fits both the null and the adjusted model overall *and* within each country, so the group table also carries a per-country **PCV** (the additive share of that country's intersectional inequality). ```{r one-call} # gender + ses are written as additive fixed effects (the adjusted model); maihda() # derives the null by dropping them, both overall and within each country. analysis <- maihda( math ~ gender + ses + (1 | gender:ses), data = maihda_country_data, group = "country" ) analysis ``` The printed report includes the overall VPC/ICC and PCV first, then the country-level comparison. The group comparison is also stored in `analysis$groups`, so it can be inspected or saved as a regular data frame. ```{r group-table} group_results <- as.data.frame(analysis$groups) group_results[order(group_results$vpc, decreasing = TRUE), c("group", "n", "n_strata", "vpc", "var_between", "var_residual", "pcv", "status")] ``` In this example all countries have 600 students and all six gender-by-SES strata are populated. Higher VPC/ICC values indicate that a larger share of mathematics score variation lies between the intersectional strata within that country; the `pcv` column shows how much of each country's between-stratum variance is the additive sum of the gender and SES main effects (so a lower PCV flags inequality that is more intersection-specific). ## Visualizing the comparison The group VPC plot orders countries by the estimated intersectional VPC/ICC. If the comparison was run with `bootstrap = TRUE`, the same plot will include per-country confidence intervals. ```{r group-vpc-plot} plot(analysis, type = "group_vpc") ``` The variance-components view shows the same comparison as shares of the total model variance. The orange segment is the between-stratum component, which is the numerator of the VPC/ICC. ```{r group-components-plot} plot(analysis, type = "group_components") ``` ## Share versus magnitude The VPC/ICC is a *ratio*: the share of the unexplained variation that lies between strata. A larger VPC therefore does not necessarily mean a larger *amount* of between-stratum (intersectional) variation -- it can also reflect a smaller residual variance. Two countries with the same absolute between-stratum variance can have very different VPCs, and vice versa. The `"group_between_variance"` view plots the absolute between-stratum variance (`var_between`) directly, so it can be read alongside the VPC to separate the *share* of inequality from its *magnitude*. ```{r group-between-variance-plot} plot(analysis, type = "group_between_variance") ``` For non-Gaussian models this variance is on the model (link) scale, and unlike the VPC it is not normalised by the residual variance. ## Additive share (PCV) by group The `"group_pcv"` view bars each country's PCV -- the proportional drop in between-stratum variance once the gender and SES main effects are added. A high bar means that country's intersectional inequality is largely the additive sum of its parts; a low (or negative) bar flags inequality that is more intersection-specific. (With strata defined by a single dimension there is no intersection to decompose, so this view and the `pcv` column are unavailable.) ```{r group-pcv-plot} plot(analysis, type = "group_pcv") ``` ## Direct group comparison If you only need the group comparison table and plots, call `compare_maihda_groups()` directly. This fits the same per-country null and adjusted models used by `maihda(group = "country")` (the `pcv` column appears when the strata have at least two dimensions). ```{r direct-workflow, eval = FALSE} group_cmp <- compare_maihda_groups( math ~ 1 + (1 | gender:ses), data = maihda_country_data, group = "country" ) group_cmp plot(group_cmp, type = "vpc") plot(group_cmp, type = "components") plot(group_cmp, type = "pcv") ``` By default, `shared_strata = TRUE`. That means the `gender:ses` strata are defined once on the pooled data, then reused in every country. A "female:Low" stratum therefore refers to the same type of stratum in each country, which makes the stratum *definitions* comparable across countries. Shared definitions are necessary but not by themselves sufficient for the VPCs to be *directly* comparable. A given country may not contain every stratum, so two countries' VPCs can be estimated over different sets of *populated* strata. When that happens, `compare_maihda_groups()` issues a warning, and the VPCs should be read with that caveat in mind (alongside the `n_strata` column) rather than as strictly like-for-like. Set `shared_strata = FALSE` only when you intentionally want each group to rebuild its own strata. In that case, stratum identities are no longer shared across groups, so interpretation should focus on within-group structure rather than direct stratum-to-stratum comparison. ## Adding bootstrap intervals For publication-oriented summaries, you can request per-group parametric bootstrap confidence intervals with the `lme4` engine. This can take noticeably longer because each group requires repeated model refits, so the example is not run when the vignette is built. ```{r bootstrap-example, eval = FALSE} group_cmp_boot <- compare_maihda_groups( math ~ 1 + (1 | gender:ses), data = maihda_country_data, group = "country", bootstrap = TRUE, n_boot = 500 ) plot(group_cmp_boot, type = "vpc") ```