---
title: "Crossed random effects in MAIHDA: dimensions and contexts"
author: "Hamid Bulut"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Crossed random effects in MAIHDA: dimensions and contexts}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4.5
)
```

## Two different things called "cross-classified"

This vignette covers the two MAIHDA designs that cross the intersectional stratum
random effect with other random intercepts, and disentangles two senses of the
term "cross-classified" that are easy to confuse:

1. The crossed-dimensions decomposition
   (`maihda(decomposition = "crossed-dimensions")`): the stratum dimensions' own
   main effects enter as random intercepts alongside the intersection, in order to
   estimate a model-based split of the between-strata variance into additive
   and interaction parts within a single model. This mode was called
   `"cross-classified"` in earlier versions of this package; that value still
   works as a deprecated alias.

2. The contextual cross-classified MAIHDA (`context =`): individuals are
   cross-classified by their intersectional stratum and a higher-level
   place or institution, patients within strata and hospitals, students within
   strata and schools. This is what the MAIHDA literature usually means by
   "cross-classified MAIHDA" (e.g. hospital differences in patient survival, or
   schools crossed with sociodemographic strata in student achievement). It
   partitions the unexplained variance into between-stratum vs. between-context
   vs. residual.

The two answer different questions, "how much of the intersectional inequality
is more than additive under this model?" vs. "how much of the unexplained
clustering is between strata rather than between shared contexts?", and they
compose: you can fit both structures in one model.

## Part 1, The crossed-dimensions decomposition

Intersectional MAIHDA asks how much of the unexplained variation in an outcome
lies between intersectional strata, and how much of that between-stratum
variation can be represented by additive main effects of the constituent
dimensions versus intersectional interaction remaining over and above those
additive parts.

The package offers two estimators for that split, selected with the `decomposition`
argument of `maihda()`:

- `"two-model"` (default), the classic approach. Fit a null model and an
  adjusted model that adds the dimensions' additive main effects as fixed
  effects, and use the PCV (the proportional change in between-stratum
  variance) as the additive-share summary. See
  `vignette("introduction", package = "MAIHDA")`.
- `"crossed-dimensions"`, a single model that enters each dimension's additive
  main effect as a random intercept:

  $$y_i = \beta_0 + \mathbf{x}_i\boldsymbol\beta
    + u^{(1)}_{d_1[i]} + \dots + u^{(K)}_{d_K[i]} + u^{(\text{stratum})}_{s[i]} + e_i.$$

  Under this random-effects parameterization, each dimension's random-effect
  variance is treated as that dimension's additive contribution, and the full
  intersection (`stratum`) random-effect variance is the residual interaction
  beyond additive. The additive and interaction shares are therefore
  model-implied shares of the between-strata variance from this one fit.

It is fit with ordinary multilevel software, `lme4` for a frequentist fit or `brms`
for a Bayesian one, so it does not require any special toolchain.

### Running a crossed-dimensions analysis

```{r fit}
library(MAIHDA)
data("maihda_health_data")

cc <- maihda(
  BMI ~ Age + (1 | Gender:Race:Education),
  data = maihda_health_data,
  decomposition = "crossed-dimensions"
)
cc
```

`maihda()` builds the crossed-dimensions model for you from the intersectional shorthand:
it reads the dimensions (`Gender`, `Race`, `Education`) from the grouping term, adds one
additive random intercept per dimension plus the intersection random intercept, and
fits the single model:

```{r formula}
cc$formula
```

The partition is on `cc$decomposition` (and printed above):

```{r decomposition}
cc$decomposition$additive_var        # sum of the dimension random-effect variances
cc$decomposition$interaction_var     # the intersection random-effect variance
cc$decomposition$additive_share      # additive share of the between-strata variance
cc$decomposition$interaction_share   # the complement: the interaction share
cc$decomposition$per_dim             # additive variance per dimension
```

`summary()` shows the full variance-components table (one row per dimension, the
interaction, and the residual) alongside the decomposition:

```{r summary}
summary(cc$model)
```

### Figures

The variance-partition and decomposition figures are aware of the crossed-dimensions
structure: the VPC plot shows one additive slice per dimension plus the interaction
and residual, and the deviation decomposition splits each stratum's deviation into its
additive (dimension random effects) and interaction (intersection random effect) parts.

```{r plots}
plot(cc, type = "vpc")            # per-dimension additive + interaction + residual
plot(cc, type = "effect_decomp")  # additive vs. interaction, per stratum
```

The ternary diagnostic needs the optional `ggtern` package, so it is only drawn
when that package is installed.

```{r plots-ternary, eval = requireNamespace("ggtern", quietly = TRUE), warning = FALSE, message = FALSE}
plot(cc, type = "ternary")        # additive / interaction / uncertainty per stratum
```

### Comparing across a higher-level group

Pass a `group` to decompose within each level of a higher-level variable, here, how
the additive-vs-interaction split differs across countries in the PISA data:

```{r groups, eval = FALSE}
data("maihda_country_data")
cc_grp <- maihda(
  math ~ 1 + (1 | gender:ses),
  data = maihda_country_data,
  group = "country",
  decomposition = "crossed-dimensions"
)
plot(cc_grp, type = "group_additive_share")  # additive share by country
plot(cc_grp, type = "group_components")       # additive / interaction / residual
```

### A Bayesian fit

Set `engine = "brms"` for a Bayesian crossed-dimensions fit; the additive and interaction
shares then carry posterior credible intervals (no bootstrap needed). This is the
recommended engine when dimensions have few levels (see the caveats below).

```{r brms, eval = FALSE}
cc_b <- maihda(
  BMI ~ Age + (1 | Gender:Race:Education),
  data = maihda_health_data,
  engine = "brms",
  decomposition = "crossed-dimensions"
)
cc_b$decomposition$additive_share_ci
```

### Two important caveats

1. The additive share is not the PCV. The two-model PCV and the crossed-dimensions
   additive share target the same broad question, how much of the between-strata
   variance is additive, but with different estimators (fixed main effects
   across two models vs. a single model's random main-effect variances, which are
   partially pooled). They may be directionally similar, but they need not be
   numerically close; do not treat one as a validation check for the other.

2. Few-level dimensions are poorly identified. A dimension's additive variance is
   estimated from its handful of levels. A binary dimension (e.g. a two-level sex
   variable) contributes a variance estimated from just two groups, so `lme4` may
   estimate one or more variance components at or near zero (a singular fit).
   This makes the additive and interaction shares unstable. Watch for the
   singular-fit note in the output, and consider `engine = "brms"` with explicit
   prior sensitivity checks when dimensions are few-levelled.

## Part 2, Contextual cross-classified MAIHDA (`context =`)

People do not only belong to intersectional strata; they are also clustered in
places and institutions, hospitals, schools, neighbourhoods, countries. When the
context matters for the outcome, a strata-only MAIHDA can conflate stratum and
context variation, especially when strata are unevenly distributed across
contexts. The contextual cross-classified MAIHDA fits both levels jointly,
crossed:

$$y_i = \beta_0 + \mathbf{x}_i\boldsymbol\beta
  + u^{(\text{stratum})}_{s[i]} + u^{(\text{context})}_{c[i]} + e_i,$$

and partitions the unexplained variance into

- between-stratum, the intersectional clustering net of the shared context
  (the headline VPC/ICC),
- between-context, the general contextual effect of the place/institution, and
- residual, within-stratum, within-context individual heterogeneity.

Pass the context column(s) via `context =`; everything else is unchanged:

```{r context-fit}
data("maihda_country_data")

ctx <- fit_maihda(
  math ~ 1 + (1 | gender:ses),
  data = maihda_country_data,
  context = "country"
)
summary(ctx)
```

The headline VPC/ICC is now the between-stratum share conditional on the country
random intercept: in these data it drops noticeably relative to the strata-only
fit, consistent with some country-level clustering or composition being separated
from the stratum component. The country share is the general contextual effect.

```{r context-compare}
# Strata-only fit for comparison: its VPC may partly reflect country clustering.
m0 <- fit_maihda(math ~ 1 + (1 | gender:ses), data = maihda_country_data)
summary(m0)$vpc$estimate          # strata-only VPC
s <- summary(ctx)
s$vpc$estimate                    # between-stratum share conditional on country
s$context$vpc_context_total      # the country (general contextual) share
```

### With the full `maihda()` workflow

`maihda(context = )` carries the context random intercept through both the null
and the adjusted model, so the PCV decomposition is computed with the context
partialled out:

```{r context-maihda}
a <- maihda(
  math ~ 1 + (1 | gender:ses),
  data = maihda_country_data,
  context = "country"
)
a
```

```{r context-plot}
plot(a, type = "vpc")          # stacked shares, with the context broken out
plot(a, type = "context_vpc")  # stratum vs. context variances side by side
```

`context` also composes with the crossed-dimensions decomposition
(`maihda(..., decomposition = "crossed-dimensions", context = "country")`): the
single fit then carries the dimension, intersection, and context random
intercepts, and the context variance enters the VPC denominator.

### `context =` vs. `group =`

Both bring in a higher-level variable, but they are different designs:

| | `group = "country"` | `context = "country"` |
|---|---|---|
| Models fitted | One independent model per country | One joint model, strata crossed with country |
| Question | Does intersectional inequality differ across countries? | How much unexplained variance is between strata vs. between countries? |
| Country effect | Handled by stratification; not estimated as a variance component | Estimated as a variance component |
| Strata | Same definitions, separate estimates per country | One set of stratum effects, pooled across countries |

Because they answer different questions, `maihda()` errors if you supply both.

### Caveats

- Few context levels identify the context variance weakly. The
  `maihda_country_data` example has only 6 countries, fine for illustration, but a
  6-level context variance is imprecise and `lme4` may fit it singular. The
  published contextual MAIHDA studies use dozens to hundreds of contexts (hospitals,
  schools). Prefer many-level contexts, or `engine = "brms"`, whose priors
  regularise the variance.
- The partition is descriptive, not causal. A large context share says outcomes
  cluster by place; it does not say place causes the outcome, nor does the
  stratum share identify discrimination. The usual MAIHDA interpretation caveats
  apply at both levels.
- A manually written `+ (1 | school)` in the formula fits the same model, but is
  summarised generically as "Other random effects". Only `context =` activates the
  labelled stratum-vs-context partition.