---
title: "Model-based continuous summary tables in R"
description: >
  Build model-based summary tables for continuous outcomes in R with
  table_continuous_lm(), including estimated means, robust standard
  errors, case weights, and APA-style output formats.
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Model-based continuous summary tables in R}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

build_rich_tables <- identical(Sys.getenv("IN_PKGDOWN"), "true")

pkgdown_dark_gt <- function(tab) {
  tab |>
    gt::opt_css(
      css = paste(
        ".gt_table, .gt_heading, .gt_col_headings, .gt_col_heading,",
        ".gt_column_spanner_outer, .gt_column_spanner, .gt_title,",
        ".gt_subtitle, .gt_sourcenotes, .gt_sourcenote {",
        "  background-color: transparent !important;",
        "  color: currentColor !important;",
        "}",
        sep = "\n"
      )
    )
}
```

```{r setup}
library(spicy)
```

`table_continuous_lm()` is the model-based companion to
`table_continuous()`. It fits one linear model per selected continuous
outcome using `lm(outcome ~ by, ...)`, then returns a compact reporting
table. This makes it the better choice when you want to stay in a
linear-model workflow, add heteroskedasticity-consistent standard
errors, or apply case weights.

## Basic usage

Use `select` for one or more continuous outcomes and `by` for the
single predictor:

```{r basic}
table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi, life_sat_health),
  by = sex
)
```

For categorical predictors, the table reports estimated means by level.
When the predictor is dichotomous, it can also show a single mean
difference and confidence interval.

## Robust standard errors

Use `vcov = "HC*"` when you want heteroskedasticity-consistent standard
errors and tests:

```{r robust}
table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi),
  by = sex,
  vcov = "HC3",
  statistic = TRUE
)
```

The `HC*` family is computed via [sandwich::vcovHC()] and includes
`"HC0"`, `"HC1"`, `"HC2"`, `"HC3"` (default for small / moderate
samples), `"HC4"`, `"HC4m"`, and `"HC5"`.

## Cluster-robust standard errors

When observations are not independent (repeated measurements per
individual, students nested in classes, panel data), classical and
`HC*` standard errors are biased downward. Use the `CR*` family
together with `cluster = id_var` to get cluster-robust inference,
dispatched to `clubSandwich`:

```{r cluster, eval = requireNamespace("clubSandwich", quietly = TRUE)}
table_continuous_lm(
  sleep,
  select = extra,
  by = group,
  vcov = "CR2",
  cluster = ID,
  statistic = TRUE
)
```

`"CR2"` is the recommended default (Bell & McCaffrey 2002;
Pustejovsky & Tipton 2018). It produces fractional Satterthwaite
degrees of freedom, rendered in the displayed test header as e.g.
`t(8.7)` or `F(2, 12.4)`. `"CR1"` matches Stata's
`vce(cluster id)` default.

## Bootstrap and jackknife

For situations where the residual distribution is non-standard or
the sample is small, `vcov = "bootstrap"` and `vcov = "jackknife"`
provide resampling-based variance estimators in pure base R (no
dependency added):

```{r bootstrap, eval = FALSE}
table_continuous_lm(
  sochealth,
  select = wellbeing_score,
  by = sex,
  vcov = "bootstrap",
  boot_n = 1000  # default
)
```

When `cluster` is supplied, bootstrap switches to a cluster
bootstrap (Cameron, Gelbach & Miller 2008) and jackknife to
leave-one-cluster-out (Quenouille 1956). Both estimators use
asymptotic inference: `z` for single-coefficient contrasts and
`chi^2(q)` for the global Wald test on `k > 2` categorical
predictors, rendered in the displayed test header.

## Case weights

Use `weights` when you want weighted estimated means or slopes in the
same model-based table:

```{r weights}
table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi),
  by = education,
  weights = weight,
  show_weighted_n = TRUE
)
```

This is often the most natural summary-table function when your
reporting workflow already relies on weighted linear models.

## Numeric predictors

If `by` is numeric, `table_continuous_lm()` reports the slope rather
than group means:

```{r numeric-by}
table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi),
  by = age,
  vcov = "HC3"
)
```

When you need the underlying returned data for further processing, use
`output = "data.frame"` for the wide raw table or `output = "long"` for
the analytic long table.

## Effect sizes

`table_continuous_lm()` supports four effect sizes via the
`effect_size` argument. All are derived from the **same fitted model**
as the displayed coefficients and `R²`, so the table stays internally
consistent — and all are **invariant to `vcov`** (the choice of
classical or `HC*` standard errors changes the contrast SE, CI, and
test statistic but not the standardized effect size itself).

Cohen's *d* and Hedges' *g* are the conventions for two-group
comparisons (Cohen 1988; Hedges and Olkin 1985; APA 2020) and require
`by` to have exactly two non-empty levels:

```{r es-d}
table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi),
  by = smoking,
  effect_size = "d"
)
```

Hedges' *g* applies the small-sample correction
`J = 1 - 3/(4·df_resid - 1)`. It is generally preferred over raw *d*
in published reports (Goulet-Pelletier and Cousineau 2018):

```{r es-g}
table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi),
  by = smoking,
  effect_size = "g"
)
```

For categorical predictors with three or more levels (or numeric
predictors), use Hays' `omega²` for a less biased estimate of the
population variance explained (Hays 1963; Olejnik and Algina 2003;
Lakens 2013):

```{r es-omega2}
table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi),
  by = education,
  effect_size = "omega2"
)
```

Cohen's `f²` (= `R² / (1 - R²)`) is the effect size familiar from
power-analysis frameworks (e.g. G*Power) and is defined for any
predictor type:

```{r es-f2}
table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi),
  by = age,
  effect_size = "f2"
)
```

## Confidence intervals for effect sizes

Setting `effect_size_ci = TRUE` adds a confidence interval for the
effect size. The implementation uses the modern noncentral-distribution
inversion approach — the consensus standard in commercial statistical
software (Stata `esize` / `estat esize`, SAS `PROC TTEST` and
`PROC GLM EFFECTSIZE`) and in mainstream R packages (`effectsize`,
`MOTE`, `TOSTER`, `effsize`):

- Noncentral *t* inversion for `"d"` and `"g"` (Steiger and Fouladi
  1997; Goulet-Pelletier and Cousineau 2018; Cousineau and
  Goulet-Pelletier 2021), which has nominal coverage across sample
  sizes — unlike the older Hedges-Olkin normal approximation that is
  biased for small samples.
- Noncentral *F* inversion for `"omega2"` and `"f2"`
  (Steiger 2004; Smithson 2003).

In the printed and rendered outputs, the effect size is displayed with
the bracketed CI (article-ready):

```{r es-ci}
table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi),
  by = smoking,
  effect_size = "g",
  effect_size_ci = TRUE
)
```

The CI level follows `ci_level` (default `0.95`). For programmatic
access, the wide raw output exposes separate numeric columns, and the
long output always includes them:

```{r es-ci-raw}
table_continuous_lm(
  sochealth,
  select = wellbeing_score,
  by = smoking,
  effect_size = "g",
  effect_size_ci = TRUE,
  output = "data.frame"
)
```

```{r es-ci-long}
out <- table_continuous_lm(
  sochealth,
  select = wellbeing_score,
  by = smoking,
  effect_size = "g",
  effect_size_ci = TRUE,
  output = "long"
)
out[, c("variable", "es_type", "es_value", "es_ci_lower", "es_ci_upper")]
```

When `weights` is supplied, all four effect sizes (and their CIs) are
computed from the weighted least-squares fit, keeping them consistent
with the weighted contrast and its CI (DuMouchel and Duncan 1983).
This is the right convention for case-weighted reporting; for
propensity-score balance assessment (Austin and Stuart 2015) or
complex-survey designs, dedicated packages (`cobalt::bal.tab()` and
`survey`) are more appropriate.

## Article-style polish

Pretty outcome labels and a comma decimal separator (useful for
European reporting):

```{r polish}
table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi),
  by = sex,
  labels = c(
    wellbeing_score = "WHO-5 wellbeing (0-100)",
    bmi = "Body-mass index (kg/m²)"
  ),
  effect_size = "g",
  effect_size_ci = TRUE,
  decimal_mark = ","
)
```

## Publication-ready output

The function supports the same output formats as the other summary-table
helpers, including `tinytable`, `gt`, `flextable`, `excel`, `word`, and
`clipboard`.

```{r gt-output, eval = build_rich_tables}
pkgdown_dark_gt(
  table_continuous_lm(
    sochealth,
    select = c(wellbeing_score, bmi, life_sat_health),
    by = sex,
    vcov = "HC3",
    statistic = TRUE,
    output = "gt"
  )
)
```

## Tidying for downstream pipelines

`table_continuous_lm()` returns an object that can be coerced to a
plain `data.frame` / `tbl_df` (stripping the spicy formatting
attributes) or piped into `broom::tidy()` / `broom::glance()` for
use with `gtsummary`, `modelsummary`, `parameters`, or any other
tidyverse-stats workflow:

```{r tidy-glance}
out <- table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi),
  by = sex,
  effect_size = "g",
  effect_size_ci = TRUE
)

# One row per estimated parameter: emmean per level, contrast for
# binary predictors, slope for numeric predictors.
broom::tidy(out)

# One row per outcome with model-level statistics: r.squared,
# adj.r.squared, F / t, df, p.value, nobs, weighted_n, plus the
# effect-size summary.
broom::glance(out)
```

## See also

- See `vignette("table-continuous", package = "spicy")` for descriptive
  continuous summary tables with classical group-comparison tests.
- See `vignette("summary-tables-reporting", package = "spicy")` for a
  cross-function reporting workflow using the summary-table helpers.
