This vignette describes the joint FPCA option that
is available across the refundBayes regression
functions:
sofr_bayes() – Bayesian Scalar-on-Function
Regression,fcox_bayes() – Bayesian Functional Cox Regression,fofr_bayes() – Bayesian Function-on-Function
Regression.In every case, the option is exposed via the same
joint_FPCA argument: a logical (TRUE /
FALSE) vector of length equal to the number of functional
predictors. When the \(i\)-th entry is
TRUE, the observed functional predictor \(W_i(s)\) is not used directly as a
covariate; instead, the regression model is built on top of a
functional principal component analysis (FPCA)
sub-model for \(W_i(s)\), and
the FPC scores are sampled jointly with the regression coefficients.
The joint-FPCA option replaces each functional predictor \(W_i(s)\) with a low-rank FPCA
representation \(\sum_{j=1}^{J}\xi_{ij}\,\phi_j(s)\) — fixed
eigenfunctions \(\phi_j\) from
refund::fpca.sc(), subject-specific scores \(\xi_{ij}\) centered on their frequentist
FPCA estimates — and samples those scores jointly with the
regression coefficient \(\beta(\cdot)\), propagating predictor
measurement-error uncertainty into the posterior of \(\beta\) and correcting the
errors-in-variables attenuation that arises when noisy \(W_i\) is plugged into the regression as if
it were observed exactly.
The methodology mirrors Section 4 of Jiang, Crainiceanu, and
Cui (2025), Tutorial on Bayesian Functional Regression Using
Stan, Statistics in Medicine 44(20–22), e70265.
The default option is (joint_FPCA = NULL) which do not
adopt joint modelling.
In standard scalar-on-function or functional Cox regression, the integral \(\int W_i(s)\beta(s)\,ds\) is plugged in as if \(W_i(s)\) were observed without error. In practice, the curves are sampled at finitely many points and are typically corrupted by measurement noise. Treating \(W_i(s)\) as exact ignores this uncertainty and biases the regression coefficient \(\beta(\cdot)\) toward zero (the classical errors-in-variables attenuation), and underestimates the posterior credible-interval width.
The joint FPCA approach replaces \(W_i(s)\) by a low-rank FPCA representation
\[W_i(s) \;=\; \mu(s) \;+\; \sum_{j=1}^{J} \xi_{ij}\,\phi_j(s) \;+\; \epsilon_i(s),\]
and treats the subject-specific scores \(\xi_{ij}\) as parameters
that are sampled together with the regression coefficients. The FPC
eigenfunctions \(\phi_j(s)\) are
estimated once with refund::fpca.sc() and held fixed, and
the initial frequentist FPCA scores \(\hat\xi_{ij}\) are used as the
prior mean for \(\xi_{ij}\). This propagates the FPCA
uncertainty into the posterior of \(\beta(\cdot)\), so the regression credible
bands are correctly inflated.
The model has two layers that share the FPC scores \(\xi_{ij}\):
Throughout this section we assume one functional predictor for clarity; the multi-predictor case is identical with one set of FPCA quantities per functional predictor.
For each subject \(i = 1, \ldots, n\) and each grid point \(s_m\), \(m = 1, \ldots, M\),
\[W_i(s_m) \;=\; \sum_{j=1}^{J} \xi_{ij}\,\phi_j(s_m) \;+\; \epsilon_i(s_m), \qquad \epsilon_i(s_m) \stackrel{\text{iid}}{\sim} N\!\left(0, \sigma_e^2\right),\]
so that
\[\log p(\mathbf{W} \mid \boldsymbol\xi, \boldsymbol\Phi, \sigma_e) \;=\; -\, n\,M\,\log\sigma_e \;-\; \frac{1}{2\sigma_e^2}\sum_{i=1}^{n}\sum_{m=1}^{M}\!\left\{W_i(s_m) - \sum_{j=1}^{J}\xi_{ij}\,\phi_j(s_m)\right\}^2.\]
In Stan code, this corresponds to
target += - N_num * M_num_i * log(sigma_e_i)
- sum((xi_i * Phi_mat_i - M_mat_i)^2) / (2 * sigma_e_i^2);Here M_mat_i is the observed \(n \times M\) matrix of \(W_i(s_m)\) values, Phi_mat_i
is the \(J \times M\) matrix of fixed
eigenfunctions, and xi_i is the \(n \times J\) matrix of FPC score
parameters.
Each FPC score is given an independent Gaussian prior centered on the initial frequentist FPCA score with eigenvalue-determined scale,
\[\xi_{ij} \;\sim\; N\!\left(\hat\xi_{ij},\,\lambda_j^{2}\right), \qquad i = 1,\ldots,n,\;\; j = 1,\ldots,J.\]
In Stan code, the kernel is implemented as
for (nj in 1:J_num_i) {
target += - N_num * log(lambda_i[nj])
- sum((xi_i[, nj] - xi_hat_i[, nj])^2) / (2 * lambda_i[nj]^2);
target += inv_gamma_lpdf(lambda_i[nj]^2 | 0.001, 0.001);
}
target += inv_gamma_lpdf(sigma_e_i^2 | 0.001, 0.001);The eigenvalue scales \(\lambda_j^2\) and the residual scale \(\sigma_e^2\) both receive weakly
informative inverse-Gamma priors. The prior mean \(\hat\xi_{ij}\) is computed once via
refund::fpca.sc() on the observed functional data.
Because \(W_i(s) = \sum_j
\xi_{ij}\,\phi_j(s)\) in the FPCA representation, the functional
integral that drives every regression model in refundBayes
becomes
\[\int_{\mathcal{S}} W_i(s)\,\beta(s)\,ds \;=\; \sum_{j=1}^{J}\xi_{ij}\,\underbrace{\int_{\mathcal{S}} \phi_j(s)\,\beta(s)\,ds}_{\,\equiv\,\alpha_j}.\]
Expanding \(\beta(s)\) in the spline basis \(\beta(s) = \sum_{k=1}^{K} b_k\,\psi_k(s)\), we have
\[\alpha_j \;=\; \sum_{k=1}^{K} b_k \int_{\mathcal{S}} \phi_j(s)\,\psi_k(s)\,ds \;\equiv\; \sum_{k=1}^{K} \mathbf{X}_{jk}\,b_k,\]
where the \(J \times K\) FPCA-spline cross-product matrix \(\mathbf{X}\) is approximated by the Riemann sum
\[\mathbf{X}_{jk} \;\approx\; \sum_{m=1}^{M} L_m\,\phi_j(s_m)\,\psi_k(s_m)\]
using the same integration weights \(L_m\) as in the standard regression formulation. After the spectral reparameterization of \(\mathbf{X}\) via \(\mathbf{S} = \int \boldsymbol\psi''(s)\boldsymbol\psi''(s)^t\,ds\) (the penalty matrix), the design matrix \(\mathbf{X}\) is split into a penalized random-effect block \(\tilde{\mathbf{X}}_r\) (size \(K_r \times J\)) and an unpenalized fixed-effect block \(\tilde{\mathbf{X}}_f\) (size \(K_f \times J\)). The linear predictor for SoFR / FCox is then
\[\eta_i \;=\; \eta_0 \;+\; \mathbf{Z}_i^{t}\boldsymbol\gamma \;+\; \boldsymbol\xi_i^{t}\!\left(\tilde{\mathbf{X}}_r^{t}\,\tilde{\mathbf{b}}_r \;+\; \tilde{\mathbf{X}}_f^{t}\,\tilde{\mathbf{b}}_f\right),\]
with \(\tilde{\mathbf{b}}_r \sim N(\mathbf{0},\sigma_b^2\mathbf{I})\) and \(\tilde{\mathbf{b}}_f\) assigned non-informative priors — exactly the same prior structure used for the regression coefficients in the no-joint-FPCA case. This is the chunk in the Stan code:
For the function-on-function regression model the integration over \(s\) is combined with a response-domain expansion in the basis \(\boldsymbol\psi(t)\) of size \(K_{\text{resp}}\), so that the bivariate coefficient \(\beta(s,t) = \sum_{k=1}^{K_{\text{resp}}}\beta_{k}(s)\,\psi_k(t)\) leads to
\[\int W_i(s)\,\beta(s,t)\,ds \;=\; \boldsymbol\xi_i^{t}\!\left(\tilde{\mathbf{X}}_r^{t}\,\mathbf{B}_r \;+\; \tilde{\mathbf{X}}_f^{t}\,\mathbf{B}_f\right)\boldsymbol\psi(t),\]
where now \(\mathbf{B}_r\) is a \(K_r \times K_{\text{resp}}\) matrix and \(\mathbf{B}_f\) is a \(K_f \times K_{\text{resp}}\) matrix of bivariate coefficients. The resulting Stan model contribution is
This is the only structural difference relative to the joint-FPCA SoFR / FCox model: the spline coefficients become matrices indexed by both the predictor-domain and the response-domain bases, and the same row-wise response-domain penalty as in the standard FoFR is applied.
The complete joint model (one functional predictor) reads
\[\begin{cases} \textbf{Outcome regression}\colon\quad \eta_i = \eta_0 + \mathbf{Z}_i^{t}\boldsymbol\gamma + \boldsymbol\xi_i^{t}\!\left(\tilde{\mathbf{X}}_r^{t}\,\tilde{\mathbf{b}}_r + \tilde{\mathbf{X}}_f^{t}\,\tilde{\mathbf{b}}_f\right) \\[6pt] \textbf{FPCA likelihood}\colon\quad W_i(s_m) \mid \boldsymbol\xi_i, \boldsymbol\Phi, \sigma_e \sim N\!\left(\sum_j \xi_{ij}\phi_j(s_m),\,\sigma_e^2\right) \\[4pt] \textbf{FPC score prior}\colon\quad \xi_{ij} \sim N(\hat\xi_{ij},\,\lambda_j^{2}) \\[4pt] \textbf{Hyperpriors}\colon\quad \tilde{\mathbf{b}}_r \sim N(\mathbf{0},\sigma_b^2\mathbf{I}),\;\; \sigma_b^2,\,\sigma_e^2,\,\lambda_j^{2} \sim \text{Inv-Gamma}(0.001, 0.001) \\[4pt] \textbf{Outcome distribution}\colon\quad Y_i \sim p(Y_i \mid \eta_i)\quad\text{(Gaussian / binomial / Cox / functional)} \end{cases}\]
The four building blocks (outcome regression, FPCA likelihood, score
prior, hyperpriors) are independent of the family; only the
outcome distribution in the last line changes. This is precisely why the
joint FPCA option is available across SoFR, FCox, and FoFR with the same
joint_FPCA argument.
The prior is centered on the initial frequentist FPCA score \(\hat\xi_{ij}\) returned by
refund::fpca.sc(), not on \(0\). This corresponds to using the standard
FPCA fit as a working informative prior. As \(\lambda_j^{2}\) becomes large the prior
reverts to a diffuse Gaussian and the joint model reduces (in spirit) to
plugging in the sample FPC scores; for moderate \(\lambda_j^{2}\) the posterior trades off
the FPCA fit and the regression likelihood. The observed-data matrix
\(W_i(s_m)\) is passed to Stan in its
original (uncentered) form, exactly as in Section 4 of the Tutorial
supplement.
The joint FPCA design matrix \(\mathbf{X}_{jk}\) is built with the same Riemann-sum integration weights as the no-joint-FPCA design matrix. Specifically, the convention is:
s(tindex, by = lmat * wmat, ...), the
rightmost variable in the product (here
wmat) is treated as the observed functional data \(W_i(\cdot)\);lmat) is
treated as the Riemann-sum integration weight matrix.This matches the Tutorial supplementary code and means that you do
not need to change your formulas when turning joint FPCA on or off; only
the joint_FPCA argument changes.
Every regression function in refundBayes accepts the
same argument:
The default is joint_FPCA = NULL, which is treated as
rep(FALSE, n_func) and gives exactly the no-joint-FPCA
behavior of the respective regression vignettes. Below we walk through
one example for each of sofr_bayes(),
fcox_bayes(), and fofr_bayes().
sofr_bayes()We reuse the simulation setup from the SoFR vignette and switch on joint FPCA for the (single) functional predictor.
set.seed(123)
n <- 100
M <- 50
tgrid <- seq(0, 1, length.out = M)
dt <- tgrid[2] - tgrid[1]
tmat <- matrix(rep(tgrid, each = n), nrow = n)
lmat <- matrix(dt, nrow = n, ncol = M)
# Smooth latent trajectory + measurement noise on the functional predictor
D_true <- t(apply(matrix(rnorm(n * M), n, M), 1, cumsum)) / sqrt(M)
wmat <- D_true + matrix(rnorm(n * M, sd = 0.3), nrow = n) ## noisy
beta_true <- sin(2 * pi * tgrid)
X1 <- rnorm(n)
eta <- 0.5 * X1 + D_true %*% (beta_true * dt) ## regression on D, not W
prob <- plogis(eta)
y <- rbinom(n, 1, prob)
data.SoFR <- data.frame(y = y, X1 = X1)
data.SoFR$tmat <- tmat
data.SoFR$lmat <- lmat
data.SoFR$wmat <- wmatNotice that the outcome is generated from the latent trajectory \(D_i(s)\) but only the noisy \(W_i(s) = D_i(s) + \epsilon_i(s)\) is observed. This is exactly the setting that motivates joint FPCA.
library(refundBayes)
fit_sofr_joint <- sofr_bayes(
formula = y ~ X1 + s(tmat, by = lmat * wmat, bs = "cc", k = 10),
data = data.SoFR,
family = binomial(),
joint_FPCA = c(TRUE), ## << turn joint FPCA on for the (only) functional term
niter = 1500,
nwarmup = 500,
nchain = 3,
ncores = 3
)For comparison, the no-joint-FPCA fit uses the same
call without the joint_FPCA argument:
fit_sofr_plain <- sofr_bayes(
formula = y ~ X1 + s(tmat, by = lmat * wmat, bs = "cc", k = 10),
data = data.SoFR,
family = binomial(),
niter = 1500, nwarmup = 500, nchain = 3, ncores = 3
)The two posterior estimates of \(\beta(s)\) can be plotted together:
Joint-FPCA credible bands are typically wider in the regions where the measurement noise on the functional predictor is informative.
When joint_FPCA = TRUE for the \(i\)-th functional predictor, the Stan fit
additionally exposes:
| Stan parameter | Meaning |
|---|---|
xi_i |
\(n \times J\) matrix of joint FPC scores |
lambda_i |
\(J\)-vector of FPC eigenvalue SDs |
sigma_e_i |
scalar, FPC residual SD |
These can be inspected with rstan::extract() for further
analysis (for example, plotting the posterior of the FPC scores or the
eigenvalue SDs).
fcox_bayes()The functional Cox setup is identical to the SoFR setup, with one
extra piece (the censoring vector). The joint FPCA model is wired in by
setting joint_FPCA = c(TRUE) on a model that has one
functional predictor:
library(refundBayes)
## Use the example dataset shipped with the FCox vignette
Func_Cox_Data <- readRDS("data/example_data_Cox.rds")
Func_Cox_Data$wmat <- Func_Cox_Data$MIMS
Func_Cox_Data$cens <- 1 - Func_Cox_Data$event
fit_cox_joint <- fcox_bayes(
formula = survtime ~ X1 + s(tmat, by = lmat * wmat, bs = "cc", k = 10),
data = Func_Cox_Data,
cens = Func_Cox_Data$cens,
joint_FPCA = c(TRUE),
niter = 5000,
nwarmup = 2000,
nchain = 1,
ncores = 1
)The Stan code generated under the hood matches Section 4 of the Tutorial supplement: the linear predictor
\[\eta_i \;=\; \mathbf{Z}_i^{t}\boldsymbol\gamma + \boldsymbol\xi_i^{t}\!\left(\tilde{\mathbf{X}}_r^{t}\tilde{\mathbf{b}}_r + \tilde{\mathbf{X}}_f^{t}\tilde{\mathbf{b}}_f\right)\]
is plugged into the Cox log-likelihood
\[\ell(\mathbf{Y},\boldsymbol\delta) \;=\; \sum_{i=1}^{n}\left[(1-\delta_i)\!\left\{\log h_0(Y_i) + \eta_i - H_0(Y_i)e^{\eta_i}\right\} + \delta_i\!\left\{-H_0(Y_i)e^{\eta_i}\right\}\right],\]
with the same M-spline / I-spline baseline-hazard construction as in the no-joint case. The FPCA likelihood and the FPC-score prior are appended to the joint Stan target function exactly as described in The Joint Bayesian Model We Adopted above.
fcox_code <- fcox_bayes(
formula = survtime ~ X1 + s(tmat, by = lmat * wmat, bs = "cc", k = 10),
data = Func_Cox_Data,
cens = Func_Cox_Data$cens,
joint_FPCA = c(TRUE),
runStan = FALSE
)
cat(fcox_code$stancode)You will see the FPCA-related declarations in the data
block (Phi_mat_1, xi_hat_1,
M_mat_1, J_num_1, M_num_1,
X_mat_r_1, X_mat_f_1), the FPCA-related
parameters in the parameters block (xi_1,
lambda_1, sigma_e_1), and the FPCA
log-likelihood + score prior contributions in the model
block.
fofr_bayes()For function-on-function regression, joint FPCA on the functional predictor is constructed in exactly the same way — only the contribution to \(\boldsymbol\mu_i(t)\) now carries the response-domain basis \(\boldsymbol\Psi\):
\[\int W_i(s)\,\beta(s,t)\,ds \;=\; \boldsymbol\xi_i^{t}\!\left(\tilde{\mathbf{X}}_r^{t}\,\mathbf{B}_r + \tilde{\mathbf{X}}_f^{t}\,\mathbf{B}_f\right)\boldsymbol\psi(t),\]
with \(\mathbf{B}_r,\mathbf{B}_f\) as matrices of size \(K_r \times K_{\text{resp}}\) and \(K_f \times K_{\text{resp}}\) respectively. Both directions of smoothness (\(s\) via the random-effect reparameterization and \(t\) via the response-domain penalty) are imposed exactly as in the no-joint FoFR model.
library(refundBayes)
set.seed(42)
n <- 200
L <- 30
M <- 30
sindex <- seq(0, 1, length.out = L)
tindex <- seq(0, 1, length.out = M)
# Smooth latent functional predictor + measurement noise
D_true <- matrix(0, nrow = n, ncol = L)
for (i in 1:n) {
D_true[i, ] <- rnorm(1) * sin(2 * pi * sindex) +
rnorm(1) * cos(2 * pi * sindex) +
rnorm(1) * sin(4 * pi * sindex)
}
X_func <- D_true + matrix(rnorm(n * L, sd = 0.3), nrow = n) ## noisy
age <- rnorm(n)
# True coefficient functions
beta_true <- outer(sin(2 * pi * sindex), cos(2 * pi * tindex))
alpha_true <- 0.5 * sin(pi * tindex)
# Generate response from the latent D_true (not from X_func!)
signal_scalar <- outer(age, alpha_true)
signal_func <- (D_true %*% beta_true) / L
epsilon <- matrix(rnorm(n * M, sd = 0.3), nrow = n)
Y_mat <- signal_scalar + signal_func + epsilon
dat <- data.frame(age = age)
dat$Y_mat <- Y_mat
dat$X_func <- X_func
dat$sindex <- matrix(rep(sindex, n), nrow = n, byrow = TRUE)fit_fofr_joint <- fofr_bayes(
formula = Y_mat ~ age + s(sindex, by = X_func, bs = "cr", k = 10),
data = dat,
joint_FPCA = c(TRUE),
spline_type = "bs",
spline_df = 10,
niter = 2000,
nwarmup = 1000,
nchain = 3,
ncores = 3
)The joint FPCA contributes the additional Stan parameters
xi_1 (\(n \times J\)),
lambda_1 (\(J\)-vector),
and sigma_e_1 (scalar), and adds the FPCA log-likelihood
and prior to the model block; the bivariate coefficient \(\hat\beta(s,t)\) is reconstructed from
\(\mathbf{B}_r, \mathbf{B}_f\) exactly
as in the no-joint FoFR case.
beta_est <- apply(fit_fofr_joint$bivar_func_coef[[1]], c(2, 3), mean)
par(mfrow = c(1, 2), mar = c(4, 4, 2, 1))
image(sindex, tindex, beta_true,
xlab = "s (predictor domain)", ylab = "t (response domain)",
main = expression("True " * beta(s,t)),
col = hcl.colors(64, "Blue-Red 3"))
image(sindex, tindex, beta_est,
xlab = "s (predictor domain)", ylab = "t (response domain)",
main = expression("Joint-FPCA " * hat(beta)(s,t)),
col = hcl.colors(64, "Blue-Red 3"))When to use joint FPCA. Joint FPCA is most useful when the observed functional predictor \(W_i(s)\) contains substantial measurement noise relative to the signal in the regression, or when the curves are sparse / irregularly observed. For dense, low-noise functional data the difference is small.
Number of FPC components \(J\). Controlled by the default
variance-explained criterion in refund::fpca.sc(). A value
such that cumulative variance explained exceeds 95 % is a reasonable
default. A larger \(J\) increases the
number of joint FPC score parameters (\(n
\times J\)) and may slow down sampling.
Convergence. Joint FPCA introduces \(n \times J\) extra latent parameters.
Multiple chains and the standard rstan diagnostics
(traceplots, \(\hat R\), bulk / tail
ESS) are recommended:
Multiple functional predictors. The argument is
a vector with one entry per s(...) functional term. Setting
some entries to TRUE and others to FALSE is
supported: each functional predictor is treated independently.
Prior strength. The default
Inv-Gamma(0.001, 0.001) priors on \(\lambda_j^{2}\) and \(\sigma_e^{2}\) are weakly informative. If
the posterior of \(\xi_i\) is shrunk
too hard toward \(\hat\xi_i\), consider
loosening the prior by editing the Stan code (set
runStan = FALSE, modify the
inv_gamma_lpdf(...) lines, and pass the resulting program
to rstan::stan() directly).