---
title: "KF-PLS: Streaming PLS with Kalman-style updates"
shorttitle: "KF-PLS: Streaming PLS with Kalman-style updates"
author:
- name: "Frédéric Bertrand"
  affiliation:
  - Cedric, Cnam, Paris
  email: frederic.bertrand@lecnam.net
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{KF-PLS: Streaming PLS with Kalman-style updates}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup_ops, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "figures/kf-pls-",
  fig.width = 7,
  fig.height = 5,
  dpi = 150,
  message = FALSE,
  warning = FALSE
)

LOCAL <- identical(Sys.getenv("LOCAL"), "TRUE")
set.seed(2025)
```


```r
library(bigPLSR)
set.seed(1)
```

## Idea

We maintain exponentially-weighted cross-products
\[
\mathbf{C}_{xx} \leftarrow \lambda\,\mathbf{C}_{xx} + \mathbf{X}_b^\top\mathbf{X}_b + q\,\mathbf{I},\qquad
\mathbf{C}_{xy} \leftarrow \lambda\,\mathbf{C}_{xy} + \mathbf{X}_b^\top\mathbf{Y}_b,
\]
over mini-batches \(b\) of rows, where \(0<\lambda\le 1\) is a forgetting factor and \(q\ge 0\) a small process-noise ridge.
At any time we extract latent components via **SIMPLS** on \((\mathbf{C}_{xx},\mathbf{C}_{xy})\).
This is stable, fast, and matches a Kalman-style tracking of slowly varying covariance structure.

## API

```r
fit <- pls_fit(X, Y, ncomp = 3,
               backend   = "arma"  # or "bigmem"
               ,algorithm = "kf_pls",
               scores    = "r",
               tol = 1e-8,
               # tuning:
               # options(bigPLSR.kf.lambda = 0.995,
               #         bigPLSR.kf.q_proc = 1e-6)
)
```

On **bigmem**, cross-products are streamed in row chunks; scores \( \mathbf{T} \) are produced via the package's chunked score kernel.

## Notes

- \(\lambda\to 1\) and \(q\to 0\) recovers batch SIMPLS.
- Smaller \(\lambda\) emphasizes recent batches (concept drift).
- \(q\) stabilizes ill-conditioned \( \mathbf{C}_{xx} \) on very high-dimensional data.