---
title: "Streaming Kernel PLS in bigPLSR: XX^T and Column-Chunked Variants"
shorttitle: "Streaming Kernel PLS in bigPLSR: XX^T and Column-Chunked Variants"
author:
- name: "Frédéric Bertrand"
  affiliation:
  - Cedric, Cnam, Paris
  email: frederic.bertrand@lecnam.net
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{Streaming Kernel PLS in bigPLSR: XX^T and Column-Chunked Variants}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup_ops, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "figures/kpls-streaming-",
  fig.width = 7,
  fig.height = 5,
  dpi = 150,
  message = FALSE,
  warning = FALSE
)

LOCAL <- identical(Sys.getenv("LOCAL"), "TRUE")
set.seed(2025)
```


## Overview

This vignette documents bigPLSR's **kernel PLS** streaming backends for
`bigmemory::big.matrix` inputs. We provide two complementary streaming strategies:

- **Column-chunked Gram** (existing): updates based on per-column blocks to form
  products involving K = X X^T implicitly.
- **Row-chunked XX^T** (new): computes a = X^T u by scanning rows in blocks, then
  emits t = X a, enabling efficient access patterns when n >> p or when the
  storage layout favors row-contiguous slices (e.g., file-backed subsets).

Both strategies produce the same model up to floating point round-off. Selection is
automatic (see `?pls_fit`) or can be forced via the option
`options(bigPLSR.kpls_gram = "rows" | "cols" | "auto")`.

## Math sketch

Let X in R^{n x p}, Y in R^{n x m} be centered.

At component h, kernel-PLS uses the NIPALS-like fixed-point update

1. Start with u in R^n (e.g., a column of Y).
2. Compute a = X^T u.  
3. Normalize w = a / ||a||_2.
4. Scores: t = X w.
5. Loadings:
   - p = (X^T t)/(t^T t),
   - q = (Y^T t)/(t^T t).
6. Deflate: X <- X - t p^T, Y <- Y - t q^T,
   and set u <- Y q.

Coefficients after H components are

beta = W (P^T W)^{-1} Q^T,   

yhat = 1 * mu_Y + (x - mu_X) beta.

The **row-chunked** implementation keeps X on disk and performs steps (2) and (4)
with two passes over row blocks:

- **Pass A (accumulate a)**: for each block B of rows, update a += B^T u_B.
- **Pass B (emit t)**: for each block B, write t_B = B * a.

Loadings p are accumulated precisely like Pass A but with t instead of u.

## APIs

- C++ entry points (Rcpp):
  - `cpp_kpls_stream_xxt(X_ptr, Y_ptr, ncomp, chunk_rows, chunk_cols, center, return_big)`
  - `cpp_kpls_stream_cols(X_ptr, Y_ptr, ncomp, chunk_cols, center, return_big)`
- R wrapper:
  - `pls_fit(..., backend = "bigmem", algorithm = "kernelpls", chunk_size, chunk_cols, ...)`

`pls_fit()` chooses the variant via `options(bigPLSR.kpls_gram)` or heuristics when
`"auto"` is set (the default).

## When to prefer each variant

- **Column-chunked ("cols")**: good default; excellent when p is large and
  access by columns is cheap (typical bigmemory column-major backing).
- **Row-chunked XX^T ("rows")**: prefer when n >> p, when row access is
  contiguous (e.g., file-backed partitions), or when you want to minimize repeated
  column-touching across iterations.

## References

- Dayal, B., & MacGregor, J.F. (1997). Improved PLS algorithms. *Journal of
  Chemometrics*, 11(1), 73–85.
- Rosipal, R., & Trejo, L.J. (2001). Kernel Partial Least Squares Regression
  in Reproducing Kernel Hilbert Space. *JMLR*, 2, 97–123.
- (and other kernel/logistic/sparse KPLS references in the `kpls_review` vignette)