---
title: "Getting Started with NNS: Normalization and Rescaling"
author: "Fred Viole"
output: html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with NNS: 04. Normalization and Rescaling}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5)
suppressPackageStartupMessages(library(NNS))
data.table::setDTthreads(1L)
options(mc.cores = 1)
RcppParallel::setThreadOptions(numThreads = 1)
Sys.setenv("OMP_THREAD_LIMIT" = 1)
```

```{r install,message=FALSE,warning = FALSE}
library(NNS)
library(data.table)
require(knitr)
require(rgl)
```

## Overview

This vignette covers two related tools:

- `NNS.norm()` for cross‑variable normalization when comparing multiple series.
- `NNS.rescale()` for single‑vector rescaling with either min‑max or risk‑neutral targets.

Both functions perform deterministic affine transformations that preserve rank structure while modifying scale.

---

# `NNS.norm()`: Normalize Multiple Variables

`NNS.norm()` rescales variables to a common magnitude while preserving distributional structure. The method can be **linear** (all variables forced to have the same mean) or **nonlinear** (using dependence weights to produce a more nuanced scaling). In the nonlinear case, the degree of association between variables influences the final normalized values.



## Mathematical Structure

Let \(X\) be an \(n \times p\) matrix of variables.

### Step 1: Compute Mean Vector

\[
m_j = \text{mean}(X_{\cdot j})
\]

If any \(m_j = 0\), it is replaced with \(10^{-10}\) to prevent division by zero.

---

### Step 2: Construct Mean Ratio Matrix

\[
RG_{ij} = \frac{m_i}{m_j}
\]

In R this corresponds to:

```r
RG <- outer(m, 1 / m)
```

---

### Step 3: Dependence Weight Matrix

If `linear = FALSE`:

- If number of variables \(p < 10\):
  \[
  W = |\mathrm{cor}(X)|
  \]
- Otherwise:
  \[
  W = |D| \quad \text{where } D = \text{NNS.dep}(X)\$Dependence
  \]
  `NNS.dep()` returns a symmetric matrix of nonlinear dependence measures.

If `linear = TRUE`, the weighting effectively becomes:

\[
W_{ij} = 1
\]

---

### Step 4: Scaling Factors

\[
s_j = \frac{1}{p} \sum_{i=1}^{p} RG_{ij} W_{ij}
\]

Each column is scaled:

\[
X_{\cdot j}^{*} = s_j X_{\cdot j}
\]

---

## Linear Case Proof

If \(W_{ij} = 1\):

\[
s_j = \frac{1}{p} \sum_{i=1}^{p} \frac{m_i}{m_j}
= \frac{\bar{m}}{m_j}
\]

Then:

\[
\text{mean}(X_{\cdot j}^{*}) = s_j m_j = \bar{m}
\]

All variables share the same mean.

---

## Nonlinear Case Interpretation

\[
\text{mean}(X_{\cdot j}^{*})
=
\frac{1}{p}
\sum_{i=1}^{p}
m_i W_{ij}
\]

Thus, the normalized mean becomes a dependence‑weighted average of original means. Variables more strongly dependent with higher‑mean variables scale upward more.

---

## Examples

### Basic Multivariate Example

This holds for any distribution type and can be applied to vectors of different lengths.

```{r basic-example, eval=FALSE}
set.seed(123)

A <- rnorm(100, mean = 0, sd = 1)
B <- rnorm(100, mean = 0, sd = 5)
C <- rnorm(100, mean = 10, sd = 1)
D <- rnorm(100, mean = 10, sd = 10)

X <- data.frame(A, B, C, D)

# Linear scaling
lin_norm <- NNS.norm(X, linear = TRUE, chart.type = NULL)
head(lin_norm)
     A Normalized B Normalized C Normalized D Normalized
[1,]   -29.929719    31.889828     5.819152    1.4264014
[2,]   -12.291609   -11.531393     5.396317    1.2388239
[3,]    83.235911    11.073887     4.643781    0.3078703
[4,]     3.765188    15.601030     5.029380   -0.2630481
[5,]     6.904039    42.717726     4.572611    2.8193657
[6,]    91.585447     2.021274     4.543080    6.6681079

# Verify means are equal
apply(lin_norm, 2, function(x) c(mean = mean(x), sd = sd(x)))

     A Normalized B Normalized C Normalized D Normalized
mean     4.827727     4.827727    4.8277270     4.827727
sd      48.744888    43.407590    0.4531172     5.203436
```


Now compare with **nonlinear scaling**:

```{r nonlinear-example, eval=FALSE}
nonlin_norm <- NNS.norm(X, linear = FALSE, chart.type = NULL)
head(nonlin_norm)
     A Normalized B Normalized C Normalized D Normalized
[1,]   -2.7834653   0.32807768     3.178568    0.7439872
[2,]   -1.1431202  -0.11863321     2.947605    0.6461499
[3,]    7.7409438   0.11392645     2.536550    0.1605800
[4,]    0.3501627   0.16050101     2.747174   -0.1372015
[5,]    0.6420759   0.43947344     2.497676    1.4705341
[6,]    8.5174510   0.02079456     2.481545    3.4779738

apply(nonlin_norm, 2, function(x) c(mean = mean(x), sd = sd(x)))

     A Normalized B Normalized C Normalized D Normalized
mean    0.4489788   0.04966692     2.637026     2.518062
sd      4.5332769   0.44657066     0.247504     2.714025
```

Note that the means differ and the standard deviations are smaller than in the linear case, reflecting the dependence structure.


#### Normalize list of unequal vector lengths
```{r unequal, eval = FALSE}
set.seed(123)
vec1 <- rnorm(n = 10, mean = 0, sd = 1)
vec2 <- rnorm(n = 5, mean = 5, sd = 5)
vec3 <- rnorm(n = 8, mean = 10, sd = 10)

vec_list <- list(vec1, vec2, vec3)

NNS.norm(vec_list)

$`x_1 Normalized`
 [1]  13.074058  -3.004912 -11.745878  25.406891  -4.647966  -5.481229   6.225165   5.920719   6.113733   9.640242

$`x_2 Normalized`
[1]  2.875960212  0.008876158  1.230826150  5.855582361 10.779166523

$`x_3 Normalized`
[1]  4.0749062  2.2395840  0.4067264  0.7457562 15.6445780  5.1941416  2.3326665  2.5622994
```


---

### Quantile Normalization Comparison

Quantile normalization forces distributions to be identical. This is literally the opposite intended effect of `NNS.norm`, which preserves individual distribution shapes while aligning ranges.  The quantile normalized series become identical in distribution, while the `NNS` methods retain the original patterns.

---


## Practical Applications

Normalization eliminates the need for multiple y‑axis charts and prevents their misuse. By placing variables on the same axes with shared ranges, we enable more relevant conditional probability analyses. This technique, combined with time normalization, is used in `NNS.caus()` to identify causal relationships between variables.

---

# `NNS.rescale()`: Distribution Rescaling

`NNS.rescale()` performs one‑dimensional affine transformations.

Function signature:

```
NNS.rescale(x, a, b, method = "minmax", T = NULL, type = "Terminal")
```

---

## 1) Min-Max Scaling

If `method = "minmax"`:

\[
x^{*}
=
a
+
(b - a)
\frac{x - \min(x)}
{\max(x) - \min(x)}
\]

Properties:

- Preserves order
- Maps support to \([a,b]\)
- Linear transformation

---

### Example

```{r rescale-minmax}
raw_vals <- c(-2.5, 0.2, 1.1, 3.7, 5.0)

scaled_minmax <- NNS.rescale(
  x = raw_vals,
  a = 5,
  b = 10,
  method = "minmax",
  T = NULL,
  type = "Terminal"
)

cbind(raw_vals, scaled_minmax)
range(scaled_minmax)
```

---

## 2) Risk-Neutral Scaling

If `method = "riskneutral"`:

Let:

- \( S_0 = a \)
- \( r = b \)
- \( T \) = time horizon

### Terminal Type

Target:

\[
\mathbb{E}[S_T] = S_0 e^{rT}
\]

Transformation form:

\[
x^{*}
=
x
\cdot
\frac{S_0 e^{rT}}
{\text{mean}(x)}
\]

This enforces the required expectation.

---

### Discounted Type

Target:

\[
\mathbb{E}[e^{-rT} S_T] = S_0
\]

Equivalent to:

\[
\mathbb{E}[S_T] = S_0 e^{rT}
\]

but the returned series is scaled so that its discounted mean equals \(S_0\). In practice, the function applies the same multiplicative factor as above, because:

\[
\text{mean}(e^{-rT} x^{*}) = e^{-rT} \cdot \text{mean}(x^{*}) = e^{-rT} \cdot S_0 e^{rT} = S_0.
\]

---

## Risk-Neutral Example

```{r rescale-riskneutral, eval=FALSE}
set.seed(123)
S0 <- 100
r <- 0.05
T <- 1

# Simulate a price path
prices <- S0 * exp(cumsum(rnorm(250, 0.0005, 0.02)))

rn_terminal <- NNS.rescale(
  x = prices,
  a = S0,
  b = r,
  method = "riskneutral",
  T = T,
  type = "Terminal"
)

c(
  mean_original = mean(prices),
  mean_rescaled = mean(rn_terminal),
  target = S0 * exp(r * T)
)

mean_original mean_rescaled        target 
     109.7019      105.1271      105.1271 
```

---

## Discounted Example

```{r rescale-discounted, eval=FALSE}
rn_discounted <- NNS.rescale(
  x = prices,
  a = S0,
  b = r,
  method = "riskneutral",
  T = T,
  type = "Discounted"
)

c(
  mean_rescaled = mean(rn_discounted),
  target_discounted_mean = S0
)

         mean_rescaled target_discounted_mean 
                   100                    100 
```

---

# Conceptual Summary

### `NNS.norm()`

- Multivariate
- Dependence‑aware scaling
- Equalizes means only in linear mode
- Preserves shape and order

### `NNS.rescale()`

- Univariate
- Affine transformation
- Either range‑targeted or expectation‑targeted
- Preserves rank structure

Both functions maintain monotonicity and are therefore compatible with NNS copula and dependence modeling frameworks.


```{r image}
set.seed(123)

x <- rnorm(1000, 5, 2)
y <- rgamma(1000, 3, 1)

# Combine variables
X <- cbind(x, y)

# NNS normalization
X_norm_lin <- NNS.norm(X, linear = TRUE)
X_norm_nonlin <- NNS.norm(X, linear = FALSE)

# Standard min-max normalization
minmax <- function(v) (v - min(v)) / (max(v) - min(v))
X_minmax <- apply(X, 2, minmax)
```

```{r plotting, echo=FALSE}
par(mfrow = c(2,2))

steelblue_alpha <- rgb(1,0,0,0.4)
red_alpha <- rgb(0,0,1,0.4)

# Breaks for original data
br_orig <- pretty(range(c(x, y)), n = 15)

# Original variables
hist(x,
     col = steelblue_alpha,
     breaks = br_orig,
     main = "Original Variables",
     xlab = "")

hist(y,
     col = red_alpha,
     breaks = br_orig,
     add = TRUE)


# Breaks for NNS normalized variables
br_norm <- pretty(range(c(X_norm_lin[,1], X_norm_lin[,2])), n = 15)

# NNS normalized
hist(X_norm_lin[,1],
     col = steelblue_alpha,
     breaks = br_norm,
     main = "NNS.norm(..., Linear=TRUE)",
     xlab = "")

hist(X_norm_lin[,2],
     col = red_alpha,
     breaks = br_norm,
     add = TRUE)

# Breaks for NNS normalized variables
br_norm <- pretty(range(c(X_norm_nonlin[,1], X_norm_nonlin[,2])), n = 15)

# NNS normalized
hist(X_norm_nonlin[,1],
     col = steelblue_alpha,
     breaks = br_norm,
     main = "NNS.norm(..., Linear=FALSE)",
     xlab = "")

hist(X_norm_nonlin[,2],
     col = red_alpha,
     breaks = br_norm,
     add = TRUE)

# Breaks for min-max normalized variables
br_minmax <- pretty(range(c(X_minmax[,1], X_minmax[,2])), n = 15)

# Standard min-max normalization
hist(X_minmax[,1],
     col = steelblue_alpha,
     breaks = br_minmax,
     main = "Standard Min-Max",
     xlab = "")

hist(X_minmax[,2],
     col = red_alpha,
     breaks = br_minmax,
     add = TRUE)
```

---

# References

If the user is so motivated, detailed arguments further examples are provided within the following:

-   [Nonlinear Nonparametric Statistics: Using Partial Moments](https://ovvo-financial.github.io/NNS/book/)

-   [Nonlinear Scaling Normalization with NNS](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/Normalization.pdf)

-   [Distributional Equivalence in GBM: Outcome Transformation for Efficient Risk-Neutral Pricing](https://doi.org/10.2139/ssrn.5742907)
