---
title: 'ClusteredNeuroVec: Parcel-based 4D Analysis'
output: rmarkdown::html_vignette
vignette: |
  %\VignetteIndexEntry{ClusteredNeuroVec: Parcel-based 4D Analysis}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
params:
  family: red
  preset: homage
css: albers.css
resource_files:
- albers.css
- albers.js
includes:
  in_header: |-
    <script src="albers.js"></script>
    <script>document.addEventListener('DOMContentLoaded',function(){document.body.classList.add('palette-red');});</script>

---

```{r setup, include=FALSE}
if (requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("albersdown", quietly = TRUE)) ggplot2::theme_set(albersdown::theme_albers(family = params$family, preset = params$preset))
if (requireNamespace("ggplot2", quietly = TRUE)) ggplot2::theme_set(neuroim2::theme_neuro(base_family = params$family))
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(neuroim2)
set.seed(123)
```

## Overview

`ClusteredNeuroVec` provides an efficient representation for parcellated 4D neuroimaging data where voxels are grouped into clusters or parcels. Instead of storing a time-series for every voxel, it stores one time-series per cluster, making it ideal for analyses using brain atlases like Schaefer-Yeo parcellations.

## Why ClusteredNeuroVec?

Traditional neuroimaging analyses often involve:
- Reducing voxel-level data to parcel/ROI averages
- Working with brain atlases that group voxels into regions
- Performing searchlight analyses at the parcel level rather than voxel level

`ClusteredNeuroVec` makes these workflows more efficient while maintaining compatibility with standard `NeuroVec` operations.

## Creating a ClusteredNeuroVec

### From scratch with synthetic data

```{r basic-creation}
# Create a simple 3D space with mask
space <- NeuroSpace(c(10, 10, 10), spacing = c(2, 2, 2))
mask_data <- array(TRUE, c(10, 10, 10))
mask_data[1:3, 1:3, 1:3] <- FALSE  # exclude corner
mask <- LogicalNeuroVol(mask_data, space)

# Create cluster assignments (e.g., 5 random clusters)
n_masked <- sum(mask_data)
cluster_ids <- sample(1:5, n_masked, replace = TRUE)
cvol <- ClusteredNeuroVol(mask, cluster_ids)

# Create synthetic 4D data
vec_space <- NeuroSpace(c(10, 10, 10, 20), spacing = c(2, 2, 2))
vec_data <- array(rnorm(10 * 10 * 10 * 20), dim = c(10, 10, 10, 20))
vec <- NeuroVec(vec_data, vec_space)

# Create ClusteredNeuroVec
cv <- ClusteredNeuroVec(vec, cvol)
print(cv)
```

### Key properties

```{r properties}
# Dimensions: still 4D (x, y, z, time)
dim(cv)

# Number of clusters
num_clusters(cv)

# Access cluster time-series matrix (T x K)
ts_matrix <- as.matrix(cv, by = "cluster")
dim(ts_matrix)  # 20 time points x 5 clusters
```

## Array-like access

`ClusteredNeuroVec` behaves like a regular 4D array:

```{r array-access}
# Extract 3D volume at time point 1
vol_t1 <- cv[,,,1]
dim(vol_t1)  # 10 x 10 x 10

# All voxels in the same cluster have the same value
# (they share the cluster's mean time-series)

# Get time-series at a specific voxel
ts <- series(cv, 5, 5, 5)
length(ts)  # 20 time points
```

## Cluster searchlight analysis

Perform searchlight analysis at the cluster level using centroid distances:

```{r searchlight}
# K-nearest neighbor searchlight (3 nearest clusters)
windows_knn <- cluster_searchlight_series(cv, k = 3)
length(windows_knn)  # One window per cluster

# Look at first window
win1 <- windows_knn[[1]]
dim(values(win1))  # 20 time points x 3 neighbors

# Radius-based searchlight (e.g., 15mm radius)
windows_radius <- cluster_searchlight_series(cv, radius = 15)
```

## Real-world example: Schaefer parcellation

```{r real-world, eval=FALSE}
# Load fMRI data
fmri_data <- read_vec("subject01_task.nii.gz")

# Load Schaefer atlas (example with 400 parcels)
atlas <- read_vol("Schaefer2018_400Parcels_7Networks.nii.gz")
mask <- atlas > 0

# Create ClusteredNeuroVol from atlas
cvol <- ClusteredNeuroVol(mask, as.integer(atlas[mask]))

# Create parcellated representation
cv <- ClusteredNeuroVec(fmri_data, cvol)

# Now you have 400 time-series (one per parcel) instead of ~200,000 voxels
parcels <- as.matrix(cv, by = "cluster")
dim(parcels)  # T x 400

# Perform connectivity analysis at parcel level
cor_matrix <- cor(parcels)
dim(cor_matrix)  # 400 x 400
```

## Integration with existing workflows

`ClusteredNeuroVec` integrates seamlessly with existing neuroim2 functions:

```{r integration}
# Use with split_reduce for custom aggregation
# (ClusteredNeuroVec already uses this internally)

# Scale time-series within each cluster
# (if scale_series is implemented for ClusteredNeuroVec)
# cv_scaled <- scale_series(cv, center = TRUE, scale = TRUE)

# Get cluster centroids for visualization
centers <- centroids(cv)
head(centers)  # x, y, z coordinates
```

## Performance benefits

By storing only K time-series instead of N voxels:
- Memory usage: O(K × T) instead of O(N × T)
- Searchlight operations: O(K²) instead of O(N²)
- Typical reduction: 100-1000x fewer time-series

For a typical fMRI dataset:
- Voxel-level: ~200,000 voxels × 500 timepoints = 100M values
- Parcel-level: 400 parcels × 500 timepoints = 200K values

## Summary

`ClusteredNeuroVec` provides:
- Efficient storage for parcellated 4D data
- Full array-like access semantics
- Cluster-aware searchlight operations
- Seamless integration with existing neuroim2 workflows

It's ideal for:
- Atlas-based analyses
- Connectivity studies
- Parcellated machine learning
- Any workflow that aggregates voxels to regions
