% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/adjust_batch.R
\name{adjust_batch}
\alias{adjust_batch}
\title{Adjust for batch effects}
\usage{
adjust_batch(
  data,
  markers,
  batch,
  method = c("simple", "standardize", "ipw", "quantreg", "quantnorm"),
  confounders = NULL,
  suffix = "_adjX",
  ipw_truncate = c(0.025, 0.975),
  quantreg_tau = c(0.25, 0.75),
  quantreg_method = "fn"
)
}
\arguments{
\item{data}{Data set}

\item{markers}{Variable name(s) to batch-adjust. Select
multiple variables with tidy evaluation, e.g.,
\code{markers = starts_with("biomarker")}.}

\item{batch}{Categorical variable indicating batch.}

\item{method}{Method for batch effect correction:
\itemize{
\item \code{simple}  Simple means per batch will be subtracted.
No adjustment for confounders.
\item \code{standardize}  Means per batch after standardization
for confounders in linear models will be subtracted.
If no \code{confounders} are supplied, \code{method = simple}
is equivalent and will be used.
\item \code{ipw}  Means per batch after inverse-probability
weighting for assignment to a specific batch in multinomial
models, conditional on confounders, will be subtracted.
Stabilized weights are used, truncated at quantiles
defined by the \code{ipw_truncate} parameters. If no
\code{confounders} are supplied, \code{method = simple}
is equivalent and will be used.
\item \code{quantreg}  Lower quantiles (default: 25th percentile)
and ranges between a lower and an upper quantile (default: 75th
percentile) will be unified between batches, allowing for
differences in both parameters due to confounders. Set the two
quantiles using the \code{quantreg_tau} parameters.
\item \code{quantnorm}  Quantile normalization between batches. No
adjustment for confounders.
}}

\item{confounders}{Optional: Confounders, i.e. determinants of
biomarker levels that differ between batches. Only used if
\code{method = standardize}, \code{method = ipw}, or
\code{method = quantreg}, i.e. methods that attempt to retain
some of these "true" between-batch differences. Select multiple
confounders with tidy evaluation, e.g.,
\code{confounders = c(age, age_squared, sex)}.}

\item{suffix}{Optional: What string to append to variable names
after batch adjustment. Defaults to \code{"_adjX"}, with
\code{X} depending on \code{method}:
\itemize{
\item \code{_adj2} from \code{method = simple}
\item \code{_adj3} from \code{method = standardize}
\item \code{_adj4} from \code{method = ipw}
\item \code{_adj5} from \code{method = quantreg}
\item \code{_adj6} from \code{method = quantnorm}
}}

\item{ipw_truncate}{Optional and used for \code{method = ipw} only:
Lower and upper quantiles for truncation of stabilized
weights. Defaults to \code{c(0.025, 0.975)}.}

\item{quantreg_tau}{Optional and used for \code{method = quantreg} only:
Quantiles to scale. Defaults to \code{c(0.25, 0.75)}.}

\item{quantreg_method}{Optional and used for \code{method = quantreg} only:
Algorithmic method to fit quantile regression. Defaults to
\code{"fn"}. See parameter \code{method} of \code{\link[quantreg]{rq}}.}
}
\value{
The \code{data} dataset with batch effect-adjusted
variable(s) added at the end. Model diagnostics, using
the attribute \code{.batchtma} of this dataset, are available
via the \code{\link[batchtma]{diagnose_models}} function.
}
\description{
\code{adjust_batch} generates biomarker levels for the variable(s)
\code{markers} in the dataset \code{data} that are corrected
(adjusted) for batch effects, i.e. differential measurement
error between levels of \code{batch}.
}
\details{
If no true differences between batches are expected, because
samples have been randomized to batches, then a \code{method}
that returns adjusted values with equal means
(\code{method = simple}) or with equal rank values
(\code{method = quantnorm}) for all batches is appropriate.

If the distribution of determinants of biomarker values
(\code{confounders}) differs between batches, then a
\code{method} that retains these "true" differences
between batches while adjusting for batch effects
may be appropriate: \code{method = standardize} and
\code{method = ipw} address means; \code{method = quantreg}
addresses lower values and dynamic range separately.

Which \code{method} to choose depends on the properties of
batch effects (affecting means or also variance?) and
the presence and strength of confounding. For the two
mean-only confounder-adjusted methods, the choice may depend
on  whether the confounder--batch association (\code{method = ipw})
or the confounder--biomarker association
(\code{method = standardize}) can be modeled better.
Generally, if batch effects are present, any adjustment
method tends to perform better than no adjustment in
reducing bias and increasing between-study reproducibility.
See references.

All adjustment approaches except \code{method = quantnorm}
are based on linear models. It is recommended that variables
for \code{markers} and \code{confounders} first be transformed
as necessary (e.g., \code{\link[base]{log}} transformations or
\code{\link{splines}}). Scaling or mean centering are not necessary,
and adjusted values are returned on the original scale.
Parameters \code{markers}, \code{batch}, and \code{confounders}
support tidy evaluation.

Observations with missing values for the \code{markers} and
\code{confounders} will be ignored in the estimation of adjustment
parameters, as are empty batches. Batch effect-adjusted values
for observations with existing marker values but missing
confounders are based on adjustment parameters derived from the
other observations in a batch with non-missing confounders.
}
\examples{
# Data frame with two batches
# Batch 2 has higher values of biomarker and confounder
df <- data.frame(
  tma = rep(1:2, times = 10),
  biomarker = rep(1:2, times = 10) +
    runif(max = 5, n = 20),
  confounder = rep(0:1, times = 10) +
    runif(max = 10, n = 20)
)

# Adjust for batch effects
# Using simple means, ignoring the confounder:
adjust_batch(
  data = df,
  markers = biomarker,
  batch = tma,
  method = simple
)
# Returns data set with new variable "biomarker_adj2"

# Use quantile regression, include the confounder,
# change suffix of returned variable:
adjust_batch(
  data = df,
  markers = biomarker,
  batch = tma,
  method = quantreg,
  confounders = confounder,
  suffix = "_batchadjusted"
)
# Returns data set with new variable "biomarker_batchadjusted"
}
\references{
Stopsack KH, Tyekucheva S, Wang M, Gerke TA, Vaselkiv JB, Penney KL,
Kantoff PW, Finn SP, Fiorentino M, Loda M, Lotan TL, Parmigiani G+,
Mucci LA+ (+ equal contribution). Extent, impact, and mitigation of
batch effects in tumor biomarker studies using tissue microarrays.
bioRxiv 2021.06.29.450369; doi: https://doi.org/10.1101/2021.06.29.450369
(This R package, all methods descriptions, and further recommendations.)

Rosner B, Cook N, Portman R, Daniels S, Falkner B.
Determination of blood pressure percentiles in
normal-weight children: some methodological issues.
Am J Epidemiol 2008;167(6):653-66. (Basis for
\code{method = standardize})

Bolstad BM, Irizarry RA, Åstrand M, Speed TP.
A comparison of normalization methods for high density
oligonucleotide array data based on variance and bias.
Bioinformatics 2003;19:185–193. (\code{method = quantnorm})
}
\seealso{
\url{https://stopsack.github.io/batchtma/}
}
\author{
Konrad H. Stopsack
}
