This vignette is meant to be the smallest useful package-side
introduction to np. The emphasis is on one clean workflow
that users can run after installation: choose a bandwidth, fit a model,
inspect the result, and plot it.
Broader worked examples, package comparisons, and method-specific articles are better carried by the gallery site:
In np, the bandwidth object is often the key object in
the analysis.
library(np)
data(cps71, package = "np")
bw <- npregbw(logwage ~ age, data = cps71)
summary(bw)
#>
#> Regression Data (205 observations, 1 variable(s)):
#>
#> Regression Type: Local-Constant
#> Bandwidth Selection Method: Least Squares Cross-Validation
#> Formula: logwage ~ age
#> Bandwidth Type: Fixed
#> Objective Function Value: 0.316055 (achieved on multistart 1)
#> Number of Function Evaluations: 47 (fast = 0)
#>
#> Exp. Var. Name: age Bandwidth: 1.892158 Scale Factor: 0.4487743
#>
#> Continuous Kernel Type: Second-Order Gaussian
#> No. Continuous Explanatory Vars.: 1
#> Estimation Time: 0.075 seconds
fit <- npreg(bws = bw)
summary(fit)
#>
#> Regression Data: 205 training points, in 1 variable(s)
#> age
#> Bandwidth(s): 1.892158
#>
#> Kernel Regression Estimator: Local-Constant
#> Bandwidth Type: Fixed
#> Residual standard error: 0.5307943
#> R-squared: 0.3108675
#>
#> Continuous Kernel Type: Second-Order Gaussian
#> No. Continuous Explanatory Vars.: 1
#> Estimation Time: 0.076 seconds (optim 0.075s, fit 0.001s)plot(cps71$age, cps71$logwage, cex = 0.25, col = "grey")
lines(cps71$age, fitted(fit), col = 2, lwd = 2)One important feature of np is that it handles mixed
data directly. Variable class matters: unordered categorical variables
should be factors, and ordered categorical variables should be ordered
factors when appropriate.
set.seed(42)
mydat <- data.frame(
y = rnorm(200),
x_cont = runif(200),
x_unordered = factor(sample(c("a", "b", "c"), 200, replace = TRUE)),
x_ordered = ordered(sample(1:4, 200, replace = TRUE))
)
bw_mixed <- npregbw(y ~ x_cont + x_unordered + x_ordered, data = mydat)
fit_mixed <- npreg(bws = bw_mixed)
summary(fit_mixed)
#>
#> Regression Data: 200 training points, in 3 variable(s)
#> x_cont x_unordered x_ordered
#> Bandwidth(s): 1808084 0.6610069 0.9981613
#>
#> Kernel Regression Estimator: Local-Constant
#> Bandwidth Type: Fixed
#> Residual standard error: 0.9721457
#> R-squared: 0
#>
#> Continuous Kernel Type: Second-Order Gaussian
#> No. Continuous Explanatory Vars.: 1
#>
#> Unordered Categorical Kernel Type: Aitchison and Aitken
#> No. Unordered Categorical Explanatory Vars.: 1
#>
#> Ordered Categorical Kernel Type: Li and Racine
#> No. Ordered Categorical Explanatory Vars.: 1
#> Estimation Time: 0.163 seconds (optim 0.163s, fit 0s)For local-polynomial-capable methods, np now supports
joint selection of polynomial order and bandwidth. The modern route is
to use search.engine = "nomad+powell" when you want the
search to choose both together.
If you want the recommended route without spelling out all of the LP
tuning arguments, use nomad = TRUE. This is a documented
convenience preset, not a generic optimizer alias: it fills only missing
values among the LP degree-search controls and leaves compatible
explicit overrides in place. This route uses the optional NOMAD backend
provided by the suggested package crs, so install
crs first if you want to use nomad = TRUE or
search.engine = "nomad"/"nomad+powell".
if (requireNamespace("crs", quietly = TRUE) &&
utils::packageVersion("crs") >= package_version("0.15-41")) {
set.seed(7)
n <- 120
x <- runif(n, -1, 1)
y <- x + 0.4 * x^2 + rnorm(n, sd = 0.18)
fit_nomad <- npreg(y ~ x, nomad = TRUE, degree.max = 1L, nmulti = 1L)
fit_nomad$bws$nomad.shortcut
# Tune one component explicitly while leaving the rest of the preset in place.
fit_nomad_direct <- npreg(
y ~ x,
nomad = TRUE,
search.engine = "nomad",
degree.max = 1L,
nmulti = 1L
)
}The same convenience entry point is available for the other
LP-capable families: npcdens, npcdist,
npplreg, npscoef, and npindex,
together with their corresponding *bw constructors.
Keep the first run modest and runnable. Fuller worked examples belong on the gallery rather than in this package vignette.
In np, the formula interface tells the function which
variables are the response and regressors. It is not imposing an
ordinary linear-additive model.
It is also important not to pass blocks of 0/1 dummies as if this
were a standard linear-model workflow. If the underlying variable is
categorical, it is usually better to keep it as one factor
or ordered variable.
This vignette keeps the package-side introduction intentionally narrow. Other common first routes are:
?npudens and ?npudist for unconditional
density and distribution work,?npcdens, ?npcdist, and
?npqreg for conditional density, distribution, and
quantiles,?npconmode for classification and conditional mode
estimation,?npplreg, ?npindex, and
?npscoef for semiparametric models.Those broader branches are better carried by help pages and website articles than by a single shipped vignette.
vignette("np_entropy_tests", package = "np") for a
compact package-side testing overview?npreg, ?npregbw, ?npudens,
and ?npcdens for core help pages