| Type: | Package |
| Title: | Cross-Validated Post-Lasso |
| Version: | 0.1.3 |
| Description: | Provides tools for cross-validated Lasso and Post-Lasso estimation. Built on top of the 'glmnet' package by Friedman, Hastie and Tibshirani (2010) <doi:10.18637/jss.v033.i01>, the main function plasso() extends the standard 'glmnet' output with coefficient paths for Post-Lasso models, while cv.plasso() performs cross-validation for both Lasso and Post-Lasso models and different ways to select the penalty parameter lambda as discussed in Knaus (2021) <doi:10.1111/rssa.12623>. |
| License: | GPL-3 |
| VignetteBuilder: | knitr |
| Encoding: | UTF-8 |
| URL: | https://github.com/MCKnaus/plasso |
| BugReports: | https://github.com/MCKnaus/plasso/issues |
| LazyData: | true |
| Imports: | glmnet, Matrix, methods, parallel, doParallel, foreach, iterators |
| RoxygenNote: | 7.3.3 |
| Suggests: | testthat (≥ 3.0.0), knitr, rmarkdown, xfun |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | no |
| Packaged: | 2025-10-28 09:26:22 UTC; Michael |
| Author: | Michael C. Knaus |
| Maintainer: | Michael C. Knaus <[email protected]> |
| Repository: | CRAN |
| Date/Publication: | 2025-10-31 18:20:02 UTC |
Core part for (Post-) Lasso cross-validation
Description
CV_core contains the core parts of the cross-validation for Lasso and Post-Lasso.
Usage
CV_core(x, y, w, cvgroup, list, i, lambda, ...)
Arguments
x |
Covariate matrix to be used in cross-validation |
y |
Vector of outcomes |
w |
Vector of weight |
cvgroup |
Categorical with k groups to identify folds |
list |
List 1:k |
i |
Number of fold that is used for prediction |
lambda |
Series of lambdas used |
... |
Pass |
Value
MSE_lasso / MSE_plasso: means squared errors for each lambda.
Adds an intercept to a matrix
Description
add_intercept adds an intercept to a matrix.
Usage
add_intercept(mat)
Arguments
mat |
Any matrix (with column names). |
Value
Matrix with intercept.
Extract coefficients from a cv.plasso object
Description
Extract coefficients for both Lasso and Post-Lasso from a cv.plasso object.
Usage
## S3 method for class 'cv.plasso'
coef(object, ..., s = c("optimal", "all"), se_rule = 0)
Arguments
object |
|
... |
Pass generic |
s |
Determines whether coefficients are extracted for all values of lambda ("all") or only for the optimal lambda ("optimal") according to the specified standard error-rule. |
se_rule |
If equal to 0, predictions from cross-validated MSE minimum (default). Negative values go in the direction of smaller
models, positive values go in the direction of larger models (e.g. |
Value
List object containing coefficients for both the Lasso and Post-Lasso models respectively.
lasso |
Sparse |
plasso |
Sparse |
Examples
# load toeplitz data
data(toeplitz)
# extract target and features from data
y = as.matrix(toeplitz[,1])
X = toeplitz[,-1]
# fit cv.plasso to the data
p.cv = plasso::cv.plasso(X,y)
# get estimated coefficients along whole lambda sequence
coefs = coef(p.cv, s="all")
head(coefs$plasso)
# get estimated coefficients for optimal lambda value according to 1-standard-error rule
coef(p.cv, s="optimal", se_rule=-1)
Extract coefficients from a plasso object
Description
Extract coefficients for both Lasso and Post-Lasso from a plasso object.
Usage
## S3 method for class 'plasso'
coef(object, ..., s = NULL)
Arguments
object |
|
... |
Pass generic |
s |
If Null, coefficients are returned for all lambda values. If a value is provided, the closest lambda value of the |
Value
List object containing coefficients that are associated with either all values along the lambda input sequence or for one specifically given lambda value for both the Lasso and Post-Lasso models respectively.
lasso |
Sparse |
plasso |
Sparse |
Examples
# load toeplitz data
data(toeplitz)
# extract target and features from data
y = as.matrix(toeplitz[,1])
X = toeplitz[,-1]
# fit plasso to the data
p = plasso::plasso(X,y)
# get estimated coefficients along whole lambda sequence
coefs = coef(p)
head(coefs$plasso)
# get estimated coefficients for specific lambda approximation
coef(p, s=0.05)
Cross-Validated Lasso and Post-Lasso
Description
cv.plasso uses the glmnet package to estimate the coefficient paths and cross-validates least squares Lasso AND Post-Lasso.
Usage
cv.plasso(x, y, w = NULL, kf = 10, parallel = FALSE, ...)
Arguments
x |
Matrix of covariates (number of observations times number of covariates matrix) |
y |
Vector of outcomes |
w |
Vector of weights |
kf |
Number of folds in k-fold cross-validation |
parallel |
Set as TRUE for parallelized cross-validation. Default is FALSE. |
... |
Pass |
Value
cv.plasso object (using a list structure) including the base glmnet object and cross-validation results (incl. optimal Lambda values) for both Lasso and Post-Lasso model.
call |
the call that produced this |
lasso_full |
base |
kf |
number of folds in k-fold cross-validation |
cv_MSE_lasso |
cross-validated MSEs of Lasso model (for every iteration of k-fold cross-validation) |
cv_MSE_plasso |
cross-validated MSEs of Post-Lasso model (for every iteration of k-fold cross-validation) |
mean_MSE_lasso |
averaged cross-validated MSEs of Lasso model |
mean_MSE_plasso |
averaged cross-validated MSEs of Post-Lasso model |
ind_min_l |
index of MSE optimal lambda value for Lasso model |
ind_min_pl |
index of MSE optimal lambda value for Post-Lasso model |
lambda_min_l |
MSE optimal lambda value for Lasso model |
lambda_min_pl |
MSE optimal lambda value for Post-Lasso model |
names_l |
Names of active variables for MSE optimal Lasso model |
names_pl |
Names of active variables for MSE optimal Post-Lasso model |
coef_min_l |
Coefficients for MSE optimal Lasso model |
coef_min_pl |
Coefficients for MSE optimal Post-Lasso model |
x |
Input matrix of covariates |
y |
Matrix of outcomes |
w |
Matrix of weights |
Examples
# load toeplitz data
data(toeplitz)
# extract target and features from data
y = as.matrix(toeplitz[,1])
X = toeplitz[,-1]
# fit cv.plasso to the data
p.cv = plasso::cv.plasso(X,y)
# get basic summary statistics
print(summary(p.cv, default=FALSE))
# plot cross-validated MSE curves and number of active coefficients
plot(p.cv, legend_pos="bottomleft")
# get coefficients at MSE optimal lambda value for both Lasso and Post-Lasso model
coef(p.cv)
# get coefficients at MSE optimal lambda value according to 1-standard-error rule
coef(p.cv, se_rule=-1)
# predict fitted values along whole lambda sequence
pred = predict(p.cv, s="all")
head(pred$plasso)
Helper function to find the position for prespecified SE rules
Description
find_Xse_ind is a helper function that finds the position for prespecified SE rules.
Usage
find_Xse_ind(CV, ind_min, oneSE, factor)
Arguments
CV |
Vector of cross-validated criterion |
ind_min |
Index of cross-validated minimum |
oneSE |
Vector that contains the standard errors of the cross-validated criterion for the whole grid |
factor |
Factor in which direction to go: Negative values favor smaller models, positive values favor larger models |
Value
Index on the Lambda grid.
plasso fitting
Description
fit_betas estimates OLS model only for active coefficients (from lasso)
Usage
fit_betas(x, y, w, nm_act, coef_lasso)
Arguments
x |
Matrix of covariates (number of observations times number of covariates matrix) |
y |
Vector of outcomes |
w |
Vector of weights |
nm_act |
Vector of active variables |
coef_lasso |
Vector of lasso coefficients |
Value
Beta estimates.
Fitted values for a subset of active variables
Description
fitted_values_cv extracts the active set from X^TX and
X^Ty to get out-of-sample predictions
for a matrix already containing only the active variables.
The function is only relevant for cases where at least one variable is selected.
Usage
fitted_values_cv(XtX_all, Xty_all, x_pred, nm_act)
Arguments
XtX_all |
Cross product of all covariates |
Xty_all |
Cross product of covariates and outcome |
x_pred |
Covariates matrix of the prediction sample |
nm_act |
Names of active variables |
Value
Fitted values in the prediction sample.
Sanitizes potential sample weights
Description
handle_weights cleans potential sample weights or codes them as ones if they are not specified.
Usage
handle_weights(w, n)
Arguments
w |
Vector or n x 1 matrix of weights or null if no weights provided |
n |
Number of observations |
Value
Vector of weights.
Normalization of sample weights for potential sample weights
Description
norm_w_to_n normalizes weights either to N or to N in treated and controls separately.
Usage
norm_w_to_n(w, d = NULL)
Arguments
w |
Vector or n x 1 matrix of weights that should be normalized |
d |
Vector of treatment indicators |
Value
Normalized weights.
Lasso and Post-Lasso
Description
plasso implicitly estimates a Lasso model using the glmnet package
and additionally estimates coefficient paths for a subsequent Post-Lasso model.
Usage
plasso(x, y, w = NULL, ...)
Arguments
x |
Matrix of covariates (number of observations times number of covariates matrix) |
y |
Vector of outcomes |
w |
Vector of weights |
... |
Pass |
Value
List including base glmnet (i.e. Lasso) object and Post-Lasso coefficients.
call |
the call that produced this |
lasso_full |
base |
beta_plasso |
matrix of coefficients for Post-Lasso model stored in sparse column format |
x |
Input matrix of covariates |
y |
Matrix of outcomes |
w |
Matrix of weights |
Examples
# load toeplitz data
data(toeplitz)
# extract target and features from data
y = as.matrix(toeplitz[,1])
X = toeplitz[,-1]
# fit plasso to the data
p = plasso::plasso(X,y)
# plot coefficient paths for Post-Lasso model
plot(p, lasso=FALSE, xvar="lambda")
# plot coefficient paths for Lasso model
plot(p, lasso=TRUE, xvar="lambda")
# get coefficients for specific lambda approximation
coef(p, s=0.05)
# predict fitted values along whole lambda sequence
pred = predict(p)
head(pred$plasso)
Plot of cross-validation curves
Description
Plot of cross-validation curves.
Usage
## S3 method for class 'cv.plasso'
plot(
x,
...,
legend_pos = c("bottomright", "bottom", "bottomleft", "left", "topleft", "top",
"topright", "right", "center"),
legend_size = 0.5,
lasso = FALSE
)
Arguments
x |
|
... |
Pass generic |
legend_pos |
Legend position. Only considered for joint plot (lass=FALSE). |
legend_size |
Font size of legend |
lasso |
If set as True, only the cross-validation curve for the Lasso model is plotted. Default is False. |
Value
Plots the cross-validation curves for both Lasso and Post-Lasso models (incl. upper and lower standard deviation curves)
for a fitted cv.plasso object.
Examples
# load toeplitz data
data(toeplitz)
# extract target and features from data
y = as.matrix(toeplitz[,1])
X = toeplitz[,-1]
# fit cv.plasso to the data
p.cv = plasso::cv.plasso(X,y)
# plot cross-validated MSE curves and number of active coefficients
plot(p.cv, legend_pos="bottomleft")
Plot coefficient paths
Description
Plot coefficient paths of (Post-) Lasso model.
Usage
## S3 method for class 'plasso'
plot(x, ..., lasso = FALSE, xvar = c("norm", "lambda", "dev"), label = FALSE)
Arguments
x |
|
... |
Pass generic |
lasso |
If set as True, coefficient paths for Lasso instead of Post-Lasso is plotted. Default is False. |
xvar |
X-axis variable:
|
label |
If TRUE, label the curves with variable sequence numbers |
Value
Produces a coefficient profile plot of the coefficient paths for a fitted plasso object.
Examples
# load toeplitz data
data(toeplitz)
# extract target and features from data
y = as.matrix(toeplitz[,1])
X = toeplitz[,-1]
# fit plasso to the data
p = plasso::plasso(X,y)
# plot coefficient paths for Post-Lasso model
plot(p, lasso=FALSE, xvar="lambda")
# plot coefficient paths for Lasso model
plot(p, lasso=TRUE, xvar="lambda")
Predict after cross-validated (Post-) Lasso
Description
Prediction for cross-validated (Post-) Lasso.
Usage
## S3 method for class 'cv.plasso'
predict(
object,
...,
newx = NULL,
type = c("response", "coefficients"),
s = c("optimal", "all"),
se_rule = 0
)
Arguments
object |
Fitted |
... |
Pass generic |
newx |
Matrix of new values for x at which predictions are to be made. If no value is supplied, x from fitting procedure is used. This argument is not used for |
type |
Type of prediction required. |
s |
Determines whether prediction is done for all values of lambda ( |
se_rule |
If equal to 0, predictions from cross-validated MSE minimum (default). Negative values go in the direction of smaller
models, positive values go in the direction of larger models (e.g. |
Value
List object containing either fitted values or coefficients for both the Lasso and Post-Lasso models respectively.
lasso |
Matrix with Lasso predictions or coefficients |
plasso |
Matrix with Post-Lasso predictions or coefficients |
Examples
# load toeplitz data
data(toeplitz)
# extract target and features from data
y = as.matrix(toeplitz[,1])
X = toeplitz[,-1]
# fit cv.plasso to the data
p.cv = plasso::cv.plasso(X,y)
# predict fitted values along whole lambda sequence
pred = predict(p.cv, s="all")
head(pred$plasso)
# predict fitted values for optimal lambda value (according to cross-validation)
pred_optimal = predict(p.cv, s="optimal")
head(pred_optimal$plasso)
# predict fitted values for new feature set X
X_new = head(X, 10)
pred_new = predict(p.cv, newx=X_new, s="optimal")
pred_new$plasso
# get estimated coefficients along whole lambda sequence
coefs = predict(p.cv, type="coefficients", s="all")
head(coefs$plasso)
# get estimated coefficients for optimal lambda value according to 1-standard-error rule
predict(p.cv, type="coefficients", s="optimal", se_rule=-1)
Predict for (Post-) Lasso models
Description
Prediction for (Post-) Lasso models.
Usage
## S3 method for class 'plasso'
predict(
object,
...,
newx = NULL,
type = c("response", "coefficients"),
s = NULL
)
Arguments
object |
Fitted |
... |
Pass generic |
newx |
Matrix of new values for x at which predictions are to be made. If no value is supplied, x from fitting procedure is used. This argument is not used for type="coefficients". |
type |
Type of prediction required. "response" returns fitted values, "coefficients" returns beta estimates. |
s |
If Null, prediction is done for all lambda values. If a value is provided, the closest lambda value of the |
Value
List object containing either fitted values or coefficients for both the Lasso and Post-Lasso models associated with all values along the lambda input sequence or for one specifically given lambda value.
lasso |
Matrix with Lasso predictions or coefficients |
plasso |
Matrix with Post-Lasso predictions or coefficients |
Examples
# load toeplitz data
data(toeplitz)
# extract target and features from data
y = as.matrix(toeplitz[,1])
X = toeplitz[,-1]
# fit plasso to the data
p = plasso::plasso(X,y)
# predict fitted values along whole lambda sequence
pred = predict(p)
head(pred$plasso)
# get estimated coefficients for specific lambda approximation
predict(p, type="coefficients", s=0.05)
Print cross-validated (Post-) Lasso model
Description
Printing main insights from cross-validated (Post-) Lasso model.
Usage
## S3 method for class 'cv.plasso'
print(x, ..., digits = max(3, getOption("digits") - 3))
Arguments
x |
|
... |
Pass generic |
digits |
Integer, used for number formatting |
Value
Prints basic statistics for different lambda values of a fitted plasso object,
i.e. cross-validated MSEs for both Lasso and Post-Lasso model as well as the number of active variables.
Print (Post-) Lasso model
Description
Printing main insights from (Post-) Lasso model.
Usage
## S3 method for class 'plasso'
print(x, ..., digits = max(3, getOption("digits") - 3))
Arguments
x |
|
... |
Pass generic |
digits |
Integer, used for number formatting |
Value
Prints glmnet-like output.
Print summary of (Post-) Lasso model
Description
Prints summary information of cv.plasso object
Usage
## S3 method for class 'summary.cv.plasso'
print(x, ..., digits = max(3L, getOption("digits") - 3L))
Arguments
x |
Summary of plasso object (either of class |
... |
Pass generic R |
digits |
Integer, used for number formatting |
Value
Prints information from summary.cv.plasso object into console.
Summary of cross-validated (Post-) Lasso model
Description
Summary of cross-validated (Post-) Lasso model.
Usage
## S3 method for class 'cv.plasso'
summary(object, ..., default = FALSE)
Arguments
object |
|
... |
Pass generic |
default |
TRUE for |
Value
For specific summary information: summary.cv.plasso object (using list structure) containing optimal
lambda values and associated MSEs for both cross-validated Lasso and Post-Lasso model.
For default: summaryDefault object.
Examples
# load toeplitz data
data(toeplitz)
# extract target and features from data
y = as.matrix(toeplitz[,1])
X = toeplitz[,-1]
# fit cv.plasso to the data
p.cv = plasso::cv.plasso(X,y)
# get informative summary statistics
print(summary(p.cv, default=FALSE))
# set default=TRUE for standard summary statistics
print(summary(p.cv, default=TRUE))
Summary of (Post-) Lasso model
Description
Summary of (Post-) Lasso model.
Usage
## S3 method for class 'plasso'
summary(object, ...)
Arguments
object |
|
... |
Pass generic |
Value
Default summary object
Simulated 'Toeplitz' Data
Description
Simulated data from a DGP with an underlying causal relationship between
covariates X and the target y.
The covariates matrix X consists of 10 variables whose effect size on target
y is defined by the vector
c(1, -0.83, 0.67, -0.5, 0.33, -0.17, 0, ..., 0)
with the first six effect sizes decreasing in absolute terms continuously
from 1 to 0 and alternating in their sign.
The true causal effect of all other covariates is 0.
The variables in X follow a normal distribution with mean zero while the
covariance matrix follows a Toeplitz matrix.
The target y is then a linear transformation of X plus a vector of standard
normal random variables (i.e. error term).
(See vignette for more details.)
Usage
data(toeplitz)
Format
An object of class standardGeneric of length 1.
Examples
# load toeplitz data
data(toeplitz)
# extract target and features from data
y = as.matrix(toeplitz[,1])
X = toeplitz[,-1]
# fit cv.plasso to the data
p.cv = plasso::cv.plasso(X,y)