Help for package plasso

Type:

Package

Title:

Cross-Validated Post-Lasso

Version:

0.1.3

Description:

Provides tools for cross-validated Lasso and Post-Lasso estimation. Built on top of the 'glmnet' package by Friedman, Hastie and Tibshirani (2010) <doi:10.18637/jss.v033.i01>, the main function plasso() extends the standard 'glmnet' output with coefficient paths for Post-Lasso models, while cv.plasso() performs cross-validation for both Lasso and Post-Lasso models and different ways to select the penalty parameter lambda as discussed in Knaus (2021) <doi:10.1111/rssa.12623>.

License:

GPL-3

VignetteBuilder:

knitr

Encoding:

UTF-8

URL:

https://github.com/MCKnaus/plasso

BugReports:

https://github.com/MCKnaus/plasso/issues

LazyData:

true

Imports:

glmnet, Matrix, methods, parallel, doParallel, foreach, iterators

RoxygenNote:

7.3.3

Suggests:

testthat (≥ 3.0.0), knitr, rmarkdown, xfun

Config/testthat/edition:

NeedsCompilation:

Packaged:

2025-10-28 09:26:22 UTC; Michael

Author:

Michael C. Knaus

[aut, cre], Stefan Glaisner [aut]

Maintainer:

Michael C. Knaus <[email protected]>

Repository:

CRAN

Date/Publication:

2025-10-31 18:20:02 UTC

Core part for (Post-) Lasso cross-validation

Description

CV_core contains the core parts of the cross-validation for Lasso and Post-Lasso.

Usage

CV_core(x, y, w, cvgroup, list, i, lambda, ...)

Arguments

x

Covariate matrix to be used in cross-validation

y

Vector of outcomes

w

Vector of weight

cvgroup

Categorical with k groups to identify folds

list

List 1:k

i

Number of fold that is used for prediction

lambda

Series of lambdas used

...

Pass glmnet options

Value

MSE_lasso / MSE_plasso: means squared errors for each lambda.

Adds an intercept to a matrix

Description

add_intercept adds an intercept to a matrix.

Usage

add_intercept(mat)

Arguments

mat

Any matrix (with column names).

Value

Matrix with intercept.

Extract coefficients from a `cv.plasso` object

Description

Extract coefficients for both Lasso and Post-Lasso from a cv.plasso object.

Usage

## S3 method for class 'cv.plasso'
coef(object, ..., s = c("optimal", "all"), se_rule = 0)

Arguments

object

cv.plasso object

...

Pass generic coef options

s

Determines whether coefficients are extracted for all values of lambda ("all") or only for the optimal lambda ("optimal") according to the specified standard error-rule.

se_rule

If equal to 0, predictions from cross-validated MSE minimum (default). Negative values go in the direction of smaller models, positive values go in the direction of larger models (e.g. se_rule=-1 creates the standard 1SE rule). This argument is not used for s="all".

Value

List object containing coefficients for both the Lasso and Post-Lasso models respectively.

lasso

Sparse dgCMatrix with Lasso coefficients

plasso

Sparse dgCMatrix with Post-Lasso coefficients

Examples


# load toeplitz data
data(toeplitz)
# extract target and features from data
y = as.matrix(toeplitz[,1])
X = toeplitz[,-1]
# fit cv.plasso to the data
p.cv = plasso::cv.plasso(X,y)
# get estimated coefficients along whole lambda sequence
coefs = coef(p.cv, s="all")
head(coefs$plasso)
# get estimated coefficients for optimal lambda value according to 1-standard-error rule
coef(p.cv, s="optimal", se_rule=-1)

Extract coefficients from a `plasso` object

Description

Extract coefficients for both Lasso and Post-Lasso from a plasso object.

Usage

## S3 method for class 'plasso'
coef(object, ..., s = NULL)

Arguments

object

plasso object

...

Pass generic coef options

s

If Null, coefficients are returned for all lambda values. If a value is provided, the closest lambda value of the plasso object is used.

Value

List object containing coefficients that are associated with either all values along the lambda input sequence or for one specifically given lambda value for both the Lasso and Post-Lasso models respectively.

lasso

Sparse dgCMatrix-class object with Lasso coefficients

plasso

Sparse dgCMatrix-class object with Post-Lasso coefficients

Examples


# load toeplitz data
data(toeplitz)
# extract target and features from data
y = as.matrix(toeplitz[,1])
X = toeplitz[,-1]
# fit plasso to the data
p = plasso::plasso(X,y)
# get estimated coefficients along whole lambda sequence 
coefs = coef(p)
head(coefs$plasso)
# get estimated coefficients for specific lambda approximation
coef(p, s=0.05)

Cross-Validated Lasso and Post-Lasso

Description

cv.plasso uses the glmnet package to estimate the coefficient paths and cross-validates least squares Lasso AND Post-Lasso.

Usage

cv.plasso(x, y, w = NULL, kf = 10, parallel = FALSE, ...)

Arguments

x

Matrix of covariates (number of observations times number of covariates matrix)

y

Vector of outcomes

w

Vector of weights

kf

Number of folds in k-fold cross-validation

parallel

Set as TRUE for parallelized cross-validation. Default is FALSE.

...

Pass glmnet options

Value

cv.plasso object (using a list structure) including the base glmnet object and cross-validation results (incl. optimal Lambda values) for both Lasso and Post-Lasso model.

call

the call that produced this

lasso_full

base glmnet object

kf

number of folds in k-fold cross-validation

cv_MSE_lasso

cross-validated MSEs of Lasso model (for every iteration of k-fold cross-validation)

cv_MSE_plasso

cross-validated MSEs of Post-Lasso model (for every iteration of k-fold cross-validation)

mean_MSE_lasso

averaged cross-validated MSEs of Lasso model

mean_MSE_plasso

averaged cross-validated MSEs of Post-Lasso model

ind_min_l

index of MSE optimal lambda value for Lasso model

ind_min_pl

index of MSE optimal lambda value for Post-Lasso model

lambda_min_l

MSE optimal lambda value for Lasso model

lambda_min_pl

MSE optimal lambda value for Post-Lasso model

names_l

Names of active variables for MSE optimal Lasso model

names_pl

Names of active variables for MSE optimal Post-Lasso model

coef_min_l

Coefficients for MSE optimal Lasso model

coef_min_pl

Coefficients for MSE optimal Post-Lasso model

x

Input matrix of covariates

y

Matrix of outcomes

w

Matrix of weights

Examples

# load toeplitz data
data(toeplitz)
# extract target and features from data
y = as.matrix(toeplitz[,1])
X = toeplitz[,-1]
# fit cv.plasso to the data
p.cv = plasso::cv.plasso(X,y)
# get basic summary statistics
print(summary(p.cv, default=FALSE))
# plot cross-validated MSE curves and number of active coefficients
plot(p.cv, legend_pos="bottomleft")
# get coefficients at MSE optimal lambda value for both Lasso and Post-Lasso model
coef(p.cv)
# get coefficients at MSE optimal lambda value according to 1-standard-error rule
coef(p.cv, se_rule=-1)
# predict fitted values along whole lambda sequence 
pred = predict(p.cv, s="all")
head(pred$plasso)

Helper function to find the position for prespecified SE rules

Description

find_Xse_ind is a helper function that finds the position for prespecified SE rules.

Usage

find_Xse_ind(CV, ind_min, oneSE, factor)

Arguments

CV

Vector of cross-validated criterion

ind_min

Index of cross-validated minimum

oneSE

Vector that contains the standard errors of the cross-validated criterion for the whole grid

factor

Factor in which direction to go: Negative values favor smaller models, positive values favor larger models

Value

Index on the Lambda grid.

plasso fitting

Description

fit_betas estimates OLS model only for active coefficients (from lasso)

Usage

fit_betas(x, y, w, nm_act, coef_lasso)

Arguments

x

Matrix of covariates (number of observations times number of covariates matrix)

y

Vector of outcomes

w

Vector of weights

nm_act

Vector of active variables

coef_lasso

Vector of lasso coefficients

Value

Beta estimates.

Fitted values for a subset of active variables

Description

fitted_values_cv extracts the active set from X^TX and X^Ty to get out-of-sample predictions for a matrix already containing only the active variables. The function is only relevant for cases where at least one variable is selected.

Usage

fitted_values_cv(XtX_all, Xty_all, x_pred, nm_act)

Arguments

XtX_all

Cross product of all covariates

Xty_all

Cross product of covariates and outcome

x_pred

Covariates matrix of the prediction sample

nm_act

Names of active variables

Value

Fitted values in the prediction sample.

Sanitizes potential sample weights

Description

handle_weights cleans potential sample weights or codes them as ones if they are not specified.

Usage

handle_weights(w, n)

Arguments

w

Vector or n x 1 matrix of weights or null if no weights provided

n

Number of observations

Value

Vector of weights.

Normalization of sample weights for potential sample weights

Description

norm_w_to_n normalizes weights either to N or to N in treated and controls separately.

Usage

norm_w_to_n(w, d = NULL)

Arguments

w

Vector or n x 1 matrix of weights that should be normalized

d

Vector of treatment indicators

Value

Normalized weights.

Lasso and Post-Lasso

Description

plasso implicitly estimates a Lasso model using the glmnet package and additionally estimates coefficient paths for a subsequent Post-Lasso model.

Usage

plasso(x, y, w = NULL, ...)

Arguments

x

Matrix of covariates (number of observations times number of covariates matrix)

y

Vector of outcomes

w

Vector of weights

...

Pass glmnet options

Value

List including base glmnet (i.e. Lasso) object and Post-Lasso coefficients.

call

the call that produced this

lasso_full

base glmnet object

beta_plasso

matrix of coefficients for Post-Lasso model stored in sparse column format

x

Input matrix of covariates

y

Matrix of outcomes

w

Matrix of weights

Examples

# load toeplitz data
data(toeplitz)
# extract target and features from data
y = as.matrix(toeplitz[,1])
X = toeplitz[,-1]
# fit plasso to the data
p = plasso::plasso(X,y)
# plot coefficient paths for Post-Lasso model
plot(p, lasso=FALSE, xvar="lambda")
# plot coefficient paths for Lasso model
plot(p, lasso=TRUE, xvar="lambda")
# get coefficients for specific lambda approximation
coef(p, s=0.05)
# predict fitted values along whole lambda sequence 
pred = predict(p)
head(pred$plasso)

Plot of cross-validation curves

Description

Plot of cross-validation curves.

Usage

## S3 method for class 'cv.plasso'
plot(
  x,
  ...,
  legend_pos = c("bottomright", "bottom", "bottomleft", "left", "topleft", "top",
    "topright", "right", "center"),
  legend_size = 0.5,
  lasso = FALSE
)

Arguments

x

cv.plasso object

...

Pass generic plot options

legend_pos

Legend position. Only considered for joint plot (lass=FALSE).

legend_size

Font size of legend

lasso

If set as True, only the cross-validation curve for the Lasso model is plotted. Default is False.

Value

Plots the cross-validation curves for both Lasso and Post-Lasso models (incl. upper and lower standard deviation curves) for a fitted cv.plasso object.

Examples

# load toeplitz data
data(toeplitz)
# extract target and features from data
y = as.matrix(toeplitz[,1])
X = toeplitz[,-1]
# fit cv.plasso to the data
p.cv = plasso::cv.plasso(X,y)
# plot cross-validated MSE curves and number of active coefficients
plot(p.cv, legend_pos="bottomleft")

Plot coefficient paths

Description

Plot coefficient paths of (Post-) Lasso model.

Usage

## S3 method for class 'plasso'
plot(x, ..., lasso = FALSE, xvar = c("norm", "lambda", "dev"), label = FALSE)

Arguments

x

plasso object

...

Pass generic plot options

lasso

If set as True, coefficient paths for Lasso instead of Post-Lasso is plotted. Default is False.

xvar

X-axis variable: norm plots against the L1-norm of the coefficients, lambda against the log-lambda sequence, and dev against the percent deviance explained.

label

If TRUE, label the curves with variable sequence numbers

Value

Produces a coefficient profile plot of the coefficient paths for a fitted plasso object.

Examples

# load toeplitz data
data(toeplitz)
# extract target and features from data
y = as.matrix(toeplitz[,1])
X = toeplitz[,-1]
# fit plasso to the data
p = plasso::plasso(X,y)
# plot coefficient paths for Post-Lasso model
plot(p, lasso=FALSE, xvar="lambda")
# plot coefficient paths for Lasso model
plot(p, lasso=TRUE, xvar="lambda")

Predict after cross-validated (Post-) Lasso

Description

Prediction for cross-validated (Post-) Lasso.

Usage

## S3 method for class 'cv.plasso'
predict(
  object,
  ...,
  newx = NULL,
  type = c("response", "coefficients"),
  s = c("optimal", "all"),
  se_rule = 0
)

Arguments

object

Fitted cv.plasso model object

...

Pass generic predict options

newx

Matrix of new values for x at which predictions are to be made. If no value is supplied, x from fitting procedure is used. This argument is not used for type="coefficients".

type

Type of prediction required. "response" returns fitted values, "coefficients" returns beta estimates.

s

Determines whether prediction is done for all values of lambda ("all") or only for the optimal lambda ("optimal") according to the standard error-rule.

se_rule

Value

List object containing either fitted values or coefficients for both the Lasso and Post-Lasso models respectively.

lasso

Matrix with Lasso predictions or coefficients

plasso

Matrix with Post-Lasso predictions or coefficients

Examples


# load toeplitz data
data(toeplitz)
# extract target and features from data
y = as.matrix(toeplitz[,1])
X = toeplitz[,-1]
# fit cv.plasso to the data
p.cv = plasso::cv.plasso(X,y)
# predict fitted values along whole lambda sequence 
pred = predict(p.cv, s="all")
head(pred$plasso)
# predict fitted values for optimal lambda value (according to cross-validation) 
pred_optimal = predict(p.cv, s="optimal")
head(pred_optimal$plasso)
# predict fitted values for new feature set X
X_new = head(X, 10)
pred_new = predict(p.cv, newx=X_new, s="optimal")
pred_new$plasso
# get estimated coefficients along whole lambda sequence
coefs = predict(p.cv, type="coefficients", s="all")
head(coefs$plasso)
# get estimated coefficients for optimal lambda value according to 1-standard-error rule
predict(p.cv, type="coefficients", s="optimal", se_rule=-1)

Predict for (Post-) Lasso models

Description

Prediction for (Post-) Lasso models.

Usage

## S3 method for class 'plasso'
predict(
  object,
  ...,
  newx = NULL,
  type = c("response", "coefficients"),
  s = NULL
)

Arguments

object

Fitted plasso model object

...

Pass generic predict options

newx

Matrix of new values for x at which predictions are to be made. If no value is supplied, x from fitting procedure is used. This argument is not used for type="coefficients".

type

Type of prediction required. "response" returns fitted values, "coefficients" returns beta estimates.

s

If Null, prediction is done for all lambda values. If a value is provided, the closest lambda value of the plasso object is used.

Value

List object containing either fitted values or coefficients for both the Lasso and Post-Lasso models associated with all values along the lambda input sequence or for one specifically given lambda value.

lasso

Matrix with Lasso predictions or coefficients

plasso

Matrix with Post-Lasso predictions or coefficients

Examples


# load toeplitz data
data(toeplitz)
# extract target and features from data
y = as.matrix(toeplitz[,1])
X = toeplitz[,-1]
# fit plasso to the data
p = plasso::plasso(X,y)
# predict fitted values along whole lambda sequence 
pred = predict(p)
head(pred$plasso)
# get estimated coefficients for specific lambda approximation
predict(p, type="coefficients", s=0.05)

Print cross-validated (Post-) Lasso model

Description

Printing main insights from cross-validated (Post-) Lasso model.

Usage

## S3 method for class 'cv.plasso'
print(x, ..., digits = max(3, getOption("digits") - 3))

Arguments

x

cv.plasso object

...

Pass generic print options

digits

Integer, used for number formatting

Value

Prints basic statistics for different lambda values of a fitted plasso object, i.e. cross-validated MSEs for both Lasso and Post-Lasso model as well as the number of active variables.

Print (Post-) Lasso model

Description

Printing main insights from (Post-) Lasso model.

Usage

## S3 method for class 'plasso'
print(x, ..., digits = max(3, getOption("digits") - 3))

Arguments

x

plasso object

...

Pass generic print options

digits

Integer, used for number formatting

Value

Prints glmnet-like output.

Print summary of (Post-) Lasso model

Description

Prints summary information of cv.plasso object

Usage

## S3 method for class 'summary.cv.plasso'
print(x, ..., digits = max(3L, getOption("digits") - 3L))

Arguments

x

Summary of plasso object (either of class summary.cv.plasso or summary)

...

Pass generic R print options

digits

Integer, used for number formatting

Value

Prints information from summary.cv.plasso object into console.

Summary of cross-validated (Post-) Lasso model

Description

Summary of cross-validated (Post-) Lasso model.

Usage

## S3 method for class 'cv.plasso'
summary(object, ..., default = FALSE)

Arguments

object

cv.plasso object

...

Pass generic summary summary options

default

TRUE for glmnet-like summary output, FALSE for more specific summary information

Value

For specific summary information: summary.cv.plasso object (using list structure) containing optimal lambda values and associated MSEs for both cross-validated Lasso and Post-Lasso model. For default: summaryDefault object.

Examples

# load toeplitz data
data(toeplitz)
# extract target and features from data
y = as.matrix(toeplitz[,1])
X = toeplitz[,-1]
# fit cv.plasso to the data
p.cv = plasso::cv.plasso(X,y)
# get informative summary statistics
print(summary(p.cv, default=FALSE))
# set default=TRUE for standard summary statistics
print(summary(p.cv, default=TRUE))

Summary of (Post-) Lasso model

Description

Summary of (Post-) Lasso model.

Usage

## S3 method for class 'plasso'
summary(object, ...)

Arguments

object

plasso object

...

Pass generic summary summary options

Value

Default summary object

Simulated 'Toeplitz' Data

Description

Simulated data from a DGP with an underlying causal relationship between covariates X and the target y. The covariates matrix X consists of 10 variables whose effect size on target y is defined by the vector c(1, -0.83, 0.67, -0.5, 0.33, -0.17, 0, ..., 0) with the first six effect sizes decreasing in absolute terms continuously from 1 to 0 and alternating in their sign. The true causal effect of all other covariates is 0. The variables in X follow a normal distribution with mean zero while the covariance matrix follows a Toeplitz matrix. The target y is then a linear transformation of X plus a vector of standard normal random variables (i.e. error term). (See vignette for more details.)

Usage

data(toeplitz)

Format

An object of class standardGeneric of length 1.

Examples

# load toeplitz data
data(toeplitz)
# extract target and features from data
y = as.matrix(toeplitz[,1])
X = toeplitz[,-1]
# fit cv.plasso to the data
p.cv = plasso::cv.plasso(X,y)

Core part for (Post-) Lasso cross-validation

Description

Usage

Arguments

Value

Adds an intercept to a matrix

Description

Usage

Arguments

Value

Extract coefficients from a cv.plasso object

Description

Usage

Arguments

Value

Examples

Extract coefficients from a plasso object

Description

Usage

Arguments

Value

Examples

Cross-Validated Lasso and Post-Lasso

Description

Usage

Arguments

Value

Examples

Helper function to find the position for prespecified SE rules

Description

Usage

Arguments

Value

plasso fitting

Description

Usage

Arguments

Value

Fitted values for a subset of active variables

Description

Usage

Arguments

Value

Sanitizes potential sample weights

Description

Usage

Arguments

Value

Normalization of sample weights for potential sample weights

Description

Usage

Arguments

Value

Lasso and Post-Lasso

Description

Usage

Arguments

Value

Examples

Plot of cross-validation curves

Description

Usage

Arguments

Value

Examples

Plot coefficient paths

Description

Usage

Arguments

Value

Examples

Predict after cross-validated (Post-) Lasso

Description

Usage

Arguments

Value

Examples

Predict for (Post-) Lasso models

Description

Usage

Extract coefficients from a `cv.plasso` object

Extract coefficients from a `plasso` object