One of the simplest models for high-dimensional data in the regression setting is the ubiquitous high-dimensional linear model,
Here is sparse and . Whilst methods are readily available for estimating and performing inference concerning the unknown vector of regression coefficients, the problem of checking whether the high-dimensional linear model actually holds has received little attention.
In the low-dimensional setting, checks for the goodness of fit of a linear model are typically based on various plots involving the residuals. Writing for the orthogonal projection on to , we have that the scaled residuals do not depend on any unknown parameters, a fact which allows for easy interpretation of these plots. This property can however also be exploited algorithmically.
If is not a linear combination of the columns of , then the scaled residuals will contain some signal. The residual sum of squares (RSS) from a nonlinear regression (e.g. random forest) of the scaled residuals on to should be smaller, on average, than if we were fitting to pure noise. Taking this RSS as our test statistic, we can easily simulate from its null distribution and thereby obtain a (finite sample exact) -value. This is the basic idea of Residual Prediction (RP) tests introduced in our paper, where scaled residuals from a linear regression are then predicted using a further regression procedure (an RP method), and some proxy for its prediction error (e.g. RSS) is computed to give the final test statistic. By converting goodness of fit to a prediction problem, we can leverage the predictive power of the variety of machine learning methods available to detect the presence of nonlinearities.
Tests for a variety of different departures from the linear model can be constructed in this framework, including tests that assess the significance of groups of predictors. For these, we take to be a subset of all available predictors and the scaled residuals are regressed on to rather than just . For example, when is moderate or high-dimensional, one can use the Lasso as the RP method. This is particularly powerful against alternatives where the signal includes a sparse linear combination of variables not present in , but in fact tends to outperform the usual -test in a wider variety of settings including fully dense alternatives. Interestingly, using OLS as the RP method is exactly equivalent to performing the -test. With this “two-stage regression” interpretation of the -test we can view the RP framework as a generalisation of the -test to allow for more general regression procedures in the second stage.
To extend the idea to the high-dimensional setting we use the square-root Lasso as the initial regression procedure. Unfortunately scaled square-root Lasso residuals do depend on the unknown parameter so simple Monte Carlo cannot be used to obtain a -value. It turns out however that this dependence is largely through the signs of the true coefficient vector rather than their magnitudes or on the noise level (this can be formalised). This motivates a particular bootstrap scheme for calibration of the tests which yields asymptotic type I error control regardless of the form of the RP method under minimum signal strength a restricted eigenvalue conditions under the null. Whilst the scheme that achieves this is somewhat cumbersome, we show empirically that it is essentially equivalent to a simple parametric bootstrap approach which also retains type I error control.
We give examples of RP tests for the significance of groups and individual predictors, where the procedure is competitive with debiased Lasso approaches, and also develop tests for nonlinearity and heteroscedasicity. The R package RPtests implements these but also allows the user to design their own RP test to target their particular alternative of interest.