Abstract
his paper considers covariate selection for the additive hazards
model. This model is particularly simple to study theoretically and its
practical implementation has several major advantages to the similar
methodology for the proportional hazards model. One complication
compared with the proportional model is, however, that there is no
simple likelihood to work with. We here study a least squares criterion
with desirable properties and show how this criterion can be
interpreted as a prediction error. Given this criterion, we de. ne
ridge and Lasso estimators as well as an adaptive Lasso and study their
large sample properties for the situation where the number of
covariates p is smaller than the number of observations. We also show
that the adaptive Lasso has the oracle property. In many practical
situations, it is more relevant to tackle the situation with large p
compared with the number of observations. We do this by studying the
properties of the so-called Dantzig selector in the setting of the
additive risk model. Specifically, we establish a bound on how close
the solution is to a true sparse signal in the case where the number of
covariates is large. In a simulation study, we also compare the Dantzig
and adaptive Lasso for a moderate to small number of covariates. The
methods are applied to a breast cancer data set with gene expression
recordings and to the primary biliary cirrhosis clinical data.
model. This model is particularly simple to study theoretically and its
practical implementation has several major advantages to the similar
methodology for the proportional hazards model. One complication
compared with the proportional model is, however, that there is no
simple likelihood to work with. We here study a least squares criterion
with desirable properties and show how this criterion can be
interpreted as a prediction error. Given this criterion, we de. ne
ridge and Lasso estimators as well as an adaptive Lasso and study their
large sample properties for the situation where the number of
covariates p is smaller than the number of observations. We also show
that the adaptive Lasso has the oracle property. In many practical
situations, it is more relevant to tackle the situation with large p
compared with the number of observations. We do this by studying the
properties of the so-called Dantzig selector in the setting of the
additive risk model. Specifically, we establish a bound on how close
the solution is to a true sparse signal in the case where the number of
covariates is large. In a simulation study, we also compare the Dantzig
and adaptive Lasso for a moderate to small number of covariates. The
methods are applied to a breast cancer data set with gene expression
recordings and to the primary biliary cirrhosis clinical data.
Originalsprog | Engelsk |
---|---|
Tidsskrift | Scandinavian Journal of Statistics |
Vol/bind | 36 |
Udgave nummer | 4 |
Sider (fra-til) | 602 |
Antal sider | 619 |
ISSN | 0303-6898 |
Status | Udgivet - 2009 |
Udgivet eksternt | Ja |