TY - JOUR
T1 - Independent screening for single-index hazard rate models with ultrahigh dimensional features
AU - Gorst-Rasmussen, Anders
AU - Scheike, Thomas
PY - 2013/3/1
Y1 - 2013/3/1
N2 - In data sets with many more features than observations, independent screening based on all univariate regression models leads to a computationally convenient variable selection method. Recent efforts have shown that, in the case of generalized linear models, independent screening may suffice to capture all relevant features with high probability, even in ultrahigh dimension. It is unclear whether this formal sure screening property is attainable when the response is a right-censored survival time. We propose a computationally very efficient independent screening method for survival data which can be viewed as the natural survival equivalent of correlation screening. We state conditions under which the method admits the sure screening property within a class of single-index hazard rate models with ultrahigh dimensional features and describe the generally detrimental effect of censoring on performance. An iterative variant of the method is also described which combines screening with penalized regression to handle more complex feature covariance structures. The methodology is evaluated through simulation studies and through application to a real gene expression data set.
AB - In data sets with many more features than observations, independent screening based on all univariate regression models leads to a computationally convenient variable selection method. Recent efforts have shown that, in the case of generalized linear models, independent screening may suffice to capture all relevant features with high probability, even in ultrahigh dimension. It is unclear whether this formal sure screening property is attainable when the response is a right-censored survival time. We propose a computationally very efficient independent screening method for survival data which can be viewed as the natural survival equivalent of correlation screening. We state conditions under which the method admits the sure screening property within a class of single-index hazard rate models with ultrahigh dimensional features and describe the generally detrimental effect of censoring on performance. An iterative variant of the method is also described which combines screening with penalized regression to handle more complex feature covariance structures. The methodology is evaluated through simulation studies and through application to a real gene expression data set.
KW - Additive hazards model
KW - Independent screening
KW - Survival data
KW - Ultrahigh dimension
KW - Variable selection
U2 - 10.1111/j.1467-9868.2012.01039.x
DO - 10.1111/j.1467-9868.2012.01039.x
M3 - Journal article
AN - SCOPUS:84873282464
SN - 1369-7412
VL - 75
SP - 217
EP - 245
JO - Journal of the Royal Statistical Society, Series B (Statistical Methodology)
JF - Journal of the Royal Statistical Society, Series B (Statistical Methodology)
IS - 2
ER -