Abstract
Random forest is a supervised learning method that combines many classification or regression trees for prediction. Here we describe an extension of the random forest method for building event risk prediction models in survival analysis with competing risks. In case of right-censored data, the event status at the prediction horizon is unknown for some subjects. We propose to replace the censored event status by a jackknife pseudo-value, and then to apply an implementation of random forests for uncensored data. Because the pseudo-responses take on values on a continuous scale, the node variance is chosen as split criterion for growing regression trees. In a simulation study, the pseudo split criterion is compared with the Gini split criterion when the latter is applied to the uncensored event status. To investigate the resulting pseudo random forest method for building risk prediction models, we analyze it in a simulation study of predictive performance where we compare it to Cox regression and random survival forest. The method is further illustrated in two real data sets.
Originalsprog | Udefineret/Ukendt |
---|---|
Tidsskrift | Statistics in Medicine |
Vol/bind | 32 |
Udgave nummer | 18 |
Sider (fra-til) | 3102-3114 |
Antal sider | 13 |
ISSN | 0277-6715 |
Status | Udgivet - 15 aug. 2013 |