Report on De Bin, Riccardo, et al.'s (2016) Paper (Subsampling versus Bootstrapping)
作者:互联网
1 Introduction
Variable selection is important part in data analysis. The popular variable selection procedures such as backward elimination and stepwise regression are shown to have instability issues (Copas and Long, 1991), which means that the variables selected can be easily affected by small changes in the data set (Gifi, 1990). A resampling technique bootstrap (Efron, 1979) has been used to study how these variable selection procedures perform. The basic idea of bootstrap is to create new samples from the original sample by random sampling with replacement. Variable selection methods can be applied to these new pseudo-samples to study their performance.
Janitza et al.'s (2015) research shows that bootstrap would select too many variables. Another technique called subsampling (Hartigan, 1969) has been used to study this problem by Meinshausen and Buhlmann (2006, 2010). However, the subsample method has not been studied in detail, nor has it been compared with bootstrap method. Thus, the De Bin, Riccardo, et al.'s (2016) paper compared the performance of these two techniques on model selection for multivariable regression. Variable inclusion frequency (Gong, 1982, Sauerbrei and Schumacher, 1992) is used as the criteria of the performance.
The rest of this report is structured similarly to the original paper. There is firstly a brief introduction of the sub-sampling techniques, variable selection procedures used and the inclusion frequencies, which is the criteria of the performance and the results of the study on the data sets used (two real-life data sets and one simulation data set), followed by a brief conclusion and evaluation.
2 Study process overview
2.1 Variable selection
The paper mainly studied the ontological sparsity of the regression model. That is to avoid including more variables unrelated to the response than necessary. The variable selection methods are not the focus of this study. Only the results for backward elimination with possible reinclusion are reported.
The selection criteria has a great influence on the number of variables in the model and whether the final model is stable (Royston and Sauerbrei, 2008, Chapter 2). In this paper, only the results for backward elimination and forward selection with the significance level \(\alpha = 0.5\) for a likelihood ratio test on the regression coefficients is reported.
2.2 Resampling
Inclusion frequencies and selected models
The final models selected by a model selection procedure for pseudo-samples would be different due to the small changes introduced by the resampling process. The difference between variables can be measured as the proportion of times a variable is included in the selected models of these pseudo-samples. That is defined as the "inclusion frequency". For instance, a variable included in the final model for all pseudo-samples has an inclusion frequency of \(1\) while a variable never included has the inclusion frequency of \(0\). The selected models are also of interest in the study.
Resampling strategies
In Mammen's (1992) and Bickel et al.'s (1997) studies, examples of the non-convergence of bootstrap emerged. Wu (1986) introduced a subsampling technique (also known as deleted jackknife), which is consistent even in cases where the classic bootstrap does not work (Davison et al., 2003; Chernick, 2011).
In this study, three resampling strategies were used. For a data set of \(n\) observations, the three strategies are:
- classical bootstrap: drawing \(n\) observations with replacement from the original data, denoted by bootstrap(n). Note that the bootstrap samples are the same size as the original data;
- m out of n bootstrap: drawing \(m = [0.632n]\) (the nearest integer to \(0.632n\)) observations with replacement from the original data, denoted by bootstrap(m);
- subsampling: drawing \(m = [0.632n]\) observations without replacement from the original data, denoted by subsampling(m).
The sample size of the last two resampling techniques (\(m = [0.632n]\), i.e. \([(1 - e^{-1})n]\)) is chosen by using the expected proportion of unique observations in a classical bootstrap sample. The bootstrap(m) is of the same sample size as the subsampling(m), for direct comparison.
2.3 Criteria to compare results
The results are mainly compared using inclusion frequencies. For real-life data, the number of unique models, model selection frequency and model sparsity (average number of variables in the model) are described for the effects of the different resampling approaches. For the simulation data, the inclusion frequencies are directly compared to the real ones. The strong effect variables should have inclusion frequencies close to \(1\), while the noise variables are close to \(\alpha\) the significant value. Weak effect variables should have values between \(0\) and \(1\).
The paper uses a measure to evaluate how well the inclusion frequencies can discriminate the relevant and the noise variable in simulation data. The area under the curve (AUC) is estimated as the mean over all relevant variables of the proportion of noise variables with lower inclusion frequency than each single relevant variable. An AUC of \(1\) describes perfect discrimination and \(0.5\) means no obvious discrimination.
2.4 Results
Real data example
For two real-life data sets, variables with high inclusion frequencies are referred to as core variables. The inclusion frequencies of these core variables are large for subsample(m). The values for bootstrap(m) are smaller suggesting poor performance. The subsample(m) also gives smaller inclusion frequencies for the least included variables. It is worth noting that there exist variables with inclusion frequencies obtained by subsample(m) being smaller than the type-I error. That is because of the influence of their high correlation with other variables. This problem would not be spotted if only applying backward elimination on the original data. The results suggest that subsample(m) can separate the variables with strong effect from those with weak or no effect better than bootstrap(n) and bootstrap(m). The results are similar for different model selection methods with different selection criteria.
As for the models selected, subsample(m) seems to include fewer variables than the two bootstrap methods for different model selection methods with different selection criteria. And the outcomes of bootstrap(m) are more similar to those of bootstrap(m) rather than subsample(m). The paper also notes that the models selected with subsample(m), whose variables are relatively less, predicts slightly better, on average, than those selected with the two bootstrapping methods.
Simulation Study
The variables with strong effects, weak effects and no effects are already known for the simulation data. The results show that those variables with no effects are with inclusion frequencies obviously higher than the theoretical value \(\alpha\) (i.e. the type-I error). The values are much smaller for subsample(m) samples than the two bootstrap methods. And the frequencies with the former variable selection technique are slightly larger than those with the backward elimination. For the relevant variables, subsample(m) provides higher inclusion frequencies than bootstrap(m) except for the binary and strongly unbalanced variables. The ranking of the variables is the same for the three resampling methods.
The results related to AUC are that subsample(m) has the ability to discriminate by inclusion frequency, the other two are much worse and bootstrap(m) is slightly better than bootstrap(n). This is mainly because the bootstrap method has a higher chance of including unrelated variables. Therefore subsample(m) seems to be a better choice than bootstrap concerning this.
The result is similar to the real data examples as for models selected. The bootstrap methods tends to include several variables with no effects more frequently than subsample(m). The two bootstrap methods have similar performance regarding the frequencies of selecting the true model.
3 Discussion
This paper compared the performance of subsampling and bootstrap methods in variable selection for multivariate regression. The results of the simulation study show that the bootstrap method has a high inclusion rate for noise variables. The subsampling method, with higher AUC values in the study, is proved to perform better in discriminating variables with and without effect.
Further exploration is required to find the reason for the high inclusion frequencies of noise variables by bootstrap. One possible reason for this is the larger significance level of the test on the bootstrap sample (Janitza et al., 2015).
One possible weakness of subsample(m) is its tendency to select the weak effect variables too few times, which may result in underfitting. This can be explained by the lower power of the significance test with smaller sample size and also the correlation between variables. Larger sample size may have better performance. But too large a sample size can not be used for the study of the instability. Too-small sample size may cause underfitting for both bootstrap(m) and subsample(m). The subsample(n) (i.e. use the same sample size of the original sample) may be worth exploring.
References
Bickel, P. J., Gotze, F., and van Zwet, W. R. (1997). Resampling fewer than n observations: Gains, losses, and remedies for losses. Statistica Sinica 7, 1–31.
Chernick, M. R. (2011). Bootstrap Methods: a guide for practitioners and researchers. Wiley
Copas, J. B. and Long, T. (1991). Estimating the residual variance in orthogonal regression with variable selection. The Statistician 40, 51–59.
Davison, A. C., Hinkley, D. V., and Young, G. A. (2003). Recent developments in bootstrap methodology. Statistical Science 18, 141–157.
De Bin, R., Janitza, S., Sauerbrei, W., & Boulesteix, A. L. (2016). Subsampling versus bootstrapping in resampling‐based model selection for multivariable regression. Biometrics, 72(1), 272-280.
Efron, B. (1979). Bootstrap methods: Another look at the jackknife. The Annals of Statistics 7, 1–26.
Gifi, A. (1990). Nonlinear Multivariate Analysis. Chichester: Wiley.
Hartigan, J. A. (1969). Using subsample values as typical values. Journal of the American Statistical Association 64, 1303–1317.
Janitza, S., Binder, H., and Boulesteix, A.-L. (2015). Pitfalls of hypothesis tests and model selection on bootstrap samples: Causes and consequences in biometrical applications. Biometrical Journal , to appear
Mammen, E. (1992). When Does Bootstrap Work? New York: Springer.
Meinshausen, N. and Buhlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics 34, 1436–1462.
Meinshausen, N. and Buhlmann, P. (2010). Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72, 417–473.
Royston, P. and Sauerbrei, W. (2008). Multivariable Modelbuilding: A pragmatic approach to regression anaylsis based on fractional polynomials for modelling continuous variables. Chichester: Wiley.
Wu, C.-F. J. (1986). Jackknife, bootstrap and other resampling methods in regression analysis. The Annals of Statistics 14, 1261–1295.
标签:Bin,versus,selection,Bootstrapping,variables,bootstrap,inclusion,subsample,data 来源: https://www.cnblogs.com/zerozhao/p/subsampling_vs_bootstrapping.html