The bias vs variance trade-off in model selection and building is one of the cornerstones of machine learning theory (see for example this discussion here or section 3.2 of Seni and Elder’s wonderful book). In short, bias is how closely the model fits the training data and variance is the variation of prediction. The key fact is that the expected Mean Squared Error is a sum of bias squared, variance and and an irreducible error component. For example, if different training sets give rise to very different classifiers then the variance of the model is relatively high, causing the bias to be relatively low.

One possible way to approach this balance game is via “ensemble models.” Again, I refer the reader to Seni and Elder’s book as a great introduction to the subject. An ensemble model starts with a collection of models, takes the estimates produced and combines these into one prediction (usually via some sort of weighted sum). These models can be completely different in flavor, the same model trained on different sets, or a combination of the two. For this example, I’ll focus on an svm model trained on 50 different sets. The simple code below, which Matt and I wrote a little while back when building a predictive model, takes a training set called “train,” generates 50 svm-models and runs a linear regression on their outputs.

N <- nrow(train)
# build the ksvm models
num_svm = 50
svm_models <- data.frame()
svm_models_p <- matrix(nrow = N, ncol = num_svm)
for (i in 1:num_svm) {
rows <- sample(1:N, size = .10* N, replace = FALSE)
svm_models <- c(svm_models, ksvm(y~., data = train[rows,], kernel = 'rbfdot'))
svm_models_p[,i] <- predict(svm_models[i][[1]], train)
}
df <- data.frame(svm_models_p, train_y)
# run a regression on their outputs
lin_model <- lm(log(train_y + 1) ~., df)

Ensemble methods have proven to be a powerful tool. The two top entries in health prize milestone 1 both used ensembles to generate their predictions.

### Like this:

Like Loading...

*Related*