Ensemble Models

The bias vs variance trade-off in model selection and building is one of the cornerstones of machine learning theory (see for example this discussion here or section 3.2 of Seni and Elder’s wonderful book).  In short, bias is how closely the model fits the training data and variance is the variation of prediction. The key fact is that the expected Mean Squared Error is a sum of bias squared, variance and and an irreducible error component. For example, if different training sets give rise to very different classifiers then the variance of the model is relatively high, causing the bias to be relatively low.

One possible way to approach this balance game is via “ensemble models.” Again, I refer the reader to Seni and Elder’s book as a great introduction to the subject. An ensemble model starts with a collection of models, takes the estimates produced and combines these into one prediction (usually via some sort of weighted sum). These models can be completely different in flavor, the same model trained on different sets, or a combination of the two. For this example, I’ll focus on an svm model trained on 50 different sets. The simple code below, which Matt and I wrote a little while back when building a predictive model, takes a training set called “train,” generates 50 svm-models and runs a linear regression on their outputs.

N <- nrow(train)

# build the ksvm models

num_svm = 50

svm_models <- data.frame()

svm_models_p <- matrix(nrow = N, ncol = num_svm)

for (i in 1:num_svm) {

rows <- sample(1:N, size = .10* N, replace = FALSE)

svm_models <- c(svm_models, ksvm(y~., data = train[rows,],  kernel = 'rbfdot'))

svm_models_p[,i] <- predict(svm_models[i][[1]], train)


df <- data.frame(svm_models_p, train_y)

# run a regression on their outputs

lin_model <- lm(log(train_y + 1) ~., df)

Ensemble methods have proven to be a powerful tool. The two top entries in health prize milestone 1 both used ensembles to generate their predictions.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: