Category Archives: Uncategorized

Mental capacity

“You know what the trouble with me is: I was raised above my mental capacity.”

A good ways into “Eyeless in Gaza,” likely Aldous Huxley’s best and certainly most autobiographical book, the main character Anthony Beavis lays this stark realization onto Helen, a female acquaintance he at once courts, reveres and is intimidated by. When I first read that a few years ago, I felt a dim flicker somewhere upstairs as if finally I had stumbled onto the missing link. Sort of like the moment when you visit a sports physician and he informs you that

P: … I am sorry to say but it seems to me that there is really very little chance that you will ever be able to run that 100m in under 10s.

Me: Really, why do you say so?

P: Well, you only have one leg.

Me: Blasted! – you are right, I do only have one leg!

So there you have it; I put the book down and decided to mix an afternoon cocktail to congratulate Mr. Huxley for finally informing me of the problem at hand. At that point there was nothing left except to spend the rest of the day dozing in a hammock overlooking the rolling hills of the French countryside, waking up every so often with the words “ah yes, indeed, that’s it” and so on…

You see I come from a family of doctors, scientist, teachers and the like, where the home walls were always lined with books and “high-subjects” always respected – a family of “intellectuals” as they were called in eastern Europe. There are many such families, each with its own particular leanings. Some who seek the higher ideal in more artistic, or humanist, mediums, some who seek them in more scientific ones. Mine was of the scientific bend, so it’s not a surprise that I ended up a mathematician, and even less surprising that such a choice was always wholeheartedly supported.

Of course, it’s a fortune to come from a family like mine, because they are pretty amazing (well, except for the black sheep part of the family…) and all these intellectual pursuits make for a dashed dandy way  to grow up. But when I look around and see the few people I have met who seemed truly destined to do what they are doing, the guys you knew were just going to play music, or paint, or be great at math, then you start to wonder: how did I end up here and why was it so difficult all the time?

Well, the obvious answer is that these types of pursuits were the ones encouraged and ones in which I was also made to realize my inherent interest from an early age. In short, perhaps I was also raised above my mental capacity. There were many other factors that contributed to the choices I made and how much I accomplished, but the bottom line was always with Anthony, or really Huxley. And yes, I know it is not easy for anyone to do worthwhile things, but some are more apt than others at certain pursuits. If you are not designed to run 100m in less than 10s, no matter how hard you try it won’t happen – you might work very hard and that will take you far, but it will be shy of great. Sports is one medium where such evaluations are delved out early and for the most part unequivocally.

Perhaps reaction to “intellectual” pursuits is a main reason that I have done so much sports in the last ten years, mostly in the genre of the kind that cause long-lasting discomfort. At least there is no illusion of inherent aptitude in athletic pursuits, so I am free to compete with only myself (not necessary an easy task).

The afternoon came to an end, and I was out of the hammock just in time to get that four hour bike ride in before the sun finally set. Of course, at some point I had to wake up – the mill was waiting.

Ensemble Models

The bias vs variance trade-off in model selection and building is one of the cornerstones of machine learning theory (see for example this discussion here or section 3.2 of Seni and Elder’s wonderful book).  In short, bias is how closely the model fits the training data and variance is the variation of prediction. The key fact is that the expected Mean Squared Error is a sum of bias squared, variance and and an irreducible error component. For example, if different training sets give rise to very different classifiers then the variance of the model is relatively high, causing the bias to be relatively low.

One possible way to approach this balance game is via “ensemble models.” Again, I refer the reader to Seni and Elder’s book as a great introduction to the subject. An ensemble model starts with a collection of models, takes the estimates produced and combines these into one prediction (usually via some sort of weighted sum). These models can be completely different in flavor, the same model trained on different sets, or a combination of the two. For this example, I’ll focus on an svm model trained on 50 different sets. The simple code below, which Matt and I wrote a little while back when building a predictive model, takes a training set called “train,” generates 50 svm-models and runs a linear regression on their outputs.

N <- nrow(train)

# build the ksvm models

num_svm = 50

svm_models <- data.frame()

svm_models_p <- matrix(nrow = N, ncol = num_svm)

for (i in 1:num_svm) {

rows <- sample(1:N, size = .10* N, replace = FALSE)

svm_models <- c(svm_models, ksvm(y~., data = train[rows,],  kernel = 'rbfdot'))

svm_models_p[,i] <- predict(svm_models[i][[1]], train)

}

df <- data.frame(svm_models_p, train_y)

# run a regression on their outputs

lin_model <- lm(log(train_y + 1) ~., df)

Ensemble methods have proven to be a powerful tool. The two top entries in health prize milestone 1 both used ensembles to generate their predictions.

Elastic R

Cathy  told me about elastic R a few weeks ago I’ve been playing around with it a bit. The goal of the project is to create a collaborative, web browser-based, environment for data analysis and computing in the cloud. So far I have to say the it looks absolutely great.

Here is a brief overview, with step-up step instructions of how to get started. Basically you sign up for an Amazon elastic compute account and use this to initiate a session on the elastic R site. This takes about 5-10 min to set up.

Then you are provided with a UI, that contains an IDLE,  file manager, graphic manager, collaboration and a few other consoles. All are easy to use and pretty much self-explanatory. Then you just code in R as you would before, upload files into the server, and explore. Moreover, it allows for real-time collaboration so you can share code, data, and run/test with others on deck. You can also grant different access privileges to your collaborators, manage projects, etc. Another great feature is that you can send “instances” with ready made widgets to others, so they could explore your data.

I’ll be looking into this more in the near future and will try to post what I learn.