Monthly Archives: October 2011

Data Without Borders and NYCLU

Data Without Borders hosted their fist data datadive this weekend. Participating organizations include the NYCLU, UN, and MIX Market. Matt and I spend a chunk of Saturday’s afternoon looking at NY police department stop and frisk data. You can see our brief summary here.

Understanding math

Three mathematicians and three physicists are taking the train to go to a conference. The three physicist are first in the ticket line and buy three tickets; the mathematicians buy just one. The physicists ask: “What are you doing? The conductor is going to see that you only have one ticket and will surely throw two of you off the train.” The mathematicians reply, “don’t worry, we have a proven method.”

The train departs and the physicists take their seats, but do so strategically so as to observe the mathematicians’ method. They watch the mathematicians crowd into a single bathroom stall. When the conductor comes buy and knocks, the door is opened partway, a hand reaches out with a ticket, the conductor takes the ticket and all safely reach their destination.

On the way back from the conference, the physicist are again first in the ticket line, but this time they buy one ticket. The mathematicians buy none. The physicists ask: “What are you doing? Surely this way the conductor is going to throw all of you off the train.’ The mathematicians reply, “don’t worry, we have a proven method.”

When the train departs, the physicists and mathematicians crowd into separate bathroom stalls. Shortly before the conductor comes by, one of the mathematicians runs over the to the physicists’ stall and knocks. The door is opened partway, a hand with a ticket reaches out, the mathematician takes the ticket and returns to his stall.

Moral: if you are going to use mathematics, understand the method.

Mental capacity

“You know what the trouble with me is: I was raised above my mental capacity.”

A good ways into “Eyeless in Gaza,” likely Aldous Huxley’s best and certainly most autobiographical book, the main character Anthony Beavis lays this stark realization onto Helen, a female acquaintance he at once courts, reveres and is intimidated by. When I first read that a few years ago, I felt a dim flicker somewhere upstairs as if finally I had stumbled onto the missing link. Sort of like the moment when you visit a sports physician and he informs you that

P: … I am sorry to say but it seems to me that there is really very little chance that you will ever be able to run that 100m in under 10s.

Me: Really, why do you say so?

P: Well, you only have one leg.

Me: Blasted! – you are right, I do only have one leg!

So there you have it; I put the book down and decided to mix an afternoon cocktail to congratulate Mr. Huxley for finally informing me of the problem at hand. At that point there was nothing left except to spend the rest of the day dozing in a hammock overlooking the rolling hills of the French countryside, waking up every so often with the words “ah yes, indeed, that’s it” and so on…

You see I come from a family of doctors, scientist, teachers and the like, where the home walls were always lined with books and “high-subjects” always respected – a family of “intellectuals” as they were called in eastern Europe. There are many such families, each with its own particular leanings. Some who seek the higher ideal in more artistic, or humanist, mediums, some who seek them in more scientific ones. Mine was of the scientific bend, so it’s not a surprise that I ended up a mathematician, and even less surprising that such a choice was always wholeheartedly supported.

Of course, it’s a fortune to come from a family like mine, because they are pretty amazing (well, except for the black sheep part of the family…) and all these intellectual pursuits make for a dashed dandy way  to grow up. But when I look around and see the few people I have met who seemed truly destined to do what they are doing, the guys you knew were just going to play music, or paint, or be great at math, then you start to wonder: how did I end up here and why was it so difficult all the time?

Well, the obvious answer is that these types of pursuits were the ones encouraged and ones in which I was also made to realize my inherent interest from an early age. In short, perhaps I was also raised above my mental capacity. There were many other factors that contributed to the choices I made and how much I accomplished, but the bottom line was always with Anthony, or really Huxley. And yes, I know it is not easy for anyone to do worthwhile things, but some are more apt than others at certain pursuits. If you are not designed to run 100m in less than 10s, no matter how hard you try it won’t happen – you might work very hard and that will take you far, but it will be shy of great. Sports is one medium where such evaluations are delved out early and for the most part unequivocally.

Perhaps reaction to “intellectual” pursuits is a main reason that I have done so much sports in the last ten years, mostly in the genre of the kind that cause long-lasting discomfort. At least there is no illusion of inherent aptitude in athletic pursuits, so I am free to compete with only myself (not necessary an easy task).

The afternoon came to an end, and I was out of the hammock just in time to get that four hour bike ride in before the sun finally set. Of course, at some point I had to wake up – the mill was waiting.

Prediction Assessment

Quite recently I (Matt) learned from my colleague Brett a little bit about information theory and how people have developed precise methods to methods to tell us how much one random variable can “teach us” about another random variable. As he explained to me, this theory can be applied to assign a numerical value assessing how good a set of predictions is. (All of this is originally due to Claude Shannon). For example, every day you wake up and the weather man on channel 2 says it’s going to rain some days, and the weather man on channel 4 says it’s going to rain on other days. After a year of listening to both of these predictions, you want to decide who is better so you only have to listen to one channel. How are you supposed to make your decision?

One issue is that everyone has some prior information about the weather where they live. At the extreme case, suppose you live in the desert where it’s sunny 99% of the days. The weather man on channel 2 is lazy – he just gets up every morning and says it’s going to be sunny, every single day, and goes home. He’s right 99% of the time, but he hasn’t helped us any! Suppose that the weather man on channel 4 though, predicts it’s going to rain 1% of the time, but when he does he’s right only 50% of the time. What is the probability that it’s going to rain when he says it’s going to rain? Well then obviously it’s 50% – and he has given us some real information. On the other hand, when he predicts that it’s going to be sunny, he’s right almost 99.5 percent of the time, not bad.

In what follows, I’m going to explain how to measure how much information the weather man (or any random variable) tells us about the weather (or any other random variable). By necessity, it’s a bit technical.

Continue reading