Parallel Computing with R

One of the reasons that R can be quite slow is that by default it uses only one core, regardless of how many your machine actually runs. There are a number of ways to get better computing time using R and with almost no code overhead increase performance by at a factor of at least the number cores locally available. Most of the packages are designed for running network clusters, but they work equally, albeit likely not as quickly, well with just one machine. Luckily many of them have very nice high-level wrappers that essentially hide all of the low-level maintenance. In addition, the examples to follow provide a good introduction to parallel computing in the case you decided to take it to the next level, linking multiple machines together, etc.

I will give a brief survey on the workings of a few of these packages in view of just one machine (extending this to a ‘real’ cluster basically only requires making sure that all packages and dependencies are installed in all machines and passwordless ssh login is enabled).

So suppose that you have a dual-core machine, like my Mac OS X 10.6 here, and you want to run a process in parallel. You can start with the package snowfall, which is a wrapper for snow, which in turns depends on Rmpi, Socket and so on… you can see the dependency tree here. However, none of that is important at the moment if you want to just try and run some computation faster. Start with installing snowfall, using say install.packages(“snowfall”) in the R console and proceed as follows:

1) Initialize a two cpu cluster (or however many cpus you have on board)

1.5) if necessary push data to all the cpu’s

2) run your computation

3) stop the cluster

Snowfall has it’s own version of all the functions in the apply() family, as well as some other network management tools.

Lets try an example in both parallel and sequential mode:

library(snowfall)</pre>
sfInit(parallel=TRUE, cpus=2, type="SOCK", socketHosts=rep('localhost',2) )

###if you have any data that will be used in the computation, such as a matrix or data frame, you need to make sure these are pushed (written) to all the cores

sfExportAll()

### or if you want to export just one object

sfExport("data")

system.time(sfLapply(1:100000, function(i){exp(i)}))

#user system elapsed

#0.072 0.009 0.172

system.time(lapply(1:100000, function(i){exp(i)}))

#user system elapsed

#0.140 0.021 0.163

sfStop()

So what happened? the regular version lapply() was actually faster than the parallel sfLapply(). Well, this is due to a combination of the time it takes to copy data plus latency and the moral of the story is that parallel computing is not always better; it makes sense only when the computation time is actually significantly longer than the writing/latency overhead.

But let us take a look at an example that actually justifies the post. Here is a code snippet from an tree ensemble model and the bit we’ll take a look at has R predicting from a fitted rpart tree, here called ‘tlist[[1]],’ i.e. the first tree in my ensemble, on a data set of 22 variables and 65000 rows, called ‘datatrain.’

library(snowfall)</pre>
sfInit(parallel=TRUE, cpus=2, type="SOCK", socketHosts=rep('localhost',2) )

sfExport('tlist')

sfExport('datatrain')

system.time(result <- sfLapply(1:nrow(datatrain), function(r) {predict(tlist[[1]],datatrain[r,])}))

#user system elapsed

#0.357 0.497 158.179

system.time(result <- lapply(1:nrow(datatrain), function(r) {predict(tlist[[1]],datatrain[r,])}))

#user system elapsed

#281.359 8.689 290.506

sfStop()

Generally I’ve seen about a 2x improvement in such calculations, which makes sense. Now lets try the same with another package called “multicore,” which is designed for the exact purpose of computing on a single machine with multiple cores.


library("multicore")

system.time(result <- mclapply(1:nrow(datatrain), function(r) {predict(tlist[[1]],datatrain[r,])}))

#user system elapsed

#235.624 9.320 132.308

Even better… Multicore also has a number of functions for loop parallelization, as well as some other tools.

So why would you ever use snowfall over multicore. Well, one reason is that multicore only features the lapply() function, where snowfall has the entire family, but that might not be a sticking point for you; however, managing multiple machines with snowfall really doesn’t take more work than managing a dual-core one, so if you are planning to write code that potentially needs to be scaleable this is a good way to go.

Either way, both of these packages give a very quick and easy way to see immediate performance improvement in your computations. Of course, this is only a small part of the story which I will continue in subsequent posts.

notjustmath

Exploring the world of data mining, statistical inference and related mathematics.