Loops in R

When you use R enough for data processing and mining you’ll soon realize that R is very slow when it comes to loops. Basically if your loop takes a long time to finish, or you know it will require a substantial number of calls to memory, it is best to replace the whole thing with one of the many optimized functions in the R armory. It seems like every data manipulation, tabulations, cross-reference, etc I have ever had to do has a function written specifically for that procedure, so it’s worth looking around.

One very useful tool is the sapply() function; there are actually many variants of it, which you can see read about here. Skipping some option details


sapply(X,FUN)

where “X” is a vector or list and “FUN” is a function to be applied to every element of “X.”
For example, lets generate a list of 20 random numbers between -5 and 5, and round each of them.

numbers <- rnorm(20, mean=0, sd=5)
> numbers
 [1]  1.4150660 -3.3882659 -1.9812246  6.4669487  8.6084292 -6.8745447  6.9130365 -0.4088883 10.3087670  3.7913419 -0.9671000  5.6355830 -2.7645819 -2.9150188  2.3123911 -3.5230559
[17]  4.5095781  4.2253857 -2.8721957 -1.0569906
> sapply(numbers, round)
[1]  1 -3 -2  6  9 -7  7  0 10  4 -1  6 -3 -3  2 -4  5  4 -3 -1

So sapply() return a list where the function “round” has been applied to every element of “numbers.” Of course, I could also just do

round(numbers)

to get the same result, but sometimes this is not possible because the function your write requires manipulation of data outside the list “X,” or some such thing.

Lets look at another example, where we have a data frame of two columns “time,” a numeric time-stamp value and “glu,”  some glucose levels. First 10 rows look like this;

> df[1:10,]
           time glu
1  -61821514560 120
2  -61821514260 116
3  -61821513960 114
4  -61821513660 114
5  -61821513360 114
6  -61821513060 116
7  -61821512760 116
8  -61821512460 112
9  -61821512160 112
10 -61821511860 114

and we want to create an autocorrelation plot of this time series. Here is the version with a loop:

#make a correlation coef. vector
corcoef <- vector(mode="numeric")
l <- length(nrow(df))

for (i in 1:l){
corcoef[i] <- cor(df$glu[i:l], df$glu[1:(l-i+1)])
}

Here is the version with sapply().

corcoef <- sapply((1:l), function(i) cor(df$glu[i:l], df$glu[1:(l-i+1)]))

For every index i from 1 to length(corcoef) sapply() produces the appropriate correlation coefficient
cor(df$glu[i:l], df$glu[1:(l-i+1)])which we then write into corcoef.

Lets plot a bit of it

plot(corcoef[11000, type="l", col="blue")

See more about this data and autocorrelation on this post on mathbabe.org.

Of course, this is just a simple example and you can do much more.  In my experience using sapply() has decreased computation time significantly, 50+ fold in many cases.

Advertisements

2 thoughts on “Loops in R

  1. Matt says:

    Do you know what’s actually going on in the (*)apply functions? Upon first search I couldn’t find the code that was actually being executed.

    Here’s a simple example to show that you really don’t always get improvement – lapply (1.507 s) was much faster than the for loop (4.074 s), but sapply (5.409) was actually slower.

    n <- 1e6
    square <- rep(0,n)
    system.time(for(i in 1:n) square[i] <- i^2)
    system.time(square <- lapply(1:n,function(i) i^2))
    system.time(square <- sapply(1:n,function(i) i^2))

  2. notjustmath says:

    All I can say, is that sapply() is a wrapper for lapply() and the actual loop is called in C.
    More details here https://svn.r-project.org/R/trunk/src/main/apply.c

    Also, there is a package called plyr, http://cran.r-project.org/web/packages/plyr/index.html, that has many useful optimized function and is worthwhile looking into.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: