R versus Numpy benchmark

In an experiment I’m doing, I need to do correlations between very large matrices. I was surprised to see that a single correlation calculation could take many hours. I decided it was worthwhile to benchmark my two main statistics options, R and Numpy, to see which would do it faster.

Here’s the code I used to do the Python:

import scipy.stats
import numpy as np
import time
import random
import sys

n = int(sys.argv[1])
reps = 1
elapsed = 0.0
for i in range(reps):
    start = time.time()
    x, y = [np.random.random((n, n)) for i in range(2)]
    scipy.stats.mstats.kendalltau(x, y)
    end = time.time()
    elapsed += end - start
print("Elapsed time (s): " + str(elapsed / reps))

Here’s the code I used to do the R (it’s not exactly the same, because R doesn’t give the option to run a correlation on a matrix directly, so I’m just using a linear array).

cmd_args = commandArgs();
n = as.real(cmd_args[4]) # use like this: R --vanilla --silent < tau.r 10 [10 is then arg 4]
x <- runif(n*n, 0.0, 1.0)
y <- runif(n*n, 0.0, 1.0)
system.time(cor.test(x, y, method="kendall"))

And here’s the enclosing shell script which runs each of the above for different sizes (it only runs once, because the long ones take so long, and because very long runs don’t suffer as badly from some sources of timing errors):

for i in 10 50 250 1250; do
	echo $i; R --vanilla --silent < tau.r $i; ./tau.py $i;
done

Results

sizeRNumpy
10*100.004s0.076s
50*500.138s2.023
250*25077s98s
1250*125050239s40414s

Unscientific conclusion: R wins easily for small sizes (possible explanation is that it doesn’t need to pre-process the 2d array to a 1d array), but at the large size I need, Numpy is the way to go.

Update many months later: the reason it’s abominably slow in both is that I’m using Kendall’s tau, a rank-based measure of correlation, which has O(n^2) time complexity! The real conclusion is therefore: consider Spearman’s rho instead, which is also rank-based, but faster. However, there is a can of worms here. They don’t give the same results, and it’s not obvious how to judge which is “better”.