While speeding up some code the other day working on a project with a colleague I ended up trying Rcpp for the first time. I re-implemented the cosine distance function using RcppArmadillo relatively easily using bits and pieces of code I found scattered around the web. But the speed increase was not as much as I expected comparing the Rcpp code to pure R.
And here is the speed comparison...
I don't know really if my implementation can be improved? For example, there is this step at the beginning where the R matrix is transformed to a Rcpp::NumericMatrix and then to an arma::mat matrix. I could not ran the code without this step. I don't think it plays that much into the run time anyway as it should be all in-memory operation but I would be curious to know if there is another way.
Not sure if it will be faster but you can create X using `arma::mat X = Rcpp::as< arma::mat >(Xs);`
ReplyDeleteIf you use the byte compiler for the cosine (native R) function, you may see improvements in the speed there as well.
ReplyDelete?compile
or
http://stat.ethz.ch/R-manual/R-devel/library/compiler/html/compile.html
Thanks for the suggestions. I did try the compiled version of cosine but saw no difference with native R and ended up not posting it. Here is a more complete benchmark:
ReplyDeletecosineRcpp2 follow @Jeffrey suggestion and cosC = cmpfun(cosine).
res
test replications elapsed relative user.self sys.self
2 cosineRcpp(mat) 100 15.749 1.000000 13.925 0.771
3 cosC(mat) 100 16.460 1.045146 14.587 1.075
4 cosineRcpp2(mat) 100 16.595 1.053718 14.090 0.785
1 cosine(mat) 100 17.032 1.081465 14.767 1.075
For the native R code try getting rid of the brackets and posting the steps sequentially. R brackets slow down the function quite a bit....
ReplyDeletefor instance:
"res <- 1 - y / (sqrt(diag(y)) %*% t(sqrt(diag(y))))"
can be:
"res.a <- diag(y)
res.a <- sqrt(res.a)
res.b <- t(res.a)
res <- res.a %*% res.b
res <- y/res
res <- 1 - res"