This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require(inline) | |
require(RcppArmadillo) | |
## extract cosine similarity between columns | |
cosine <- function(x) { | |
y <- t(x) %*% x | |
res <- 1 - y / (sqrt(diag(y)) %*% t(sqrt(diag(y)))) | |
return(res) | |
} | |
cosineRcpp <- cxxfunction( | |
signature(Xs = "matrix"), | |
plugin = c("RcppArmadillo"), | |
body=' | |
Rcpp::NumericMatrix Xr(Xs); // creates Rcpp matrix from SEXP | |
int n = Xr.nrow(), k = Xr.ncol(); | |
arma::mat X(Xr.begin(), n, k, false); // reuses memory and avoids extra copy | |
arma::mat Y = arma::trans(X) * X; // matrix product | |
arma::mat res = (1 - Y / (arma::sqrt(arma::diagvec(Y)) * arma::trans(arma::sqrt(arma::diagvec(Y))))); | |
return Rcpp::wrap(res); | |
') | |
mat <- matrix(rnorm(100000), ncol=1000) | |
x <- cosine(mat) | |
y <- cosineRcpp(mat) | |
identical(x, y) | |
[1] TRUE |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(rbenchmark) | |
res <- benchmark( cosine(mat), | |
cosineRcpp(mat), | |
columns=c("test", "replications", "elapsed", | |
"relative", "user.self", "sys.self"), | |
order="relative", | |
replications=1000) | |
res | |
test replications elapsed relative user.self sys.self | |
2 cosineRcpp(mat) 1000 149.974 1.000000 139.389 7.356 | |
1 cosine(mat) 1000 161.268 1.075306 148.613 10.069 |
Not sure if it will be faster but you can create X using `arma::mat X = Rcpp::as< arma::mat >(Xs);`
ReplyDeleteIf you use the byte compiler for the cosine (native R) function, you may see improvements in the speed there as well.
ReplyDelete?compile
or
http://stat.ethz.ch/R-manual/R-devel/library/compiler/html/compile.html
Thanks for the suggestions. I did try the compiled version of cosine but saw no difference with native R and ended up not posting it. Here is a more complete benchmark:
ReplyDeletecosineRcpp2 follow @Jeffrey suggestion and cosC = cmpfun(cosine).
res
test replications elapsed relative user.self sys.self
2 cosineRcpp(mat) 100 15.749 1.000000 13.925 0.771
3 cosC(mat) 100 16.460 1.045146 14.587 1.075
4 cosineRcpp2(mat) 100 16.595 1.053718 14.090 0.785
1 cosine(mat) 100 17.032 1.081465 14.767 1.075
For the native R code try getting rid of the brackets and posting the steps sequentially. R brackets slow down the function quite a bit....
ReplyDeletefor instance:
"res <- 1 - y / (sqrt(diag(y)) %*% t(sqrt(diag(y))))"
can be:
"res.a <- diag(y)
res.a <- sqrt(res.a)
res.b <- t(res.a)
res <- res.a %*% res.b
res <- y/res
res <- 1 - res"