Tuesday, March 19, 2013

How not to reveal your MySQL DB login/password when sharing code on GitHub or BitBucket?

Solution: use your ~/.my/cnf
Inside your ~/.my.cnf file define the connection parameters to your databases. For example, here I define two groups called local and toto:
user = root
password = ultra_secret
host = localhost

user = capitaine_flam
password = galaxy
host = milky.way.net
This allow me to connect to the databases defined in the group from R using the following command:
This trick also apply when you are writing report using knitR to create Sweave or markdown reports. Your password is stored in clear in your .my.cnf file. So as long as your file system is not compromised you are fairly safe.

Sunday, February 24, 2013

Large correlation in parallel

A little improvement to the bigcor function proposed on Rmazing to compute huge correlation matrix in R, I made the function work in parallel using all the CPU cores available on the machine. The code is here.

Here is a benchmark of the 2 functions on my machine with 8 cores:

Monday, January 14, 2013

Air quality analysis from Beijing twitter feed.

As air pollution in Beijing reach new high [NYT article]. I re-ran the analysis I put online a few months ago.
"Crazy bad" is a good description when it reach those levels. But I am sure there are other place like Mexico city, LA etc... that also look as dramatic as those number for Beijing. The fact that the machine is tweeting make the analysis so easy. I hope it keep tweeting and that other place in the world do the same.

From the NYT article:
The existence of the embassy’s machine and the @BeijingAir Twitter feed have been a diplomatic sore point for Chinese officials. In July 2009, a Chinese Foreign Ministry official, Wang Shu’ai, told American diplomats to halt the Twitter feed, saying that the data “is not only confusing but also insulting,” according to a State Department cable obtained by WikiLeaks. Mr. Wang said the embassy’s data could lead to “social consequences.”