tag:blogger.com,1999:blog-18610707321893664132024-03-19T03:16:53.707-07:00R ChronicleUnknownnoreply@blogger.comBlogger13125tag:blogger.com,1999:blog-1861070732189366413.post-13026864580870089502013-03-19T03:44:00.000-07:002013-03-19T03:44:43.292-07:00How not to reveal your MySQL DB login/password when sharing code on GitHub or BitBucket?Solution: use your ~/.my/cnf<br>
Inside your ~/.my.cnf file define the connection parameters to your databases. For example, here I define two groups called <b>local</b> and <b>toto</b>:<br />
<pre>
[local]
user = root
password = ultra_secret
host = localhost
[toto]
user = capitaine_flam
password = galaxy
host = milky.way.net
</pre>
This allow me to connect to the databases defined in the group from R using the following command:
<script src="https://gist.github.com/bobthecat/5195075.js"></script>
<br />
This trick also apply when you are writing report using knitR to create Sweave or markdown reports. Your password is stored in clear in your .my.cnf file. So as long as your file system is not compromised you are fairly safe.
<div>
<br /></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-1861070732189366413.post-91526555966883561062013-02-24T08:20:00.000-08:002013-02-24T09:11:58.083-08:00Large correlation in parallelA little improvement to the bigcor function proposed on <a href="http://rmazing.wordpress.com/2013/02/22/bigcor-large-correlation-matrices-in-r/" target="_blank">Rmazing</a> to compute huge correlation matrix in R, I made the function work in parallel using all the CPU cores available on the machine. The code is <a href="https://gist.github.com/bobthecat/5024079">here</a>.<br />
<br />
Here is a benchmark of the 2 functions on my machine with 8 cores:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjGQwYqWg5QWbB07Bv6-9Btfs9QHloVOJ8FvpYgXE7Ev_DYk0YSaG0D3bS7uqxgVZFXhK0psjEszIEkMheeIl84gi2CP807bz3PEYspIzGs1GTOs8U7GyDU1jEEYaAbeqvQcqMDlD_4aXfy/s1600/bigcor_benchmark.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjGQwYqWg5QWbB07Bv6-9Btfs9QHloVOJ8FvpYgXE7Ev_DYk0YSaG0D3bS7uqxgVZFXhK0psjEszIEkMheeIl84gi2CP807bz3PEYspIzGs1GTOs8U7GyDU1jEEYaAbeqvQcqMDlD_4aXfy/s400/bigcor_benchmark.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<br />
<script src="https://gist.github.com/bobthecat/5024316.js"></script>
Unknownnoreply@blogger.com1Cambridge, UK52.205337 0.1218169999999645452.127478 -0.039544500000035454 52.283196 0.28317849999996453tag:blogger.com,1999:blog-1861070732189366413.post-4391815750031039672013-01-14T03:17:00.000-08:002013-01-14T03:17:13.819-08:00Air quality analysis from Beijing twitter feed.As air pollution in Beijing reach new high [<a href="http://www.nytimes.com/2013/01/13/science/earth/beijing-air-pollution-off-the-charts.html?_r=0">NYT article</a>]. I re-ran the analysis I put <a href="http://brainchronicle.blogspot.co.uk/2012/07/twitter-analysis-of-air-pollution-in.html">online</a> a few months ago.
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbYGI3QhuLmaT3A4c147tQnymOazsF_Ayl-o59c2GC1JZKmxZJgLTJ-oEwWvQQv5bI3P05f_SqwyBTNn4ZgQr50sJLIEn6TUFEPC5PiLHMvypoRZ_9062MqvYVD7xD2FgnBsyA7sAWpBSC/s1600/twitter_pol.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbYGI3QhuLmaT3A4c147tQnymOazsF_Ayl-o59c2GC1JZKmxZJgLTJ-oEwWvQQv5bI3P05f_SqwyBTNn4ZgQr50sJLIEn6TUFEPC5PiLHMvypoRZ_9062MqvYVD7xD2FgnBsyA7sAWpBSC/s400/twitter_pol.png" width="400" /></a></div>
"Crazy bad" is a good description when it reach those levels. But I am sure there are other place like Mexico city, LA etc... that also look as dramatic as those number for Beijing. The fact that the machine is tweeting make the analysis so easy. I hope it keep tweeting and that other place in the world do the same.<br />
<br />
From the NYT article:
<br />
<blockquote>
The existence of the embassy’s machine and the @BeijingAir Twitter feed have been a diplomatic sore point for Chinese officials. In July 2009, a Chinese Foreign Ministry official, Wang Shu’ai, told American diplomats to halt the Twitter feed, saying that the data “is not only confusing but also insulting,” according to a State Department cable obtained by WikiLeaks. Mr. Wang said the embassy’s data could lead to “social consequences.”</blockquote>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-1861070732189366413.post-52459492109011981402012-12-21T09:01:00.000-08:002013-03-14T11:21:52.122-07:00Computing an empirical pFDR in RThe positive false discovery rate (pFDR) has become a classical procedure to test for false positive. It is one of my favourite because it rely on a re-sampling approach.<br>
<br>
I base my implementation on <a href="http://www.genomine.org/" target="_blank">John Storey</a> PNAS paper and the technical report he published with <a href="http://www-stat.stanford.edu/~tibs/" target="_blank">Rob Tibshirani</a> while at Stanford [1-2] (I find the technical report much more didactic than the PNAS paper).
<br>
<br>
I will not describe here why when considering multiple tests simultaneously you need to control for multiple hypothesis testing (here is a link for that: <a href="http://en.wikipedia.org/wiki/Multiple_comparisons">Wikipedia</a>). However, I will re-state in the terms of Storey et al.[1] the definition of pFDR:<br>
The pFDR is the expected quantity defined by the # of false positive / # of significant features conditional there is at least one significant results. which can be written as:<br>
<code>
pFDR = E[F/S|S>0]
</code><br>
That said, the probability of having at least one significant value is almost certain when we have lots of features (genes). Meaning that the above equation can be re-written as<br>
<code>pFDR ~= FDR ~= E[F]/E[S]</code>.<br>
What the pFDR is measuring is the probability that the feature considered is false positive at the p-value level of this feature. For example gene A has a p-value of 0.07 and a pFDR of 0.0001. If we consider gene A to be significant the chance that it is a false positive is very low. Storey wrote it like that:
<br>
<blockquote>
"[...] the pFDR can be written as <code>Pr(feature i is truly null | feature i is significant)</code>[...]"</blockquote>
In short this is a p-value for your p-value.<br>
Now let's compute the pFDR for a concrete example. Here, we will use a genome-wide gene expression study comparing two groups of patients (gene as row, patients as column). But the scenario can be applied to any type of data (SNPs, proteins, patient values...) as long as you have class labels. That said, it is a tiny bit idiotic to implement an FDR test for gene expression data as there are several R packages providing this functionality already but the aim here is to know how it works.<br>
<br>
<b>Practical thoughts to keep in mind regarding the FDR:</b><br>
<b>1.</b> Because it relies on a random sampling procedure, results between runs will never exactly look the same. But increasing the number of random shuffling will generate more and more similar results.<br>
<b>2.</b> It might be obvious to some but it is worth noting that if you do not have groups or an order in your columns (=samples) you will not be able to shuffle the labels and thus compute the FDR.<br>
<br>
<b>The work</b>:<br>
Starting from a normalized gene expression matrix generated like that:<br>
<script src="https://gist.github.com/4347441.js"></script>
From this small expression matrix of 2000 genes by 18 samples with two groups we will compute a p-value using a two-sided t-test.<br>
<script src="https://gist.github.com/4347506.js"></script>
We obtained a vector of p-values that we will use to obtain the q-values. To do that, we need to evaluate the <b>number of false positive </b>(<span style="font-family: monospace;">pFDR ~= E[F]/E[S]</span>).<br>
We will achieve that by re-computing the p-value using shuffled class labels to see if just by random chance we can obtain lower p-values than with the original class label. The shuffling will re-label some ER+ samples as ER- and vice-versa.<br>
In a way, we aim at estimating the robustness of the p-value we obtained.<br>
<a href="http://brainchronicle.blogspot.com/2012/12/computing-empirical-pfdr-in-r.html#more">Read more »</a>Unknownnoreply@blogger.com0Wellcome Trust MRC Bldg, Addenbrookes Hospital, Hills Rd, Cambridge, CB2 0XY, UK52.176662219999983 0.1417425899999999552.098764719999984 -0.016185910000000053 52.254559719999982 0.29967108999999992tag:blogger.com,1999:blog-1861070732189366413.post-316615213443712822012-09-21T11:50:00.000-07:002012-09-21T11:50:52.094-07:00Religious restrictions index: how do countries compare?The Guardian DataBlog published yesterday an interesting article exploring graphically the <a href="http://www.guardian.co.uk/news/datablog/2012/sep/20/religious-restrictions-index-intolerance-rise">religious intolerance across the world</a>. The data are coming from a report published by Pew Research Center's Forum on Religion and Public Life. I like the philosophy DataBlog a lot, providing the raw data for everyone to look at.
<br />
However, I felt that the visualization could be improved. First the data are longitudinal and no temporal representation is provided. So I downloaded the Google Spreadsheet and worked it in R with googleVis. googleVis is the R API to the Google graphic library.
<br />
The data are composed of two data type:
<br />
<br />
<ul>
<li>The Government Restriction Index (GSI) [measures government laws, policies
and actions that restrict religious beliefs or practices]</li>
<li>The Social Hostilities Index (SHI) [measures acts of religious hostility by private individuals,organizations and social groups]</li>
</ul>
<br />
The R code is the following:
<script src="https://gist.github.com/3763136.js?file=religion_index.r"></script>
<br />
<!-- MotionChart generated in R 2.15.1 by googleVis 0.2.17 package -->
<!-- Fri Sep 21 11:26:32 2012 -->
<!-- jsHeader -->
<script src="http://www.google.com/jsapi" type="text/javascript">
</script>
<script type="text/javascript">
// jsData
function gvisDataMotionChartID2d1067bcd01 ()
{
var data = new google.visualization.DataTable();
var datajson =
[
[
"Antigua and Barbuda ",
2007,
1.1,
0.3
],
[
"Argentina ",
2007,
1.7,
0.6
],
[
"Bahamas ",
2007,
1.4,
0.5
],
[
"Barbados ",
2007,
0.8,
0.3
],
[
"Belize ",
2007,
1.3,
0
],
[
"Bolivia ",
2007,
1,
0
],
[
"Brazil ",
2007,
0.4,
0.8
],
[
"Canada ",
2007,
1,
1.2
],
[
"Chile ",
2007,
1.2,
0.4
],
[
"Colombia ",
2007,
1.8,
3.3
],
[
"Costa Rica ",
2007,
1,
0
],
[
"Cuba ",
2007,
4.5,
0
],
[
"Dominica ",
2007,
0.8,
0.3
],
[
"Dominican Republic ",
2007,
0.6,
0
],
[
"Ecuador ",
2007,
1.1,
0.6
],
[
"El Salvador ",
2007,
0.6,
0.4
],
[
"Grenada ",
2007,
0.5,
0
],
[
"Guatemala ",
2007,
1.2,
1
],
[
"Guyana ",
2007,
0.7,
0
],
[
"Haiti ",
2007,
1.8,
0.6
],
[
"Honduras ",
2007,
1.3,
0.3
],
[
"Jamaica ",
2007,
1,
0
],
[
"Mexico ",
2007,
4.7,
5.5
],
[
"Nicaragua ",
2007,
2,
0.5
],
[
"Panama ",
2007,
0.7,
0
],
[
"Paraguay ",
2007,
0.6,
0.7
],
[
"Peru ",
2007,
1.8,
0
],
[
"St. Kitts and Nevis ",
2007,
0.6,
0.3
],
[
"St. Lucia ",
2007,
0.6,
0.3
],
[
"St. Vincent and the Grenadines ",
2007,
0.6,
0.3
],
[
"Suriname ",
2007,
0,
0
],
[
"Trinidad and Tobago ",
2007,
0.3,
0.6
],
[
"United States ",
2007,
1.6,
1.9
],
[
"Uruguay ",
2007,
0.3,
0.6
],
[
"Venezuela ",
2007,
3.6,
0.8
],
[
"Afghanistan ",
2007,
5.3,
8.5
],
[
"Armenia ",
2007,
3.4,
2.7
],
[
"Australia ",
2007,
1.3,
1.8
],
[
"Azerbaijan ",
2007,
5,
2.9
],
[
"Bangladesh ",
2007,
4,
8.3
],
[
"Bhutan ",
2007,
4.4,
1.9
],
[
"Brunei ",
2007,
7.2,
4.2
],
[
"Burma (Myanmar) ",
2007,
7.9,
4.9
],
[
"Cambodia ",
2007,
2.9,
0.8
],
[
"China ",
2007,
7.8,
0.9
],
[
"Cyprus ",
2007,
1.2,
0.9
],
[
"Federated States of Micronesia ",
2007,
0.2,
0
],
[
"Fiji ",
2007,
0.9,
2.6
],
[
"Hong Kong ",
2007,
1,
0.8
],
[
"India ",
2007,
4.8,
8.8
],
[
"Indonesia ",
2007,
6.2,
8.3
],
[
"Iran ",
2007,
7.9,
6
],
[
"Japan ",
2007,
0.2,
0.4
],
[
"Kazakhstan ",
2007,
5.6,
3.1
],
[
"Kiribati ",
2007,
0.3,
0.8
],
[
"Kyrgyzstan ",
2007,
3.9,
5.5
],
[
"Laos ",
2007,
6.3,
1
],
[
"Macau ",
2007,
1.3,
0.3
],
[
"Malaysia ",
2007,
6.4,
1
],
[
"Maldives ",
2007,
6.5,
2.6
],
[
"Marshall Islands ",
2007,
0.2,
0
],
[
"Mongolia ",
2007,
1.9,
0.6
],
[
"Nauru ",
2007,
2,
0.3
],
[
"Nepal ",
2007,
3.4,
4.2
],
[
"New Zealand ",
2007,
0.3,
0.4
],
[
"Pakistan ",
2007,
5.8,
8.9
],
[
"Palau ",
2007,
0.6,
0.3
],
[
"Papua New Guinea ",
2007,
0.8,
0
],
[
"Philippines ",
2007,
1.6,
3.7
],
[
"Samoa ",
2007,
0.8,
0.4
],
[
"Singapore ",
2007,
4.6,
0.2
],
[
"Solomon Islands ",
2007,
0.6,
0.4
],
[
"South Korea ",
2007,
1.6,
0
],
[
"Sri Lanka ",
2007,
4,
7.8
],
[
"Taiwan ",
2007,
0.5,
0
],
[
"Tajikistan ",
2007,
4.5,
2.2
],
[
"Thailand ",
2007,
2.6,
2.6
],
[
"Timor-Leste ",
2007,
0.9,
4.2
],
[
"Tonga ",
2007,
2,
0
],
[
"Turkey ",
2007,
6.6,
4.7
],
[
"Turkmenistan ",
2007,
5.6,
1.5
],
[
"Tuvalu ",
2007,
1.8,
2.1
],
[
"Uzbekistan ",
2007,
7.7,
3.3
],
[
"Vanuatu ",
2007,
1,
1
],
[
"Vietnam ",
2007,
6.6,
1.2
],
[
"Albania ",
2007,
0.8,
0.2
],
[
"Andorra ",
2007,
0.9,
0
],
[
"Austria ",
2007,
2.6,
1.1
],
[
"Belarus ",
2007,
5.9,
1.4
],
[
"Belgium ",
2007,
4,
0.9
],
[
"Bosnia-Herzegovina ",
2007,
1.5,
2.4
],
[
"Bulgaria ",
2007,
4,
2.2
],
[
"Croatia ",
2007,
0.7,
2
],
[
"Czech Republic ",
2007,
1,
1.2
],
[
"Denmark ",
2007,
2.5,
1.2
],
[
"Estonia ",
2007,
1.1,
0.8
],
[
"Finland ",
2007,
0.6,
0.8
],
[
"France ",
2007,
3.3,
3.4
],
[
"Georgia ",
2007,
2.2,
4.7
],
[
"Germany ",
2007,
3.1,
2.1
],
[
"Greece ",
2007,
5.2,
4.4
],
[
"Hungary ",
2007,
0.3,
1
],
[
"Iceland ",
2007,
2.6,
0.4
],
[
"Ireland ",
2007,
0.6,
0.4
],
[
"Italy ",
2007,
2,
1.9
],
[
"Kosovo ",
2007,
1.9,
2.4
],
[
"Latvia ",
2007,
2.3,
1.4
],
[
"Liechtenstein ",
2007,
1.3,
0.1
],
[
"Lithuania ",
2007,
1.6,
0.8
],
[
"Luxembourg ",
2007,
0.8,
0
],
[
"Malta ",
2007,
1.2,
0.4
],
[
"Moldova ",
2007,
4.2,
3.8
],
[
"Monaco ",
2007,
2.5,
0
],
[
"Montenegro ",
2007,
0.9,
2.4
],
[
"Netherlands ",
2007,
0.4,
1
],
[
"Norway ",
2007,
1.5,
1
],
[
"Poland ",
2007,
1,
0.9
],
[
"Portugal ",
2007,
0.3,
0
],
[
"Republic of Macedonia ",
2007,
2.2,
1.5
],
[
"Romania ",
2007,
4.8,
5.5
],
[
"Russia ",
2007,
5.8,
3.7
],
[
"San Marino ",
2007,
0.1,
0
],
[
"Serbia ",
2007,
3.1,
1.5
],
[
"Slovakia ",
2007,
2.8,
1.9
],
[
"Slovenia ",
2007,
0.6,
1
],
[
"Spain ",
2007,
2,
1.6
],
[
"Sweden ",
2007,
1.2,
0.7
],
[
"Switzerland ",
2007,
1.2,
1.7
],
[
"Ukraine ",
2007,
2.6,
1.9
],
[
"United Kingdom ",
2007,
1.6,
1.6
],
[
"Algeria ",
2007,
5.6,
3.6
],
[
"Bahrain ",
2007,
4.3,
3
],
[
"Egypt ",
2007,
7.2,
6.1
],
[
"Iraq ",
2007,
5.1,
10
],
[
"Israel ",
2007,
3.9,
7.8
],
[
"Jordan ",
2007,
4.6,
3.5
],
[
"Kuwait ",
2007,
4.8,
1.9
],
[
"Lebanon ",
2007,
1.4,
5.1
],
[
"Libya ",
2007,
5.1,
1.4
],
[
"Morocco ",
2007,
4.9,
3.7
],
[
"Oman ",
2007,
3.9,
0.3
],
[
"Palestinian territories ",
2007,
3.3,
6.4
],
[
"Qatar ",
2007,
3.3,
0.3
],
[
"Saudi Arabia ",
2007,
8,
7.2
],
[
"Sudan ",
2007,
5.7,
6.5
],
[
"Syria ",
2007,
4.5,
5.3
],
[
"Tunisia ",
2007,
4.8,
3.8
],
[
"United Arab Emirates ",
2007,
3.9,
0.1
],
[
"Western Sahara ",
2007,
4.8,
3.3
],
[
"Yemen ",
2007,
4.3,
6.2
],
[
"Angola ",
2007,
3.3,
3.7
],
[
"Benin ",
2007,
0.3,
0
],
[
"Botswana ",
2007,
0.9,
0.1
],
[
"Burkina Faso ",
2007,
0.3,
1.5
],
[
"Burundi ",
2007,
0.4,
0.9
],
[
"Cameroon ",
2007,
1.1,
1.4
],
[
"Cape Verde ",
2007,
0.3,
0.1
],
[
"Central African Republic ",
2007,
3.7,
3.3
],
[
"Chad ",
2007,
4.2,
3.3
],
[
"Comoros ",
2007,
5.4,
6.2
],
[
"Democratic Republic of the Congo ",
2007,
1.3,
2.6
],
[
"Djibouti ",
2007,
2.4,
1.8
],
[
"Equatorial Guinea ",
2007,
2.6,
0
],
[
"Eritrea ",
2007,
7,
0.4
],
[
"Ethiopia ",
2007,
2.6,
5.3
],
[
"Gabon ",
2007,
1.7,
0.1
],
[
"Gambia ",
2007,
0.5,
0.8
],
[
"Ghana ",
2007,
1.2,
4.9
],
[
"Guinea ",
2007,
1.5,
1.7
],
[
"Guinea Bissau ",
2007,
1.5,
0
],
[
"Ivory Coast ",
2007,
1.9,
3.1
],
[
"Kenya ",
2007,
2.9,
2.4
],
[
"Lesotho ",
2007,
0.4,
0
],
[
"Liberia ",
2007,
1.7,
3.8
],
[
"Madagascar ",
2007,
1.8,
0
],
[
"Malawi ",
2007,
0.4,
0.3
],
[
"Mali ",
2007,
0.9,
0.3
],
[
"Mauritania ",
2007,
6.5,
0.9
],
[
"Mauritius ",
2007,
1.4,
0.3
],
[
"Mozambique ",
2007,
1.1,
0.3
],
[
"Namibia ",
2007,
0.3,
0
],
[
"Niger ",
2007,
1.7,
1.5
],
[
"Nigeria ",
2007,
3.7,
4.4
],
[
"Republic of the Congo ",
2007,
0.7,
0.4
],
[
"Rwanda ",
2007,
2,
0
],
[
"Sao Tome and Principe ",
2007,
0.2,
0
],
[
"Senegal ",
2007,
0.5,
0
],
[
"Seychelles ",
2007,
1.3,
0
],
[
"Sierra Leone ",
2007,
0,
0
],
[
"Somalia ",
2007,
4.4,
7.4
],
[
"South Africa ",
2007,
0.6,
2.2
],
[
"Swaziland ",
2007,
1.5,
0
],
[
"Tanzania ",
2007,
2.1,
3.5
],
[
"Togo ",
2007,
2.8,
0
],
[
"Uganda ",
2007,
2.4,
0.4
],
[
"Zambia ",
2007,
2,
0
],
[
"Zimbabwe ",
2007,
2.8,
1.2
],
[
"Antigua and Barbuda ",
2009,
1.4,
0.6
],
[
"Argentina ",
2009,
2.5,
1
],
[
"Bahamas ",
2009,
2.3,
0
],
[
"Barbados ",
2009,
0.8,
0.4
],
[
"Belize ",
2009,
0.7,
0
],
[
"Bolivia ",
2009,
0.7,
0.4
],
[
"Brazil ",
2009,
1.1,
1.7
],
[
"Canada ",
2009,
1.3,
1.5
],
[
"Chile ",
2009,
1.3,
0.8
],
[
"Colombia ",
2009,
1.6,
3.3
],
[
"Costa Rica ",
2009,
2.1,
0.5
],
[
"Cuba ",
2009,
4,
1.3
],
[
"Dominica ",
2009,
1,
0.4
],
[
"Dominican Republic ",
2009,
1,
0
],
[
"Ecuador ",
2009,
0.7,
0
],
[
"El Salvador ",
2009,
1.4,
0
],
[
"Grenada ",
2009,
0.5,
0
],
[
"Guatemala ",
2009,
0.9,
1.2
],
[
"Guyana ",
2009,
0.5,
0
],
[
"Haiti ",
2009,
0.8,
0.4
],
[
"Honduras ",
2009,
2.7,
0.3
],
[
"Jamaica ",
2009,
1.3,
0
],
[
"Mexico ",
2009,
4.2,
5.1
],
[
"Nicaragua ",
2009,
1.1,
0.1
],
[
"Panama ",
2009,
1.1,
0
],
[
"Paraguay ",
2009,
1.2,
0
],
[
"Peru ",
2009,
1.5,
0
],
[
"St. Kitts and Nevis ",
2009,
0.5,
0.6
],
[
"St. Lucia ",
2009,
0.9,
0.8
],
[
"St. Vincent and the Grenadines ",
2009,
0.5,
0.4
],
[
"Suriname ",
2009,
0.7,
0
],
[
"Trinidad and Tobago ",
2009,
0.6,
0.9
],
[
"United States ",
2009,
1.6,
2
],
[
"Uruguay ",
2009,
0.3,
0
],
[
"Venezuela ",
2009,
2.5,
1.2
],
[
"Afghanistan ",
2009,
6.5,
8.6
],
[
"Armenia ",
2009,
4.2,
3.2
],
[
"Australia ",
2009,
0.7,
1.8
],
[
"Azerbaijan ",
2009,
5.1,
2.8
],
[
"Bangladesh ",
2009,
5.1,
8.3
],
[
"Bhutan ",
2009,
5,
0.5
],
[
"Brunei ",
2009,
5.4,
1.8
],
[
"Burma (Myanmar) ",
2009,
7.9,
4.9
],
[
"Cambodia ",
2009,
2.4,
0
],
[
"China ",
2009,
8.2,
3.3
],
[
"Cyprus ",
2009,
1.9,
1.1
],
[
"Federated States of Micronesia ",
2009,
0.2,
0
],
[
"Fiji ",
2009,
0.9,
1.2
],
[
"Hong Kong ",
2009,
2.4,
0
],
[
"India ",
2009,
5,
8.8
],
[
"Indonesia ",
2009,
7,
8.1
],
[
"Iran ",
2009,
8,
6.7
],
[
"Japan ",
2009,
0.2,
0.4
],
[
"Kazakhstan ",
2009,
5,
2
],
[
"Kiribati ",
2009,
0.8,
1
],
[
"Kyrgyzstan ",
2009,
5.6,
2.5
],
[
"Laos ",
2009,
7.1,
0.6
],
[
"Macau ",
2009,
1.9,
0
],
[
"Malaysia ",
2009,
8.1,
1.3
],
[
"Maldives ",
2009,
7.3,
1.9
],
[
"Marshall Islands ",
2009,
0.2,
0
],
[
"Mongolia ",
2009,
2.7,
2.9
],
[
"Nauru ",
2009,
0.7,
0
],
[
"Nepal ",
2009,
3.5,
5.3
],
[
"New Zealand ",
2009,
0.4,
0.3
],
[
"Pakistan ",
2009,
7,
9.8
],
[
"Palau ",
2009,
0.6,
0.4
],
[
"Papua New Guinea ",
2009,
1.4,
1.5
],
[
"Philippines ",
2009,
0.8,
3
],
[
"Samoa ",
2009,
0.9,
0.3
],
[
"Singapore ",
2009,
4,
0.2
],
[
"Solomon Islands ",
2009,
0.6,
0.3
],
[
"South Korea ",
2009,
1.2,
0
],
[
"Sri Lanka ",
2009,
3.7,
6.2
],
[
"Taiwan ",
2009,
0.8,
0
],
[
"Tajikistan ",
2009,
7,
2.2
],
[
"Thailand ",
2009,
3.4,
4.6
],
[
"Timor-Leste ",
2009,
0,
4.2
],
[
"Tonga ",
2009,
1.6,
0
],
[
"Turkey ",
2009,
6.4,
4.4
],
[
"Turkmenistan ",
2009,
6.7,
1.2
],
[
"Tuvalu ",
2009,
2.9,
2.8
],
[
"Uzbekistan ",
2009,
8.2,
1.7
],
[
"Vanuatu ",
2009,
1,
0
],
[
"Vietnam ",
2009,
6.3,
5
],
[
"Albania ",
2009,
0.7,
0
],
[
"Andorra ",
2009,
0.7,
0
],
[
"Austria ",
2009,
3.3,
1.7
],
[
"Belarus ",
2009,
6.4,
3.1
],
[
"Belgium ",
2009,
3.5,
1.7
],
[
"Bosnia-Herzegovina ",
2009,
1.6,
3.1
],
[
"Bulgaria ",
2009,
4,
4
],
[
"Croatia ",
2009,
1.8,
1.3
],
[
"Czech Republic ",
2009,
2.2,
1.3
],
[
"Denmark ",
2009,
3,
4.6
],
[
"Estonia ",
2009,
0.9,
0.3
],
[
"Finland ",
2009,
1.3,
0.4
],
[
"France ",
2009,
5.3,
3
],
[
"Georgia ",
2009,
2.4,
2.6
],
[
"Germany ",
2009,
3,
3.7
],
[
"Greece ",
2009,
4,
3.1
],
[
"Hungary ",
2009,
0.6,
1.2
],
[
"Iceland ",
2009,
1.8,
1.3
],
[
"Ireland ",
2009,
0.6,
0.6
],
[
"Italy ",
2009,
2.1,
2.4
],
[
"Kosovo ",
2009,
1.5,
2.8
],
[
"Latvia ",
2009,
2,
0.6
],
[
"Liechtenstein ",
2009,
1.3,
1.2
],
[
"Lithuania ",
2009,
2.5,
0.8
],
[
"Luxembourg ",
2009,
0.6,
0.4
],
[
"Malta ",
2009,
1.3,
0
],
[
"Moldova ",
2009,
4.7,
4.2
],
[
"Monaco ",
2009,
2.5,
0
],
[
"Montenegro ",
2009,
0.9,
1.5
],
[
"Netherlands ",
2009,
1,
1.2
],
[
"Norway ",
2009,
2,
1.5
],
[
"Poland ",
2009,
1.2,
2.3
],
[
"Portugal ",
2009,
0.1,
0.8
],
[
"Republic of Macedonia ",
2009,
1,
1.5
],
[
"Romania ",
2009,
3.9,
4.1
],
[
"Russia ",
2009,
6.7,
5.5
],
[
"San Marino ",
2009,
0.2,
0
],
[
"Serbia ",
2009,
4.7,
3.1
],
[
"Slovakia ",
2009,
3,
3.3
],
[
"Slovenia ",
2009,
0.9,
0.9
],
[
"Spain ",
2009,
1.1,
1.9
],
[
"Sweden ",
2009,
1.7,
3
],
[
"Switzerland ",
2009,
1.5,
2.5
],
[
"Ukraine ",
2009,
2.1,
2.1
],
[
"United Kingdom ",
2009,
2.8,
3.8
],
[
"Algeria ",
2009,
7.3,
5.3
],
[
"Bahrain ",
2009,
4.7,
3.3
],
[
"Egypt ",
2009,
8.6,
7.4
],
[
"Iraq ",
2009,
5.4,
9.2
],
[
"Israel ",
2009,
4.3,
7.8
],
[
"Jordan ",
2009,
4.6,
5
],
[
"Kuwait ",
2009,
5.8,
0.4
],
[
"Lebanon ",
2009,
2.8,
2.6
],
[
"Libya ",
2009,
6.9,
1.5
],
[
"Morocco ",
2009,
5.2,
2.6
],
[
"Oman ",
2009,
4.7,
0
],
[
"Palestinian territories ",
2009,
3.5,
6.9
],
[
"Qatar ",
2009,
5,
0.4
],
[
"Saudi Arabia ",
2009,
7.9,
6.2
],
[
"Sudan ",
2009,
5.3,
5.6
],
[
"Syria ",
2009,
7.1,
5.1
],
[
"Tunisia ",
2009,
5.2,
1.5
],
[
"United Arab Emirates ",
2009,
4.2,
0.8
],
[
"Western Sahara ",
2009,
5.2,
2.6
],
[
"Yemen ",
2009,
6.4,
7.3
],
[
"Angola ",
2009,
2.4,
1.2
],
[
"Benin ",
2009,
0.3,
0
],
[
"Botswana ",
2009,
0.6,
0
],
[
"Burkina Faso ",
2009,
0.6,
1.3
],
[
"Burundi ",
2009,
0.2,
0.8
],
[
"Cameroon ",
2009,
1.3,
0.4
],
[
"Cape Verde ",
2009,
0.3,
0
],
[
"Central African Republic ",
2009,
4.6,
2.3
],
[
"Chad ",
2009,
3.8,
0.3
],
[
"Comoros ",
2009,
4.3,
4.9
],
[
"Democratic Republic of the Congo ",
2009,
0.7,
3.1
],
[
"Djibouti ",
2009,
1.6,
1.3
],
[
"Equatorial Guinea ",
2009,
1.7,
0
],
[
"Eritrea ",
2009,
7.9,
0.6
],
[
"Ethiopia ",
2009,
3.5,
5.1
],
[
"Gabon ",
2009,
1.8,
0.3
],
[
"Gambia ",
2009,
1.1,
0.3
],
[
"Ghana ",
2009,
0.8,
3.5
],
[
"Guinea ",
2009,
1.6,
0
],
[
"Guinea Bissau ",
2009,
0.3,
0
],
[
"Ivory Coast ",
2009,
2,
0.8
],
[
"Kenya ",
2009,
3.8,
3
],
[
"Lesotho ",
2009,
0.5,
0
],
[
"Liberia ",
2009,
0.5,
0.5
],
[
"Madagascar ",
2009,
2.4,
0.1
],
[
"Malawi ",
2009,
1.2,
0
],
[
"Mali ",
2009,
0.6,
0.4
],
[
"Mauritania ",
2009,
6.2,
1
],
[
"Mauritius ",
2009,
1.4,
1.3
],
[
"Mozambique ",
2009,
1.2,
1
],
[
"Namibia ",
2009,
0.3,
0
],
[
"Niger ",
2009,
2.2,
0.6
],
[
"Nigeria ",
2009,
4.5,
8
],
[
"Republic of the Congo ",
2009,
0.7,
0.1
],
[
"Rwanda ",
2009,
2.8,
0
],
[
"Sao Tome and Principe ",
2009,
0.2,
0
],
[
"Senegal ",
2009,
1.3,
0.3
],
[
"Seychelles ",
2009,
1,
0
],
[
"Sierra Leone ",
2009,
0,
0.9
],
[
"Somalia ",
2009,
6.2,
8.5
],
[
"South Africa ",
2009,
1,
0.7
],
[
"Swaziland ",
2009,
1.6,
1.8
],
[
"Tanzania ",
2009,
2.4,
0.9
],
[
"Togo ",
2009,
1.2,
0
],
[
"Uganda ",
2009,
4.1,
0.3
],
[
"Zambia ",
2009,
1.7,
1.8
],
[
"Zimbabwe ",
2009,
3.7,
1.5
],
[
"Antigua and Barbuda ",
2010,
1.2,
0.4
],
[
"Argentina ",
2010,
1.6,
0.5
],
[
"Bahamas ",
2010,
3,
0
],
[
"Barbados ",
2010,
0.8,
0.4
],
[
"Belize ",
2010,
0.8,
0
],
[
"Bolivia ",
2010,
1,
0.3
],
[
"Brazil ",
2010,
1,
3.3
],
[
"Canada ",
2010,
1.6,
1.1
],
[
"Chile ",
2010,
1.3,
1.8
],
[
"Colombia ",
2010,
2.8,
3.1
],
[
"Costa Rica ",
2010,
2.6,
0.8
],
[
"Cuba ",
2010,
4.8,
0.9
],
[
"Dominica ",
2010,
0.9,
0.4
],
[
"Dominican Republic ",
2010,
0.6,
0
],
[
"Ecuador ",
2010,
0.8,
0
],
[
"El Salvador ",
2010,
1.4,
0
],
[
"Grenada ",
2010,
0.8,
0.8
],
[
"Guatemala ",
2010,
1,
0.3
],
[
"Guyana ",
2010,
2.1,
0
],
[
"Haiti ",
2010,
0.8,
2.2
],
[
"Honduras ",
2010,
1.6,
0.8
],
[
"Jamaica ",
2010,
1.4,
0.4
],
[
"Mexico ",
2010,
3.5,
3.6
],
[
"Nicaragua ",
2010,
3.1,
0.6
],
[
"Panama ",
2010,
1.1,
0
],
[
"Paraguay ",
2010,
1,
0.8
],
[
"Peru ",
2010,
2.6,
0
],
[
"St. Kitts and Nevis ",
2010,
0.7,
0.4
],
[
"St. Lucia ",
2010,
1.3,
0.4
],
[
"St. Vincent and the Grenadines ",
2010,
0.7,
0.4
],
[
"Suriname ",
2010,
0.9,
0.4
],
[
"Trinidad and Tobago ",
2010,
0.9,
1.2
],
[
"United States ",
2010,
2.7,
3.4
],
[
"Uruguay ",
2010,
0.3,
0
],
[
"Venezuela ",
2010,
3.5,
0.8
],
[
"Afghanistan ",
2010,
8,
7.7
],
[
"Armenia ",
2010,
4.7,
4.3
],
[
"Australia ",
2010,
1.7,
2.1
],
[
"Azerbaijan ",
2010,
6.9,
2.2
],
[
"Bangladesh ",
2010,
5.6,
8.2
],
[
"Bhutan ",
2010,
3.6,
0
],
[
"Brunei ",
2010,
6.5,
3.1
],
[
"Burma (Myanmar) ",
2010,
7.3,
5.8
],
[
"Cambodia ",
2010,
2.7,
2.2
],
[
"China ",
2010,
7.5,
2
],
[
"Cyprus ",
2010,
2.8,
3.8
],
[
"Federated States of Micronesia ",
2010,
0.2,
0
],
[
"Fiji ",
2010,
3,
0.9
],
[
"Hong Kong ",
2010,
1.2,
0
],
[
"India ",
2010,
5.3,
9
],
[
"Indonesia ",
2010,
8.6,
7.2
],
[
"Iran ",
2010,
7.9,
5
],
[
"Japan ",
2010,
2,
3.1
],
[
"Kazakhstan ",
2010,
5.7,
1.2
],
[
"Kiribati ",
2010,
0.5,
0.8
],
[
"Kyrgyzstan ",
2010,
6.5,
5.1
],
[
"Laos ",
2010,
5.7,
3.1
],
[
"Macau ",
2010,
0.6,
0
],
[
"Malaysia ",
2010,
6.4,
2.2
],
[
"Maldives ",
2010,
8.6,
2.7
],
[
"Marshall Islands ",
2010,
0.2,
0.8
],
[
"Mongolia ",
2010,
3.4,
1.2
],
[
"Nauru ",
2010,
1.1,
0
],
[
"Nepal ",
2010,
3.3,
5.6
],
[
"New Zealand ",
2010,
0.6,
2.7
],
[
"Pakistan ",
2010,
6.3,
9
],
[
"Palau ",
2010,
0.2,
0
],
[
"Papua New Guinea ",
2010,
1.1,
3.5
],
[
"Philippines ",
2010,
1.2,
3.9
],
[
"Samoa ",
2010,
1.4,
0
],
[
"Singapore ",
2010,
5,
0.2
],
[
"Solomon Islands ",
2010,
0.9,
0
],
[
"South Korea ",
2010,
1.9,
0
],
[
"Sri Lanka ",
2010,
6,
8.3
],
[
"Taiwan ",
2010,
1,
0
],
[
"Tajikistan ",
2010,
6.5,
2.3
],
[
"Thailand ",
2010,
3.5,
5.5
],
[
"Timor-Leste ",
2010,
0.5,
2.9
],
[
"Tonga ",
2010,
1.8,
0
],
[
"Turkey ",
2010,
5.8,
4.9
],
[
"Turkmenistan ",
2010,
5.6,
1.2
],
[
"Tuvalu ",
2010,
2.9,
1.7
],
[
"Uzbekistan ",
2010,
7.9,
2.2
],
[
"Vanuatu ",
2010,
1,
0.4
],
[
"Vietnam ",
2010,
7,
4
],
[
"Albania ",
2010,
1.3,
0.2
],
[
"Andorra ",
2010,
0.7,
0
],
[
"Austria ",
2010,
3.2,
1.9
],
[
"Belarus ",
2010,
6.8,
1.4
],
[
"Belgium ",
2010,
3.7,
2.2
],
[
"Bosnia-Herzegovina ",
2010,
2,
2.6
],
[
"Bulgaria ",
2010,
3.3,
2.2
],
[
"Croatia ",
2010,
2.7,
2.3
],
[
"Czech Republic ",
2010,
1.1,
1.3
],
[
"Denmark ",
2010,
3.4,
2.1
],
[
"Estonia ",
2010,
1.4,
0.8
],
[
"Finland ",
2010,
1.1,
0
],
[
"France ",
2010,
4.1,
5.1
],
[
"Georgia ",
2010,
2.9,
4.1
],
[
"Germany ",
2010,
4,
5.3
],
[
"Greece ",
2010,
5.5,
4.1
],
[
"Hungary ",
2010,
0.8,
2.8
],
[
"Iceland ",
2010,
2.8,
1.2
],
[
"Ireland ",
2010,
0.6,
0
],
[
"Italy ",
2010,
2.6,
3.3
],
[
"Kosovo ",
2010,
1.7,
3.7
],
[
"Latvia ",
2010,
2,
0.8
],
[
"Liechtenstein ",
2010,
1.4,
1.2
],
[
"Lithuania ",
2010,
1.6,
1.5
],
[
"Luxembourg ",
2010,
1.2,
0
],
[
"Malta ",
2010,
1.7,
0
],
[
"Moldova ",
2010,
4.8,
2.9
],
[
"Monaco ",
2010,
1.8,
0
],
[
"Montenegro ",
2010,
0.4,
2.4
],
[
"Netherlands ",
2010,
1.3,
1.6
],
[
"Norway ",
2010,
2.4,
1.7
],
[
"Poland ",
2010,
1.7,
1.7
],
[
"Portugal ",
2010,
0.8,
0.3
],
[
"Republic of Macedonia ",
2010,
2.8,
3
],
[
"Romania ",
2010,
4.4,
4
],
[
"Russia ",
2010,
7.2,
7.3
],
[
"San Marino ",
2010,
0.1,
0
],
[
"Serbia ",
2010,
3.5,
4.7
],
[
"Slovakia ",
2010,
2.6,
1
],
[
"Slovenia ",
2010,
1,
0.4
],
[
"Spain ",
2010,
2.7,
2.1
],
[
"Sweden ",
2010,
1.5,
2.4
],
[
"Switzerland ",
2010,
2.3,
2.9
],
[
"Ukraine ",
2010,
4,
4
],
[
"United Kingdom ",
2010,
4.3,
6.2
],
[
"Algeria ",
2010,
6.9,
5.4
],
[
"Bahrain ",
2010,
4.2,
3.7
],
[
"Egypt ",
2010,
8.7,
7.6
],
[
"Iraq ",
2010,
4.6,
8.3
],
[
"Israel ",
2010,
6.1,
7.9
],
[
"Jordan ",
2010,
6.5,
5.1
],
[
"Kuwait ",
2010,
4.7,
1.7
],
[
"Lebanon ",
2010,
3.7,
4.9
],
[
"Libya ",
2010,
5.8,
0.2
],
[
"Morocco ",
2010,
6.2,
1.2
],
[
"Oman ",
2010,
5.3,
0.6
],
[
"Palestinian territories ",
2010,
3.5,
7.7
],
[
"Qatar ",
2010,
5.6,
0.4
],
[
"Saudi Arabia ",
2010,
8.6,
7.2
],
[
"Sudan ",
2010,
5.4,
5
],
[
"Syria ",
2010,
7.3,
3.3
],
[
"Tunisia ",
2010,
7.7,
1
],
[
"United Arab Emirates ",
2010,
4.3,
0.8
],
[
"Western Sahara ",
2010,
5.9,
0
],
[
"Yemen ",
2010,
7,
7.8
],
[
"Angola ",
2010,
3.7,
2.3
],
[
"Benin ",
2010,
0.5,
0.4
],
[
"Botswana ",
2010,
1.2,
0
],
[
"Burkina Faso ",
2010,
1.3,
1.5
],
[
"Burundi ",
2010,
0.7,
2.2
],
[
"Cameroon ",
2010,
0.6,
0.4
],
[
"Cape Verde ",
2010,
0.3,
0
],
[
"Central African Republic ",
2010,
4.4,
4.5
],
[
"Chad ",
2010,
6,
2.6
],
[
"Comoros ",
2010,
3.6,
1
],
[
"Democratic Republic of the Congo ",
2010,
2.8,
2.9
],
[
"Djibouti ",
2010,
2.1,
0.5
],
[
"Equatorial Guinea ",
2010,
2.4,
0
],
[
"Eritrea ",
2010,
7.7,
0.6
],
[
"Ethiopia ",
2010,
4.3,
4.9
],
[
"Gabon ",
2010,
1.9,
0.8
],
[
"Gambia ",
2010,
1.8,
0.5
],
[
"Ghana ",
2010,
1,
2.6
],
[
"Guinea ",
2010,
2.9,
1.9
],
[
"Guinea Bissau ",
2010,
0.3,
1.2
],
[
"Ivory Coast ",
2010,
2,
3.7
],
[
"Kenya ",
2010,
4.7,
6.7
],
[
"Lesotho ",
2010,
0.6,
0.4
],
[
"Liberia ",
2010,
1.7,
1.8
],
[
"Madagascar ",
2010,
3.3,
0.4
],
[
"Malawi ",
2010,
1,
0.4
],
[
"Mali ",
2010,
0.9,
1.3
],
[
"Mauritania ",
2010,
6.2,
1.5
],
[
"Mauritius ",
2010,
1.2,
2.8
],
[
"Mozambique ",
2010,
1.1,
0.3
],
[
"Namibia ",
2010,
0.7,
0
],
[
"Niger ",
2010,
3,
1
],
[
"Nigeria ",
2010,
5.8,
7.8
],
[
"Republic of the Congo ",
2010,
1,
0
],
[
"Rwanda ",
2010,
3.8,
0.8
],
[
"Sao Tome and Principe ",
2010,
0.2,
0
],
[
"Senegal ",
2010,
1.6,
0.2
],
[
"Seychelles ",
2010,
1.3,
0
],
[
"Sierra Leone ",
2010,
0,
0.5
],
[
"Somalia ",
2010,
5.2,
8.1
],
[
"South Africa ",
2010,
0.7,
4.4
],
[
"Swaziland ",
2010,
2.4,
0.5
],
[
"Tanzania ",
2010,
3.9,
5.1
],
[
"Togo ",
2010,
1.4,
0.3
],
[
"Uganda ",
2010,
3.4,
5.8
],
[
"Zambia ",
2010,
2.5,
2.3
],
[
"Zimbabwe ",
2010,
4.4,
2.2
]
];
data.addColumn('string','country');
data.addColumn('number','year');
data.addColumn('number','GRI');
data.addColumn('number','SHI');
data.addRows(datajson);
return(data);
}
// jsDrawChart
function drawChartMotionChartID2d1067bcd01() {
var data = gvisDataMotionChartID2d1067bcd01();
var options = {};
options["width"] = 600;
options["height"] = 500;
var chart = new google.visualization.MotionChart(
document.getElementById('MotionChartID2d1067bcd01')
);
chart.draw(data,options);
}
// jsDisplayChart
function displayChartMotionChartID2d1067bcd01()
{
google.load("visualization", "1", { packages:["motionchart"] });
google.setOnLoadCallback(drawChartMotionChartID2d1067bcd01);
}
// jsChart
displayChartMotionChartID2d1067bcd01()
<!-- jsFooter </script>
<!-- divChart -->
<br />
<div id="MotionChartID2d1067bcd01" style="height: 500px; width: 600px;">
</div>
I like it better to explore those data. Select a country of interest and follow it.Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-1861070732189366413.post-48027038185832566402012-07-31T12:10:00.000-07:002012-07-31T15:53:18.319-07:00Twitter analysis of air pollution in BeijingOne of the air pollution detection machine in Beijing (<a href="http://beijing.usembassy-china.org.cn/070109air.html">at the American Embassy</a>) is connected to Twitter and tweet about the air quality in real time. By default the machine in Beijing output the 24hr summary PM2.5 air pollution information. What is PM2.5 is define <a href="http://airnow.gov/index.cfm?action=aqibasics.particle">here</a>
<br>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDI7Rm1YGc-4IawFJc-3OhZZTFix7a9dqqecUnqxxNGZLthyphenhyphenCB4Z5bZr7TTyqTKMvn-K6zkKodcuOnDmMpai59AAkXEZWLJYNdHwvBO9KUKLhWenaTCGTnbFSQOjccXl_4Q1m8vxPeQ0OE/s1600/twitter_pol.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDI7Rm1YGc-4IawFJc-3OhZZTFix7a9dqqecUnqxxNGZLthyphenhyphenCB4Z5bZr7TTyqTKMvn-K6zkKodcuOnDmMpai59AAkXEZWLJYNdHwvBO9KUKLhWenaTCGTnbFSQOjccXl_4Q1m8vxPeQ0OE/s400/twitter_pol.png" width="400"></a></div>
Next will be to compare the pollution level between different cities such as LA and Beijing. But it turns out the air quality data for California are not so easy to get programmatically.<br>
<br>
Here is the code I used to produce this analysis:<br>
<a href="http://brainchronicle.blogspot.com/2012/07/twitter-analysis-of-air-pollution-in.html#more">Read more »</a>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-1861070732189366413.post-86646663887485414972012-06-09T18:03:00.000-07:002012-06-09T18:05:18.574-07:00Rcpp vs. R implementation of cosine similarityWhile speeding up some code the other day working on a project with a colleague I ended up trying Rcpp for the first time. I re-implemented the cosine distance function using RcppArmadillo relatively easily using bits and pieces of code I found scattered around the web. But the speed increase was not as much as I expected comparing the Rcpp code to pure R.<br>
<script src="https://gist.github.com/2903031.js?file=Rcpp_cosine.r">
</script>
And here is the speed comparison...
<br>
<a href="http://brainchronicle.blogspot.com/2012/06/rcpp-vs-r-implementation-of-cosine.html#more">Read more »</a>Unknownnoreply@blogger.com4tag:blogger.com,1999:blog-1861070732189366413.post-81124851627544033112012-06-08T16:20:00.002-07:002012-06-08T16:20:54.828-07:00A new approach to discover pain related genesOur latest paper in PLoS Computational Biology is out.<br />
<span class="Apple-style-span" style="font-family: Helvetica;">The project spanned over 2 years starting at the end of my first year of postdoctoral training until now. It has been a truly collaborative endeavor across institutions but also across sub-disciplines using text-mining, leveraging public genomic data across diseases and genotyping a human twin cohort subjected to experimental pain. A big thank to all my collaborators.</span><br />
<span class="Apple-style-span" style="font-family: Helvetica;"><span class="Apple-style-span" style="font-family: Arial;"><a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002538" target="_blank">http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002538</a></span></span><br />
<span class="Apple-style-span" style="font-family: Helvetica;"><br /></span><br />
<span class="Apple-style-span" style="font-family: Helvetica;"><span class="Apple-style-span" style="font-family: Arial;">Briefly, we successfully demonstrated that ranking diseases by pain level using a literature co-citation approach and then extracting the gene whose expression change is associated with this ranking lead to interesting new pain gene candidate.</span></span><br />
The beauty of the approach is that it can be apply to other concept than pain. For example, we show in the paper that we can significantly prioritize genes involve in inflammation in a similar fashion.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-1861070732189366413.post-54594227444477983702012-06-03T22:11:00.000-07:002012-06-12T12:45:22.944-07:00Obtaining a protein-protein interaction network for a gene list in RBuilding a network of interaction between a bunch of genes can help a great deal in understanding the relationships between the seemingly disparate elements from your list. It can seems challenging at first to build such network but it's less complicated than it looks. Here is an approach I use.<br />
<br />
Resources to obtain interactions information are numerous. Logically we think to go for the central repository if it exists. Unfortunately, for protein-protein interaction (PPI) there are severals (<a href="http://www.ebi.ac.uk/intact/" target="_blank">IntAct</a>, <a href="http://thebiogrid.org/" target="_blank">BioGRID</a>, <a href="http://www.hprd.org/" target="_blank">HPRD</a>, <a href="http://string-db.org/" target="_blank">STRING</a>...).<br />
Using the API developed for these repo would require time and we usually don't have it. Fortunately, the gene web page from NCBI Entrez gene compile interactions from BioGRID and HPRD which seems like a reasonable and robust compromise. And on the other we can use the XML package to parse the web page.<br />
<br />
First, we need a gene list, here I refer you to an <a href="http://brainchronicle.blogspot.com/2012/05/another-look-at-over-representation.html" target="_blank">earlier post</a> where we extract a list 274 significantly differentially regulated genes.<br />
Using the following little function you can scrap the interaction table from the NCBI web page.<br />
<span class="Apple-style-span" style="color: red;"><b>[update: corrected bug where some genes returned an error]</b></span><br />
<script src="https://gist.github.com/2866212.js?file=get.ppiNCBI.r">
</script>
Here is a quick example with the first 20 genes from my list. You obtain your edge list in the form of a data.frame.
<script src="https://gist.github.com/2866218.js?file=ppi.r">
</script>
The <a href="http://cran.r-project.org/web/packages/NCBI2R/index.html" target="_blank">NCBI2R package</a> provides a similar function but there is a bug in GetInteractions().<br />
<br />
You can write this dataframe to a text file and import it in Cytoscape directly but you can also display and work your network directly in R using the igraph package.
<script src="https://gist.github.com/2866386.js?file=ppiNetwork.r">
</script><br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwN5KAFqRXIfcnarqucmchUbPdcQmPFl1Um2JPAodn8XRUBnub5476ILtiIBT_Wslz0_izXBldLYYnPjwOk5MAsKJJnmaj_WDK4XXxYDeqcSuT-C8tPTPmXryxzuyMdNWJoEzVr424jUE6/s1600/network.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwN5KAFqRXIfcnarqucmchUbPdcQmPFl1Um2JPAodn8XRUBnub5476ILtiIBT_Wslz0_izXBldLYYnPjwOk5MAsKJJnmaj_WDK4XXxYDeqcSuT-C8tPTPmXryxzuyMdNWJoEzVr424jUE6/s400/network.png" width="400" /></a><br />
The network is simple and not fully connected but consider we obtained interaction for 5 genes out of 20 here only.Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-1861070732189366413.post-56983155743929369782012-05-20T23:41:00.000-07:002012-05-20T23:41:11.972-07:00Another look at over-representation analysis interpretationInterpreting a list of differentially regulated genes can take many forms. One of the most widely used method is looking for enrichment of functional group of genes compared to a random sampling of gene from the same universe, namely an over-representation analysis (ORA).<br>
<br>
The point I want to explore today is <b>what is the best way to interpret the results of an ORA?</b><br>
The list of GO categories one obtain often tells a complex message and leave us with a confuse feeling that we are cherry picking the categories that fit our hypothesis the best.<br>
<br>
Let's have a look at an example. First, I extract a gene list from a publicly available experiment in Gene Expression Omnibus. I use GEOquery for that and obtain a list of 274 genes up- and down-regulated (code at the end).<br>
<br>
From this gene list we can perform a GO ORA fairly easily using the GOstats package. I combined all the steps necessary in two functions (GO_over.r and write.GOhyper.r) that you can found on <a href="https://github.com/bobthecat/codebox" target="_blank">my GitHub repo</a>. I usually download the functions directly from my R session using this function:<br>
<blockquote class="tr_bq">
<a href="https://github.com/bobthecat/codebox/blob/master/source_https.r" target="_blank">https://github.com/bobthecat/codebox/blob/master/source_https.r</a> (copy and paste it in your R session or save it to a file call source_https.r)</blockquote>
<script src="https://gist.github.com/2708387.js?file=GO_over.r">
</script>
Here we are presented with a table of <b>59 GO categories that are all significant</b> after multiple hypothesis testing correction. Cell adhesion, generation of neurons, cellular response to interferon-beta...<br>
<br>
<b>How to interpret this list?</b><br>
One way to do that is to display the Directed Acyclic Graph (DAG) of the over-represented GO categories in the list. But in my opinion it is difficult to get a big picture of such representation. We know that the GO categories (and to a lower extend pathways) share common genes. My hypothesis is that visualizing the relationship between GO categories based on the amount of gene shared will likely help to interpret the results. So what I do, in addition, is to visualize the amount of gene shared between GO categories by plotting the results of the ORA using a heatmap (code below the plot).
<br>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgNjKe17R32fQ3bc5DBSNjJOH9O4vA_VVmVuVTv0nA5dSsGZdpCTESkyjvu3mmvOx-W8Kk4gdI0ewIHO9TgEGvuGwCCHPRQwPZRN6OqMq-bAfpfzsQ63-gQWu4ZJKlIFnfRvxh2hp1QUaBX/s1600/GO_heatmap.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgNjKe17R32fQ3bc5DBSNjJOH9O4vA_VVmVuVTv0nA5dSsGZdpCTESkyjvu3mmvOx-W8Kk4gdI0ewIHO9TgEGvuGwCCHPRQwPZRN6OqMq-bAfpfzsQ63-gQWu4ZJKlIFnfRvxh2hp1QUaBX/s400/GO_heatmap.png" width="400"></a></div>
Rows and columns are GO categories. The color of each square represents the percentage of gene shared between any two categories. Here we see that our gene list (274 genes) seems to preferentially contain genes from three ensembles of GO categories that are in yellow along the diagonal. Based on this observation we can interpret that the main events going on in these cells seems to be linked to regulation of metabolism, cytoskeleton re-organization and neurons development. Which make sense when you consider that we compared iPS cells to neurospheres cells.<br>
<br>
I welcome comments about this approach (in fact this the purpose of this post). I would like to argue that such representation of a GO ORA is complementary to displaying a flat text table and plotting the DAG. Did anybody already used this approach to interpret GO ORA? Or has a better solution?<br>
I acknowledge that it is not the perfect solution. For example, if a category does not share many genes with others it does not mean it is not worth investigating. It might even be the key to understanding the biological experiment but there are a lot of those categories... which one to pick? Plus, I think a GO ORA does not aim at fined grain analysis but at a global overview of the events.<br>
<br>
Here is the code to produce the heatmap:
<br>
<a href="http://brainchronicle.blogspot.com/2012/05/another-look-at-over-representation.html#more">Read more »</a>Unknownnoreply@blogger.com4tag:blogger.com,1999:blog-1861070732189366413.post-7936775632151663682012-05-17T15:46:00.000-07:002012-05-17T15:46:03.386-07:00Installing EUtils perl moduleIn my recent post on <a href="http://brainchronicle.blogspot.com/2012/05/using-r-to-graph-subject-trend-in.html">Using R to graph a subject trend in PubMed</a> I used the EUtils Perl module. There are detailed general instructions on how to install Perl module <a href="http://www.rcbowen.com/imho/perl/modules.html">here</a> for all major OS. What I did on my Mac is that.<br />
I downloaded the archive from <a href="http://bioinformatics.tgen.org/brunit/downloads/tgen-eutils/">http://bioinformatics.tgen.org/brunit/downloads/tgen-eutils/</a> and ran those commands.
<blockquote>
tar -xzf TGen-EUtils-0.13.tar.gz<br />
cd TGen-EUtils-0.13<br />
## instruction are in the INSTALL file<br />
perl Makefile.PL<br />
make<br />
make test<br />
sudo make install<br />
</blockquote>
And that's it folks.Unknownnoreply@blogger.com3tag:blogger.com,1999:blog-1861070732189366413.post-90878829798878146752012-05-15T16:13:00.001-07:002012-05-21T13:04:32.662-07:00Using R to graph a subject trend in PubMedThe traditional way to show that your topic is worth studying in front of an audience is to show the state of the field based on a literature review. This is especially true if your subject is obscure except to a handful of scientists in the world.<br>
I was confronted with this problem more than once and the last time I decided to plot the state-of-the-field using a few scripts.<br>
I wrote three scripts for that: pubmed_trend.r that take your PubMed query and send it to the NCBI using the Eutils tools (Perl script). Then I plot the results. The details of the scripts are below but here is how you create your trend.<br>
In this example, we plot the trend for the number of publications per year for papers annotated with MeSH terms for "sex characteristics" and "pain" and compare this search to the number of publication/year for "sex characteristics" and "Analgesics". We will run this search between 1970 and 2011.
<script src="https://gist.github.com/2705715.js?file=pubmed_pain.r">
</script>
And here is the plot.
<br>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5bBw6b5KahS7YZKgG9UYtZD9kJ3Qqrd0djmGfAuwKq3hYxTRDabYbFcobzL3dES8mAPbucutiecbtiUkK39O_Bc2FPPi55wwnsNhNc9TpsO7wHDyCcwwxI10__XorVPzGvNaez4fmESq8/s1600/sex_pain.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5bBw6b5KahS7YZKgG9UYtZD9kJ3Qqrd0djmGfAuwKq3hYxTRDabYbFcobzL3dES8mAPbucutiecbtiUkK39O_Bc2FPPi55wwnsNhNc9TpsO7wHDyCcwwxI10__XorVPzGvNaez4fmESq8/s400/sex_pain.jpg" width="400"></a></div>
What we see here is that the number of publications per year talking about sex difference and pain or analgesics is growing but the number of publication per year is still small and more research is needed.<br>
...and you are good to go, your talk is <a href="http://media.tumblr.com/tumblr_lt77phjQ9f1qdlkgg.gif" target="_blank">launched</a><br>
<br>
Here are the details of the scripts and functions. The pubmed_trend.r takes a PubMed query string as you would type it in the search box through the web interface (space have to be replaced by '+').<br>
<a href="http://brainchronicle.blogspot.com/2012/05/using-r-to-graph-subject-trend-in.html#more">Read more »</a>Unknownnoreply@blogger.com13tag:blogger.com,1999:blog-1861070732189366413.post-13146004647629993092012-05-08T10:57:00.000-07:002012-05-16T00:04:39.430-07:00Hello worldThe Brain Chronicle blog is an attempt to share with the R and scientific community at large some methods, recipes and other thoughts that emerge from my day-to-day work as a computer biologist researcher.Unknownnoreply@blogger.com0