Tuesday, July 31, 2012

Twitter analysis of air pollution in Beijing

One of the air pollution detection machine in Beijing (at the American Embassy) is connected to Twitter and tweet about the air quality in real time. By default the machine in Beijing output the 24hr summary PM2.5 air pollution information. What is PM2.5 is define here
Next will be to compare the pollution level between different cities such as LA and Beijing. But it turns out the air quality data for California are not so easy to get programmatically.

Here is the code I used to produce this analysis:
library(twitteR)
library(ggplot2)
library(grid)
# download all that you can
pol <- userTimeline('BeijingAir', n=3200)
length(pol)
# 3200
myGrep <- function(x){
grep("PM2.5 24hr avg;", x$getText(), value=T)
}
POL <- unlist(lapply(pol, myGrep))
# cleaning no data tweets
POL <- POL[-grep("No Data", POL)]
# uncomment the following to combine with previous extract
# allPM <- unique(c(allPM, POL))
allPM <- POL
time <- sub("^(.*) to .*", "\\1", allPM)
# to posix time
time <- strptime(time, format="%m-%d-%Y %R")
PM <- as.numeric(sub("^.* 24hr avg; (.*); .*; .*", "\\1", allPM, perl=T))
data <- data.frame(PM=PM, time=time)
data <- data[order(data$time),]
yrange <- c(25, 75, 125, 175, 250, 400)
tsize=4
textPos <- as.POSIXct(strsplit(as.character(min(data$time)), " ")[[1]][1])
p <- qplot(time, PM, data=data, geom=c("blank"), group=1)
p +
labs(x = "Time", y = "Fine particles (PM2.5) 24hr avg", size = expression(log[10](pval))) +
opts(title="Air pollution in Beijing\nTwitter @BeijingAir", panel.background=theme_rect(colour="white")) +
geom_hline(aes(yintercept=50), colour="green", alpha=I(1/5), size=2) +
geom_hline(aes(yintercept=100), colour="yellow", alpha=I(1/5), size=2) +
geom_hline(aes(yintercept=150), colour="orange", alpha=I(1/5), size=2) +
geom_hline(aes(yintercept=200), colour="red", alpha=I(1/5), size=2) +
geom_hline(aes(yintercept=300), colour="darkred", alpha=I(1/5), size=2) +
geom_path(aes(time, PM), data=data, group=1) +
annotate("text", x=textPos, y=yrange[1], label="good", size=tsize, colour="grey70") +
annotate("text", x=textPos, y=yrange[2], label="moderate", size=tsize, colour="grey70") +
annotate("text", x=textPos, y=yrange[3], label="unhealthy", size=tsize, colour="grey70") +
annotate("text", x=textPos, y=yrange[4], label="unhealthy +", size=tsize, colour="grey70") +
annotate("text", x=textPos, y=yrange[5], label="unhealthy ++", size=tsize, colour="grey70") +
annotate("text", x=textPos, y=yrange[6], label="hazardous", size=tsize, colour="grey70") +
opts(title="Air pollution in Beijing\nTwitter @BeijingAir",
panel.background=theme_rect(colour="white"))
ggsave(filename="twitter_pol.png")
view raw tweet_air_pol.r hosted with ❤ by GitHub

1 comment: