I was confronted with this problem more than once and the last time I decided to plot the state-of-the-field using a few scripts.
I wrote three scripts for that: pubmed_trend.r that take your PubMed query and send it to the NCBI using the Eutils tools (Perl script). Then I plot the results. The details of the scripts are below but here is how you create your trend.
In this example, we plot the trend for the number of publications per year for papers annotated with MeSH terms for "sex characteristics" and "pain" and compare this search to the number of publication/year for "sex characteristics" and "Analgesics". We will run this search between 1970 and 2011.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
source('pubmed_trend.r') | |
sex.pub <- pubmed_trend(search.str = 'Sex+Characteristics[mh] AND Pain[mh]', year.span=1970:2011) | |
analgesic.pub <- pubmed_trend(search.str = 'Sex+Characteristics[mh] AND Analgesics[mh]', year.span=1970:2011) | |
source('plot_bar.r') | |
library("RColorBrewer") | |
pdf(file='sex_pain.pdf', height=8, width=8) | |
par(las=1) | |
colorfunction = colorRampPalette(brewer.pal(9, "Reds")) | |
mycolors = colorfunction(length(sex.pub)) | |
plot_bar(x=sex.pub, linecol="#525252", cols=mycolors, addArg=FALSE) | |
colorfunction = colorRampPalette(brewer.pal(9, "Blues")) | |
mycolors = colorfunction(length(analgesic.pub)) | |
plot_bar(x=analgesic.pub, linecol='black', cols=mycolors, addArg=TRUE) | |
title('Number of publication per year') | |
legend('topleft', | |
legend=c('Sex and Pain', 'Sex and Analgesics'), | |
fill=c("red", "blue"), | |
bty="n", | |
cex=1.1 | |
) | |
dev.off() |
What we see here is that the number of publications per year talking about sex difference and pain or analgesics is growing but the number of publication per year is still small and more research is needed.
...and you are good to go, your talk is launched
Here are the details of the scripts and functions. The pubmed_trend.r takes a PubMed query string as you would type it in the search box through the web interface (space have to be replaced by '+').
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
pubmed_trend <- function(search.str = 'Sex+Characteristics[mh] AND Pain[mh]', year.span=1970:2011) { | |
require(XML) | |
require(RCurl) | |
results <- NULL | |
tmpf <- "./tempfile.xml" | |
## clean before | |
system(paste("rm", tmpf)) | |
for(i in year.span){ | |
queryString <- paste(search.str, ' AND ', i, '[dp]', sep="") | |
print(paste('queryString:', queryString)) | |
sysString <- paste('./pubmed_trend.pl "', queryString,'"', sep="") | |
system(sysString) | |
xml <- xmlTreeParse(tmpf, useInternalNodes=TRUE) | |
pubTerm <- as.numeric(xmlValue(getNodeSet(xml, "//Count")[[1]])) | |
print(paste("#______num pub for",i,":",pubTerm)) | |
rm(xml) | |
results <- append(results, pubTerm) | |
## avoid being kicked out! | |
Sys.sleep(1) | |
} | |
names(results) <- year.span | |
## clean after | |
system(paste("rm", tmpf)) | |
return(results) | |
} |
[Update] I rely here on TGen EUtils Perl module instruction how to install it can be found here
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/perl -w | |
# | |
# pubmed_trend.pl | |
# | |
# Created by David Ruau on 2011-02-17. | |
# Department of Pediatrics/Div. System Medicine Stanford University. | |
# | |
##################### USAGE ######################### | |
# | |
# Query PubMed with Eutils tools | |
# | |
##################################################### | |
use Bio::TGen::EUtils; | |
use strict; | |
my $queryString = $ARGV[0]; | |
## query info | |
my $eu = Bio::TGen::EUtils->new( 'tool' => 'pubmed_trend.pl', | |
'email' => 'REPLACE_ME@gmail.com' ); | |
## EFetch | |
my $query = $eu->esearch( db => 'pubmed', | |
term => $queryString, | |
usehistory => 'n' ); | |
$query->write_raw( file => 'tempfile.xml' ); | |
if (-z 'tempfile.xml') { | |
# one more time | |
my $query = $eu->esearch( db => 'pubmed', | |
term => $queryString, | |
usehistory => 'n' ); | |
$query->write_raw( file => 'tempfile.xml' ); | |
if (-z 'tempfile.xml') { | |
open (FILE, '>', 'tempfile.xml') or die 'Could not open file, $!'; | |
print FILE "<begin>hello world</begin>"; | |
close (FILE); | |
} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
plot_bar <- function(x=sex.pub, linecol="royalblue", cols, addArg=TRUE) { | |
bp <- barplot(x, col=cols, add=addArg) | |
fit <- stats::lowess(x, f=1/3) | |
lines(x=bp, fit$y, col=linecol, lwd=3) | |
} |
Thanks! A very useful and widely applicable routine. I'll have to use this for my next talk!
ReplyDeleteWow, amazing post, thanks!
ReplyDeleteNice. A while ago I implemented something similar as a web app, though not in R; it's Perl with a Lucene index rather than an NCBI web query so it's quite fast.
ReplyDeletehttp://www.ogic.ca/mltrends/
That's pretty cool. I did not know your website. I see that you use a GET protocol so I queried MLTrends from R directly like that:
Delete> x <- read.table("http://www.ogic.ca/mltrends/?search_type=titles;norm_type=publications;graph_scale=linear;query=pain;Graph%21=Graph%21&DOWNLOAD=1", sep="\t", header=T)
> dim(x)
[1] 62 2
> plot(x$year, x$pain, type='l')
Building a little function with the 3 options (Search in, normalization and, scale) you propose is straight forward (pseudo code):
mltrends <- function(searchTerm="pain", searchIn=c(), norm=c(), scale=c()){
URLquery <- paste(query, together)
x <- read.table(URLquery)
return(x)
}
Awesome!
Can we use MeSH terms through your interface?
DeleteCurrently we're only indexing title and abstract by date (also authors but those aren't accessible via the web interface).
DeleteI couldn't get this to work at all, would you mind describing a bit more where I'm supposed to put which files? I'm not an R neophyte, but I don't know much about shell commands, perl or the xml packages so it's difficult to trouble shoot what's happening here.
ReplyDeleteSean, save the r scripts in pubmed_trend.r and plot_bar.r as in the example. Save the perl script in pubmed_trend.pl and make it executable. The example above assume that you put your scripts in the folder where you run R. You need to have perl installed and the required R packages.
Delete> sex.pub <- pubmed_trend(search.str = 'Sex+Characteristics[mh] AND Pain[mh]', year.span=1970:2011)
Deleterm: ./tempfile.xml: No such file or directory
[1] "queryString: Sex+Characteristics[mh] AND Pain[mh] AND 1970[dp]"
/bin/sh: ./pubmed_trend.pl: Permission denied
If I go to the shell and try to execute the perl file I get:
$ perl ~/pubmed_trend.pl
Can't locate Bio/TGen/EUtils.pm in @INC (@INC contains: /Library/Perl/5.12/darwin-thread-multi-2level /Library/Perl/5.12 /Network/Library/Perl/5.12/darwin-thread-multi-2level /Network/Library/Perl/5.12 /Library/Perl/Updates/5.12.3 /System/Library/Perl/5.12/darwin-thread-multi-2level /System/Library/Perl/5.12 /System/Library/Perl/Extras/5.12/darwin-thread-multi-2level /System/Library/Perl/Extras/5.12 .) at /Users/swilts/pubmed_trend.pl line 13.
BEGIN failed--compilation aborted at /Users/swilts/pubmed_trend.pl line 13.
So, I guess this also depends on downloading and installing Bio/TGen/EUtils.pm ?
I updated the post and added instructions how to install the EUtils module here: http://brainchronicle.blogspot.com/2012/05/installing-eutils-perl-module.html
DeleteOK, and a quick google shows the other half of my problem was not knowing that I needed to do this in the terminal:
Deletesudo chmod 755 pubmed_trend.pl
Looks like it's working now, cheers!
Sean, as concerns e-utilities specifically, an easy intro is to use NCBI's cool ebot: http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/ebot/ebot.cgi. It'll generate the code necessary to do simple tasks involving e-utilities. I use ebot just to save me having to code/remember how to use e-utilities. I can't imagine how to simplify it further...
ReplyDelete# In Ubuntu, install XML and Curl support.
ReplyDelete# Otherwise, installation of R-Packages will fail
sudo apt-get install libxml2-dev curl libcurl4-openssl-dev
# the perl script needs the TGEN-EUtils
# download package TGen-EUtils-0.xxx.tar.gz from http://bioinformatics.tgen.org/brunit/downloads/tgen-eutils/
# extract it to some place you like, e.g. ~/Downloads/
# in terminal go to the extracted folder
# type in:
perl Makefile.PL
make
make test
sudo make install
# Start R
# install packages 'XML', 'RCurl', 'RColorBrewer'
install.packages(c('XML', 'RCurl', 'RColorBrewer'),dependencies=T)
# now the script should work
# works at least on Ubuntu 12.04 LTS precise 64bit
# GREAT WORK, BTW!!!
# have fun