Sunday, June 3, 2012

Obtaining a protein-protein interaction network for a gene list in R

Building a network of interaction between a bunch of genes can help a great deal in understanding the relationships between the seemingly disparate elements from your list. It can seems challenging at first to build such network but it's less complicated than it looks. Here is an approach I use.

Resources to obtain interactions information are numerous. Logically we think to go for the central repository if it exists. Unfortunately, for protein-protein interaction (PPI) there are severals (IntAct, BioGRID, HPRD, STRING...).
Using the API developed for these repo would require time and we usually don't have it. Fortunately, the gene web page from NCBI Entrez gene compile interactions from BioGRID and HPRD which seems like a reasonable and robust compromise. And on the other we can use the XML package to parse the web page.

First, we need a gene list, here I refer you to an earlier post where we extract a list 274 significantly differentially regulated genes.
Using the following little function you can scrap the interaction table from the NCBI web page.
[update: corrected bug where some genes returned an error]
Here is a quick example with the first 20 genes from my list. You obtain your edge list in the form of a data.frame. The NCBI2R package provides a similar function but there is a bug in GetInteractions().

You can write this dataframe to a text file and import it in Cytoscape directly but you can also display and work your network directly in R using the igraph package.

The network is simple and not fully connected but consider we obtained interaction for 5 genes out of 20 here only.