Monday, April 6, 2015

Diseases and Protein Protein interactions

To understand the molecular basis of genetic diseases, it is important to discover their causal genes. Typically, a disease is associated with a linkage interval on the chromosome if single nucleotide polymorphism (SNPs) in the interval are correlated with an increased susceptibility to the disease. These linkage intervals define a set of candidate disease-causing genes. Genes related to the same disease are also known to have protein products that physically interact A class of computational approaches have recently been proposed that exploit these two sources of information—physical interaction networks and linkage intervals—to predict associations between genes and diseases. Methods typically begin with an artificial disease subinterval and test how well they can identify a known causal gene from among a fixed number of nearby genes in the query subinterval. In another stringent method instead of ranking only genes in the subinterval, all genes in all intervals related to a query disease are ranked.  This more stringent approach is advantageous because it allows us to find disease-causing genes that lie in existing disease intervals but that were previously not associated with the disease. Consequently, we can gauge a gene's relatedness to any query disease.

Network-based algorithms to predict gene–disease associations:

A widely used network-based approach (‘Neighborhood’) predicts for a protein p the annotations that are associated with more than θ percent of p's network neighbors. The method associates a gene with a disease if it lies within a linkage interval associated with the disease and interacts with ≥1 gene annotated with the disease.

Random walks have been used to transfer annotations within networks. We define a random walk (‘RW’) starting from genes known to be associated with a query disease d. At each time step, the walk has a probability r of returning to the initial nodes. Once the process converged (L2-distance between probability vectors in consecutive time steps <10−6), a prediction was made for all genes in relevant intervals with visitation probability greater than θ.

Graph partitioning is a promising technique for predicting gene–disease associations because it can uncover functional modules in PPI networks, and phenotypically similar diseases are often caused by proteins that have similar biological processes. Three graph partitioning algorithms were recently shown to find the most biologically relevant modules: GS, MCL and VI-Cut. GS1 losslessly compresses the input network, producing a smaller summary network and a list of corrections to over-generalizations in the summary. The nodes in this summary correspond to modules in the input network. The summary graph can be further compressed by discarding the list of corrections and applying GS again, resulting in larger modules (‘GS2’). This process can be repeated i times, yielding a ‘GSi’ method. The ‘GS-All’ method makes the union of the predictions made by GS1, GS2 and GS3. VI-Cut is a semi-supervised clustering method that uses annotations in the training set when creating modules.

Quality of Network based predictions:

In a typical implementation the following results were obtained:

The random walk methods and Prop show a clear dominance over the clustering and neighborhood methods. The clustering methods, VI-Cut, and GS1 and its variants GS2, GS3 and GS-All, which have not previously been appraised for the task of predicting gene–disease associations, performed slightly worse than the random walk methods, but better than the neighborhood approaches. They achieve between 18.4% and 68.6% precision and 1.1% and 17.9% recall.


The classes of network-based methods considered here each approached the task of predicting gene–disease associations using very different philosophies. Although random walk approaches are superior to clustering and neighborhood approaches. In general certain diseases for which high-throughput PPI networks were an especially useful source from which to make high-quality predictions can be found. Diseases that have little correlation with the interaction network call for higher quality networks or an integrative approach that considers sequence, functional annotations, expression data or other additional information.


Aerts Set al. Gene prioritization through genomic data fusion. Nat. Biotechnol. 2006

  1. Brohee S
  2. van Helden J. 
Evaluation of clustering algorithms for protein-protein interaction networksBMC Bioinformatics 2006

  1. Fraser HB
  2. Plotkin JB
Using protein complexes to predict phenotypic effects of gene mutationGenome Biol. 2007

  • Ideker T
  • Sharan R
  • Protein networks in diseaseGenome Res. 2008

    No comments:

    Post a Comment