We are searching data for your request:

**Forums and discussions:**

**Manuals and reference books:**

**Data from registers:**

**Wait the end of the search in all databases.**

Upon completion, a link will appear to access the found materials.

Upon completion, a link will appear to access the found materials.

I used different algorithms for splitting my large gene network into different sub-networks using Cytoscape. I compared the modularity scores and decided that the algorithm with the best modularity score will be used for clustering the network.

My colleague came up with the argument that modularity scores can only be compared for different number of clusters being created by the same algorithm.

So, my question is if modularity scores can be used to compare between different algorithms or only in the case of the same algorithm producing different number of clusters.

Since the goal of modularity estimation algorithms is to find the community structure that maximizes modularity, it is valid to compare different algorithms based on their modularity estimate. However, i) it is not the only consideration for choosing a community detection algorithm, and ii) the obtained modularity estimate will depend on the particular structure of each network, e.g. its number of nodes. For a principled comparison between several different algorithms, see Yang et al. (2016). Based on multiple criteria, they identify the multilevel (or *Louvain*) algorithm (Blondel et al. 2008) as the overall best choice.

_{ References- Yang, Zhao, René Algesheimer, and Claudio J. Tessone. “A Comparative Analysis of Community Detection Algorithms on Artificial Networks.” Scientific Reports 6, no. 1 (August 2016). https://doi.org/10.1038/srep30750.- Blondel, Vincent D., Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. “Fast Unfolding of Communities in Large Networks.” Journal of Statistical Mechanics: Theory and Experiment 2008, no. 10 (October 2008): P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008.}

## Author summary

In our labs, we aimed to use network algorithms to contextualize hits from functional genomics screens and gene expression studies. In order to understand how to apply these algorithms to our data, we characterized seventeen previously published algorithms based on characteristics of their output and their performance in three tasks: cross validation, prediction of drug targets, and behavior with random input.

**Citation:** Hill A, Gleim S, Kiefer F, Sigoillot F, Loureiro J, Jenkins J, et al. (2019) Benchmarking network algorithms for contextualizing genes of interest. PLoS Comput Biol 15(12): e1007403. https://doi.org/10.1371/journal.pcbi.1007403

**Editor:** Luonan Chen, Chinese Academy of Sciences, CHINA

**Received:** January 16, 2019 **Accepted:** September 11, 2019 **Published:** December 20, 2019

**Copyright:** © 2019 Hill et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability:** The majority of the data used for benchmarking are publically available, and their locations are described within the manuscript. A small subset of the datasets used were results from internal Novartis CRISPR screens that are proprietary to Novartis. Overall conclusions from the proprietary data was similar to the publically available datasets. All algorithms have been previously published and are cited within the manuscript. For this specific work, we used a re-implementation of algorithms into CBDD software package. This software is proprietary to Clarivate. For those interested in accessing the CBDD software, please visit www.clarivate.com for company contact information. Networks used in this work are a combination of a resource proprietary to Clarivate (see www.clarivate.com for company contact information) and a publicly available network (STRING). Generality of the results to other networks was confirmed with a publically available network, HumanNet, as described in the manuscript.

**Funding:** This research was funded by Novartis Institutes for BioMedical Research. Novartis provided support in the form of salaries for all authors. Army Research Office Institute for Collaborative Biotechnologies (W911NF-09-0001) funded the graduate school tuition of Abby Hill. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests:** I have read the journal's policy and the authors of this manuscript have the following competing interests: All authors were employed by Novartis when the work was completed, and some have equity interest in Novartis. AH is currently employed by Pfizer.

This is a

PLOS Computational BiologyBenchmarking paper.

## Methods

### Development of granular network

Granular networks for the present study have been created using two different protocols. The first protocol uses a physics-based Discrete Element Method (DEM) that simulates true granular dynamics. In the second protocol, we synthetically generate granular ensemble where, particles are positioned in a hexagonal lattice. We then use a square stencil to remove particles and therefore produce a patterned disordered region.

### DEM simulation to generate granular ensemble

The granular ensemble of the present study is created using DEM 27,28,29 simulation of 7428 macroscopic 2D particles of uniform size (radius 0.01 *m*) by centripetal packing. In this approach, an externally applied centripetal force (magnitude effectively equal to gravitational force), is directed towards the center of the box. DEM simulation protocol adopted for this article is identical to refs 30,31 , but nevertheless described in Supplementary Information (SI)-1 for the sake of completeness. All the particles move towards the center of the box because of this centripetal force and form a packing. During this packing process, collisions between particles cause reduction in their kinetic energy. Most of the particles in final structure pack in six-fold coordinated hexagonal structure. A very small fraction of the particles have coordination number (number of physically touching neighbors) different from six (Fig. 1a). The network is constructed by placing an edge between any two granules if they are in physical contact. Since most of the particles in the ensemble display six-fold coordination, the resultant network is a composition of mostly 6-regular sub-graph regions separated by irregular regions (Fig. 1b). The irregular regions consist of weakly connected nodes. These regions maybe visually discriminated from the strongly connected regular regions.

Granular ensemble and corresponding granular network. (**a**) Packing of 7428 uniform sized disks obtained by centripetal packing through DEM simulation (see main text and SI-1). The particles are colored based on their coordination numbers (see the colorbar at the extreme left). The ordered and disordered regions can be visually discerned. A good modularity function should become maximal for the partition that matches these partitions obtained by visual inspection. **(b)** Contact network formed from ensemble (**a**) by following the recipe: an edge is drawn between two particles if they are in physical contact. The weight of all edges is fixed as unity to make it an un-weighted network. Six-regular graph regions and irregular regions are shown in inset (extreme right). [The files containing node connectivity (graph_srep.gml) and the coordinates of each node and their corresponding degree (Particle_pos_with_coord_no_Srep.xls) can be downloaded from https://sites.google.com/a/iitbbs.ac.in/kks-research-work/research-data].

### Synthetically generated granular ensemble

In this protocol, particles are positioned in a regular hexagonal lattice. We then use a square stencil to remove particles and therefore produce a patterened disordered region. These patterns form logical boundaries for community partitions. The network formation protocol from these ensembles is same as discussed earlier.

### Finding the communities

For finding and creating communities in the network, we employ two different protocols. In the first protocol, we use a Potts model based community detection scheme discussed next. In the second protocol, we manually set the communities.

### Potts model based community detection algorithm

The accuracy of a graph clustering algorithm is mainly controlled by the quality function it uses. It should favor more edges inside community as well as restricts large number of missing edges by using some penalization function. We have used spin-glass-type Potts model algorithm for the community detection of Rhonhovde and Nussinov 9 (hereafter referred as RN model) This algorithm is described in details in SI-2 and the complete c ++ code of RN method can be downloaded from https://sites.google.com/a/iitbbs.ac.in/kks-research-work/research-data). The RN method attempts to iteratively find a partition that corresponds to the ground state of the following energy function (or Hamiltonian *H*) given in Eq. (2).

The asymmetry between connected (*A*_{ij} = 1) and disconnected (*J*_{ij} = *1-A*_{ij} = 1) edges may be reflected by individually setting the respective edge weights (denoted by *a* and *b*). The multiplicative factor, γ, is used as a structural resolution parameter. Altering the value of γ may increase or decrease the missing edge weights thus providing control over size and number of communities found by minimizing the Hamiltonian of Eq. (2). Typically, larger values of *γ* favor smaller communities, and vice versa.

The Potts Hamiltonian favors an intra-community link whereas it disfavors the missing edges within same community. The optimised state of the potts model Hamiltonian have mostly similar spin entities in same state and vice versa 32 .

### Manually setting the communities

In this protocol, we manually set the community boundaries without any regard to its optimality. If the community boundaries coincide with those associated with naturally occurring identifiable structures, clearly representing an optimal solution, then one may expect a higher modularity value, provided the function is properly constructed. At other times, we forcefully set the community boundaries not to coincide with such structures. Then one may anticipate a very low modularity values for such sub-optimal partitions. We will clearly demonstrate that our new modularity function is more sensitive to such changes and outperforms the NG modularity.

### New Modularity function

Before presenting our new function, we briefly discuss physical intuition that underlies our formulation. This intuition is deeply rooted in the study of naturally occurring magnetic materials. Specifically, the modularity function that we will shortly introduce is inspired by the formation of magnetic domains in ferromagnetic materials. In a magnetic material, there are “domains” in which all the moments tend to locally align (or become “polarized”) along the same direction. Such structures are of low energy. We wish to draw an analogy between such magnetic domains and clusters in a graph. Each node in the graph may be viewed as a local magnetic moments or “spin”, (such as that associated with an individual atom). An edge in the graph represents a magnetic interaction between two local moments (atoms/nodes). Along these lines, each community in the graph can indeed be viewed as a magnetic domain. Our modularity function is modeled by a Potts Hamiltonian 33 . The construction of this Hamiltonian essentially hinges upon the choices one makes to model the spin-spin interactions for four different scenarios. Adding these interactions together will yield the full Hamiltonian describing our system. The four possible scenarios in this spin dynamic formulation are the following (see SI-3 for details): (i) interaction between spins of identical polarization within same community (ii) interaction associated with a missing edge between nodes within the same community (i.e two nodes of identical spin polarization), (iii) interaction for an edge between two different communities (or two different magnetic domains of different spin polarization) and (iv) interaction representing a missing edge between two nodes from two separate communities. We will use a Heaviside step function to incorporate geometrical constraint associated with the individual items. A complete pictorial guide for all possible scenarios can be found in SI-4. While, the energy of a system (the Hamiltonian) is an extensive quantity, we will make the modularity an intensive parameter by scaling it with the total number of edges. We do so since, if the architecture of the network is relatively homogenous, we expect a comparable value of the modularity capturing the system size independent “quality” of the partition. Our proposed modularity function reads:

Here *a*_{ij} and *b*_{ij} are the strength (not to be confused with edge weights) of connected and missing edges between *i* *th* and *j* *th* nodes respectively. For setting up the strengths of edges, *a*_{ij} and *b*_{ij}, there can be many choices. We employ a comparison to the local degree distribution between the *i* *th* and *j* *th* node with the average degree distribution 〈*k*〉 of the network, and set

where, (langle k
angle =frac<1>*N* is total number of nodes and *k* is the node degree.

Our proposition, therefore, has two important distinctions from almost all of the traditional modularity functions that we know of, including the NG modularity (Eq. (1)) as well as other modularity functions developed for spatial networks. First, it does not explicitly resort to a null model (although the function of Eqs (2 and 3) was largely inspired by such a comparison). Second and most importantly, traditional methods neglect the inter-community interactions (both the existing and missing edges) whereas our function accounts for them. The key difference is embodied by the geometrical dependence that appears in our modularity of Eq. (3). The NG modularity penalizes for all missing edges between two distant nodes within same community. This penalty is imposed even when that edge might not be physically possible because of geometrical constraint. In order to restrict this incorrect over penalization, we introduced, in Eq. (3), a Heaviside unit step function *θ(Δx*_{ij}) which incorporates the geometrical constraints in the form of neighborhood definition. Here (<
m*- >_*

*i, j*and

*x*

_{c}defines cutoff distance for neighborhood and was chosen to be

*x*

_{c}=

*1.05** (

*R*

_{i}+

*R*

_{j}) where,

*R*is the radius of particle for the present study. The function

*θ(Δx*

_{ij}) introduces a penalty only for those missing links where an edge is geometrically possible.

*
*

*
*

Our function compares the local degree distribution at node level with the average degree distribution of the network (Eq. (4)). Its value will be high if the nodes of a community are highly linked with each other. Highly linked communities exhibit a local degree distribution that is larger than the average degree distribution over the entire network. We remark that employing the absolute value |*b|*, in Eq. (3) is not mandatory this takes care of a subtlety as we explained in SI-4. A thorough discussion on the implications of Eq. (3) for all possible scenarios is also exhaustively discussed therein.

## The gene regulatory network of mESC differentiation: a benchmark for reverse engineering methods

A large body of data have accumulated that characterize the gene regulatory network of stem cells. Yet, a comprehensive and integrative understanding of this complex network is lacking. Network reverse engineering methods that use transcriptome data to derive these networks may help to uncover the topology in an unbiased way. Many methods exist that use co-expression to reconstruct networks. However, it remains unclear how these methods perform in the context of stem cell differentiation, as most systematic assessments have been made for regulatory networks of unicellular organisms. Here, we report a systematic benchmark of different reverse engineering methods against functional data. We show that network pruning is critical for reconstruction performance. We also find that performance is similar for algorithms that use different co-expression measures, i.e. mutual information or correlation. In addition, different methods yield very different network topologies, highlighting the challenge of interpreting these resulting networks as a whole.

This article is part of the theme issue ‘Designer human tissue: coming to a lab near you’.

### 1. Background

Despite huge efforts to map out gene regulatory interactions between genes in many different cell types using ChIP-seq and similar methods, a comprehensive understanding of the gene regulatory network governing cell state transitions is still lacking [1]. In addition to functional annotations of the genome, a large body of gene expression data has been collected during the past decades. We therefore decided to investigate if such data could be leveraged to derive the topology of the gene regulatory network governing the complex transition from a stem cell to a differentiated cell. A number of approaches that address the problem of inferring gene regulatory networks from high-throughput data have been developed in the past 20 years. Most network reconstruction methods that use expression data make the assumption that the rate of change of a gene's mRNA concentration is a function of the mRNA concentration of all other genes [2]. Such an approach assumes that the protein concentration of each gene is determined purely by its mRNA concentration, and it ignores the influence of post-translational modifications.

Many network reconstruction approaches have been systematically benchmarked at the DREAM project [3]. However, these benchmarks have mainly been done on data from microorganisms such as *E. coli* and *Saccharomyces cerevisiae*. Also using data from *Escherichia coli*, Allen *et al*. [4] studied network reconstruction using data from 500 microarrays, and examined the influence of the number of samples on reconstruction quality. As the complexity of mammalian regulatory networks is much larger than that of yeast or bacteria, the results of those benchmarks may not be transferable to reconstructing more complex networks. Only very few studies have benchmarked reconstruction algorithms on mammalian datasets, largely because of missing gold standards. One study compared different methods on multiple datasets [5], showing that the ARACNE algorithm, which is based on information theory, performs best on the B-cell data, where there is a good gold standard [6]. Another study used mammalian datasets but used only GO enrichment, an indirect measure, as a performance metric, limiting the interpretation [7].

Two recent publications attempted to reconstruct mammalian embryonic stem cell networks. Cegli *et al.* [8] used ARACNE on 171 microarrays to reconstruct a gene regulatory network in mouse embryonic stem cells (mESC). Reconstruction quality was assessed using Reactome and transcription factor perturbation data. Cahan *et al.* [9] used approximately 4000 samples from different mouse tissues to train a gene regulatory network using the CLR [10] algorithm. They used the area under the precision-recall-curve (AUPR) to estimate the performance, using gold standards from ChIP-Chip data in the Escape database [11], a dataset measuring differential expression after overexpression, and the Encode ChIP-seq data. However, neither of these works included any comparison of reconstruction methods, making the improvement over random values hard to interpret.

Other attempts to reconstruct the gene regulatory network in mESC [10,12] used ChIP-seq data additionally or exclusively for training and are thus not directly comparable to approaches using only transcriptome data. Finally, some reconstruction efforts aim at only reconstructing a core pluripotency network containing a couple of nodes, using time-resolved data [13] or culture condition-dependent expression [14].

In this paper, we will use and compare different state-of-the-art network reconstruction methods to reconstruct the topology of the gene regulatory network governing the differentiation process of mESCs from a large body of transcriptome data. We evaluate these reconstructed networks using functional data on regulatory interactions, such as ChIP-seq data or transcription factor perturbation experiments.

### 2. Material and methods

#### (a) Algorithms and packages used

All calculations were performed in R. We compared different similarity measures (Pearson and Spearman correlation, mututal information, ARACNE [15], CLR [16], MRNET [17], partial correlation [18]), as implemented in base R, in the parmigene R-package, or in the parcor package. Most algorithms did not require a parameter, except for ARACNE (*τ*), partial correlation (parameter *k*, which we set to 3) and mutual information (estimation related parameter *k*, which was set to 9). All topological properties were computed using the R package igraph.

#### (b) Gold standards

A list of transcription factors in mouse ‘Gene list of TFs’ was obtained from the Animal Transcription Factor Database (ATFDB) [19], located at http://www.bioguo.org/AnimalTFDB/, which was used to limit the genes analysed to transcription factors. To define the ChIP-seq gold standard, we downloaded the Mouse ES Cell ChIP-Seq Compendium maintained by the Bioinformatics Core at Wellcome Trust—MRC Stem Cell Institute, available at http://lila.results.cscr.cam.ac.uk/ES_Cell_ChIP-seq_compendium_UPDATED.html. We computed the transcription factor association score for each gene in each dataset, as defined in [20], with characteristic length scale at 1000 bp. For the overexpression gold standard, we downloaded the GEO-datasets GSE31381 and GSE14559 using the R package GEOquery. Because these series contain a mix of already logarithmized expression values and linear expression values, all samples with median greater than 10 were logarithmized. To rank genes, we calculated empirical *z*-values. We first calculated the mean expression and the standard deviation for each gene using the four samples labelled as ‘Control’, and we then defined the standard deviation by a Loess fit on the standard deviation versus the mean with a span of 3. We then used a *z*-score threshold of 2.9 and a fold change of more than 2, which results in an estimated false discovery rate of less than 0.001. The loss of function (LoF) benchmark data were obtained from the Escape database [11]. The list of differential genes for each transcription factor assayed was downloaded from http://www.maayanlab.net/ESCAPE/download/logof.txt.zip. From these data, we kept only experiments marked as LoF. Gene symbols were translated to Ensembl gene ids using the symbolToGene function of the R package annmap (Ensembl v. 74). The transcription factor knockdown data were obtained from publication [21]. We downloaded the list of differential genes for each transcription factor assayed from http://www.nature.com/srep/2013/130306/srep01390/extref/srep01390-s1.xls. Gene symbols were translated to Ensembl gene ids using the symbolToGene function of the R package annmap (Ensembl v. 74). The literature-based PluriNetWork described in [22] was downloaded from WikiPathways (http://wikipathways.org/index.php/Pathway:WP1763) on 15 April 2014 using Cytoscape 3.1.0 and exported as plain text file. It was subsequently imported into R and restricted to node pairs that could be mapped to Ensembl gene ids using Ensembl v. 79. We furthermore retained only interactions where the first node was a transcription factor, yielding 362 interactions.

#### (c) Transcriptome data

Microarray samples were obtained from the GEO database automatically using the R packages GEOquery for data retrieval and GEOmetadb for data selection. Using GEOmetadb (time stamp 11/2013), the GEO database was searched for entries corresponding to the Affymetrix Mouse Gene St platform (GPL6246). These entries were filtered by the following criteria. First, the raw CEL data had to be present. Second, the description had to contain at least one of the following keywords: mESC, stem cell, stem cells, Oct4, Sox2, Nanog, Pou5f1, embryonic. The corresponding 1194 samples were then downloaded. These samples were normalized together using the rma function from the R package oligo. Probesets were annotated with Ensembl gene ids using the R package mogene10sttranscriptcluster.db. Probesets that were associated with more than one gene were omitted. The final expression matrix contained 19 615 genes measured in 1194 samples.

#### (d) Differentiated tissue annotations

We downloaded raw data for the microarray samples referenced in the manually curated CellNet tissue atlas [9] and normalized samples using RMA implemented in the R package oligo. Probesets were annotated with ensembl gene ids using the R package mogene10sttranscriptcluster.db. To compare the expression values of the CellNet samples with the keyword-based samples, both were re-normalized together using quantile normalization.

#### (e) Calculation of area under the precision-recall

We ranked all predictions based on the score, and calculated the precision-recall curve using the ROCR package [23] together with the interpolation algorithm of Boyd *et al*. [24], available at https://github.com/kboyd/raucpr [25]. The obtained AUPRs were used to calculate the improvement of the predictions over random.

#### (f) Size-dependent performance

From the expression dataset of 1194 samples, five random samples were drawn with the sizes 16, 32, 64, 128, 256, 512 and 1024. The reconstruction algorithms were then applied on the same random samples.

### 3. Results

Our approach to benchmark network reconstruction algorithms was as follows: we compiled a large collection of 1194 publicly available transcriptomes of mESC by querying the GEO database with keywords. By co-normalization, these data then represented a large data matrix of expression of genes in mESCs under different conditions and with different differentiation status. We subsequently applied a set of co-expression metrics and published state-of-the-art network reconstruction algorithms (figure 1*a*) to these data that generated ranked lists of potential transcription factor target gene pairs. We then compared the predictions of these algorithms with different sets of gold standards that link transcription factors and their target genes. To this end, we used ChIP-seq data, data on differential expression after transcription factor perturbation and a manually curated literature-based network. Using these gold standards, we could then rank individual algorithms by their average performance using the area under the precision-recall curve.

Figure 1. (*a*) Pipeline for the benchmarking of network prediction algorithms. (i) Microarray samples from a fixed Affymetrix mouse array type that are identified by keywords related to embryonic stem cells are downloaded from the GEO database. All samples are normalized together, yielding the gene expression matrix. Different algorithms that predict interactions based on co-expression are then applied to the expression matrix to obtain a list of gene pairs ranked by their interaction score. As we investigate gene regulation by TFs, only TF-gene pairs are retained. The quality of predictions is evaluated using two kinds of evidence for direct interaction, ChIP-seq and differential expression after TF perturbation. (ii) TF-gene pairs are classified as hits and misses according to ChIP-seq and TF perturbation evidence. For the ChIP-seq gold standard, a TF-gene pair is classified as hit when the TF binds close enough to the promoter of the gene. For the perturbation gold standard, genes with a significant fold change upon TF perturbation are classified as targets of the respective TF. In addtion, a literature-based network is used for benchmarking. From the rank of the hit and miss TF-gene pairs in the list ranked by interaction score, the precision-recall curve is calculated. The area under the precision-recall curve (AUPR) serves as quality benchmark of the network predictions. (*b*) Visualization of keyword-based samples relative to annotated differentiated tissue and ESC samples using tSNE.

#### (a) A large dataset of gene expression in mESCs

To aggregate a large mESC transcriptome compendium, we used the GEO database and searched for the following keywords in the abstracts: mESC, stem cell, stem cells, Oct4, Sox2, Nanog, Pou5f1 and embryonic. We then downloaded all datasets as raw data that were from one array platform (Affymetrix Mouse Gene 1.0 ST Array (Gene St) platform). This yielded 1194 transcriptomes. These data were then normalized together to obtain an expression matrix (figure 1*a*). When we visualized the samples using t-distributed stochastic neighbour embedding (t-SNE) together with samples from annotated tissues (CellNet dataset, figure 1*b*), we noted that our selected samples form a heterogeneous group where most samples cluster with annotated stem cells, many are not assigned to specific tissues and some co-cluster with tissues. Such heterogeneity is potentially important to reconstruct the network, since then the data contain sufficient variance in the expression of pluripotency-associated TFs, which in turn allows to identify genes that covary with these TFs.

#### (b) Selection of methods for network reverse engineering

Most network reconstruction methods use a measure for statistical association of two variables [26] to predict links. In this work, we decided to benchmark similarity measures that score non-functional (mutual information), functional monotonic nonlinear (Spearman correlation) and linear (Pearson correlation) relations [27–29] (figure 2). In addition, we added algorithms that are based on these measures, but in addition prune these network either on mutual information (ARACNE [15], MRNET [17], CLR [16]), or Pearson correlation (partial correlation).

Figure 2. Classification of scores with the help of the scheme proposed in [27]. Scores are classified by their ability to detect co-expression relationships with the indicated properties, nonlinear, non-monotonic and non-functional. The pruning column indicates whether the algorithm tries to remove indirect links.

All algorithms take a *n* × *m* expression matrix as input, where *n* is the number of genes and *m* is the number of (microarray) samples. The output is an *n* × *n* matrix that contains an association score for each gene pair. The ARACNE algorithm is the only one that needs an input parameter. This parameter *τ* determines how aggressively the algorithm tries to prune indirect links. The value *τ* = 1 corresponds to no pruning, while for *τ* = 0, the weakest link between three mutually connected nodes will always be deleted. In this work, we employed *τ* = 0.15 (denoted by aracne_15) and *τ* = 0.5 (denoted by aracne_50) corresponding to the preconfigured standard value and very weak pruning, respectively. To benchmark partial correlation, we chose different implementations, namely a sparse (pcor_lasso) and a non-sparse method (pcor_pls, partial least squares) for estimating partial correlation [18].

#### (c) Gold standards for determining direct TF-gene interactions

We compiled different complementary gold standards to benchmark the network reconstructions, covering TF binding and regulation by a TF [30]. First, we used a collection of ChIP-seq datasets (Mouse ES Cell ChIP-Seq Compendium [31]) that provides information about TF binding. Second, we used three collections of perturbation experiments, where transcriptomes are measured after TF knockout [11], TF knockdown (Kd) [21] and TF overexpression [32,33] (see §2). These gold standards emphasize different aspects. While TF overexpression will unveil the targets of TFs that are not active in mESCs, TF knockdown and knockout will only unveil targets of TFs that are already expressed in mESCs. Note also that the number and identity of the TFs are different between different gold standards. Additionally, we used the manually curated PluriNetWork [22], which concentrates on the core pluripotency factors. The overlap in interactions between these gold standards is shown in electronic supplementary material, figure S1.

Each gold standard was then filtered to those genes that are annotated as transcription factors using transcription factor annotation from the animal transcription factor database (ATFDB [19]). For details, consult §2.

#### (d) Benchmarking network prediction algorithms: pruning is important

For benchmarking the algorithms, we decided first to concentrate on the top predictions, as one of the main aims of network reconstruction is to generate a list of interesting candidates for further testing. We thus profiled the algorithms based on the area under the precision-recall curve for the top 5% recall (AUPR_{0.05} see full precision-recall curves in electronic supplementary material, figure S2). For each algorithm, TF–target interactions were ranked based on the absolute value of the corresponding score. As the pure AUPR_{0.05} score is hard to interpret, we decided to divide it by the AUPR_{0.05} score for random predictions. These resulting scores quantify the improvement over random predictions for individual algorithms with respect to the different gold standards and are shown in figure 3*a*. The ChIP-seq and LoF gold standards are less sensitive to differences between the algorithms, as can be seen from the relatively small performance differences of the top-performing algorithms. By contrast, in the case of the overexpression and Kd benchmark, two and three algorithms, respectively, clearly outperform the rest.

Figure 3. (*a*) Improvement of the AUPR_{0.05} over random predictions (performance) for networks predicted by individual scores. Performance is shown for gold standards based on ChIP-seq measurements (ChIP-seq), differential gene expression after TF knockdown (Kd), TF loss of function experiments (LoF), differential gene expression upon TF overexpression (overexpression) and a curated literature-based network (literature). Algorithms used for the comparison are Spearman correlation (spearman), Pearson correlation (pearson), mutual information (mi), ARACNE (with cut-off parameter 0.15 and 0.5, respectively), CLR, MRNET and partial correlation in the pls (pcor_pls), lasso (pcor_lasso) implementation. (*b*) Mean rank of the algorithm for all gold standards. (*c*) Size dependence of the log_{2} performance (AUPR_{0.05} over random). Points and errorbars indicate the median and standard deviation of the performance on five randomly sampled groups of arrays (same samples were taken for different scores) of the indicated size.

When comparing individual algorithms, partial correlation (pls implementation) and clr_mi attain the highest average rank across gold standards of all algorithms (figure 3*b*). Except for the ChIP-seq and literature-based gold standards, pls-based partial correlation ranks top for each gold standard. The two algorithms clr_mi and ARACNE with cut-off parameter *τ* = 0.5, both based on mutual information but using additional strategies for eliminating indirect links, perform similarly to pcor_pls. Therefore, all three top-performing algorithms employ a strategy for pruning indirect links. On the other hand, being able to detect only linear relationships does not impact performance strongly. This is indicated by the fact that partial correlation is among the best-performing algorithms and that Pearson correlation and mutual information attain a similar average rank. We concluded that reconstruction success of an algorithm is determined more by the pruning strategy used than by which measure is used to score co-expression patterns.

Somewhat surprisingly, the two implementations for partial correlation performed very differently, with the pls implementation ranking top, although this implementation has been shown to have the highest error for estimating the partial correlation matrix on synthetic data [18].

As experimental data are costly, it is also of interest how well an algorithm performs with respect to the amount of available data. Therefore, we also benchmarked the two top-performing algorithms clr_mi and pcor_pls on random subsamples of the entire expression dataset (figure 3*c*). The two algorithms differ markedly with respect to their size-dependent performance. The algorithm pcor_pls shows top performance even at the smallest subsamples (16 arrays) while performance saturates at approximately 500 arrays for clr_mi. To understand why clr_mi needs large amounts of data for top performance, we additionally computed the size dependence of the CLR algorithm based on Pearson correlation, CLR_Pearson. For this algorithm, performance quickly saturates similar to pcor_pls. This suggests that the slowly saturating behaviour of clr_mi is due to the relatively large amount of data needed to obtain good estimates for mutual information.

We were next interested to see whether some TF targets can be predicted better then others. We therefore assessed the performance of different algorithms on predicting targets for individual TFs present in the different gold standards, and compared the AUPR_{0.05}. The resulting improvements over random predictions are shown in figure 4. We found that the performance varies considerably among TFs, although some of this variation could be due to different numbers of true targets for different TFs (in the Kd, LoF and overexpression gold standards). However, even for the ChIP-seq gold standard, where we had a fixed number of targets, there is huge variation. For example, the log_{2} improvement over random of the clr_mi algorithm ranges from approximately 1 to more than 4.

Figure 4. (*a*–*d*) Comparison of the performance improvement over random log_{2}(AUPR_{0.05}/random AUPR_{0.05}) for individual TFs for aracne_50 clr_mi and pcor_pls. Scatter plot of the improvement of the AUPR_{0.05} over random predictions for the clr_mi and pcor_pls algorithms for the different gold standards as indicated.

When we clustered the different algorithms by using the correlations in their performance for the TFs (electronic supplementary material, figure S3), we noted that the pcor_pls score is the only score that is both not correlated to the mutual information-derived scores and performs well with respect to the overall improvement over random. That makes this score an interesting candidate for providing TF targets that complement the predictions of other well-performing scores. This is especially obvious when comparing the performance of pcor_pls and the clr_mi algorithm for individual TFs determined on the overexpression gold standard (figure 4).

#### (e) Predicted topologies of the TF–TF network differ strongly

We next used the top-performing algorithms to predict the topology of the gene regulatory network in differentiating mESC, and asked if there is a clear hierarchy between core pluripotency factors and ancillary pluripotency factors [14]. Another interesting aspect that could be inferred from the topology is how pluripotency factors are coupled to lineage-specific factors in order to regulate the transition from the maintenance of pluripotency to the commitment to certain lineages [34].

To derive the network topology, we applied the three top-performing algorithms and took the top 0.1% of interactions across the predicted network. The predicted networks are shown together with the literature-based network in figure 5*a*. These visualizations unveil that the networks exhibit very different topologies. We next compared the topological properties of the literature network and the predicted networks by quantifying some standard measures (table 1). The degree distribution (figure 5*b*) of the literature network shows an approximately linear decrease in the log–log-plot. Contrary to this, the predicted networks show a marked deviation from a linear degree distribution for the highest degrees. The linear degree distribution of the literature network up to the highest degrees is reflected by the fact that the most important nodes, Nanog and Pou5f1, are involved in 50% of all interactions. Their targets typically have no interactions among themselves, leading to a star-like structure centered on Nanog and Pou5f1 (figure 5). This fact is reflected in the low transitivity of the literature network when compared with the predicted networks. A related quantity, the degree correlation coefficient indicates whether nodes of high degree are typically connected to other nodes of high degree (positive degree correlation) or to nodes of low degree (negative degree correlation). The degree correlation coefficient of −0.47 confirms the star-like structure.

Figure 5. (*a*) Comparison of the topologies of the predicted TF–TF networks with the literature-based network for the top-ranking algorithms. The transcription factors Pou5f1, Sox2 and Nanog are indicated by magenta dots. (*b*) Degree distribution for the TF–TF gene regulatory networks predicted by the indicated algorithms, with aracne_15 and aracne_50 denoting the ARACNE algorithm with 0.15 and 0.5 as cut-off parameter and pcor_pls denoting partial correlation in the pls implementation.

Table 1. Quantification of topological properties of the predicted networks. The networks predicted by the algorithms clr_mi, partial correlation in the pls implementation and ARACNE with 0.50 as cut-off parameters are compared with the literature network. Mean rank OSN (Oct4, Sox2, Nanog): denotes the mean degree rank of the core triad, with low values indicating high degrees % in largest component: the fraction of vertices contained in the largest connected component no connected components: the number of connected components modularity edge betweenness: the modularity of the graph according to the edge betweenness community measure modularity fastgreedy: the modularity of the graph according to the fastgreedy community measure diameter, the graph diameter transitivity, the graph transitivity.

In the network reconstructed by ARACNE, interactions are focused on a small fraction of all nodes, which form the centre of the largest connected component. The centre of this connected component is formed by Oct4, Sox2 and Nanog (OSN), among others. This is reflected in the very low mean rank of the degrees of this core triad. Similar to the literature network, the ARACNE network has a low modularity and also a relatively low diameter. It differs from the literature network in its degree correlation, which is positive. This fact points towards a hierarchy of nodes, with the highest degree nodes connected to the next highest degree nodes, and so on.

The network predicted by partial correlation has the highest fraction of nodes with small degree among the predicted networks. Also, the degree correlation is near zero, indicating that there is little substructure in the largest connected component. There are no communities that are only connected to other communities via gateway nodes, resulting in a comparably low diameter and the largest connected component of all predicted networks. The mean rank of the degree of the core triad is higher than for ARACNE, reflecting a less central place of these TFs in the network.

Finally, the network predicted by the clr_mi algorithm shows the most pronounced modular structure with groups of nodes that have high connectivity among themselves but low connectivity with outside nodes. The modular structure is reflected in the highest modularity index of all predicted networks as well as the highest transitivity and degree correlation. Though the core triad is located in a community of nodes with high connectivity, the smallness of this community leads to the highest mean rank of the degree of OSN among the predicted networks. The large diameter of the clr_mi network is also a consequence of the modular structure. Nodes from different communities can only be connected by paths traversing the few gateway nodes that connect communities.

It is known that reconstruction algorithms, in general, tend to enrich different types of motifs [26], affecting the topology of the predicted network. Some of the different topological features of the predicted networks can be explicitly traced back to the way the employed algorithm works. Networks induced by correlation are more transitive than random networks, whereas those induced by partial correlation are less transitive than random networks [35]. The ARACNE algorithm also directly influences transitivity by cutting all weakest links in triangles unless their strength is above the tolerance parameter. Here, ARACNE shows rather high transitivity because the cutoff parameter *τ* was set to the large value of 0.5. The clr_mi algorithm does not influence the topology in an explicit way, but induces a network that exhibits a unique modular structure among the predicted networks. In comparison with these reconstructed networks, the literature network is strongly shaped by the prominent roles of Pou5f1 and Nanog and to a lesser degree Sox2. Thus, its structure may to some degree also be a consequence of a publication bias. Taking into account the bias introduced by the different algorithms and the extreme focus of the literature network on the core triad, it seems that each algorithm emphasizes different aspects about the topology of the TF–TF network in mESC from those unbiased network reconstruction attempts.

### 4. Discussion and conclusion

Understanding how development and differentiation are controlled at the molecular level depends largely on understanding the underlying gene regulatory networks [36]. Both for human and mouse ESC, a large body of published transcriptome data has accumulated from which regulatory interactions can be deduced. Network reconstruction based on co-expression measures seems to be a promising approach for deducing interactions from this data.

Here, we analysed how useful different reconstruction algorithms are for inferring the gene regulatory network underlying differentiation of embryonic stem cells. We focus on mouse, as for mESCs there are several datasets that can serve as a gold standard. We benchmarked the predictions generated by different algorithms using orthogonal high-throughput experiments. We could show that the best-performing algorithms use pruning schemes that remove indirect links. One of the top-performing algorithms, partial correlation, was particularly interesting because its predictions are not strongly correlated with those of the other well-performing algorithms. A topological analysis of the TF–TF regulatory network predicted by the three top-performing algorithms highlighted markedly different topologies whose features can be traced back to the design of the algorithm. Thus, while evidence from high-throughput data supports the usefulness of the predictions based on co-expression, the observed network structures are strongly influenced by the employed algorithm. This is one of the aspects currently limiting the usefulness of network reconstruction algorithms.

Network reconstruction has been approached with differing scope and multiple tools. Reconstruction of small scale networks is often used to integrate information about multiple perturbation experiments into a single model. This model can then be used to predict untested perturbations without tedious or costly experiments. This approach to molecular networks has proved to be fruitful in devising ways to manipulate cellular signal transduction [37]. In the field of stem cells, it can be used to understand how the wiring between signalling pathways and pluripotency-associated factors determines maintenance of the pluripotent state under different culture conditions [14,38].

Network reconstruction on a larger scale was often done with the goal of generating lists of candidate genes for follow-up experiments [8,12]. Another aspect that has been studied with large-scale networks is the overall structure of the network and its relation to selective pressures that gave rise to the observed network structure [39]. Important concepts in this context are overrepresented motifs and the frequency with which feedback is encountered in biological networks [40].

Current limits in the quality of network reconstructions, as also observed in this work, show that the goal of obtaining a comprehensive reliable representation of the regulatory network is still distant. How can progress in this direction be made? New data may significantly enhance our ability to infer networks. In particular, single cell data may open up the possibility of observing distinct states of a network that get blurred in bulk data [41–43]. This may help to resolve the temporal sequence of gene silencing and activation. Perturbations in conjunction with well-resolved time series may also yield important insights as one can separate early, presumably direct, from late, presumably indirect effects. The currently available transcription factor perturbation data are usually optimized for detecting even weakly deregulated genes [32]. As this is achieved by assessing the transcriptome days after the start of the perturbation, a large amount of differential expression is caused by indirect effects.

How can these approaches help experimental researchers interested in studying differentiation? First, reverse engineering methods are good enough to leverage the large body of available transcriptomes to identify likely TF–target interactions that can then be validated experimentally. Second, our results also show that different methodologies unveil different sets of TF–target interactions, thus it is valuable to use different algorithms. However, it is clear that the networks reconstruction algorithms are clearly not reliable enough (yet) that their results can be taken for granted, and can only be taken as starting points as candidates for further experimental validation.

## 2 Results

### 2.1 Validity of the simulation-based evaluation framework

As shown in the framework diagram ( Figure 1B ), the initial biological difference between the two conditions lay in the network difference, and then this difference flowed downward through simulated expression data to the resultant ranking scores of candidate regulators. We first attempted to ensure that it was the initial network difference that governed the resultant ranking scores, ruling out the possibility that the observed regulator ranking was attributed to technical biases. For this purpose, we performed a 𠇌ontrasting” simulation experiment, which involved dataset pairs originating from two contrasting regulatory networks, and a “homogeneous” experiment, which involved dataset pairs originating from two identical regulatory networks. We compiled 40 dataset pairs for each scenario, and hence obtained 40 ranked lists for the homogeneous case and another 40 for the contrasting case. In the homogeneous network comparisons, the resultant regulator ranking was random, and the alternative rankings should not have substantial mutual consistency. In the contrasting network comparisons, however, if the regulator ranking algorithm had sufficient discrimination power, then the persistent structural difference should drive the 40 trial results towards a consensus reflective of the genuine structural divergence.

For each algorithm's multiple results out of the 40 repetitive runs, we calculated the Spearman correlation values for each of the 780 combinations formed from the 40 score lists for the contrasting experiment and the homogeneous experiment, respectively, and we comparatively showed the mean Spearman correlation values as bar plots ( Figure 2 ). Remarkably, each algorithm demonstrated more consistent results in the contrasting case than in the homogeneous case for all or most of the surveyed key parameter values (Mann-Whitney test *P*-valueπ.01 Figure 2 ). While Figure 2 involved perturbation of one specific regulator, the same conclusion held for the perturbation of every regulator (data not shown). Such an increase in result consistency in the presence of biological difference implied that all existing algorithms were able to reflect regulatory difference signals introduced through regulator inactivation in their ranking results. Therefore, we judged that our simulation framework ( Figure 1B ) was valid for our purposes of evaluating the discrimination ability of algorithms in regards to the (inactivated) differential regulator(s).

Regulator rankings are more consistent in the presence of biological difference than in the absence of biological difference. Seven regulator-ranking algorithms (subplot titles) were implemented to rank 47 candidate regulators based on a pair of simulated expression datasets, which were derived either from two differential regulatory networks (𠇌ontrasting”) or from two identical networks (“homogeneous”). Wherever applicable, a series of the algorithms’ key parameter values were investigated. With key parameter value being fixed, 40 redundant dataset pairs were simulated for repeated testing. Each subplot shows the mean and standard deviation of Spearman correlations of 780 pairs formed from 40 redundant ranking lists. An asterisk (*) indicates a significant difference between contrasting result consistency and homogeneous result consistency (Mann-Whitney test *P*-valueπ.01).

While only mutual result consistency is shown in Figure 2 , we could have a glimpse of some technical characteristics of the surveyed algorithms therein. The contrasting result consistency actually indicated the algorithms’ robustness against variation arising from sample choice or technical noise. As a result, TED, TDD, and TFactS appeared more stable in this regard than other algorithms ( Figure 2 ). The homogeneous result consistency indicated how much an algorithm was biased towards a certain default regulator ranking. And indeed we observed certain result consistency in the absence of differential regulation for TED, TDD, TFactS, dCSA_t2t, and dCSA_r2t ( Figure 2 ). Coincidentally, these five algorithms all require a defined regulator-target network ( Table 1 ), and in particular, TED and TFatS rely on statistical tests in which a regulator's out-degree plays a decisive role (eqs. (1) and (3)). We presume that many of these algorithms may be biased towards regulators with larger out-degrees. This notion was further supported, as shown in the next section. Nevertheless, we noticed that, in general, smaller key parameter values were associated with lower homogeneous result consistency but higher contrasting result consistency, a pattern most evident in TED yet still discernable in TDD and TFactS ( Figure 2 ). This observation may suggest technical biases were less severe with reasonably small key parameter values.

### 2.2 TED and TFactS outperformed other algorithms in simulation evaluation

In single-regulator inactivation tests, we had 47 sets of results, each for a particular inactivated regulator. In each set of results, PTA scores were retrieved for seven algorithms across a range of tested key parameter values. The two sets of results for the regulators with maximum and minimum out-degrees, respectively, are shown in Figure 3A and B , while all 47 sets of results were averaged and shown in Table 2 . In the results for the most extensively regulating regulator ( Figure 3A ), we observed that TED and TFactS overall outperformed the other algorithms with the highest PTA scores throughout a majority of the key parameter value range. Actually, when all results for the total 47 regulators were summarized, TED and TFactS indeed turned out to be the best and second-best algorithms, respectively, as measured by accuracy ( Table 2 , column “PTA”). TED and TFactS also had better robustness against data variations than most other algorithms ( Table 2 , column “RAV”), suggesting that their results might be more stable in varied sample recruitment in real practical usage. However, TED and TFactS were sensitive to key parameter value ( Table 2 , column “RAP”), which means that special attention must be paid to the fraction of interesting genes (DEGs or DCGs) in real practical application.

(color online) Performance comparison of seven methods in the scenario of one single regulator being inactivated. A, Results for the regulator with maximum out-degree (72). B, Results for the regulator with minimum out-degree (3).

### Table 2

Evaluation results based on 47 single-regulator inactivation experiments

PTA a) | RAP b) | RAV c) | ||||
---|---|---|---|---|---|---|

PTA value | PTA rank | RAP value | RAP rank | RAV value | RAV rank | |

TED | 0.621ଐ.171 | 1 | 0.038ଐ.017 | 5 | 0.332ଐ.091 | 3 |

TDD | 0.571ଐ.188 | 5 | 0.036ଐ.021 | 4 | 0.440ଐ.055 | 2 |

TFactS | 0.616ଐ.225 | 2 | 0.110ଐ.048 | 7 | 0.768ଐ.032 | 1 |

RIF1 | 0.509ଐ.067 | 7 | 0.036ଐ.014 | 3 | 0.035ଐ.033 | 7 |

RIF2 | 0.550ଐ.128 | 4 | 0.054ଐ.035 | 6 | 0.106ଐ.054 | 6 |

dCSA_t2t | 0.515ଐ.159 | 6 | 0 d) | 1 | 0.317ଐ.041 | 4 |

dCSA_r2t | 0.597ଐ.240 | 3 | 0 d) | 1 | 0.296ଐ.060 | 5 |

As shown in the two subplots of Figure 3 , there was a major difference in accuracy between the regulator with the maximum out-degree and the regulator with the minimum out-degree. As we reasoned earlier, differential regulator prioritization accuracy may be correlated with the regulator's out-degree. When we systematically analyzed this relationship for the 47 separately inactivated regulators, we did find a significant positive correlation between PTA scores and TF out-degrees for TED, TDD, TFactS, and RIF1 ( Figure 4A ), implying that an extensively regulating regulator was likely more discoverable. Another algorithm, dCSA_r2t, demonstrated a significant negative correlation between the PTA scores and mean in-degrees of the targets of perturbed regulators ( Figure 4B ). As small in-degrees correspond to dominant influences of the single regulator on its targets, it is indicated that an exclusively regulating regulator may also likely be discovered.

(color online) Correlation between discrimination accuracy of 47 separately inactivated regulators and their regulatory characteristics. A, A significant, positive Pearson correlation (*P*-valueπ.01) was observed between PTA (priority of true answer) scores and out-degrees of the regulators in four algorithms (TED, TDD, TFactS, and RIF1). B, A significant negative Pearson correlation (*P*-valueπ.01) was observed between PTA scores and the mean in-degree of each regulator's targets in algorithm dCSA_r2t.

Once the PTA scores for four algorithms over each single regulator were plotted on Figure 4A , we could compare the accuracies of the involved algorithms over a broad view. Still, it was evident that the upper portions, characterized by higher PTA scores, were dominated by TED and TFactS. Examining the points on each vertical line led to a degree-specific comparison of algorithm accuracies, and the results may suggest an advantage of TED over TFactS for differential regulators with higher-degrees ( Figure 4A ).

Lastly, we showed the discrimination accuracies when multiple regulators were inactivated ( Table 3 ). In total, we designed 11 simulation cases, where the first eight ( Table 3 , cases A1�) shared a common baseline regulatory network, and the next three had separately selected baseline networks ( Table 3 , cases B, C, and D). On average, TED and TFactS were ranked the best and the second-best of the seven algorithms, respectively. However, there was considerable variation in the AUC values between cases, in particular, between different baseline networks ( Table 3 ). The AUC value did not appear to correlate with the fraction of differential regulators, as the toughest case (case B, Table 3 ) for most algorithms happened to have the smallest fraction of differential regulators. We earlier found that the out-degrees of individual regulators, and at times, the in-degrees of the targets, were important factors affecting the discrimination accuracy of the algorithms. Thus, when multiple regulators were simultaneously inactivated, the scenario became much more complex. More rigorous tests are warranted to elucidate the mechanisms underlying multi-regulator inactivation scenarios.

### Table 3

Prioritization accuracies in multi-regulator inactivation experiments

Case | # nodes | # edges | # regulators | DR fraction a) | AUC b) | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|

TED | TDD | TFactS | RIF1 | RIF2 | dCSA (t2t) | dCSA (r2t) | |||||

A1 | 30% | 0.69 | 0.61 | 0.60 | 0.62 | 0.48 | 0.53 | 0.57 | |||

A2 | 29% | 0.70 | 0.67 | 0.62 | 0.51 | 0.67 | 0.51 | 0.61 | |||

A3 | 28% | 0.69 | 0.58 | 0.64 | 0.62 | 0.43 | 0.48 | 0.61 | |||

A4 | 1000 | 2309 | 103 | 27% | 0.77 | 0.60 | 0.60 | 0.48 | 0.68 | 0.51 | 0.54 |

A5 | 26% | 0.76 | 0.62 | 0.63 | 0.42 | 0.51 | 0.58 | 0.49 | |||

A6 | 24% | 0.66 | 0.64 | 0.58 | 0.69 | 0.77 | 0.59 | 0.53 | |||

A7 | 23% | 0.69 | 0.58 | 0.55 | 0.63 | 0.68 | 0.61 | 0.52 | |||

A8 | 18% | 0.63 | 0.56 | 0.55 | 0.68 | 0.65 | 0.54 | 0.62 | |||

B | 1000 | 2293 | 95 | 16% | 0.50 | 0.37 | 0.47 | 0.44 | 0.38 | 0.42 | 0.78 |

C | 1000 | 2322 | 105 | 49% | 0.65 | 0.71 | 0.54 | 0.33 | 0.63 | 0.48 | 0.50 |

D | 1000 | 2301 | 98 | 20% | 0.63 | 0.55 | 0.69 | 0.54 | 0.62 | 0.49 | 0.47 |

### 2.3 TED and TFactS replicated multiple differential regulators in two lung cancer datasets

Since TED and TFactS were found the most accurate algorithms in the above simulation evaluations, we extended the evaluation of these two algorithms by using two real lung cancer expression datasets. The ranked TF lists outputted by TED and TFactS, respectively, were limited to a threshold of 1.3, corresponding to a nominal *P*-value of 0.05. Depending on the network and the algorithm choice, anywhere from a couple to tens of differential regulators were retrieved ( Table 4 ). In general, more differential regulators were associated with dataset Lung-I (with more samples) than dataset Lung-II, and more differential regulators were associated with the TRANSFAC-A (with more regulatory relationships) than with the TRANSFAC-B network. A minor violation to this general pattern was found when TED run on dataset Lung-II with the TRANSFAC-B network this combination led to 10 differential regulators, which was slightly greater than that (9) out of the larger network or that (8) out of the larger dataset.

### Table 4

Summary of differential regulators identified from two lung cancer datasets

Network | Dataset | TED | TFactS | TED+TFactS |
---|---|---|---|---|

TRANSFAC-A | Lung-I | 21 | 63 | 3 |

Lung-II | 9 | 30 | 3 | |

Replicated | 0 | 16 | 0 | |

TRANSFAC-B | Lung-I | 7 | 7 | 0 |

Lung-II | 10 | 2 | 0 | |

Replicated | 2 | 1 | 0 |

We first compared TED and TFactS in terms of the number of prioritized regulators. As shown in Table 4 , TFactS identified more differential regulators than TED with the larger network TRANSFAC-A (63 vs. 21, or 30 vs. 9), but equal or fewer differential regulators with the smaller network TRANSFAC-B (7 vs. 7, or 2 vs. 10). Then, we checked the replication scenario of each algorithm from dataset Lung-I to dataset Lung-II. Using the larger network TRANSFAC-A, those reproduced numbered 0 of TED's initial 21 regulators, and 16 (25.4%) of TFactS's initial 63 regulators. Using the smaller network TRANSFAC-B, those reproduced were two (28.6%) of TED's initial seven regulators, and one (14.3%) of TFactS's initial seven regulators ( Table 4 ). Given these two layers of comparative results, we might speculate that TFactS worked better in a larger-scale, denser regulatory network, while TED is comparable to TFactS in a smaller-scale, sparser regulatory network. However, due to the limited number of datasets, the comparative conclusion may not be generalizable to future cases. Of note, genes contained in dataset Lung-I were more discriminable from the DE perspective than from the DCE perspective, as genes with borderline DE features were not included (see more details in [18]) accordingly, dataset Lung-II was also biased towards the DE feature. Indeed, from Lung-I to Lung-II, we observed significant consistency in the DEG/non-DEG classification (Fisher's exact test, *P*-valueς.2휐 𢄦 ), but no significant consistency in the DCG/non-DCG classification. Though these two datasets were apparently favorable to TFactS, TED still showed comparable performance under the TRANSFAC-B network. It is expected that TED may show an even better performance in real applications involving unbiased sets of genes.

Regardless of algorithm choice, quite a few differential TFs reproduced from Lung-I to Lung-II. A total of 19 repetitively identified regulators are listed in Table 5 as a reference for other researchers. Of these 19 TFs, five (ARID5B, IRF1, MAX, SPI1, and TCF3) were covered in our two expression data matrices. These five TFs generally had medium to high expression levels in Lung-I dataset as compared to the total genes, but some showed a dramatic decrease of expression level in the other dataset Lung-II. Two TFs were considered as DCGs in dataset Lung-I but not in Lung-II three TFs were considered as DEGs in dataset Lung-I and two of them (TCF3 and SPI1) were repetitively differentially expressed in dataset Lung-II. According to these observations of specific cases, we might infer that differential regulators might not demonstrate remarkable and stable expression features on their own. The algorithms could discern their importance through analyzing the systematic expression changes among their target genes.

### Table 5

Differential TFs identified from both lung cancer datasets by TFactS or TED

Algorithm/network | TF | Dataset Lung-I | Dataset Lung-II | ||
---|---|---|---|---|---|

Score | Rank | Score | Rank | ||

TFactS/ TRANSFAC-A | GTF2I | 3.8 | 8 | 2.2 | 2 |

IRF1 | 3.4 | 15 | 1.7 | 16 | |

RBPJ | 2.8 | 19 | 1.4 | 26 | |

GLI1 | 2.5 | 23 | 2.1 | 3 | |

NKX2-2 | 2.2 | 25 | 1.3 | 30 | |

ZIC1 | 1.9 | 35 | 1.5 | 24 | |

ZIC3 | 1.9 | 35 | 1.5 | 24 | |

MYOD1 | 1.8 | 39 | 1.8 | 13 | |

NR4A2 | 1.7 | 40 | 2.1 | 1 | |

ASCL1 | 1.7 | 44 | 1.8 | 7 | |

MYF5 | 1.7 | 44 | 1.8 | 7 | |

MYF6 | 1.7 | 44 | 1.8 | 7 | |

TCF4 | 1.7 | 44 | 1.8 | 7 | |

ARID5B | 1.6 | 47 | 1.9 | 5 | |

TCF3 | 1.6 | 51 | 1.8 | 14 | |

HNF1B | 1.6 | 53 | 1.8 | 15 | |

TFactS/ TRANSFAC-B | SPI1 | 2.1 | 3 | 2.0 | 1 |

TED/ TRANSFAC-B | MAX | 1.8 | 1.5 | 1.3 | 9.5 |

E2F1 | 1.4 | 7 | 1.4 | 8 |

We found that SPI1 was detected as a reproduced differential regulator by TFactS ( Table 5 ). The oncogenic TF SPI1 reportedly accelerates DNA replication and promotes genetic instability in the absence of DNA breakage in leukemia [24]. However, reports on the role of SPI1 in lung cancer development are rare. TED identified two TFs (MAX and E2F1) as reproducible in two independent lung cancer cohorts ( Table 5 ). Intriguingly, MAX inactivation in lung cancer disrupts the MYC-SWI/SNF program, and an aberrant MYC-SWI/SNF network is essential for lung cancer development [25]. Another TF, E2F1, is required for GCN5 (a lysine acetyltransferase that generally regulates gene expression) to mediate lung cancer cell growth and promote the proliferation of a lung cancer cell line [26]. These additional evidences from literature indicate that the repetitively identified differential regulators are highly likely causal to lung cancer development.

Other than the three TFs (SPI1, MAX, and E2F1) discussed above, some other TFs in Table 5 may also be worth noting for follow-up investigation. According to a cancer gene compendium NCG v4.0 [27], *GLI1*, *ZIC3*, *TCF3*, and *HNF1B* are either known or candidate cancer genes, but existing studies have not linked them to lung cancer yet. Four TFs repeatedly identified by TFactS, GTF2I, GLI1, ZIC1, and ZIC3, were also accredited by TED in either the Lung-I or Lung-II dataset. These highlighted TFs likely have more pathogenic potential in the development of lung cancer.

## 5 DISCUSSION

In this article, we compared four different network inference algorithms—ARACNE, CLR, MRNET and RN—with respect to their performance. For this comparison, we used several local network-based measures in combination with ensemble simulations allowing a detailed analysis down to the level of individual edges. This is the highest resolution achievable. The main purpose of our investigation was not only to reveal the general performance of these methods with respect to the studied novel measures but also to gain insights into a possible bias of these methods. For example, our finding that repressor edges are easier to infer for all four algorithms than activator edges means that the regulatory networks inferred by these methods do systematically discriminate activating interactions. Hence, an interpretation of the inferred networks in biological terms should take this bias into account to avoid spurious conclusions that are in fact induced by the working mechanism of the employed method. We found similar results for network motifs consisting of three genes. Also for these measures all four algorithms behaved largely the same. This is different for the measure *D*_{s}. Only ARACNE and CLR showed a significant dependence on *D*_{s}.

Application of our simulation results for ARACNE allowed to make a prediction about the expected number of regulatory interactions in human B cells. Extending this discussion we can also formulate a hypothesis about direct interaction partners of Myc as presented in Basso *et al.* ( 2005). Based on our results presented in Table 2, we hypothesize that these targets are likely to form leaf nodes in the underlying regulatory network. This means that many of these targets may only interact with Myc but no other gene products. This would make these genes peripheral in this network with respect to information processing because they represent so to say one-way streets. More interestingly, because they directly connect with Myc, this transcription factor may also not form a central component of the information processing because it is generally assumed that gene networks are organized hierarchically. Hence, either regulatory networks are hierarchically organized, then the closeness of Myc to leaf genes suggests its decentral character, or Myc is central suggesting either a non-hierarchical organization of the network or the existence of genes that behave non-hierarchical in an otherwise hierarchically organized system. If the latter case is true this might be an indicator of a network characteristics that remained so far covert.

A further prediction that we can make relates to the direction of interactions. Again, based on our results in Table 2, we predict that edges should be oriented toward leaf genes making, e.g. Myc the source of outgoing edges. Due to the fact that Myc is a transcription factor, this appears compelling. However, we want to emphasize that our methods employed in this article are not familiar with the semantics of *transcription factor*, making such a prediction not straight forward for a theoretical method.

For the experimental design of the inference of regulatory networks from expression data (Margolin and Califano, 2007) follow at least two suggestions from our results. First, despite the fact that we studied four different inference methods that have been introduced separately, we demonstrated, by usage of motif statistics ( Table 1), that they behave quantitatively similar. A possible explanation may be that these methods have in common to be based on *bivariate* estimates of MI, not considering higher orders thereof. For this reason, it seems sensible to study multivariate informations in combination with our measures to improve certain regulatory combinations that are by current methods systematically discriminated. Although the extension to motifs involving more than three genes is straight forward, the interpretation of these results guiding the design of *n*-variate information may be intricate. For this reason, we suggest focusing first on three-gene motifs and corresponding information measures. Second, it would be interesting to study differences between observational and interventional data with respect to, e.g. motif statistics. Specifically, it would be beneficial for future experiments to understand what *parts* of the regulatory network can or cannot be inferred well from observations data only, respectively, from interventional data, to identify an optimal combination of both data types balancing performance and economic constraints. Hence, our study may contribute complementing recent results from DREAM (Stolovitzky and Califano, 2007 Stolovitzky *et al.*, 2009) by introducing a novel network-based perspective that may not only help to evaluate methods but also to guide the design of novel statistical estimators.

This discussion underlines the importance of sound simulation studies in order to obtain meaningful interpretations of the inferred networks. Also, as demonstrated with our discussion about Myc and human B cells, such studies enable practical predictions and intriguing hypotheses about the intricate working mechanism of biological pathways and their underlying information processing.

## Modularity score comparison between different algorithms in network analysis - Biology

I used different algorithms for splitting my large gene network into different sub-networks using Cytoscape. I compared the modularity scores and decided that the algorithm with the best modularity score will be used for clustering the network.

My colleague came up with the argument that modularity scores can only be compared for different number of clusters being created by the same algorithm.

So, my question is if modularity scores can be used to compare between different algorithms or only in the case of the same algorithm producing different number of clusters.

The modularity score of a graph is the sum over all clusters of the number of edges in a cluster minus the number of edges expected by chance in the cluster. However, there are a few different ways of computing it (essentially due to how one defines edges expected by chance) but once a given definition of modularity is chosen, the modularity score of a graph only depends on the choice of clusters in the graph. What this means is: choose a definition of modularity then compute it for every partition of the graph you're interested in (possibly using different clustering algorithms). However, there are already algorithms that find the clusters that maximize the modularity so in principle, clustering algorithms that optimize another objective function are not expected to give the best modularity (unless of course they end up with the same partitioning of the graph).

Login before adding your answer.

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

## Introduction

Identification of disease-associated genes is an important step toward enhancing our understanding of the cellular mechanisms that drive human diseases, with profound applications in modeling, diagnosis, prognosis, and therapeutic intervention [1]. Genome-wide linkage and association studies (GWAS) in healthy and affected populations identify chromosomal regions potentially containing hundreds of candidate genes possibly associated with genetic diseases [2]. Investigation of these candidates using experimental methods is an expensive task, thus not always a feasible option. Consequently, computational methods play an important role in prioritization and identification of the most likely disease-associated genes by utilizing a variety of data sources such as gene expression [3, 4], functional annotations [4–7], and protein-protein interactions (PPIs) [3, 8–14]. The scope of methods that rely on functional annotations is limited because only a small fraction of genes in the genome are currently annotated.

In recent years, several algorithms have been proposed to incorporate topological properties of PPI networks in understanding genetic diseases [3, 8, 13]. These algorithms mostly focus on prioritization of candidate genes and mainly exploit the notion that the products of genes associated with similar diseases have a higher chance of being connected in the network of PPIs. However, an important challenge for these applications is the incomplete and noisy nature of the PPI data [15]. Missing interactions and false positives affect the accuracy of methods based on local network information such as direct interactions and shortest distances. Few global methods based on simulation of information flow in the network (*e.g*., random walks [8, 13] or network propagation [14]) get around this problem to a certain extent by considering multiple alternate paths and whole topology of PPI networks. Nevertheless, as we demonstrate in this paper, these methods favor genes whose products are highly connected in the network and perform poorly in identifying loosely connected disease genes.

In this study, we propose novel statistical adjustment methods to correct for degree bias in information flow based disease gene prioritization. These methods aim to assess the statistical significance of the network connectivity of a candidate gene to known disease genes. For this purpose, we use three reference models that take into account the degree distribution of the PPI network: (*i*) reference model based on degree distribution of known disease gene products, (*ii*) reference model based on the degree of candidate gene products, and (*iii*) likelihood ratio test using eigenvector centrality as the reference model.

We present comprehensive experimental results demonstrating that the proposed statistical adjustment methods are very effective in detecting loosely connected disease genes which are generally less studied, thus potentially more interesting in terms of generating novel biological knowledge. However, we observe that these methods might perform less favorably in identifying highly connected disease genes. Consequently, we develop three uniform prioritization methods that effectively integrate existing algorithms with the proposed statistical adjustment methods, with a view of delivering high accuracy irrespective of the network connectivity of target disease genes. These methods choose the measure to rank candidate genes (raw scores vs. statistically adjusted scores), based on several criteria that take into account the network degree of candidates. Finally, we present comprehensive experimental results in the Results section. These results show that the resulting prioritization methods, implemented in Matlab as a suite called D A D A , outperform existing approaches in identifying disease-associated genes.

## Conclusion

In this paper, we propose a phylogenetic framework for analyzing modularity in protein-protein interaction networks. Our approach is motivated by the premise that biomolecular interactions and their modularity are likely to provide direct functional information on the evolution of biological systems. We also develop a method based on the simulation of network evolution to evaluate phylogenetic tree reconstruction methods. Comprehensive experimental results on simulated, as well as real data show that our algorithm is highly successful in reconstructing the underlying phylogenies based on PPI networks, is quite robust to noise, and performs significantly better than existing network-based phylogeny reconstruction algorithms on available protein-protein interaction data. These results demonstrate the promise of modularity-based approaches in comparative network analysis and motivate the study of the evolution of network modularity within a phylogenetic framework.

## Modularity score comparison between different algorithms in network analysis - Biology

Theodosiou et al. BMC Res Notes (2017) 10:278 DOI 10.1186/s13104-017-2607-8

BMC Research Notes Open Access

NAP: The Network Analysis Profiler, a web tool for easier topological analysis and comparison of medium‑scale biological networks Theodosios Theodosiou1†, Georgios Efstathiou1†, Nikolas Papanikolaou1, Nikos C. Kyrpides2, Pantelis G. Bagos3, Ioannis Iliopoulos1* and Georgios A. Pavlopoulos1,2*

Abstract Objective: Nowadays, due to the technological advances of high-throughput techniques, Systems Biology has seen a tremendous growth of data generation. With network analysis, looking at biological systems at a higher level in order to better understand a system, its topology and the relationships between its components is of a great importance. Gene expression, signal transduction, protein/chemical interactions, biomedical literature co-occurrences, are few of the examples captured in biological network representations where nodes represent certain bioentities and edges represent the connections between them. Today, many tools for network visualization and analysis are available. Nevertheless, most of them are standalone applications that often (i) burden users with computing and calculation time depending on the network’s size and (ii) focus on handling, editing and exploring a network interactively. While such functionality is of great importance, limited efforts have been made towards the comparison of the topological analysis of multiple networks. Results: Network Analysis Provider (NAP) is a comprehensive web tool to automate network profiling and intra/internetwork topology comparison. It is designed to bridge the gap between network analysis, statistics, graph theory and partially visualization in a user-friendly way. It is freely available and aims to become a very appealing tool for the broader community. It hosts a great plethora of topological analysis methods such as node and edge rankings. Few of its powerful characteristics are: its ability to enable easy profile comparisons across multiple networks, find their intersection and provide users with simplified, high quality plots of any of the offered topological characteristics against any other within the same network. It is written in R and Shiny, it is based on the igraph library and it is able to handle medium-scale weighted/unweighted, directed/undirected and bipartite graphs. NAP is available at http:// bioinformatics.med.uoc.gr/NAP. Keywords: Network biology, Network topology, Node and edge ranking, Centralities, Network comparison

*Correspondence: [email protected] [email protected] † Theodosios Theodosiou and Georgios Efstathiou contributed equally to this work 1 Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete Medical School, 70013 Heraklion, Crete, Greece 2 Joint Genome Institute, Lawrence Berkeley Lab, United States Department of Energy, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA Full list of author information is available at the end of the article © The Author(s) 2017. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/ publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Theodosiou et al. BMC Res Notes (2017) 10:278

Introduction Metabolic reactions, signal transduction, gene expression, gene regulation, protein interactions and other biological concepts are often captured in network representations showing individual bioentities as nodes and their interconnections as edges. Each network is characterized by a different topology. In small-world networks for example, any node in the graph can be reached from any other node in a small number of steps. In scale-free networks, highly connected nodes can be identified as hubs. Networks with densely connected neighborhoods have high clustering coefficient and tend to form clusters. In social networks, the robustness is sensitive upon edges with high betweenness centrality, necessary to bridge distant communities. Protein–protein interaction networks (PPIs) are captured as undirected connected graphs following a scale-free topology with hierarchical modularity [1, 2]. While existing visualizations often comply with topological network analysis [3–6], only few of them purely focus on topological analysis, comparison and edge/node ranking. Cytoscape’s [7] Network Analyzer [8] as well as Gephi [9], offer similar functionality but do not support direct comparison between topological features of multiple networks. ZoomOut [10] and Network Analysis Toolkit (NEAT) [11] on the other hand are mostly focused on graph clustering. Stanford Network Analysis Platform (SNAP) [12] and igraph [13] offer a wide spectrum of functions and modules related to topological analysis but are offered as command line libraries, thus making them less accessible to non-experts. To overcome these barriers, we offer NAP, a modest web application, dedicated to make network topological analysis and inter/intra-network topological comparison simpler and more appealing to the broader community. Main text The GUI

NAP comes with a self-explanatory web interface, organized in several tabs. Upload file tab

It is dedicated to file uploading and network naming (Fig. 1a). Once one or more networks have been uploaded, three sub-tabs will appear. In the first sub-tab, users can see the network as a binary list in the form of searchable tables (Fig. 1b), in the second sub-tab a static visualization of the network and in the third sub-tab an interactive network visualization (Fig. 1c). Topology tab

The second tab is dedicated to network topological analysis. Once one or more networks are loaded, users can interactively choose between several topological features.

While, here, users can explore one network at a time, in a second sub-tab users can automatically generate an internetwork topological analysis plot in order to directly compare one or more networks. Examples of these cases can be depicted in Fig. 1d, e. Ranking tab

This part is dedicated to node and edge ranking. Users can interactively choose between several node and edge topological features and sort the relevant nodes/edges accordingly. Moreover, users can plot the distribution of any topological feature of a network against any other and visualize it in a matrix-like plot. Examples are presented in Fig. 1f, g. Clustering tab

This tab is dedicated to network clustering. While NAP is not intended to be a network clustering application, MCL Markov Clustering is incorporated [14]. This way, user can cluster medium-sized networks (Fig. 1h). Intersection

This tab is dedicated in calculating the intersection between any pair of selected networks. Results are shown as Venn diagrams and can an export function to download the intersecting network is offered (Fig. 1i). Input file

NAP supports loading of multiple weighted/unweighted, directed/undirected and bipartite graphs. Each network can be loaded as a two-column binary list of connections as a tab delimited text file. After uploading, users must manually give a name and define the type of each network. In addition, random networks of various sizes (100, 1000, 10,000 nodes) and types (Barabási–Albert, Erdos–Renyi, Watts–Strogatz small-world and bipartite graphs) can be automatically generated and used as examples. Notably, NAP currently accepts networks of up to 50,000 edges. For this article, we used two yeast protein–protein interaction (PPI) networks: Gavin 2006 [15] and Gavin 2002 [16], the first consisting of 6531 edges and 1430 vertices and the second consisting of 3210 edges and 1352 vertices. For the first dataset, large-scale tandem affinity purification and mass spectrometry were used to characterize multiprotein complexes in Saccharomyces cerevisiae whereas the second dataset shows the first genome-wide screen of complexes in Yeast. Basic visualization

Nodes and edges can be presented as dynamic, easy to filter, excel-like tables, as well as static and dynamic 2D network visualizations. Tables are sortable by name and searchable using simple substring matching.

Theodosiou et al. BMC Res Notes (2017) 10:278

Fig. 1 NAP’s web interface. a Users can upload several networks in the form of a list (pairwise connections) and subsequently name them. Users can also generate graphs of various sizes (50, 100, 1000, 10,000) based on the Barabási–Albert, Erdos–Renyi or Watts–Strogatz small-world model. Additionally, users can generate bipartite graphs of various sizes. b Network contents in the form of searchable and sortable tables. c-left Static network visualization. c-right Interactive Cytoscape.js network visualization. d Selection of topological features and their values. e Inter-network comparisons of topological features. f Node/edge ranking in the view of searchable tables. g Intra-network topological feature comparison in the form of a matrix. h Implementation of MCL clustering algorithm. i Intersection of any two chosen networks

Theodosiou et al. BMC Res Notes (2017) 10:278

While NAP is not designed to be a visualization tool, its 2D static network visualization comes with a plethora of traditional layout algorithms (Random, Circle, Sphere, Fruchterman–Reingold, Reingold–Tilford, Kamada– Kawai, Grid, Lgl and SVD). After a completed layout, nodes and their coordinates, along with their connections can be exported as simple text files and imported to other, more advanced visualization tools [3–6].

at-a-glance view of the loaded network. Notably, NAP’s visualization cannot scale very well due to browser limitations but is fair for middle-sized networks. For higher quality visualization, graph editing, manipulation and interactive network exploration, users are encouraged to use other available tools such as Cytoscape and Gephi. The input file format for NAP, Cytoscape and Gephi is the same (2 column tab delimited file).

NAP utilizes CytoscapeWeb/Cytoscape.js [17, 18]. to additionally provide a dynamic network visualization. Users can interactively zoom in/out, relocate the nodes and select them and choose between various edge/node colors and shapes and among very standard graph layouts. We chose to provide both static and dynamic visualization at a basic level so that the user can get an

NAP is able to calculate several topological features for a selected network taken from the igraph library. While in igraph’s manual pages one can find more detailed information about the calculations, most formulas and definitions are also explained in [19]. Table 1 summarizes a simplified explanation of NAP’s aforementioned metrics.

Table 1 NAP’s supported topological features and their explanation Topological feature

Shows the number of edges in the network. Moderate network of several thousand connections are very acceptable

Shows the number of nodes in the network. There is no limitation on the number of nodes

Shows the length of the longest geodesic. The diameter is calculated by using a breadth-first search like method. The graph-theoretic or geodesic distance between two points is defined as the length of the shortest path between them

The eccentricity of a vertex is its shortest path distance from the farthest other node in the graph. The smallest eccentricity in a graph is called its radius. The eccentricity of a vertex is calculated by measuring the shortest distance from (or to) the vertex, to (or from) all vertices in the graph, and taking the maximum

The density of a graph is the ratio of the number of edges and the number of possible edges

Shows the number of edges in the network. If the has more than 10,000 edges it will take into account the first 10,000

The average number of steps needed to go from a node to any other

A metric to show if the network has the tendency to form clusters

This function calculates how modular is a given division of a graph into subgraphs

How many nodes are connected to themselves

The eccentricity of a vertex is its shortest path distance from the farthest other node in the graph

Average eigenvector centrality It is a measure of the influence of a node in a network Assortativity degree

The assortativity coefficient is positive is similar vertices (based on some external property) tend to connect to each, and negative otherwise

Is directed acyclic graph

It returns True (1) or False (0)

It returns True (1) or False (0) depending whether the edges are directed or not

It returns True (1) or False (0) depending whether the graph is bipartite or not

It returns True (1) or False (0). A graph is chordal (or triangulated) if each of its cycles of four or more nodes has a chord, which is an edge joining two nodes that are not adjacent in the cycle. An equivalent definition is that any chordless cycles have at most three nodes

Average number of neighbors

How many neighbors each node of the network has on average

It is an indicator of a node’s centrality in a network. It is equal to the number of shortest paths from all vertices to all others that pass through that node. Betweenness centrality quantifies the number of times a node acts as a bridge along the shortest path between two other nodes

It measures the speed with which randomly walking messages reach a vertex from elsewhere in the graph

It is defined as the number of links incident upon a node

It calculates the minimum st-cut between two vertices in a graph The minimum st-cut between source and target is the minimum total weight of edges needed to remove to eliminate all paths from source to target

Use of igraph to searches a graph for motifs of size 3

Use of igraph to searches a graph for motifs of size 4

Theodosiou et al. BMC Res Notes (2017) 10:278

Fig. 2 Direct comparison of the topological features of two yeast protein–protein interaction datasets. a Gavin 2002 dataset [16] consists of 3210 edges and 1352 vertices, whereas Gavin 2006 [15] consists of 6531 edges and 1430 vertices. b Comparison of the networks’ clustering coefficient, density, closeness, betweenness and degree

Theodosiou et al. BMC Res Notes (2017) 10:278

Inter‑network topological feature comparison

Selected topological features of a single network can be visualized as a multi-column bar chart. This way, a user can for example, see the average closeness centrality, the average clustering coefficient and the average shortest path length of the whole graph as numerical values or as a bar chart. Notably, the chart is dynamic and gets automatically updated upon a selection set of features. When users want to directly compare one or more networks, a combined bar chart with adjusted colors indicating the selected networks, can capture the average topological features of all selected networks next to each other. For example, a straight comparison of the aforementioned yeast protein–protein interaction datasets is presented in Fig. 2. While both networks significantly vary in the number of edges, as shown in Fig. 2a, and despite the fact that they have similar density, they have significantly different clustering coefficient as shown in Fig. 2b. This way Gavin 2006 dataset tends to form tighter clusters compared to Gavin 2002. Intra‑network topological feature comparison

Users can select one network at a time and see the distribution of each topological metric. Figure 3a, b for example show the degree distribution for Gavin’s 200 and 2002 PPI network respectively.

In addition, users have the ability to generate a distribution plot showing any topological feature against any other within a selected network. A high-resolution 2D scatterplot is generated on the fly, displaying the distribution of a chosen topological parameter in a histogramlike view. Should the user desire to explore more than one topological parameter at a time, NAP gives the user the opportunity to generate on-the-fly advanced plots by pairwise comparing any topological feature of a network against any other feature within the same network. This matrix-like plot showing pairwise correlations of any combination between the selected topological features is not limited to the number of features to be plotted. The upper triangular part of the plot shows the numerical correlation between any pair of topological features whereas the lower-triangular part of the matrix the scatterplot of one feature against another. The diagonal shows the topological feature which corresponds to that column and row. Like before, two all-against-all plots comparing the degree, the clustering coefficient, the closeness and the betweenness centrality of Gavin 2002 and 2006 PPI datasets are shown in Fig. 3c, d respectively. Notably, figures can be downloaded as jpeg from the browser while scatter plot coordinates can now be downloaded as CSV files and visualized by external applications like Excel or STATA.

Fig. 3 Intra-network comparison of selected topological features within the Gavin 2002 yeast PPI dataset [16]. a The degree distribution for Gavin 2002 dataset. b The degree distribution for Gavin 2006 dataset. c An all-against-all distribution matrix comparing the degree, the closeness, the betweenness and the clustering coefficient for Gavin 2002 PPI network. d An all-against-all distribution matrix comparing the degree, the closeness, the betweenness and the clustering coefficient for Gavin 2006 PPI network

Theodosiou et al. BMC Res Notes (2017) 10:278

Nodes and edges of a selected network (accessible as a drop-down menu) can be sorted according to a preferred topological feature and using dynamic easy-tofilter excel-like tables. Nodes and edges can be sorted in both descending and ascending order. Figure 4a for example shows the proteins of Gavin 2006 PPI network sorted in descending order according to their degree. It is obvious that PWP2 (YCR057C) protein, a conserved 90S pre-ribosomal component essential for proper endonucleolytic cleavage of the 35 S rRNA precursor at A0, A1, and A2 sites is the protein with most connections. Similarly, Fig. 4b shows that the connection between SEC8 (YPR055W) and RPC17 (YJL011C) has the highest betweenness centrality, thus making a very important connection as it acts as a bridge connecting different neighborhoods.

While NAP is not a clustering visualization tool, MCL Markov clustering algorithm has been implemented (Fig. 1h). Users can select a network and adjust the inflation value of MCL. A two-column searchable matrix will be generated showing the node name and the cluster each node belongs two. This way, users can easily find whether two nodes belong in the same cluster or not. This feature is recommended for small and medium-size networks and must be avoided for larger networks. For a deeper clustering analysis, users are encouraged to users command line tools or try the ClusterMaker Cytoscape plugin [20]. Intersection

Users can automatically find the intersection between any pair of selected networks. Once two networks have

Fig. 4 Node and edge ranking. a Proteins of the Gavin 2006 PPI datasets are sorted according to their degree. PWP2 (YCR057C) protein has many neighbors and might behave as hub. b Interactions of the Gavin 2006 PPI datasets are sorted according to their betweenness centrality. Edge between SEC8 (YPR055W) and RPC17 (YJL011C) behaves as a bridge between communities

Theodosiou et al. BMC Res Notes (2017) 10:278

been selected, two Venn diagrams will be generated showing the node and the edge overlap between the two selected networks. In order to visualize the intersecting part of the networks, users can download the network in a CVS format and import it to third-party applications such as Cytoscape or Gephi. Figure 5 shows an example of how to find the intersection between Gavin 2002 and Gavin 2006 PPI datasets. Bipartite graphs

NAP is able to manage bipartite graphs. Given a bipartite graph, users can automatically extract its two monopartite projections and analyze them separately. In a gene– disease bipartite graph for example, one can generate a disease–disease network through common genes and vice versa, a gene–gene network through common diseases. Implementation

NAP’s web interface is written in Shiny and back-end functions implemented in R. Topological features are calculated with the use of igraph-R library [13] and plots are generated through R and plotly [21]. Static network visualizations are offered by the d3 library whereas dynamic

network visualization is provided by CytoscapeWeb/ Cytoscape.js [17, 18].

Discussion Network Analysis Provider (NAP) is designed to complement existing state-of-the-art visualization and analysis tools. It emphasizes on topological network analysis and inter-/intra-network topological feature comparison. Overall, we believe that NAP can reach users beyond the broader network analysis community and aid nonexperts in analyzing their networks in a simplified and highly interactive way. Limitations NAP runs on a browser and therefore, it is not optimized for large-scale networks. NAP’s future versions will include a much richer and optimized set of clustering algorithms [22], richer motif extraction algorithms, network alignment methods such as Corbi [23] and GraphAlignment [24], more scalable visualization, user account profiles to store and load the networks, incorporation of Arena3D [25, 26] for 3D multilayered network visualization and better handling of bipartite graphs taking into account their special topological properties.

Fig. 5 NAP’s functionality to find the intersection between ant pair of selected networks. a Gavin 2006 and 2002 PPI datasets visualized by Cytoscape 3.4.0 using the Prefuse layout. b NAP’s generated Venn diagrams showing the overlapping nodes and edges of the two networks. c NAP’s intersection export function and visualization with Cytoscape

Theodosiou et al. BMC Res Notes (2017) 10:278

Authors’ contributions GAP conceived the concept, designed the analysis process and wrote the article, TT and GE implemented the tool, NP enriched the UI of the article, PGB was the main actor behind the bipartite analysis, NCK and II provided user feedback. All authors read and approved the final manuscript. Author details 1 Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete Medical School, 70013 Heraklion, Crete, Greece. 2 Joint Genome Institute, Lawrence Berkeley Lab, United States Department of Energy, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA. 3 Department of Computer Science and Biomedical Informatics, University of Thessaly, Papasiopoulou 2‑4, Galaneika, 35100 Lamia, Greece. Acknowledgements Not applicable. Competing interests The authors declare that they have no competing interests. Availability of data and materials The datasets used and/or analyzed during the current study are available from the corresponding author upon request. Funding This work was supported by the US Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, under Contract Number DE-AC0205CH11231 and used resources of the National Energy Research Scientific Computing Center, supported by the Office of Science of the US Department of Energy.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Received: 16 February 2017 Accepted: 7 July 2017

References 1. Koschutzki D, Schreiber F. Centrality analysis methods for biological networks and their application to gene regulatory networks. Gene Regul Syst Bio. 20082:193–201. 2. Yook SH, Oltvai ZN, Barabasi AL. Functional and topological characterization of protein interaction networks. Proteomics. 20044(4):928–42. 3. Gehlenborg N, O’Donoghue SI, Baliga NS, Goesmann A, Hibbs MA, Kitano H, Kohlbacher O, Neuweger H, Schneider R, Tenenbaum D, et al. Visualization of omics data for systems biology. Nat Methods. 20107(3 Suppl):S56–68. 4. Pavlopoulos G, Iacucci E, iliopoulos I, Bagos P. Interpreting the omics ‘era’ data. In: Multimedia services in intelligent environments, vol. 25. New York: Springer International Publishing 2013. p. 79–100. 5. Pavlopoulos GA, Malliarakis D, Papanikolaou N, Theodosiou T, Enright AJ, Iliopoulos I. Visualizing genome and systems biology: technologies, tools, implementation techniques and trends, past, present and future. Gigascience. 20154:38. 6. Pavlopoulos GA, Wegener AL, Schneider R. A survey of visualization tools for biological network analysis. BioData Min. 20081:12. 7. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 200313(11):2498–504. 8. Doncheva NT, Assenov Y, Domingues FS, Albrecht M. Topological analysis and interactive visualization of biological networks and protein structures. Nat Protoc. 20127(4):670–85.