PHI-Nets: A Network Resource for Ascomycete Fungal Pathogens to Annotate and Identify Putative Virulence Interacting Proteins and siRNA Targets

Interactions between proteins underlie all aspects of complex biological mechanisms. Therefore, methodologies based on complex network analyses can facilitate identification of promising candidate genes involved in phenotypes of interest and put this information into appropriate contexts. To facilitate discovery and gain additional insights into globally important pathogenic fungi, we have reconstructed computationally inferred interactomes using an interolog and domain-based approach for 15 diverse Ascomycete fungal species, across nine orders, specifically Aspergillus fumigatus, Bipolaris sorokiniana, Blumeria graminis f. sp. hordei, Botrytis cinerea, Colletotrichum gloeosporioides, Colletotrichum graminicola, Fusarium graminearum, Fusarium oxysporum f. sp. lycopersici, Fusarium verticillioides, Leptosphaeria maculans, Magnaporthe oryzae, Saccharomyces cerevisiae, Sclerotinia sclerotiorum, Verticillium dahliae, and Zymoseptoria tritici. Network cartography analysis was associated with functional patterns of annotated genes linked to the disease-causing ability of each pathogen. In addition, for the best annotated organism, namely F. graminearum, the distribution of annotated genes with respect to network structure was profiled using a random walk with restart algorithm, which suggested possible co-location of virulence-related genes in the protein–protein interaction network. In a second ‘use case’ study involving two networks, namely B. cinerea and F. graminearum, previously identified small silencing plant RNAs were mapped to their targets. The F. graminearum phenotypic network analysis implicates eight B. cinerea targets and 35 F. graminearum predicted interacting proteins as prime candidate virulence genes for further testing. All 15 networks have been made accessible for download at www.phi-base.org providing a rich resource for major crop plant pathogens.


Figure 1 A simple graph with three communities.
Nodes and links are respectively depicted by circles and straight lines. Each dashed circle represents community structure (module).

Community structure detection
The community structure of the main connected component was identified by means of the greedy agglomerative algorithm knows as Louvain method (Blondel et al., 2008), Figure 2. The method has the advantage of being very fast and accurate despite its greedy nature. The algorithm consists of two main phases: a modularity optimisation and a community aggregation. In the first phase, each node represents an individual module. Then, successive iteration takes place over all nodes to verify which vertices should be connected to increase the modularity. The process is repeated until no further improvement in the modularity can be obtained. In the second phase of the algorithm, a new network is formed where the communities that have been established in the first phase become nodes in the new network and the links between those nodes are given the weight which is a sum of the weights of the links that join the two corresponding communities. Also, links between nodes in the same community become self-loops for this community in the new network. The step is repeated until no further gain in the modularity is achievable. As a result, it is possible to attain the best partitions of the initial network into communities.
Thus, looking at the network example depicted in Figure 2 (Blondel et al., 2008), 13 nodes (illustrated in light blue colour) represent 13 communities in the initial network. After modular optimisation and community aggregation, a new network of four nodes (green, blue, red, and light blue) is created. Then, both phases of the algorithm are repeated on the created network of four nodes. As consequence of the second pass of the algorithm, green and blue nodes fall into one community. Similarly, the red and light blue nodes becoming part of the second community of the third network. This is because a weight on the link between green and blue node is higher (4) comparing to a weight on the link between green and other two nodes (1), just as a weight on the link between red and light blue node (3) comparing to a weight on the link between red and green nodes (1). Thus, the blue node becomes a part of green node community and red node becomes a part of light blue node community leading to a weighted link between newly created communities equal to 3 (summary of links between green and light blue nodes, green and red nodes, and blue and light blue nodes). Finally, no further improvement in the modularity of a newly created third network could be obtained.

Figure 2 Visualisation of the steps of Louvain algorithm.
Where red, blue, green and light blue nodes indicate four different communities (modules). Weights of links between new nodes are the sum of the weight of the links between nodes in the corresponding two communities. Weight on new nodes is the number of self-loops calculated as links between nodes in the same community. For example links for the blue community: L={(3,7), (7,3), (7,6), (6,7)} account for 4 self-loops after 1st pass of the algorithm (Image copied from Blondel et al., 2008).
Node role assignment -cartography analysis A node role is characterised according to two measures adopted from the Guimera and Nunes Amaral (2005) study: within-community degree z-score and participation coefficient P. Degree z-score measures the connectivity of the node to members of the same community (module), whereas participation coefficient likewise measures its connectivity to members of other communities (modules) relative to its own module. The high value of a z-score indicates the high within-cluster node degree. The participation coefficient of a node is close to 1 if the links from a node are equally distributed among all clusters and is equal to 0 if all links of a node are within its own cluster.
The node classification scheme applied in this work was defined previously (Guimera and Nunes Amaral, 2005) and can be summarised as follows. Based on the region in a parameter space of z-score and participation coefficient, nodes are categorised as hubs (with a higher number of links within its own community, z ≥ 2.5) and non-hubs (z < 2.5). Non-hubs nodes are further assigned four different roles: R1 -ultra-peripheral node (with all links within its community (module), P ≈ 0), R2 -peripheral node (if node has at least 60% its links within-community, 0.05 < P <0.625), R3 -non-hub connector node (has half of its links, or at least 2 links, whichever is larger, withincommunity, 0.625 <P < 0.8) R4 -non-hub kinless node (if a node has 35% of its links within-community, P > 0.8). Such nodes cannot clearly be assigned to one community. Thus, the node has links homogeneously spread among all communities.
The hub nodes, however, are divided into further three categories: R5 -provincial hub (hub node with the great majority, at least 80%, of links within its community, P < 0.3), R6 -connector hub (hub with many links to other clusters and at least half of its links within-community, 0.3 < P < 0.75) R7 -global kinless hub (hub with links homogeneously spread among all clusters and fewer than half its links within-community, P > 0.75). As per R4 this identified that such nodes cannot clearly be assigned to one community.
Chi-square test of association H0 hypothesis -there is no association between the node position in the network and its effect on the pathogenic lifestyle