DSNetwork: An Integrative Approach to Visualize Predictions of Variants’ Deleteriousness

One of the most challenging tasks of the post-genome-wide association studies (GWAS) research era is the identification of functional variants among those associated with a trait for an observed GWAS signal. Several methods have been developed to evaluate the potential functional implications of genetic variants. Each of these tools has its own scoring system, which forces users to become acquainted with each approach to interpret their results. From an awareness of the amount of work needed to analyze and integrate results for a single locus, we proposed a flexible and versatile approach designed to help the prioritization of variants by aggregating the predictions of their potential functional implications. This approach has been made available through a graphical user interface called DSNetwork, which acts as a single point of entry to almost 60 reference predictors for both coding and non-coding variants and displays predictions in an easy-to-interpret visualization. We confirmed the usefulness of our methodology by successfully identifying functional variants in four breast cancer and nine schizophrenia susceptibility loci.

In this context, we created a web application called DSNetwork for D ecision S upport N etwork. This tool aims to provide the users with deleteriousness predictions for human variants (hg19 build) recovered from several sources, and to present these scores in a user-friendly web interface.
The following paragraphs describe DSNetwork's approach through the hypothetical analysis of a loci containing 5 variants rs4233486, rs35054111, rs11808410, rs11804913 and rs7554973 using the deleteriousness scores of 5 distinct fictive predictors namely A, B, C, D and E. Table 1 summarizes the scores generated by these 5 predictors, reflecting their predictions regarding the functional impacts of the candidate variants.
DSNetwork integrates the characteristics of the different predictors and creates a reference frame containing the lower and upper boundaries as well as the direction (ascending [ASC], or descending [DESC]) of their prediction scores ( Figure 1). The direction is used to rank variants from the most deleterious to the least deleterious on the basis of their respective scores. The boundaries are used to establish the absolute deleteriousness level of each variant.

Figure 1: Predictor reference frame
Once the different reference frames are integrated, they can be used to prioritize the variants according to 3 types of representations: the intra-predictor relative ranks, the intra-predictor absolute scores and the global ranks.
Here are the scores obtained for the 5 candidate variants for these 5 approaches: Intra-predictor ranks Intra-predictor ranks allow the prioritization of a list of variants relative to one another. According to the reference frames illustrated in Figure 1, the 5 predictors produce scores ranging from 0 to 1. We can classify the 5 variants of interest from the most deleterious (rank 1) to the least deleterious (rank 5) with each predictor (  In order to summarize this information in an easy-to-interpret representation, each variant is depicted as a pie chart where each slice represents the rank of the variant for one of the predictors. Thus, in the current analysis, five pie charts are generated and each pie chart is divided into five slices of the same size. The slices are ordered by predictor by default. We used a color gradient ranging from red to green, where red corresponds to the most deleterious variant (rank 1) among the candidates for a given predictor. The gray color represents missing data. Figure 3 depicts the pie charts generated for the five candidate variants. The slices can be ordered by color to allow easy identification of variants that appear the most deleterious across predictors.

Figure 3: Visualization of intra-predictor ranking
Intra-predictor absolute scores Intra-predictor absolute scores allow prediction of variant deleteriousness in reference to the thresholds established for a particular predictor. Given these boundaries, we can determine where each variant is located on the deleteriousness spectrum for each predictor. We chose to divide the score range of each approach into 20 equal intervals. The first interval contains the most deleterious scores and the 20th, the least deleterious. Thus, the annotation scores obtained for each variant are translated into their corresponding intervals. This allows the user to know if a variant is predicted as deleterious by a particular approach without having to know the implementation details of this approach. For clarity purposes, in this example the range of scores has been divided into 4 intervals (instead of 20) (Table 3). As for intra-predictor ranks, each variant is depicted as a pie chart where each slice represents the score interval of the variant for a particular predictor. We used a color gradient ranging from red to blue. The red color represents the most deleterious interval for a given predictor. The gray color represents missing data. Figure 4 depicts the pie charts generated for the five candidate variants. The slices can be ordered by color to easily identify variants with the most predictions of deleteriousness.

Figure 4: Visualization of intra-predictor scores intervals
Global ranking In order to further facilitate the prioritization, we propose to summarize the information regarding the relative ranks in an overall rank for each variant.
To do so, we calculate the average rank of each variant based on its intra-predictor ranks. Then, we order the variants according to their average rank. Variants with the lowest average ranks are considered as the best candidates for being deleterious. Because in some cases there may be missing values for some of the predictors when analysing a specific set of variants, we propose three strategies for calculating a consistent average rank which will be comparable between variants and which will take into account these missing values: 1) replace missing values with the average value (default one, Table 4); or 2) replace missing values with the median value (    As for the intra-predictor scores and ranks, the global ranks are made available for each variant under the form of a pie chart where the rank is represented by a color gradient ranging from red to green. The color red represents the most deleterious variant among the candidates for all approaches ( Figure 5).

Variants network
DSNetwork offers the possibility to simply visualize scores and linkage disequilibrium between variants in order to identify potential haplotypes. The scores are the nodes of the network and the LD between the different variants is represented by the links between the nodes. The level of LD is estimated via the r2 measure and represented in a color gradient ranging from yellow to red. The red representing a total imbalance is r2 = 1. The gray color represents the missing information ( Figure 6).

Conclusion
Considering the relative ranks, the two best candidate variants of our hypothetical analysis are rs4233486 and rs7554973. Indeed, depending on the substitution approach, these two variants are respectively first, first ex aequo or second among the 5 analysed variants. However, despite an apparent draw, rs4233486 could be the best candidate when taking into account the absolute score intervals. For three of the five approaches, rs4233486 is found in the most deleterious intervals. However, one cannot exclude the putative functional impact of rs7554973 with regard to its scores and the high LD with rs4233486.

Tool usage
The application is divided into 3 panels : 1) Input, 2) Selection, 3) Visualization. Once the variant list is uploaded, the "Fetch annotations" button will trigger the score retrieval process.

Input
Either paste your variants list in the text area ... or upload a text file containing a variant id per line.
Once your data has loaded, you may choose to trigger SNPNexus data retrieval. As the data needs to be retrieved from the server, a certain processing time should be expected. You can configure the waiting time (in minutes) through a slide bar.
Press "Fetch annotations" to trigger annotation retrieval.
An overview of the results is presented in a scatter plot representing the requested variants along the map of sequence constraint C ontext-D ependent T olerance S core (CDTS) -determined through alignment of genomes from thousands of individuals.
Legend : • Color code: selected variants , unselected variants • Shape code enables to distinguish variants according to their regulatory consequences. 7 different shapes are available. If there is more than 7 different annotations, the rarer annotations will all have the same shape, the square. By default, the best variants with regards to the overall global mean ranking are selected (up to 30).
You can download all the annotations by pressing "Download results (TSV)"

Selection
In the second panel, for consistency purposes, non-synonymous and regulatory variants are processed separately. Once the annotations are fetched, a summary table will appear in the selection panel. Without user specific selection, the first regulatory variants, up to a maximum of thirty, will be highlighted in the table and CDTS plot.
The summary table contains 6 columns: query, HGVS ids, CADD consequences and 3 columns containing overall global mean ranks (OGMR) according to the three substitution approaches.
⚠ Please notice that OGMR are computed by taking into account all the variants and all the available annotations and are not updated when some variants or annotations are excluded from the analysis in the following steps.
Users can change the type of variants to visualise through the dropdown list at the right of the selection panel.
Which will automatically update the CDTS plot and highlight the preselected coding variants.
You can also change the selected variants through this table. The number of selectable variants is restricted to 30 for ergonomic reasons.
You can use the reported global scores in the summary table to select a particular subset of variants. In the following picture, we selected regulatory variants which global rank is inferior to 5 with one of the substitution approaches.
Interactions with the network Once your mouse is the network area, you can use it to interact with the network in the following matter: -Scroll in or out to zoom -Drag and drop to move the network within the network area -Grab the right bottom border and drag to the desired width or height to adjust the size of the network -Double click on a variant pie chart to get variant annotation details -Right click in the network area and select the "Save image as" option to save the network image in "png" format Visualization parameters are available at the left of the Network panel and will be described from the top to the bottom.

Focus
This first parameter enables to automatically focus the network on a particular variant. This is a useful option when dealing with big networks. The "none" option restores the initial visualization.

Linkage disequilibrium
By default, no linkage disequilibrium (LD) data are shown. To map LD on the network edges, choose a 1000 Genomes population and press "Add LD information" in the left panel.
This process can take a few seconds (up to 1 minute) You can restrict the LD range you want to display after doing this, press "Update" to update the network.
You can use the "Between" option to restrict the LD information to a particular variant. Press "Update" to update the visualization.