Project name: XSwap
Project homepage:
Operating system(s): MacOS, Linux, Windows
Programming language: Python, C, C++
Other requirements: None
License: BSD 2-Clause
RRID: SCR_024802
biotools ID: xswap
AUROC ~ area under the receiver operating characteristic curve
PPI ~ protein-protein interaction
TF-TG ~ transcription factor-target gene
RWR ~ random walk with restart
This work was supported, in part, by Pfizer Worldwide Research, Development, and Medical.
{% set funders = {} -%} {%- for author in manubot.authors -%} {% for funder in author.get('funders', []) -%} {%- if funder in funders -%} {% set _ = funders[funder].append(author.initials) %} {%- else -%} {% set _ = funders.update({funder: [author.initials]}) %} {%- endif -%} {%- endfor -%} {%- endfor %} {% for funder, author_initials in funders.items() -%} {% if author_initials|length > 2 %} {{- ', '.join(author_initials[:-1]) }}, and {{ author_initials[-1] }} were funded by {{ funder }}. {% elif author_initials|length == 2 %} {{- ' and '.join(author_initials) }} were funded by {{ funder }}. {% else %} {{- author_initials[0] }} was funded by {{ funder }}. {% endif %} {%- endfor -%} The funders had no role in the study design, data analysis and interpretation, or writing of the manuscript.
Author contributions are noted here according to CRediT (Contributor Roles Taxonomy). {%- set roles = {} -%} {%- for author in manubot.authors -%} {% for role in author.get('roles', []) -%} {%- if role in roles -%} {% set _ = roles[role].append(author.initials) %} {%- else -%} {% set _ = roles.update({role: [author.initials]}) %} {%- endif -%} {%- endfor -%} {%- endfor %} {% for role, author_initials in roles.items() -%} {% if author_initials|length > 2 %} {{- role }} by {{ ', '.join(author_initials[:-1]) }}, and {{ author_initials[-1] }}. {% elif author_initials|length == 2 %} {{- role }} by {{ ' and '.join(author_initials) }}. {% else %} {{- role }} by {{ author_initials[0] }}. {%- endif -%} {%- endfor %}
The authors thank Blair Sullivan for her feedback on a draft of the manuscript.
Table: Applications of the modified XSwap algorithm to various network types with appropriate parameter choices. For simple networks, each node's degree is preserved. For bipartite networks, each node's number of connections to the other part is preserved, and the partite sets (node class memberships) are preserved. For directed networks, each nodes' in- and out-degrees are preserved, though parameter choices depend on the network being permuted. Some directed networks can include antiparallel edges or loops while others do not. {#tbl:xswap tag="S1"}
The performance of the XSwap algorithm depends on a number of network properties. We define network density to be the number of edges divided by the number of potential edges. Increasing network density lowers the asymptotic fraction of edges changed, as greater density prevents the algorithm from removing certain edges. Random graphs generated with a preferential attachment mechanism (via Barabási–Albert) can have a lower fraction of their edges swapped, asymptotically, as compared to uniform random graphs (via Erdős–Rényi).
{#fig:swap-percent width="100%" tag="S1"}
To approximate the edge prior, we began by making two simplifications. First, we assumed independence between node pairs. This assumption does not actually hold for the XSwap algorithm, though it is a reasonable simplification for large, sparse networks. Second, we assumed that the XSwap process is stationary. This assumption also does not actually hold, but it was made because it significantly simplifies the problem. A single node pair has two possible states, "edge" and "no edge". These states are not transient, and they are not periodic so long as more than one possible swap exists in the network. In almost all cases, then, our simplified model of the algorithm gives the state of a node pair as an ergodic process, independent of other node pairs.
\begin{align*} P^T &= \begin{bmatrix} 1-q & r \ q & 1-r \end{bmatrix} \end{align*}
The stationary distribution of this system should correspond to the distribution when the number of swaps goes to infinity.
It can be found by computing the eigenvectors of the system, as we know that the stationary distribution vector,
\begin{align*} \mathbf{v} = \frac{1}{r + q} \begin{bmatrix} r \ q \end{bmatrix} \end{align*}
The asymptotic edge probability is therefore
Since node pairs are being treated as independent, the probability of an edge being created in one successful iteration, given that the edge does not currently exist, is the ratio of the number of edge choices involving nodes
Similarly, the probability of an edge being eliminated in one iteration is the ratio of the number of edge choices involving
The approximate edge prior is, therefore,
Unfortunately, we found that the above edge prior approximation is a poor approximation in many cases. We found that the following modified form (introduced in Methods) affords a superior approximation:
\begin{equation} P_{i,j} = \frac{d(u_i) d(v_j)}{\sqrt{(d(u_i) d(v_j))^2 + (m - d(u_i) - d(v_j) + 1)^2}} \end{equation}
Interestingly, this expression can be derived by normalizing the eigenvector
Data | Network | Nodes | Edges |
Hetionet | AdG | Source: 402, Target: 20945 | 102240 |
AeG | Source: 402, Target: 20945 | 526407 | |
AlD | Source: 402, Target: 137 | 3602 | |
AuG | Source: 402, Target: 20945 | 97848 | |
BPpG | Source: 11381, Target: 20945 | 559504 | |
CCpG | Source: 1391, Target: 20945 | 73566 | |
CbG | Source: 1552, Target: 20945 | 11571 | |
CcSE | Source: 1552, Target: 5734 | 138944 | |
CdG | Source: 1552, Target: 20945 | 21102 | |
CrC | 1552 | 6486 | |
CuG | Source: 1552, Target: 20945 | 18756 | |
DaG | Source: 137, Target: 20945 | 12623 | |
DdG | Source: 137, Target: 20945 | 7623 | |
DpS | Source: 137, Target: 438 | 3357 | |
DuG | Source: 137, Target: 20945 | 7731 | |
GuG | 20945 | 265672 | |
GcG | 20945 | 61690 | |
GiG | 20945 | 147164 | |
GpMF | Source: 20945, Target: 2884 | 97222 | |
GpPW | Source: 20945, Target: 1822 | 84372 | |
PPI | Sampled | 3992 | 255522 |
Literature | 3992 | 364743 | |
Systematic | 3916 | 12913 | |
bioRxiv | Sampled | 4587 | 30686 |
<2018 | 4615 | 43691 | |
All time | 4615 | 44963 | |
TF-TG | Sampled | Source: 142, Target: 1396 | 2689 |
Literature | Source: 144, Target: 1406 | 3496 | |
Systematic | Source: 144, Target: 1417 | 29177 |
In the table that follows, let
Feature | Definition | Citation |
Jaccard index | $\frac{ | k(u) \cap k(v) |
Preferential attachment score | $ | k(u) |
Resource allocation index | $\sum_{w \in k(u) \cap k(v)} \frac{1}{ | k(w) |
Adamic/Adar index | $\sum_{w \in k(u) \cap k(v)} \frac{1}{log | k(w) |
Random walk with restart score | [@doi:10.1145/1014052.1014135;@raw:laplacian] | |
Inference score | $\frac{ | A(u) \cap D(v) |
Table: Edge prediction features. {#tbl:edge-prediction tag="S3"}