Availability of Supporting Source Code and Requirements

Project name: XSwap
Project homepage: https://github.com/greenelab/xswap-manuscript
Operating system(s): MacOS, Linux, Windows
Programming language: Python, C, C++
Other requirements: None
License: BSD 2-Clause
RRID: SCR_024802
biotools ID: xswap

Declarations

List of abbreviations

AUROC ~ area under the receiver operating characteristic curve

PPI ~ protein-protein interaction

TF-TG ~ transcription factor-target gene

RWR ~ random walk with restart

Competing interests

This work was supported, in part, by Pfizer Worldwide Research, Development, and Medical.

Funding

{% set funders = {} -%} {%- for author in manubot.authors -%} {% for funder in author.get('funders', []) -%} {%- if funder in funders -%} {% set _ = funders[funder].append(author.initials) %} {%- else -%} {% set _ = funders.update({funder: [author.initials]}) %} {%- endif -%} {%- endfor -%} {%- endfor %} {% for funder, author_initials in funders.items() -%} {% if author_initials|length > 2 %} {{- ', '.join(author_initials[:-1]) }}, and {{ author_initials[-1] }} were funded by {{ funder }}. {% elif author_initials|length == 2 %} {{- ' and '.join(author_initials) }} were funded by {{ funder }}. {% else %} {{- author_initials[0] }} was funded by {{ funder }}. {% endif %} {%- endfor -%} The funders had no role in the study design, data analysis and interpretation, or writing of the manuscript.

Authors' contributions

Author contributions are noted here according to CRediT (Contributor Roles Taxonomy). {%- set roles = {} -%} {%- for author in manubot.authors -%} {% for role in author.get('roles', []) -%} {%- if role in roles -%} {% set _ = roles[role].append(author.initials) %} {%- else -%} {% set _ = roles.update({role: [author.initials]}) %} {%- endif -%} {%- endfor -%} {%- endfor %} {% for role, author_initials in roles.items() -%} {% if author_initials|length > 2 %} {{- role }} by {{ ', '.join(author_initials[:-1]) }}, and {{ author_initials[-1] }}. {% elif author_initials|length == 2 %} {{- role }} by {{ ' and '.join(author_initials) }}. {% else %} {{- role }} by {{ author_initials[0] }}. {%- endif -%} {%- endfor %}

Acknowledgments

The authors thank Blair Sullivan for her feedback on a draft of the manuscript.

References {.page_break_before}

Supplementary Information

XSwap parameter settings for network types

Network type	Degree preserved	Figure	`allow_antiparallel`	`allow_loops`
simple	all	{height="85px"}	False	False
directed	in/out	{height="85px"}	Depends on networks	Depends on networks
bipartite	Depends on directedness	{height="85px"}	True	True

Table: Applications of the modified XSwap algorithm to various network types with appropriate parameter choices. For simple networks, each node's degree is preserved. For bipartite networks, each node's number of connections to the other part is preserved, and the partite sets (node class memberships) are preserved. For directed networks, each nodes' in- and out-degrees are preserved, though parameter choices depend on the network being permuted. Some directed networks can include antiparallel edges or loops while others do not. {#tbl:xswap tag="S1"}

Performance of the XSwap algorithm

The performance of the XSwap algorithm depends on a number of network properties. We define network density to be the number of edges divided by the number of potential edges. Increasing network density lowers the asymptotic fraction of edges changed, as greater density prevents the algorithm from removing certain edges. Random graphs generated with a preferential attachment mechanism (via Barabási–Albert) can have a lower fraction of their edges swapped, asymptotically, as compared to uniform random graphs (via Erdős–Rényi).

$Higher density networks have lower asymptotic fractions of edges swapped and take more attempts to reach these values. The Barabási–Albert model produces scale-free random graphs, while Erdős–Rényi generates random graphs where all edges are equally likely.$ {#fig:swap-percent width="100%" tag="S1"}

Approximate edge prior {#approx-prior-supp}

To approximate the edge prior, we began by making two simplifications. First, we assumed independence between node pairs. This assumption does not actually hold for the XSwap algorithm, though it is a reasonable simplification for large, sparse networks. Second, we assumed that the XSwap process is stationary. This assumption also does not actually hold, but it was made because it significantly simplifies the problem. A single node pair has two possible states, "edge" and "no edge". These states are not transient, and they are not periodic so long as more than one possible swap exists in the network. In almost all cases, then, our simplified model of the algorithm gives the state of a node pair as an ergodic process, independent of other node pairs.

Let $A_{i,j}$ represent the existence of edge $(i, j)$ For a given node pair, $(i, j)$, then, let $q_{i,j}$ represent the transition probability from the "no edge" state to the "edge" state in one successful iteration of the XSwap algorithm. Let $r_{i,j}$ represent the probability of the opposite transition ("edge" to "no edge") in one successful iteration. With "no edge" represented as $[1, 0]^T$ and "edge" represented as $[0, 1]^T$, the transition matrix, $P$, is given by the following:

\begin{align*} P^T &= \begin{bmatrix} 1-q & r \ q & 1-r \end{bmatrix} \end{align*}

The stationary distribution of this system should correspond to the distribution when the number of swaps goes to infinity. It can be found by computing the eigenvectors of the system, as we know that the stationary distribution vector, $\mathbf{v}$ satisfies $P^T \mathbf{v} = \mathbf{v}$. The eigenvector $\mathbf{v}$, normalized to sum to 1 as a probability vector, is given by

\begin{align*} \mathbf{v} = \frac{1}{r + q} \begin{bmatrix} r \ q \end{bmatrix} \end{align*}

The asymptotic edge probability is therefore

$$\frac{q}{r + q}.$$

Since node pairs are being treated as independent, the probability of an edge being created in one successful iteration, given that the edge does not currently exist, is the ratio of the number of edge choices involving nodes $i$ and $j$ to the total number of possible swaps, $S$. Let $d(u_i)$ represent the degree of source node $i$ and $d(v_j)$ represent the degree of target node $j$.

$$q_{i,j} = \frac{d(u_i)d(v_j)}{S}$$

Similarly, the probability of an edge being eliminated in one iteration is the ratio of the number of edge choices involving $(i,j)$ and any other valid edge to the total number of possible swaps. Let $m$ be the total number of edges in the network.

$$r_{i,j} = \frac{m - d(u_i) - d(v_j) + 1}{S}$$

The approximate edge prior is, therefore,

$$\frac{d(u_i)d(v_j)}{m - d(u_i) - d(v_j) + 1 + d(u_i)d(v_j)}.$$

Unfortunately, we found that the above edge prior approximation is a poor approximation in many cases. We found that the following modified form (introduced in Methods) affords a superior approximation:

\begin{equation} P_{i,j} = \frac{d(u_i) d(v_j)}{\sqrt{(d(u_i) d(v_j))^2 + (m - d(u_i) - d(v_j) + 1)^2}} \end{equation}

Interestingly, this expression can be derived by normalizing the eigenvector $\mathbf{v}$ to be a unit vector in the 2-norm instead of the 1-norm; that is, we use the value $q / \sqrt{r^2 + q^2}$ instead of $q/(r+q)$. Because the modified form of the approximation offers a much superior fit to the data, we chose to include only the modified version in the released Python package, and we used the modified form throughout our analysis.

Networks used for comparison {#networks}

Data	Network	Nodes	Edges
Hetionet	AdG	Source: 402, Target: 20945	102240
	AeG	Source: 402, Target: 20945	526407
	AlD	Source: 402, Target: 137	3602
	AuG	Source: 402, Target: 20945	97848
	BPpG	Source: 11381, Target: 20945	559504
	CCpG	Source: 1391, Target: 20945	73566
	CbG	Source: 1552, Target: 20945	11571
	CcSE	Source: 1552, Target: 5734	138944
	CdG	Source: 1552, Target: 20945	21102
	CrC	1552	6486
	CuG	Source: 1552, Target: 20945	18756
	DaG	Source: 137, Target: 20945	12623
	DdG	Source: 137, Target: 20945	7623
	DpS	Source: 137, Target: 438	3357
	DuG	Source: 137, Target: 20945	7731
	GuG	20945	265672
	GcG	20945	61690
	GiG	20945	147164
	GpMF	Source: 20945, Target: 2884	97222
	GpPW	Source: 20945, Target: 1822	84372
PPI	Sampled	3992	255522
	Literature	3992	364743
	Systematic	3916	12913
bioRxiv	Sampled	4587	30686
	<2018	4615	43691
	All time	4615	44963
TF-TG	Sampled	Source: 142, Target: 1396	2689
	Literature	Source: 144, Target: 1406	3496
	Systematic	Source: 144, Target: 1417	29177

Edge prediction features

In the table that follows, let $k(u)$ denote the set of neighbors of node $u$. Let $\mathbf{A}$ represent the normalized Laplacian adjacency matrix, and let $y_u$ be a vector with all ones except for a one in the $u$-th position. $x$ For a directed graph, let $A(u)$ denote the set of nodes that node $u$ points to and $D(u)$ the set of nodes that point to $u$. All definitions that follow are the score between nodes $u$ and $v$.

Feature	Definition	Citation
Jaccard index	$\frac{	k(u) \cap k(v)
Preferential attachment score	$	k(u)
Resource allocation index	$\sum_{w \in k(u) \cap k(v)} \frac{1}{	k(w)
Adamic/Adar index	$\sum_{w \in k(u) \cap k(v)} \frac{1}{log	k(w)
Random walk with restart score	$c \bigg[ \bigg( \mathbb{I} - (1-c) \mathbf{A}\bigg)^{-1} \mathbf{y}_u \bigg]_v$	[@doi:10.1145/1014052.1014135;@raw:laplacian]
Inference score	$\frac{	A(u) \cap D(v)

Table: Edge prediction features. {#tbl:edge-prediction tag="S3"}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

90.back-matter.md

90.back-matter.md

Availability of Supporting Source Code and Requirements

Declarations

List of abbreviations

Competing interests

Funding

Authors' contributions

Acknowledgments

References {.page_break_before}

Supplementary Information

XSwap parameter settings for network types

Performance of the XSwap algorithm

Approximate edge prior {#approx-prior-supp}

Networks used for comparison {#networks}

Edge prediction features

Files

90.back-matter.md

Latest commit

History

90.back-matter.md

File metadata and controls

Availability of Supporting Source Code and Requirements

Declarations

List of abbreviations

Competing interests

Funding

Authors' contributions

Acknowledgments

References {.page_break_before}

Supplementary Information

XSwap parameter settings for network types

Performance of the XSwap algorithm

Approximate edge prior {#approx-prior-supp}

Networks used for comparison {#networks}

Edge prediction features