Skip to content

What is [Con]tact score?

Sergey O edited this page Sep 15, 2022 · 1 revision

[WIP]

Besides structure, AlphaFold2 also returns a distogram. This is a distribution of distances between every pair of positions. The sharpness or entropy of the distribution provides a level of confidence. So if the model is confident about the prediction every pair of residues will have a sharp prediction. This is something AlphaFold1 and TrRosetta in the past used to estimate model accuracy (this got largely replaced by plddt and pae, which explicitly returns model confidence).

For a protein we know each position on averages makes at least 2 contacts, if you exclude immediate neighbors (seqsep < 6). So what the [con]tact score does is computes the entropy of the top2 contacts per position. Because we reason that at least 2 contacts per position should be within 14 angstroms and be very confident.

For binder hallucination there are two "con" options: "con" and "i_con". The first is tries to maximize [num]ber of [con]tacts with cb-cb distance < [cutoff] per position with sequence seperation > [seqsep]. The latter tries to maximize [num]ber of contacts at interface between binder and target, per binder position (number of positions controled with [num_pos]) with cb-cb distance < [cutoff].

Clone this wiki locally