Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grouping chains (receptor and ligand) for evaluation #33

Open
heol1 opened this issue Jun 10, 2024 · 11 comments
Open

Grouping chains (receptor and ligand) for evaluation #33

heol1 opened this issue Jun 10, 2024 · 11 comments
Labels

Comments

@heol1
Copy link

heol1 commented Jun 10, 2024

In the previous version of DockQ, there were input arguments, -native_chain{1,2} and -model_chain{1,2}, that allow grouping chains for receptor and ligand. For example, if there are one chains (e.g., A) for receptor and two other chains (e.g., H and L) for ligand and I only interested in the interface between (A) and (H+L), how can I do this with DockQ v2? With the previous version, it can be done like the following:

DockQ.py [model] [native] -native_chain1 A -native_chain2 H L -model_chain1 A -model_chain2 H L

I noticed that DockQ v2 evaluates for every interface between chains and gives some averaged values as the metrics for the whole complex. However, I could not find optional arguments for grouping chains. I do not think averaging metrics for interested chain pairs would work neither because numerators and denominators for RMSDs and fnat are different for evaluating (A and (H+L)) compared to averaging (A and H) and (A and L).

@clami66
Copy link
Collaborator

clami66 commented Jun 11, 2024

Hi @heol1 ,

This feature is currently not supported anymore, since it interferes with the automatic mapping of chains. But you can still make this work on your own by manually merging two chains as one before calling DockQ. Here is an example of a function merge_chains that merges chains B, C from one PDB and H, L from another, then calls DockQ on the resulting structures:

from DockQ.DockQ import load_PDB, run_on_all_native_interfaces

def merge_chains(model, chains_to_merge):
    for chain in chains_to_merge[1:]:
        for i, res in enumerate(model[chain]):
            res.id = (chain, res.id[1], res.id[2])
            model[chains_to_merge[0]].add(res)
        model.detach_child(chain)
    model[chains_to_merge[0]].id = "".join(chains_to_merge)
    return model

model = load_PDB("2j5l.cif.gz")
native = load_PDB("2j4w.cif.gz")

model = merge_chains(model, ["B", "C"])
native = merge_chains(native, ["L", "H"]) # automatic mapping will not work here, user must merge in the right order

# native:model chain map dictionary for two interfaces
chain_map = {"D":"A", "LH":"BC"}
# returns a dictionary containing the results and the total DockQ score
run_on_all_native_interfaces(model, native, chain_map=chain_map)

({('D', 'LH'): {'DockQ_F1': 0.8826043954496244,
   'DockQ': 0.8359010987463277,
   'F1': 0.7472527472527473,
   'irms': 0.4391250825975146,
   'Lrms': 1.2297489438154532,
   'fnat': 0.6071428571428571,
   'nat_correct': 34,
   'nat_total': 56,
   'fnonnat': 0.02857142857142857,
   'nonnat_count': 1,
   'model_total': 35,
   'clashes': 0,
   'len1': 34,
   'len2': 437,
   'class1': 'ligand',
   'class2': 'receptor',
   'is_het': False,
   'chain1': 'A',
   'chain2': 'BC',
   'chain_map': {'D': 'A', 'LH': 'BC'}}},
 0.8359010987463277)

You could stick the function in a separate script and call it on your input files if you prefer using DockQ from the command line, let me know if you need help with that.

A couple things I should mention:

  • This is largely untested, so be sure you double check your results
  • As I said, this will make automatic mapping of chains impossible in some cases, e.g. if your merged chains are homomeric like in the example above. This means that you will have to manually decide if you merge H with L to form chain HL or viceversa to form chain LH. In the example above, the DockQ score would be incorrect if you merged them the other way around.
  • Don't discount the idea of letting DockQ just figure out the mapping and summarize the results for all interfaces. Results will not be the same but should correlate well, as you can see below.
model = load_PDB("2j5l.cif.gz")
native = load_PDB("2j4w.cif.gz")

chain_map = {"D":"A", "L":"B", "H":"C"}
run_on_all_native_interfaces(model, native, chain_map=chain_map)


#DockQ average across the two interfaces of interest
print((res[0][("D", "L")]["DockQ"] + res[0][("D", "H")]["DockQ"]) / 2)

0.8505

@heol1
Copy link
Author

heol1 commented Jun 11, 2024

Thank you for your detailed explanations!

@heol1 heol1 closed this as completed Jun 11, 2024
@clami66
Copy link
Collaborator

clami66 commented Jun 12, 2024

Great! Reopening this in case someone else is looking for the same thing

@clami66 clami66 reopened this Jun 12, 2024
@clami66 clami66 added the solved label Jun 12, 2024
@AmeyaHarmalkar
Copy link

Thanks for the explanation and sharing the merge_chains function!
One request : will it be possible to share the commit hash where the change was made? I would like to revert to the earlier version where it was possible to explicitly define the chains. Maybe I can just use release v1?

@clami66
Copy link
Collaborator

clami66 commented Jun 17, 2024

@AmeyaHarmalkar I think v1 is the safest option if you want to use the old flags

@floeshak
Copy link

Hi, I just tested this and have a question regarding the merging of two chains into one if they are ligands. Merging the ligand chains will make the combined ligand chain longer than the receptor chain, which will affect the lrmsd calculation and thus the DockQ score. Is there a way to avoid this?

Thank you

@serbulent-av
Copy link

Thanks for this. Merging is pretty important for antibody research. I tried this with averaging option on ~15 structures and has correlation score 0.99. So I think it is safe to use and merge it in to the original code if you'd like.

@clami66
Copy link
Collaborator

clami66 commented Jul 9, 2024

@serbulent-av did you mean that averaging the scores from the default automapping function is correlating 0.99 with merging the chains?

If that is case, would it not be best to just use the averaging?

@clami66
Copy link
Collaborator

clami66 commented Jul 9, 2024

Hi @floeshak

Merging the ligand chains will make the combined ligand chain longer than the receptor chain, which will affect the lrmsd calculation and thus the DockQ score. Is there a way to avoid this?

Not off the top of my head, unfortunately. We might add a functionality to force a different ligand-receptor mapping, as it was also asked for the previous version, but I don't know how soon that will be available.

@serbulent-av
Copy link

@clami66 Yes I meant that. I think the choice depends if you'd like to see other metrics such as fnat separately (per chain) you may like to use averaging. However, for antibodies, merging chains is often better than averaging when analyzing interaction metrics like fnat. This is because:

The heavy chain usually dominates antibody-antigen interactions.
Averaging can be misleading, while merging gives a more realistic picture.
Example:

H chain: fnat 0.5 (12 interacting residues)
L chain: fnat 1.0 (2 interacting residues)
Averaging: (0.5 + 1.0) / 2 = 0.75
Merging: (0.5 * 12 + 1.0 * 2) / 14 ≈ 0.58

The merged result (0.58) better reflects the overall interaction, considering each chain's actual contribution. This approach provides a more accurate representation of the antibody-antigen interface.

@clami66
Copy link
Collaborator

clami66 commented Jul 9, 2024

@serbulent-av I understand.

Though you should in theory have all the information you need from the results dictionary to recompute the correct fnat as well (but not from the textual output if you run from command line).

We will also add the possibility to output the results to a json file soon, so that should make it even simpler to parse and aggregate scores

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants