Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'cluster' Extract MIBIG data #140

Closed
theo-llewellyn opened this issue Apr 24, 2023 · 6 comments
Closed

KeyError: 'cluster' Extract MIBIG data #140

theo-llewellyn opened this issue Apr 24, 2023 · 6 comments

Comments

@theo-llewellyn
Copy link

Hello,

I am using NPLinker using the following commands within python:

import sys, csv, os
sys.path.append('../src')
from nplinker.nplinker import NPLinker
npl = NPLinker('nplinker_demo1.toml')
npl.load_data()

And I am getting the following error at the stage of loading the MIBIG dataset:

11:20:25 [DEBUG] loader.py:598, make_mibig_bgc_dict(ProteoSAFe-FEATURE-BASED-MOLECULAR-NETWORKING-0c145140-network_components/mibig_json)
11:20:25 [INFO] genomics.py:538, Found 1817 MiBIG json files
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/tbl19/miniconda3/envs/nplinker-env/lib/python3.11/site-packages/nplinker/nplinker.py", line 272, in load_data
    if not self._loader.load(met_only=met_only):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tbl19/miniconda3/envs/nplinker-env/lib/python3.11/site-packages/nplinker/loader.py", line 337, in load
    if not met_only and not self._load_genomics():
                            ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tbl19/miniconda3/envs/nplinker-env/lib/python3.11/site-packages/nplinker/loader.py", line 599, in _load_genomics
    self.mibig_bgc_dict = make_mibig_bgc_dict(self.strains,
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tbl19/miniconda3/envs/nplinker-env/lib/python3.11/site-packages/nplinker/genomics.py", line 575, in make_mibig_bgc_dict
    accession, biosyn_class = extract_mibig_json_data(data)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tbl19/miniconda3/envs/nplinker-env/lib/python3.11/site-packages/nplinker/genomics.py", line 551, in extract_mibig_json_data
    accession = data['cluster']['mibig_accession']
                ~~~~^^^^^^^^^^^
KeyError: 'cluster'

The MIBIG dataset seems to have downloaded correctly and the .json files appear normal. Do you know what this error is referring to?

Any help would be much appreciated.

Best,
Theo

@CunliangGeng
Copy link
Member

Hi Theo, which version or branch of nplinker are you using?

@theo-llewellyn
Copy link
Author

Hello, I'm using v1.3.2. Installed on a M1 mac with rosetta enabled terminal using:

python -m venv env
source env/bin/activate

pip install nplinker

install-nplinker-deps

@CunliangGeng
Copy link
Member

The installation looks good.

What toml file are you using? is it this one (https://github.com/NPLinker/nplinker/blob/main/notebooks/nplinker_demo1.toml)?

@theo-llewellyn
Copy link
Author

I tried 2 different toml files; When I run it with (https://github.com/NPLinker/nplinker/blob/main/notebooks/nplinker_demo1.toml) I get the same error as the one shown in the Jupyter notebook https://github.com/NPLinker/nplinker/blob/main/notebooks/nplinker_demo1.ipynb

Exception: Failed to find *any* BiGSCAPE Network_Annotations tsv files under "/Users/clgeng/nplinker_data/pairedomics/extracted/MSV000079284/bigscape" (incorrect cutoff value? currently set to 30)

When I run it with my own toml file using my own data I get the KeyError: 'cluster' Extract MIBIG data error. My toml file looks like this. I have already run Bigscape.

loglevel = "DEBUG"
logfile = ""
repro_file = ""

[dataset]
root = "ProteoSAFe-FEATURE-BASED-MOLECULAR-NETWORKING-0c145140-network_components"
bigscape_cutoff = 45
run_bigscape = false
antismash_format = "flat"
antismash_delimiters = ["_scaffold"]

[webapp]
tables_metcalf_score = 3.0

Thanks for your help

@CunliangGeng
Copy link
Member

Though NPLinker (v1.3.2) allows user to provide their own local data as input, it not recommended to use that since this feature was not tested and have (un)foreseen problems. Instead, it's recommended to upload data to PODP platform and use PODP ID as input.

We have been refactoring NPLinker code but not released yet the latest version. I'd suggest you reinstall NPLinker using code from the dev branch:

# create a new virtual environment (Remove your old one, do create a new one)
python -m venv env
source env/bin/activate

# clone the code
git clone https://github.com/NPLinker/nplinker.git
git checkout dev

# install nplinker from local code
pip install  -e .

# install nplinker non-pypi dependencies and databases
install-nplinker-deps

Note that it's better to let NPLinker to run bigscape to get the GCFs. By default, it uses
--mibig --clans-off --mix --include_singletons parameters to run bigscape (here mibig v3.1 is used by default).
If your bigscape running did not use the same parameters, NPLinker will fail to load them.

@theo-llewellyn
Copy link
Author

Thank you for this information. My misunderstanding, I am happy for you to close this issue now and will explore uploading the files to PODP.
Thanks again for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants