Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fast CIF file parsing #1183

Closed
ltalirz opened this issue Feb 22, 2018 · 3 comments
Closed

fast CIF file parsing #1183

ltalirz opened this issue Feb 22, 2018 · 3 comments
Assignees

Comments

@ltalirz
Copy link
Member

ltalirz commented Feb 22, 2018

I've just benchmarked the speed of parsing large CIF files (several hundreds of atoms each) into python representations with ASE, pymatgen and PyCifRW.
From my findings, by default ASE and PyCifRW are about on par (pymatgen is 2x slower).
However, using scan_type='flex' in PyCifRW results in another 4x speedup.

All one needs to do is to replace
https://github.com/aiidateam/aiida_core/blob/62a5258f09d92dd36cd75f5f462196c24037b078/aiida/orm/data/cif.py#L494
by c = CifFile.ReadCif(self.get_file_abs_path(), scan_type='flex')

The documentation of PyCifRW says

flex will tokenise the input CIF file using fast C routines, but is not available for CIF2/STAR2 files.

so I'm not sure it is the best choice for everyone, but I would definitely want to use it.

In my view, this should be an aiida-wide setting. Can someone tell me where I should put it?

@giovannipizzi
Copy link
Member

I'm not sure it should be a AiiDA-wide setting (that would be managed via properties: http://aiida-core.readthedocs.io/en/latest/verdi/properties.html?highlight=setproperties).
Instead, I think it should be a class method (or maybe attribute, so it's stored in the DB, and set in the init or with a specific method), because it depends on the format of the specific file (I think CIF2 refers to https://www.iucr.org/resources/cif/cif2).

@ltalirz ltalirz assigned ltalirz and unassigned nmounet and giovannipizzi Feb 22, 2018
@ltalirz
Copy link
Member Author

ltalirz commented Feb 22, 2018

Point taken... people might want to use both methods in one AiiDA instance.

I would like to avoid storing it for each CifData object and bloating the DB.
If I make the scan_type a class attribute of CifData (a user could then do something like CifData.pycifrw_scan_type='flex'), I guess it won't go to the DB when an object is stored, or will it?

@giovannipizzi
Copy link
Member

If it is not set as an attribute with set_attr, it does not go to the DB. On the other hand, it's a small string, and maybe good to know if the two methods give different results (expected or bugs)... We store much more anyway in the DB in many cases, my personal suggestion would be to store this bit, it's part of the provenance of how you parsed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants