Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CifData constructor parses cif file, and then throws parsed object away #1186

Closed
ltalirz opened this issue Feb 22, 2018 · 3 comments
Closed
Assignees

Comments

@ltalirz
Copy link
Member

ltalirz commented Feb 22, 2018

In the following piece of code

from aiida.data.orm.cif import CifData
a = CifData(file="/path/to/file.cif")

the constructor of CifData seems to first parse the CIF file, and then throw the information away again.

Below is the code for the constructor:
https://github.com/aiidateam/aiida_core/blob/62a5258f09d92dd36cd75f5f462196c24037b078/aiida/orm/data/cif.py#L513-L519
After line 517, self._values actually holds the parsed CIF object [1]. Then it is set to None.

This raises a few questions

  1. Is this a bug or am I missing something?
  2. Regardless of whether it is intentional or not, in my opinion the code needs to change, as it is very difficult to understand what is going on under the hood. This could be done either by changing some of the node-level logic that causes the _values to be set (which I would prefer) or by hiding the _values from this logic. Which of the two routes should be followed?
  3. I think for CifData in the end one wants at least two options for the constructor: a direct one, where the constructor parses the contents of the cif file and stores them, and a lazy one, where the constructor just stores the file name and the Cif file is parsed on demand.
    Perhaps even a third one, which is the current model of storing only formula & spacegroup and throwing away the python object so that it does not bloat the DB.
    Which of these options should be the default?

Mentioning @merkys who seems to have written these lines...
Also mentioning @giovannipizzi @nmounet

[1] This goes via a rather complicated route. The super calls
aiida/orm/implementation/sqlalchemy/node.py _init__
=> aiida/orm/implementation/general/node.py _set_with_defaults
=> aiida/orm/implementation/general/node.py _set_internal
=> (somehow) aiida/orm/data/cif.py set_file
=> aiida/orm/data/cif.py _set_attr('formulae', self.get_formulae())
which causes the CIF file to be read

Edit: Regarding 2., on reflection it is not that strange anymore, if one has learned that in the 'AiiDA way' calling the constructor with file=<filename> calls the set_file function
https://github.com/aiidateam/aiida_core/blob/62a5258f09d92dd36cd75f5f462196c24037b078/aiida/orm/data/cif.py#L528-L543
where get_formula then inevitably forces the parsing of the cif file.

@merkys
Copy link
Member

merkys commented Feb 23, 2018

This indeed seems to be a bug. Looking back to the introduction of __init__() it seems that set_file() was not supposed to parse the file and populate _values. Apparently, I have introduced such behavior later and forgot to review the __init__().

For the solution here I support the lazy spirit. Attributes formulae and spacegroup_numbers could be set at the moment of reference (CIF file is always present in the repository and can be parsed on demand). However, I recall a requirement of the attributes to be present from the moment of the initialization of Data in order to be searchable in the back-end database (I may be wrong here), and this requirement keeps us away from moving to lazy evaluation.

A quick fix would be just to move initialization of _ase and _values before the call to Super(CifData).__init__(), but I assume that we should think of a more general solution.

@ltalirz
Copy link
Member Author

ltalirz commented Feb 23, 2018

Thanks, I'm working on a solution now, pull request comes soon

I recall a requirement of the attributes to be present from the moment of the initialization of Data in order to be searchable in the back-end database (I may be wrong here), and this requirement keeps us away from moving to lazy evaluation.

Thanks for the notice, I'll have a look.
Anyhow, in order not to break people's code I will keep eager parsing the default and introduce the lazy parsing as an option.

@merkys
Copy link
Member

merkys commented Feb 23, 2018

Anyhow, in order not to break people's code I will keep eager parsing the default and introduce the lazy parsing as an option.

I think this is the best way to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants