CifData constructor parses cif file, and then throws parsed object away #1186

ltalirz · 2018-02-22T20:52:04Z

In the following piece of code

from aiida.data.orm.cif import CifData
a = CifData(file="/path/to/file.cif")

the constructor of CifData seems to first parse the CIF file, and then throw the information away again.

Below is the code for the constructor:
https://github.com/aiidateam/aiida_core/blob/62a5258f09d92dd36cd75f5f462196c24037b078/aiida/orm/data/cif.py#L513-L519
After line 517, self._values actually holds the parsed CIF object [1]. Then it is set to None.

This raises a few questions

Is this a bug or am I missing something?
Regardless of whether it is intentional or not, in my opinion the code needs to change, as it is very difficult to understand what is going on under the hood. This could be done either by changing some of the node-level logic that causes the _values to be set (which I would prefer) or by hiding the _values from this logic. Which of the two routes should be followed?
I think for CifData in the end one wants at least two options for the constructor: a direct one, where the constructor parses the contents of the cif file and stores them, and a lazy one, where the constructor just stores the file name and the Cif file is parsed on demand.
Perhaps even a third one, which is the current model of storing only formula & spacegroup and throwing away the python object so that it does not bloat the DB.
Which of these options should be the default?

Mentioning @merkys who seems to have written these lines...
Also mentioning @giovannipizzi @nmounet

[1] This goes via a rather complicated route. The super calls
aiida/orm/implementation/sqlalchemy/node.py _init__
=> aiida/orm/implementation/general/node.py _set_with_defaults
=> aiida/orm/implementation/general/node.py _set_internal
=> (somehow) aiida/orm/data/cif.py set_file
=> aiida/orm/data/cif.py _set_attr('formulae', self.get_formulae())
which causes the CIF file to be read

Edit: Regarding 2., on reflection it is not that strange anymore, if one has learned that in the 'AiiDA way' calling the constructor with file=<filename> calls the set_file function
https://github.com/aiidateam/aiida_core/blob/62a5258f09d92dd36cd75f5f462196c24037b078/aiida/orm/data/cif.py#L528-L543
where get_formula then inevitably forces the parsing of the cif file.

The text was updated successfully, but these errors were encountered:

merkys · 2018-02-23T15:37:10Z

This indeed seems to be a bug. Looking back to the introduction of __init__() it seems that set_file() was not supposed to parse the file and populate _values. Apparently, I have introduced such behavior later and forgot to review the __init__().

For the solution here I support the lazy spirit. Attributes formulae and spacegroup_numbers could be set at the moment of reference (CIF file is always present in the repository and can be parsed on demand). However, I recall a requirement of the attributes to be present from the moment of the initialization of Data in order to be searchable in the back-end database (I may be wrong here), and this requirement keeps us away from moving to lazy evaluation.

A quick fix would be just to move initialization of _ase and _values before the call to Super(CifData).__init__(), but I assume that we should think of a more general solution.

ltalirz · 2018-02-23T15:40:12Z

Thanks, I'm working on a solution now, pull request comes soon

I recall a requirement of the attributes to be present from the moment of the initialization of Data in order to be searchable in the back-end database (I may be wrong here), and this requirement keeps us away from moving to lazy evaluation.

Thanks for the notice, I'll have a look.
Anyhow, in order not to break people's code I will keep eager parsing the default and introduce the lazy parsing as an option.

merkys · 2018-02-23T15:44:35Z

Anyhow, in order not to break people's code I will keep eager parsing the default and introduce the lazy parsing as an option.

I think this is the best way to do.

ltalirz assigned ltalirz and giovannipizzi and unassigned giovannipizzi Feb 22, 2018

ltalirz mentioned this issue Feb 23, 2018

Issue 1183 fast cif parsing #1188

Closed

ltalirz mentioned this issue Feb 23, 2018

CifData: fast scanning and lazy parsing #1190

Merged

ltalirz closed this as completed Mar 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CifData constructor parses cif file, and then throws parsed object away #1186

CifData constructor parses cif file, and then throws parsed object away #1186

ltalirz commented Feb 22, 2018 •

edited

Loading

merkys commented Feb 23, 2018

ltalirz commented Feb 23, 2018 •

edited

Loading

merkys commented Feb 23, 2018

CifData constructor parses cif file, and then throws parsed object away #1186

CifData constructor parses cif file, and then throws parsed object away #1186

Comments

ltalirz commented Feb 22, 2018 • edited Loading

merkys commented Feb 23, 2018

ltalirz commented Feb 23, 2018 • edited Loading

merkys commented Feb 23, 2018

ltalirz commented Feb 22, 2018 •

edited

Loading

ltalirz commented Feb 23, 2018 •

edited

Loading