Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up creation of Nodes in the AiiDA ORM #2214

Merged
merged 2 commits into from
Nov 30, 2018

Conversation

ltalirz
Copy link
Member

@ltalirz ltalirz commented Nov 17, 2018

  • move AiiDAManager from aiida.work to aiida.manage
  • introduce caching of .objects in aiida.orm.entities.Collection
  • cache default user in aiida.orm.user.User.Collection (fix Add caching of object collections #2216)

Benchmark on MacBook Pro, python3

  • before: 58.5s to create 10k nodes (57.2 spent inside Node.__init__)
  • after: 2.8s to create 10k nodes (1.7s spent inside Node.__init__)

The timing of 58.5s is significantly worse than what we wrote down during the tests with @szoupanos and @giovannipizzi (which was ~10s for 10k nodes), i.e. probably someone broke the caching.

Anyhow, the target we set then was: 2s for 10k nodes, which we have reached now on python3 (it's the 1.7s that should scale linearly with the number of nodes).

giovannipizzi
giovannipizzi previously approved these changes Nov 18, 2018
Copy link
Member

@giovannipizzi giovannipizzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ltalirz great!!
Can I suggest to add a step in travis (not necessarily a test, maybe an additional test) that runs the speed test (e.g. for creating 10k nodes, and for running say 10x a simple verdi command that a) require load_dbenv and b) does not require it, and output some timings? This might be great in the long term to go back in history and monitor when something broke speedup. Moreover, in Jenkins one can output "artifacts" that are stored, maybe also in travis there is something similar. If you don't have the time to do it but you're ok with this, feel free to copy this text in an issue (and even assign me).

@giovannipizzi
Copy link
Member

I realize tests are falling, there is at least one relative import that has not been moved

@ltalirz ltalirz changed the title Speed up creation of Nodes in the AiiDA ORM [WIP] Speed up creation of Nodes in the AiiDA ORM Nov 18, 2018
@ltalirz
Copy link
Member Author

ltalirz commented Nov 18, 2018

For performance tests, I opened #2215

This PR is still a little bit dirty and before I move forward I have a few questions, perhaps also for @sphuber or @muhrin :

  1. What exactly should be the scope of the AiiDAManager? Before this PR it looked backend-independent on the surface (I might be wrong) but now we are definitely adding backend-dependent stuff.

  2. @muhrin's work was making it possible to specify the backend for getting objects. I guess we should retain this functionality in the AiiDA manager. To me this means we either
    a. cache the backend in the AiiDA manager and use that and/or
    b. make it possible to specify the backend when accessing the cache (and effectively have a per-backend cache)
    If lots of stuff is going to end up here, Ideally, we should try to reuse some of the syntax used to access the uncached objects.

  3. My purpose here was just to speed up Node() but the uncached User.objects.get_default() is used in many other places in the code. We'll need to decide on a policy of where to use cached objects and where not.

@coveralls
Copy link

coveralls commented Nov 18, 2018

Coverage Status

Coverage increased (+0.02%) to 69.014% when pulling 2ff392a on ltalirz:speedup-node into def85b5 on aiidateam:provenance_redesign.

@muhrin
Copy link
Contributor

muhrin commented Nov 18, 2018

Dear Leo, thanks for the work - looks good.

My response:

  1. AiiDAManager is specifically act as the single place to construct/cache/access backend specific calsses. Particularly some which are quite complex to construct e.g. Runners and Communicators. So I'm not sure what exactly you mean by 'backend dependent' but if you mean it does different things based on the current backend (or more specifically profile) then that's exactly what it's there for.

  2. This is a good point and needs a little discussion. For now AiiDAManager is a singleton of sorts (given that it only has class methods) that uses the current profile to make decisions but that precludes the ability to use it with different profiles simultanously...that said, this is kind of the point in that it's there to do the right thing based on the currently set profile (which could potentially be changed at runtime).

  3. I would make this lazily cached in the User collection so it's fast for everyone using it

@ltalirz
Copy link
Member Author

ltalirz commented Nov 18, 2018

@muhrin thanks a lot for your thoughts!

Re 2.: Ok. So, the idea is that when you load a new profile (when this becomes possible...), the AiiDAManager cache will be reset, correct? I will add this to its docstring.

Re 3.: Could you elaborate on this a bit? In 1. you wrote that the AiiDAManager should be the single place to construct/cache/access backend specific classes.
So withlazily cached do you mean:

  • The default user will be cached inside the AiiDAManager
  • User.objects.get_default() will use/populate the cached default user in AiiDAManager._DEFAULT_USER

Furthermore, is there currently anything that invalidates the cache?
What if you pass to User.objects.get_default() a backend that is not the current one?

@muhrin
Copy link
Contributor

muhrin commented Nov 18, 2018

  1. Correct.

  2. The dependency relationship was envisaged to be that AiiDAManager know about the backend but not the other way around (where possible). This would mean that AiiDAManager should eventually contain the construct_backend() method and I would imagine that it would be User.objects.get_default() that would contain the caching itself. If you do User.objects(my_backend).get_default() the first part (User.objects(my_backend)) actually passes back a separate collection instance which would itself contain the same caching logic as the default one. Currently calling with a custom backend will not result in the collection being cached after instantiation which would have serious performance implications so it's a target for future optimisation.

@ltalirz
Copy link
Member Author

ltalirz commented Nov 18, 2018

Currently calling with a custom backend will not result in the collection being cached after instantiation which would have serious performance implications so it's a target for future optimisation.

This holds even when calling with the default backend, correct? I.e. caching stuff inside User.objects currently does not work.

In [1]: from aiida.orm import User

In [2]: User.objects._test = 1

In [3]: User.objects._test
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-998f4695a253> in <module>()
----> 1 User.objects._test

AttributeError: 'Collection' object has no attribute '_test'

So, for the moment, should we cache the default user inside the AiiDAManager and use this cache from User.objects.get_default() with a notice to move this cache into User.objects in the future (and knowing that this temporary solution will not do the right thing when passing a non-default backend to User.objects)?

@muhrin
Copy link
Contributor

muhrin commented Nov 19, 2018

Hmm...I thought it was there already but you're right, it's not. The place for caching (the default) objects collection is here:

https://github.com/aiidateam/aiida_core/blob/aa7907260906f01d8d7bb04b1b1421b24ab00d3f/aiida/orm/entities.py#L116

It's quick to add, happy to do it because the performance implications could be great.

If you get this PR through I can add the caching, you can track it here: #2216

I would avoid putting the caching in the AiiDAManager because 1) it will move anyway, and 2) I haven't really reflected on if this is even a good place to put things like this (although on the surface I don't see why not).

Now if a new collection is created via .objects it is cached for the
next time in a lazy store.  Similarly the default user for a User
collection is cached after being created for the first time.  This may
lead to issues if the default user can change for a given profile during
runtime, however for now we don't support this.
@muhrin muhrin changed the title [WIP] Speed up creation of Nodes in the AiiDA ORM Speed up creation of Nodes in the AiiDA ORM Nov 29, 2018
@muhrin
Copy link
Contributor

muhrin commented Nov 29, 2018

Ok, ready to go. @ltalirz please have a look. @sphuber please review


# pylint: disable=wildcard-import

from .profile import *
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need a wildcard here?
Perhaps just import the functions we want (I guess the constants defined in profile.py are not needed here (?))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to what we do in other prats of the code (which I borrowed from the asyncio codebase):

from .profile import *

__all__ = profile.__all__

This way we delegate what gets imported to profile itself.

@@ -126,6 +126,28 @@ def rmq_prefix(self):
def is_test_profile(self):
return self._test_profile

@property
def configured_user_email(self):
Copy link
Member Author

@ltalirz ltalirz Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to introduce a different adjective (configured user) here?
Why not default user?

Edit: I see you just moved the function. Perhaps still a good time to rename it now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to default_user_email

email = self.config[DEFAULT_USER_CONFIG_FIELD]
# I do not catch the error in case of missing configuration, because
# it is already a ConfigurationError
except KeyError:
Copy link
Member Author

@ltalirz ltalirz Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this an exception before as well?
Should it be forbidden not to have a default user?

Edit: I see, it was...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's forbidden. Can't store any nodes without one.

from aiida.backends.djsite.db.testbase import DjangoTests
DjangoTests().clean_db()

def __clean_db_sqla(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this stuff no longer necessary, this creation of a new_user object?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's now done at the ORM level in the aiida/backends/testimplbase.py insert_data so it's backend independent...this is the way, ideally, all these things would be but we can't delete everything from ORM function calls.

@ltalirz
Copy link
Member Author

ltalirz commented Nov 29, 2018

@muhrin added a few comments, otherwise looks good to me (+ tested that node creation is fast again :-) )!
just the failing tests to fix...

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments as well

@@ -25,22 +25,24 @@ class FixtureManagerTestCase(unittest.TestCase):
"""Test the FixtureManager class"""

def setUp(self):
print("Starting set up")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe get rid of the print statements or make them debug/info logs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@@ -55,6 +57,8 @@ def test_create_use_destroy_profile(self):
from aiida import is_dbenv_loaded
with Capturing() as output:
self.fixture_manager.create_profile()

print("1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left over debug print statements

Copy link
Contributor

@muhrin muhrin Nov 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expunged

# user is recreated because it caches the user!
# In any case, store it in self.user though
# email = get_manager().get_profile().configured_user_email
# self.test_session.query(DbUser).filter(DbUser.email != email).delete()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment still relevant if the accompanying line is commented out?

Copy link
Contributor

@muhrin muhrin Nov 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted

def __init__(self, backend, entity_class):
"""Construct a new entity collection"""
# assert issubclass(entity_class, Entity), "Must provide an entity type"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we uncomment this assert, or if not just remove it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncommented

try:
return False, self.get(**kwargs)
except exceptions.NotExistent:
return True, User(backend=self.backend, **kwargs)

def get_default(self):
"""
Get the current default user
Get the current default user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong indentation in docstring

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unindented

# A label associated to the present class (coincides with the resource name)
__label__ = "computers"
# The AiiDA class one-to-one associated to the present class
from aiida.orm import Computer
_aiida_class = Computer
# from aiida.orm import Computer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete commented import

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Erased


computer = Computer.get('deneb')
dbauthinfo = computer.get_dbauthinfo(get_automatic_user())
dbauthinfo = computer.get_dbauthinfo(orm.User.objects.get_default())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done more succinctly now

computer = Computer.get('deneb')
transport = computer.get_transport()
scheduler = computer.get_scheduler()
scheduler.set_transport(transport)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced

This way we have greater control over its creation and we don't have to
implement a reset method which starts to get complicate.  We just throw
the object away and instsantiate a new one when we need to reset

 * Removed construct_backend() (replaced by get_manager().get_backend())
 * Changed all tests to not specify the backend, they should just do
what user facing code does which is use the default one and if we ever
want to test with different backends we just implement a function to
load a new backend from a profile globally
* Moved getting default user email to the `Profile` class
@sphuber sphuber merged commit 0cce69f into aiidateam:provenance_redesign Nov 30, 2018
ltalirz added a commit to ltalirz/aiida-core that referenced this pull request Feb 14, 2019
PR aiidateam#2214 stripped of safeguards that prevented the default user from
being deleted in between tests.
Since AiiDA expects a default user in many operations, leaving the DB
in a state with no default user is not ideal.

This commit reintroduces safeguards that result in the default user
remaining untouched.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants