Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verdi node delete 923 #1083

Merged
merged 13 commits into from
Feb 5, 2018
Merged

Conversation

lekah
Copy link
Contributor

@lekah lekah commented Jan 26, 2018

verdi node delete via command line is introduced, making the secret script not longer necessary.
Made a test for that functionality and updated the documentation.

lekah added 4 commits January 26, 2018 15:00
…the command-line interface. A new module in utils takes care of the abstract part (querying of nodes to delete, with the provenance followed in reverse. Backend-specific utilities take care of the deletion in the DB
action='store_true')
# Commenting this option for now
# parser.add_argument('-f', '--force', help='force deletion, disables final user confirmation', action='store_true')
parser.add_argument('-v', '--verbosity', help='verbosity level', action='count', default=1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the verbosity level: not sure it's very clear like this...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been addressed in 3fcab81

Copy link

@nmounet nmounet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good, but since this is quite dangerous I think one needs at least one more review (and maybe an approval from GP?)

Copy link
Contributor

@szoupanos szoupanos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall and from a fast look it looks OK (but the details are also important). For example, I am not totally sure about the link status and if we cover all the cases with the tests. Maybe we can chat directly.

It's not a bad idea to involve @giovannipizzi to the discussion.


session = sa.get_scoped_session()

with session.begin(subtransactions=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At aiida.orm.implementation.sqlalchemy.code.delete_code I see the following

        from aiida.backends.sqlalchemy import get_scoped_session
        session = get_scoped_session()
        session.begin(subtransactions=True)
        try:
            code.dbnode.delete()
            session.commit()
        except:
            session.rollback()
            raise

Which should be the verbose way of what you wrote. I suppose that at the end of the with clause/statement, there is an auto-commit (according to http://docs.sqlalchemy.org/en/latest/orm/session_transaction.html#session-autocommit).

It is worth verifying that for any kind of error in the queries, there is a rollback.I suppose there is but...
The same for Django

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The django part is taken from the existing script, and the same logic is replicated for sqlalchemy.
The django code is working on big databases, I was using it in a while in the current form.
Regarding your comment on verbosity, I disagree: I do explicit deletion of the links, since it's more robust than relying on deletion cascades (which I'm not sure is implemented) for the links. Maybe above example for code deletion should be checked?

From the docs that you cite I understand that using the session in a try-except clause or with a context-manager is equivalent. As for the 'it is worth verifying': That's the reviewers job ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not actually a proof of correctness that it runs on big databases (it's an indication). I also never mentioned to do deletion propagation etc.

An overall comment. The reviewers' job is to make some constructive comments on the code and highlight some potential corner cases (of course they should be somehow answered constructively since the author asked the reviewer's opinion & the reviewer invested some time to give feedback).

Unfortunately, I don't have the time to check your code for all these corner cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me see problem, I make fix... Seriously, let's not have a big discussion here, about proofs of correctness, and the role of reviewers. I added the explicit try-and-except clause that you wanted and made some more tests with deliberately wrong queries. The rollback works. Actually, I found that, different from Django, the cascading is not implemented for group-membership, so I also delete on the dbnode_dbgroup table in 83b0478 .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes

@lekah
Copy link
Contributor Author

lekah commented Jan 30, 2018

Welcome to the party @giovannipizzi
I tried to make the command both transparent in usage and safe.
By default, it asks for user confirmation before deleting anything. Also, there is a dry-run option.
This particular feature has been requested by a multitude of users. I agree that it's dangerous, which is why the precautions and prompt for user confirmation.
I'm very open to putting another confirmation question, and/or printing a warning in all-caps.
(ARE YOU REALLY SURE YOU WANT TO DELETE X NODES?)

@giovannipizzi
Copy link
Member

I gave my comments to @lekah personally, after which he did the two new commits. I think the way it is now it's acceptable. Note that the follow-returns has been removed from the command line (I think it's too dangerous, it's not what people want). It's still in the code to avoid to remove the implementation)>

@giovannipizzi
Copy link
Member

Important question: what to do when deleting outputs of sealed calculations? @sphuber @muhrin
It shouldn't be allowed, what do you think? Otherwise the calculation looks ok but it misses some important output. Is this a constraint we want to have? As a consequence, should there be a flag for deleting 'creating' calculations?

@giovannipizzi
Copy link
Member

After further discussion with @lekah we realised that there might be a number of reasons why users want to delete outputs. They should at least be aware that they are doing it. So, proposed solution:

  1. there is a check flag, True by default, that checks
    • if there are any Data node to be deleted, CREATEd by a calculation that is not going to be deleted
    • if there are any Calculation node to be deleted, CALLed by a calculation that is not going to be deleted
  2. In this case, it will print out a message with the pairs of nodes (input -> output) and a message explaining why in general this is discouraged
  3. If the user, anyway, wants to continue, we add an entry to the CalculationLog table (visible with verdi calculation logshow or verdi work report) that says that the user asked to delete an output, similarly to what is done when a calculation is killed - so at least it is written down, and when checking back old calculations, if something weird is seen, i.e. missing outputs, one can check why this is so

@sphuber
Copy link
Contributor

sphuber commented Jan 31, 2018

Do these final checks hold for all calculations or only sealed ones? Do we now have a clear definition and implementation of the sealed concept. I for one am not sure what exactly it entails.

@lekah
Copy link
Contributor Author

lekah commented Jan 31, 2018

For all calculations

…a logging mechanism for calculations that lose created data or called instance. Implemented a printout when this happens and storing to DbLog. To be reviewed for further improvements.
@lekah
Copy link
Contributor Author

lekah commented Jan 31, 2018

Commit c0da5c4 addresses the first set of remarks by @giovannipizzi . There is a query to find all calculations that lose created data, and a query for calculations that lose called instances.
prompts to stdout inform the user, and in case of deletion, information is written to DbLog.
No check-flag yet, this will follow asap.

@szoupanos
Copy link
Contributor

I had more time today and I looked at it more carefully.
What I still miss is which are the available links that we have now (this is more or less "easy" to find), and how these are created (apart from the the obvious: CREATE, RETURN, INPUT etc). So I need a bit more input to closely follow the checks at aiida/utils/delete_nodes.py and if they are sufficient enough.

Do we have a draft on the existing link types?

@lekah
Copy link
Contributor Author

lekah commented Feb 1, 2018

Current link types are enumerated (literally) here: http://aiida-core.readthedocs.io/en/latest/_modules/aiida/common/links.html
For how these are created, check for example the Node._add_dblink_from method and everything that calls it, if you're interested in higher-level implementation. @sphuber , is there a draft on currently existing link types?

@sphuber
Copy link
Contributor

sphuber commented Feb 1, 2018

If with 'draft' you are referencing to the document that I started writing and presented at some point, that is all still purely hypothetical and none of it has been implemented. You already pointed to the link types that currently exist in v0.11.0. As to how they are used, the best thing would be to look at the discussion in issue #687 where we defined the migration rules to retroactively write the link types for older databases. These should reflect the rules that the code currently uses to create new links.

…ations that would lose created data or called instance will get this action written into the log. Additional warnings are printed, but it is still allowed to delete data without its creator or called workflows without their callers, since there are usecases
@lekah
Copy link
Contributor Author

lekah commented Feb 1, 2018

71cb8df addresses the comments and suggestions by @giovannipizzi . Calculations will have it written to the log if created or called instances are deleted. There is a flag to disable this checks, but it cannot be enabled via command-line.

Copy link
Member

@giovannipizzi giovannipizzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to approve this. The documentation on the link types is indeed missing but it is a different issue. When the documentation is ready, we can check if we need to add a few more checks (the only one coming to my mind is a warning for 'RETURN'ed stuff similarly to CREATEd. The rest should be ok as we automatically delete all children via INPUT or CREATE, at least in this version.

@giovannipizzi
Copy link
Member

I don't merge in case someone wants to give some final comment.

@szoupanos
Copy link
Contributor

Yes, I agree. I also had a short chat with Leo and I also agree with what @giovannipizzi said.
We can add more checks (if we discover that we may want them - something is missing) in the future,

@nmounet nmounet merged commit 9ea1a5e into aiidateam:develop Feb 5, 2018
This was referenced Mar 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants