Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REST API: POST endpoint for QueryBuilder queryhelp JSON payloads #4337

Merged

Conversation

CasperWA
Copy link
Contributor

@CasperWA CasperWA commented Aug 28, 2020

Closes #3646

This is the first pass at implementing a /query-endpoint with an HTTP POST functionality to pass a QueryBuilder queryhelp JSON object and receive the results back in the "standard" AiiDA REST API data format.

A few things differ for the returned "standard" data format:

  • All entities will have the full_type key. (See comment below).
    This is a small price to pay (in my opinion) in order to get the value for this for custom plugin Nodes when relevant (the NodeTranslator is used for this endpoint to be as general as possible).
  • Links will not have the link_type and link_label keys, but these will still be present as the regular type and label keys, respectively, which is returned normally from QueryBuilder.
    We may consider to also remove these extra keys from the regular output in other endpoints with links, but that's for another time.

So far, I have tested this with a variety of Nodes, trying to get incoming and outgoing, projecting various and differing things. The latter has led to the code's current state that ensure that what is returned is equivalent to what the QueryBuilder would normally return.
I have also tested retrieving different entities, e.g., the User related to a Node. This also works fine.

For now, I have implemented it such that the payload (the posted queryhelp JSON) will be taken at face value and passed on. This is done for simplicity and because I expect most will use this by setting up a QueryBuilder instance and passing the instance's queryhelp property as a payload. However, manually written queryhelp dictionaries dumped as JSON should work as well, as long as it works for the QueryBuilder...
One could create a parser, re-defining the queryhelp keys, however, I don't see the point of this right now.

Currently missing in this PR:

  • From @giovannipizzi (see comment below): Allow a config file to enable/disable this endpoint (e.g., we would disable it on MC by default).
  • Tests.
  • Account for manually written queryhelp dictionaries that work with QueryBuilder, but may fail in the POST functionality. (This may include custom projection values.)
    Somewhat fixed, specifically in terms of the project keyword (using wilcard (*) or not including the keyword).
  • Determine whether a custom Translator class should be created for this endpoint, or if it's overkill. This will, e.g., make it able to accommodate the differences in the returned "standard" data format mentioned above, but also adds a lot of extra code, which may obscure the functionality (or make it more transparent, depending on the code-reader).
    Edit: This does not seem to be necessary. It can still be done in the future, but using NodeTranslator or similar is fine.
  • Possibly add entry in documentation about this feature.

@CasperWA CasperWA added the pr/work-in-progress PR that is still work in progress but already needs discussion label Aug 28, 2020
@codecov
Copy link

codecov bot commented Aug 28, 2020

Codecov Report

Merging #4337 (4c9d44a) into develop (2c5293d) will increase coverage by 0.06%.
The diff coverage is 94.74%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #4337      +/-   ##
===========================================
+ Coverage    79.46%   79.52%   +0.06%     
===========================================
  Files          484      484              
  Lines        35775    35820      +45     
===========================================
+ Hits         28426    28481      +55     
+ Misses        7349     7339      -10     
Flag Coverage Δ
django 73.76% <94.74%> (+0.07%) ⬆️
sqlalchemy 72.95% <94.74%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiida/restapi/common/config.py 100.00% <ø> (ø)
aiida/restapi/common/identifiers.py 78.69% <0.00%> (ø)
aiida/restapi/resources.py 97.26% <95.66%> (-0.35%) ⬇️
aiida/cmdline/commands/cmd_restapi.py 100.00% <100.00%> (ø)
aiida/restapi/api.py 95.66% <100.00%> (+19.57%) ⬆️
aiida/restapi/run_api.py 90.91% <100.00%> (+0.22%) ⬆️
aiida/restapi/translator/nodes/node.py 85.44% <100.00%> (+0.52%) ⬆️
aiida/transports/plugins/local.py 81.54% <0.00%> (-0.25%) ⬇️
aiida/engine/daemon/client.py 75.65% <0.00%> (+1.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2c5293d...4c9d44a. Read the comment docs.

@giovannipizzi
Copy link
Member

Thanks! One quick comment - queries might become very expensive. Should we put in some safeguards to avoid users creating (inadvertently or on purpose) DOS attacks?
A few options:

  • allow in a config file to enable/disable this endpoint (e.g. we would disable it on MC by default)
  • put by default a limit (e.g. 1000?) to any submitted query? (might be unexpected, though, and not enough to avoid high CPU/disk/mem load for long times)
  • other ideas? e.g. is it possible to ask Postgres to stop running a query if it takes longer than XXX seconds?

@CasperWA
Copy link
Contributor Author

Thanks! One quick comment - queries might become very expensive. Should we put in some safeguards to avoid users creating (inadvertently or on purpose) DOS attacks?

Very interesting! And good catch!

A few options:

  • allow in a config file to enable/disable this endpoint (e.g. we would disable it on MC by default)

Easily done, and probably a good idea to include in combination with some actual timout (or other) handling when it's enabled.

  • put by default a limit (e.g. 1000?) to any submitted query? (might be unexpected, though, and not enough to avoid high CPU/disk/mem load for long times)

Do you mean for number of returned results?
If yes, then I definitely think something similar to pagination is needed here, either simply reusing the current pagination implementation in some way, or return a streaming response, where one can retrieve batches of the result.

  • other ideas? e.g. is it possible to ask Postgres to stop running a query if it takes longer than XXX seconds?

I have looked quickly into this, it seems there indeed is a way to set the PostgreSQL timeout time for handling a query, see statement_timeout in this stackexchange post.

Copy link
Member

@ltalirz ltalirz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CasperWA - I think this is potentially extremely useful, and so I would encourage you to go further in this direction, e.g. by adding some basic tests (no need to go overboard, just a few is fine).

aiida/restapi/api.py Outdated Show resolved Hide resolved
aiida/restapi/resources.py Show resolved Hide resolved
aiida/restapi/resources.py Outdated Show resolved Hide resolved
aiida/restapi/resources.py Outdated Show resolved Hide resolved
aiida/restapi/resources.py Outdated Show resolved Hide resolved
aiida/restapi/resources.py Outdated Show resolved Hide resolved
aiida/restapi/resources.py Outdated Show resolved Hide resolved
@ltalirz
Copy link
Member

ltalirz commented Nov 27, 2020

@CasperWA I've just met with @flavianojs who is working on the REST API for the aiida-ginestra plugin.
This generic querybuilder endpoint might allow him to replace (or simplify the implementation of) some of the specific query-endpoints he added for this plugin, so he would be happy to help here, e.g. with review or e.g. adding some more tests if needed.

Please let him know what you think would be the best way forward to get this PR merged.

@CasperWA CasperWA force-pushed the close_3646_json-queryhelp-rest-posts branch from 7a60fce to f7471fd Compare December 10, 2020 13:55
@CasperWA
Copy link
Contributor Author

CasperWA commented Dec 10, 2020

Update

  • I have turned the data value in the response into a dictionary/JSON object, where each key is the QueryBuilder tag, and the value (as usual) is a list of dictionaries/JSON objects with the desired projections.
  • Furthermore, I have forcefully removed all full_type "projections" from any entity (also Nodes). This is due to the realization that ensuring full_type is correct when node_type is explicitly not included as a projection, is quite substantial - at least in a general way. I could "hack" the query done for the /querybuilder endpoint to always include node_type and process_type, and remove them later, but this will only solve it for that particular endpoint and not for the REST API in general. Getting that information in general is much more difficult due to the way the REST API is designed.
    Hence, my solution was simply to remove the key. This also solves the issue that it was included for non-Node entities, where it is not valid (e.g., for a User).

@chrisjsewell
Copy link
Member

chrisjsewell commented Dec 10, 2020

Hmm, I'm not adverse to the solution here,
but I do think think the more "full/standardised" solution for this would be to "finalise",
@dev-zero's https://github.com/dev-zero/aiida-graphql implementation.

What are your thoughts on your implementation vs the graphql one?

@CasperWA
Copy link
Contributor Author

CasperWA commented Dec 10, 2020

Hmm, I'm not adverse to the solution here,
but I do think think the more "full/standardised" solution for this would be to "finalise",
@dev-zero's https://github.com/dev-zero/aiida-graphql implementation.

What are your thoughts on your implementation vs the graphql one?

The GraphQL one introduces a different kind of query language and query methodology.
This implementation utilizes the "QueryBuilder language" for querying, but the same method/protocol as for GraphQL (the only similarity).
They are two very distinct implementations, but may result in similar results.

I have already noted the connection to the GraphQL implementation in the related issue #3646.

In summary, I don't see any issue with having both implementations, especially since this is a natural extension of the existing REST API to utilize the existing Python functionality directly.

@CasperWA CasperWA marked this pull request as ready for review December 10, 2020 16:58
@dev-zero
Copy link
Contributor

dev-zero commented Dec 10, 2020

Sorry for not participating much in the discussion before.

Wrt to general interfaces like this one I think one argument against it in REST was always that it breaks the possibility to have (an effective) intermediate cache of the queries: most REST queries could potentially be cached on the document level via a reverse proxy, with such a general interface you always have to hit the DB (or an object cache).
The other point I came across is about maintenance: with a REST API you can easily get a statistics which endpoint gets hit and what impact your changes to it will have and whether or not you have to implement compat layers if your data organization in the backend changes (even things like implementing ACL could break it).
With a general interface likes this one the query language and the database itself becomes the API, limiting you severely when it comes to reorganizations in the backend.

A GraphQL-approach shares some of the drawbacks when it comes to cacheability (see link below), but you still maintain an intermediate layer between the internal data representation and the query (which also has its downside of course). Due to its generic nature one can employ a GraphQL-specific caching mechanism for the objects (which is again a caching on the returned data-representation rather than on the source objects).
Furthermore, since GraphQL became something of an industry standard there is a lot of tooling around which helps when building frontends to it (which may also have disadvantages since it may force you to update your schema to stay compliant).

If this is about developing clients which might access the objects/data directly, I am slowly turning towards classical server-side generated sites again. The main reason being the latency and the browsers capability of progressive rendering: a new client will hit you first for the webapplication, which once it has loaded has to hit the server again for data = 2 roundtrips plus application startup time, while the server side generated webapp works in 1. Authentication makes it even more complex. Unless of course you need really live data streams, for which GraphQL has again a solution, question would only be how to wire this up to the backend data model again.

Now, many of those points apply to general purpose designs, while AiiDA operates in a rather narrow and specialized field, in which some of the points may be weighted differently.

@chrisjsewell
Copy link
Member

chrisjsewell commented Dec 10, 2020

thanks @dev-zero; what he said ☝️ 😆
again I'm not adverse to the solution in this PR, and yes this does not exclude parallel development of a graphql solution.
But in practice, if this is in aiida-core then there will be absolutely no motivation to develop/maintain/use a graphql solution.
So we should just check that we are not going to come across any issues that graphql has already solved and basically end up "re-inventing it"

@CasperWA
Copy link
Contributor Author

CasperWA commented Dec 10, 2020

But in practice, if this is in aiida-core then there will be absolutely no motivation to develop/maintain/use a graphql solution.

I don't believe that this is true. Even in the response by @dev-zero there are clear differences and use cases for the two approaches. Furthermore, the aiida-graphql package is already live, although not up-to-date with the latest AiiDA version, I believe. So the development has already mainly been done, although it might lack some flesh here and there.
In the end, if a use case arises where GraphQL is the way to go, it's all about picking up the aiida-graphql package again, brushing it off and using it.

So we should just check that we are not going to come across any issues that graphql has already solved and basically end up "re-inventing it"

This is not a case of implementing a functionality similar to GraphQL, however it is indeed inspired by it. Specifically @dev-zero's work with the aiida-graphql package presented at an AiiDA retreat.
This is more a case of making the REST API more dynamic in its use, while considering the specific use case of @flavianojs and the INTERSECT project, where the aiida-post package was developed for a Java frontend to use AiiDA in a GUI. This implementation should serve as the most basic functionality to accommodate this use case, while also serving as a test-case to see whether it makes sense at all to have this functionality in aiida-core for the REST API.
At least that's how I see it.

And thanks @dev-zero for sharing your comments here as well 👍 The caching issue is something I have considered, but not more than a second, since this in my eyes is a first "test" implementation, to see whether this functionality is useful or not. One can always create a plugin and extend the REST API with similar functionality if needed, but starting to implement it in aiida-core should allow for some closer integration and optimization (at a later point) if necessary.

@chrisjsewell
Copy link
Member

chrisjsewell commented Dec 10, 2020

Even in the response by @dev-zero there are clear differences and use cases for the two approaches.

Happy for @dev-zero to set me straight, but to my eyes this is exactly the same use case as GraphQL: Querying a database via a web-orientated API.

But anyhow, I've said my piece and however we do it I think this will certainly be very useful 👍

@CasperWA
Copy link
Contributor Author

CasperWA commented Dec 10, 2020

Happy for @dev-zero to set me straight, but to my eyes this is exactly the same use case as GraphQL: Querying a database via a web-orientated API.

I would consider that the same as saying Windows and OS X is exactly the same due to the same use case of creating a graphical operating system... 😅

@chrisjsewell
Copy link
Member

chrisjsewell commented Dec 10, 2020

I would consider that the same as saying Windows and OS X is exactly the same due to the same use case of creating a graphical operating system... 😅

Are you trying to make my point for me 😝 because yes I would say they're pretty much the same these days; sure I can use VS Code, Firefox and Zoom on both, which takes care of ~90% of all I do 😂

@CasperWA
Copy link
Contributor Author

I would consider that the same as saying Windows and OS X is exactly the same due to the same use case of creating a graphical operating system... 😅

Are you trying to make my point for me 😝 because yes I would say they're pretty much the same these days; sure I can use VS Code, Firefox and Zoom on both, which takes care of ~90% of all I do 😂

Hehe, fair enough 😅. In an attempt to be painfully clear though, my point is more that the specific use case will change things substantially. In the most current use case a Java GUI application already exists that uses the aiida-post package, i.e., POST requests to the REST API with standard AiiDA REST API responses in order to function. To make this work this is what's needed. If they want to reconfigure to use GraphQL or another use case should arise with a more streaming-like functionality requirement, I think GraphQL might be the better choice. So they can co-exist - just as Windows is always seen running virtually on a Mac (not really like that, but you know)... co-existing 😅

@CasperWA
Copy link
Contributor Author

CasperWA commented Dec 10, 2020

Further update:

  • A posting configuration option has been added when generating the API, which will make endpoints available that allows HTTP POST requests (currently only /querybuilder).
    This can be toggled either as a parameter to the aiida.restapi.run_api:configure_api() function or as a CLI option for verdi restapi (here it's a togglable flag --posting/--no-posting).
    Default: True, i.e., include /querybuilder.

@chrisjsewell
Copy link
Member

i.e., POST requests to the REST API with standard AiiDA REST API responses in order to function. To make this work this is what's needed.

Final, final note; this is exactly how you can use graphql: you can literally just add it as an endpoint to Flask: https://strawberry.rocks/docs/flask

@CasperWA
Copy link
Contributor Author

i.e., POST requests to the REST API with standard AiiDA REST API responses in order to function. To make this work this is what's needed.

Final, final note; this is exactly how you can use graphql: you can literally just add it as an endpoint to Flask: https://strawberry.rocks/docs/flask

I know. I believe this is the approach used in the aiida-graphql package by @dev-zero.

This was not a matter of implementation but of use case and query language.

@CasperWA CasperWA added pr/ready-for-review PR is ready to be reviewed and removed pr/work-in-progress PR that is still work in progress but already needs discussion labels Jan 25, 2021
@CasperWA CasperWA requested a review from ltalirz January 25, 2021 15:15
ltalirz
ltalirz previously approved these changes Jan 25, 2021
Copy link
Member

@ltalirz ltalirz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @CasperWA - just one more question regarding the full type

aiida/restapi/resources.py Show resolved Hide resolved
node_entry['full_type'] = (
construct_full_type(node_entry.get('node_type'), node_entry.get('process_type'))
if node_entry.get('node_type') or node_entry.get('process_type') else None
)
Copy link
Member

@ltalirz ltalirz Jan 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, so here is where the full_type is set?
Just for me to understand: This is the NodeTranslator class, but for some reason it also seems to be used to translate objects that are not nodes - is that the source of the problem?

And, finally, I guess you tried before instead of setting full_type to None to simply not set it here, but then tests break?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so this is what I tried to explain in previous comment answers.
The only way to get all the various REST API-specific information in the response (and make it the most similar to other REST API responses), I need to use NodeTranslator as the translator class for the /querybuilder-endpoint.
This is because the NodeTranslator is special, adding full_type, which is not a property that otherwise exists in AiiDA. It's a quirk of the REST API.
As far as I know, this is the only addition to an AiiDA entity, and as such, if I use NodeTranslator for all entities, I make sure I also include the special REST API properties. The need for the subsequent removal of full_type should now be clear - it's not a property of any other AiiDA entity than Nodes. And even then, to define full_type both node_type and process_type are needed. If these properties are not requested in the POSTed queryhelp, they will not be available.
I could here ensure that they're always requested, but that would demand a lot of logic to go through the POSTed queryhelp. Something I didn't feel was necessary for this PR at this point. So instead I've opted for the current solution.

It's worth noting that the construct_full_type utility function actually had a bug. It used process_type twice (for both node_type and process_type parts of full_type). This PR fixes that bug, as well as remove the necessity for try/except in the highlighted code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, finally, I guess you tried before instead of setting full_type to None to simply not set it here, but then tests break?

It was more of a conscious choice to make the response as AiiDA REST API-like as possible. Including all the extra properties expected from any other AiiDA REST API response for the various entities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue here relates to the fact that the REST API was not built to return multiple types of entities (this has been mentioned in issue #4676 as well). So I need a "catch-em-all"/"works-for-all"-translator :)

@CasperWA CasperWA force-pushed the close_3646_json-queryhelp-rest-posts branch 2 times, most recently from 31c40dd to 428e5aa Compare January 25, 2021 16:37
@ltalirz
Copy link
Member

ltalirz commented Jan 25, 2021

docs still failing?

@CasperWA
Copy link
Contributor Author

docs still failing?

Yeah. I didn't touch any docs, so I'm confused. Also the pre-commit is failing due to files that I didn't touch either.

@CasperWA CasperWA force-pushed the close_3646_json-queryhelp-rest-posts branch from 428e5aa to 74441c9 Compare January 25, 2021 17:00
@CasperWA
Copy link
Contributor Author

Alright @ltalirz, I've completed splitting this up in a first commit that fixes some minor things in the REST API code base:

  • Use node_type in the construct_full_type() function.
  • Remove a couple of try/except statements in favor of newer Python syntax.
  • Properly use API_CONFIG in the configure_api() function.
    This one imported a user specified API_CONFIG, but then didn't use it to instantiate the Flask API.

The second commit concerns the purpose of this PR specifically.

@sphuber
Copy link
Contributor

sphuber commented Jan 25, 2021

docs still failing?

Yeah. I didn't touch any docs, so I'm confused. Also the pre-commit is failing due to files that I didn't touch either.

Both are due to release of plumpy==0.18.4. PR #4669 should fix it when merged

@chrisjsewell
Copy link
Member

Both are due to release of plumpy==0.18.4. PR #4669 should fix it when merged

indeed, @sphuber were you intending to have another look at that, because I'm happy to merge if others are?

@CasperWA
Copy link
Contributor Author

I'll just merge this with rebase+merge then, is that fine @sphuber, @ltalirz, @chrisjsewell? Or do you want to wait for #4669 and/or do a different kind of merge?

@sphuber
Copy link
Contributor

sphuber commented Jan 25, 2021

Both are due to release of plumpy==0.18.4. PR #4669 should fix it when merged

indeed, @sphuber were you intending to have another look at that, because I'm happy to merge if others are?

I am giving it a final pass now

I'll just merge this with rebase+merge then, is that fine @sphuber, @ltalirz, @chrisjsewell? Or do you want to wait for #4669 and/or do a different kind of merge?

Please wait until #4669 is merged and then rebase. We still want the tests to pass before merging

@chrisjsewell
Copy link
Member

Please wait until #4669 is merged and then rebase. We still want the tests to pass before merging

indeed, tests should always pass before merging; cough @ramirezfranciscof cough #4675 😉

@sphuber
Copy link
Contributor

sphuber commented Jan 25, 2021

Alright, the other PR is merged. You can rebase this and the tests should pass. Unless there were other failures of course.

@CasperWA CasperWA force-pushed the close_3646_json-queryhelp-rest-posts branch from 74441c9 to f95df2c Compare January 26, 2021 08:37
@CasperWA
Copy link
Contributor Author

CasperWA commented Jan 26, 2021

Right. So the pre-commit is still failling due to files I didn't touch. @sphuber, @chrisjsewell ?
I can add a commit that fixes this?

@chrisjsewell
Copy link
Member

I can add a commit that fixes this?

yeh its a known issue that has cropped up before. I thought it had been fixed, but obviously not

@CasperWA CasperWA force-pushed the close_3646_json-queryhelp-rest-posts branch from 65046e1 to fda9197 Compare January 26, 2021 09:45
- Use node_type in construct_full_type().
- Don't use try/except for determining full_type.
- Remove unnecessary try/except in App for catch_internal_server.
- Use proper API_CONFIG for configure_api.
The POST endpoint returns what the QueryBuilder would return, when
providing it with a proper queryhelp dictionary.
Furthermore, it returns the entities/results in the "standard" REST API
format - with the exception of `link_type` and `link_label` keys for
links. However, these particular keys are still present as `type` and
`label`, respectively.

The special Node property `full_type` will be removed from any entity,
if its value is `None`. There are two cases where this will be True:
- If the entity is not a `Node`; and
- If neither `node_type` or `process_type` are among the projected
properties for any given `Node`.

Concerning security:
The /querybuilder-endpoint can be toggled on/off with the configuration
parameter `CLI_DEFAULTS['POSTING']`.
Added this to `verdi restapi` as `--posting/--no-posting` option.
The option is hidden by default, as the naming may be changed in the
future.

Reviewed by @ltalirz.
@CasperWA CasperWA force-pushed the close_3646_json-queryhelp-rest-posts branch from fe7c387 to 4c9d44a Compare January 26, 2021 10:54
@CasperWA
Copy link
Contributor Author

@chrisjsewell / @sphuber feel free to review the PR based on the latest commit 4c9d44a.
@ltalirz has already approved the first two commits above and there have been no content changes in them since.

@CasperWA CasperWA merged commit 4c9d44a into aiidateam:develop Jan 26, 2021
@CasperWA CasperWA deleted the close_3646_json-queryhelp-rest-posts branch January 26, 2021 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/ready-for-review PR is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JSON queryhelp injections to REST API
7 participants