Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/add scope filter #14

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Feature/add scope filter #14

wants to merge 10 commits into from

Conversation

daniellrgn
Copy link
Contributor

@daniellrgn daniellrgn commented Apr 3, 2024

Limit user access to requests for their patient resource and resources that reference their patient resource, based on the id found in their token.

Limit user access to requests for their patient resource and resources that reference their patient resource, based on the id found in their token.

from jwt_proxy.audit import audit_HAPI_change

blueprint = Blueprint('auth', __name__)
SUPPORTED_METHODS = ('GET', 'POST', 'PUT', 'DELETE', 'OPTIONS')

# TODO: to be pulled into its own module and loaded per config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i love the TODO. makes sense to me to move this to a separate module - by that i mean a separate .py file for easier migration later.

return False

user_id = token.get("sub")
identifier_pattern = rf"(https(:|%3[Aa])(\/|%2[Ff]){2}keycloak\.ltt\.cirg\.uw\.edu(%7[Cc]|\|))?{user_id}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is rather hardcoded to one keycloak install. it would be nice to look to a config value, similar to UPSTREAM_SERVER.

any reason to use re for URL parsing, vs. say urlparse from urllib.parse ?

Copy link
Contributor Author

@daniellrgn daniellrgn Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Paul, good points -

This pattern just matches identifier-like query param values like https://keycloak.ltt.cirg.uw.edu|keycloak-user-id
The code system https://keycloak.ltt.cirg.uw.edu is a dummy system url that cPRO adds to the LTT patient resources to identify keycloak user ids, and would certainly better live in config to support other codings/identifiers/systems.

I wasn't aware of urlparse, so thanks for the recommendation! I can't tell if it's quite equivalent to the use of this pattern though: it seems closer to a replacement for Flask's req.args call in the next line that parses the query params from the url, but if there's a better way to validate the content of a specific query param value there I'm all ears.

All that being said, I'll clean up this pattern - I'd hedged it a bit since I didn't know exactly what was url-encoded and what wasn't, but I've found that the query param values are always url-decoded by the req.args call here, and the later check against the identifier params in a POSTed resource's references will always come through url-encoded.

jwt_proxy/api.py Outdated
user_info=decoded_token.get("email") or decoded_token.get("preferred_username"),
)
return response_content
return jsonify(message="invalid request"), 400
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this implies a lack of authorization, a 401 is a more accurate code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that 400 generally doesn't fit. I'd advocate to keep the 400 response on parsing errors like get_json failing though.
Would a 403 in response be an even better semantic fit, as by this point the user is authenticated but unauthorized to make the request, and reauthenticating will not help?

Copy link
Contributor Author

@daniellrgn daniellrgn Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disregard that first bit, I didn't realize werkzeug's BadRequest errors automatically convert to 400 when left uncaught. They sure do think these things through 😛

jwt_proxy/api.py Outdated
if id_param_value is not None and re.search(identifier_pattern, id_param_value):
return True
# Search body for keycloak id
if req.is_json:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the request.json property is a nice shortcut. will be None if not included or wrong content-type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The Flask docs recommended preferring get_json(), though it looks like I don't need this check as get_json() includes it implicitly.

jwt_proxy/api.py Outdated
return response_content

# TODO: call new function here to dynamically load a filter call dependent on config; hardwired for now
if scope_filter(request, decoded_token):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably best to exit early (with the return jsonify... as you have). i fear we'll get a deep nest of it clauses when additional filters come online.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Paul! was going to mention something similar; sometimes called a "guard clause"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thanks both!

# TODO: to be pulled into its own module and loaded per config
def scope_filter(req, token):
# Check path
resource_pattern = rf"(Patient|DocumentReference)$"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear this approach leaves a backdoor to well tailored requests. A user could include a string like "Patient" in say an or-clause of a search, or the like.

Any reason to not just limit to the requested resource in the request path? i.e. {url}/Patient or {url}/DocumentReference for this use case?

Copy link
Contributor Author

@daniellrgn daniellrgn Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not just limit to the requested resource in the request path? i.e. {url}/Patient or {url}/DocumentReference for this use case?

That's the intention of this pattern, to enforce the request path (sans params) ends in either Patient or DocumentReference (it's checked against req.path, which will typically be fhir/DocumentReference etc.). I preferred containing this check within the function to defining a url rule or something, but let me know if there's a better/more pythonic way!

Copy link
Member

@ivan-c ivan-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! There's more work remaining, so your TODOs/caveats are much appreciated when things settle down a bit

jwt_proxy/api.py Outdated
return response_content

# TODO: call new function here to dynamically load a filter call dependent on config; hardwired for now
if scope_filter(request, decoded_token):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Paul! was going to mention something similar; sometimes called a "guard clause"

Guard clause prior to proxy
Response code change to 403 on failure
json handling improvement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants