-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/add scope filter #14
base: main
Are you sure you want to change the base?
Conversation
Limit user access to requests for their patient resource and resources that reference their patient resource, based on the id found in their token.
|
||
from jwt_proxy.audit import audit_HAPI_change | ||
|
||
blueprint = Blueprint('auth', __name__) | ||
SUPPORTED_METHODS = ('GET', 'POST', 'PUT', 'DELETE', 'OPTIONS') | ||
|
||
# TODO: to be pulled into its own module and loaded per config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i love the TODO. makes sense to me to move this to a separate module - by that i mean a separate .py file for easier migration later.
return False | ||
|
||
user_id = token.get("sub") | ||
identifier_pattern = rf"(https(:|%3[Aa])(\/|%2[Ff]){2}keycloak\.ltt\.cirg\.uw\.edu(%7[Cc]|\|))?{user_id}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is rather hardcoded to one keycloak install. it would be nice to look to a config value, similar to UPSTREAM_SERVER
.
any reason to use re
for URL parsing, vs. say urlparse
from urllib.parse
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Paul, good points -
This pattern just matches identifier
-like query param values like https://keycloak.ltt.cirg.uw.edu|keycloak-user-id
The code system https://keycloak.ltt.cirg.uw.edu
is a dummy system url that cPRO adds to the LTT patient resources to identify keycloak user ids, and would certainly better live in config to support other codings/identifiers/systems.
I wasn't aware of urlparse
, so thanks for the recommendation! I can't tell if it's quite equivalent to the use of this pattern though: it seems closer to a replacement for Flask's req.args
call in the next line that parses the query params from the url, but if there's a better way to validate the content of a specific query param value there I'm all ears.
All that being said, I'll clean up this pattern - I'd hedged it a bit since I didn't know exactly what was url-encoded and what wasn't, but I've found that the query param values are always url-decoded by the req.args
call here, and the later check against the identifier params in a POSTed resource's references will always come through url-encoded.
jwt_proxy/api.py
Outdated
user_info=decoded_token.get("email") or decoded_token.get("preferred_username"), | ||
) | ||
return response_content | ||
return jsonify(message="invalid request"), 400 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as this implies a lack of authorization, a 401 is a more accurate code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that 400 generally doesn't fit. I'd advocate to keep the 400 response on parsing errors like get_json failing though.
Would a 403 in response be an even better semantic fit, as by this point the user is authenticated but unauthorized to make the request, and reauthenticating will not help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disregard that first bit, I didn't realize werkzeug's BadRequest errors automatically convert to 400 when left uncaught. They sure do think these things through 😛
jwt_proxy/api.py
Outdated
if id_param_value is not None and re.search(identifier_pattern, id_param_value): | ||
return True | ||
# Search body for keycloak id | ||
if req.is_json: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the request.json
property is a nice shortcut. will be None
if not included or wrong content-type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The Flask docs recommended preferring get_json()
, though it looks like I don't need this check as get_json() includes it implicitly.
jwt_proxy/api.py
Outdated
return response_content | ||
|
||
# TODO: call new function here to dynamically load a filter call dependent on config; hardwired for now | ||
if scope_filter(request, decoded_token): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably best to exit early (with the return jsonify...
as you have). i fear we'll get a deep nest of it clauses when additional filters come online.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks Paul! was going to mention something similar; sometimes called a "guard clause"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, thanks both!
# TODO: to be pulled into its own module and loaded per config | ||
def scope_filter(req, token): | ||
# Check path | ||
resource_pattern = rf"(Patient|DocumentReference)$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fear this approach leaves a backdoor to well tailored requests. A user could include a string like "Patient" in say an or-clause of a search, or the like.
Any reason to not just limit to the requested resource in the request path? i.e. {url}/Patient
or {url}/DocumentReference
for this use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to not just limit to the requested resource in the request path? i.e. {url}/Patient or {url}/DocumentReference for this use case?
That's the intention of this pattern, to enforce the request path (sans params) ends in either Patient
or DocumentReference
(it's checked against req.path
, which will typically be fhir/DocumentReference
etc.). I preferred containing this check within the function to defining a url rule or something, but let me know if there's a better/more pythonic way!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks! There's more work remaining, so your TODOs/caveats are much appreciated when things settle down a bit
jwt_proxy/api.py
Outdated
return response_content | ||
|
||
# TODO: call new function here to dynamically load a filter call dependent on config; hardwired for now | ||
if scope_filter(request, decoded_token): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks Paul! was going to mention something similar; sometimes called a "guard clause"
Guard clause prior to proxy Response code change to 403 on failure json handling improvement
Limit user access to requests for their patient resource and resources that reference their patient resource, based on the id found in their token.