Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/add scope filter #14

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
34 changes: 33 additions & 1 deletion jwt_proxy/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,39 @@
import jwt
import requests
import json
import re

from jwt_proxy.audit import audit_HAPI_change

blueprint = Blueprint('auth', __name__)
SUPPORTED_METHODS = ('GET', 'POST', 'PUT', 'DELETE', 'OPTIONS')

# TODO: to be pulled into its own module and loaded per config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i love the TODO. makes sense to me to move this to a separate module - by that i mean a separate .py file for easier migration later.

def scope_filter(req, token):
# Check path
resource_pattern = rf"(Patient|DocumentReference)$"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear this approach leaves a backdoor to well tailored requests. A user could include a string like "Patient" in say an or-clause of a search, or the like.

Any reason to not just limit to the requested resource in the request path? i.e. {url}/Patient or {url}/DocumentReference for this use case?

Copy link
Contributor Author

@daniellrgn daniellrgn Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not just limit to the requested resource in the request path? i.e. {url}/Patient or {url}/DocumentReference for this use case?

That's the intention of this pattern, to enforce the request path (sans params) ends in either Patient or DocumentReference (it's checked against req.path, which will typically be fhir/DocumentReference etc.). I preferred containing this check within the function to defining a url rule or something, but let me know if there's a better/more pythonic way!

if re.search(resource_pattern, req.path) is None:
return False

user_id = token.get("sub")
identifier_pattern = rf"(https(:|%3[Aa])(\/|%2[Ff]){2}keycloak\.ltt\.cirg\.uw\.edu(%7[Cc]|\|))?{user_id}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is rather hardcoded to one keycloak install. it would be nice to look to a config value, similar to UPSTREAM_SERVER.

any reason to use re for URL parsing, vs. say urlparse from urllib.parse ?

Copy link
Contributor Author

@daniellrgn daniellrgn Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Paul, good points -

This pattern just matches identifier-like query param values like https://keycloak.ltt.cirg.uw.edu|keycloak-user-id
The code system https://keycloak.ltt.cirg.uw.edu is a dummy system url that cPRO adds to the LTT patient resources to identify keycloak user ids, and would certainly better live in config to support other codings/identifiers/systems.

I wasn't aware of urlparse, so thanks for the recommendation! I can't tell if it's quite equivalent to the use of this pattern though: it seems closer to a replacement for Flask's req.args call in the next line that parses the query params from the url, but if there's a better way to validate the content of a specific query param value there I'm all ears.

All that being said, I'll clean up this pattern - I'd hedged it a bit since I didn't know exactly what was url-encoded and what wasn't, but I've found that the query param values are always url-decoded by the req.args call here, and the later check against the identifier params in a POSTed resource's references will always come through url-encoded.


# Search params for identifier-like param containing keycloak id
params = req.args
id_param_value = params.get("identifier", params.get("_identifier", params.get("subject.identifier")))
if id_param_value is not None and re.search(identifier_pattern, id_param_value):
return True

# Search body for subject.reference containing keycloak id
body = req.get_json() # Propegates a 400 BadRequest on failure
resource_type = body.get("resourceType")
if resource_type == "DocumentReference":
reference_string = body.get("subject", {}).get("reference")
if reference_string is not None and re.search(identifier_pattern, reference_string):
return True
return False



def proxy_request(req, upstream_url, user_info=None):
"""Forward request to given url"""
Expand Down Expand Up @@ -67,7 +94,12 @@ def validate_jwt(relative_path):
)
except jwt.exceptions.ExpiredSignatureError:
return jsonify(message="token expired"), 401


# TODO: call new function here to dynamically load a filter call dependent on config; hardwired for now
scope_filter_ok = scope_filter(request, decoded_token)
if not scope_filter_ok:
return jsonify(message="Forbidden"), 403

response_content = proxy_request(
req=request,
upstream_url=f"{current_app.config['UPSTREAM_SERVER']}/{relative_path}",
Expand Down