-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upload to existing indexd record #1147
Changes from 12 commits
24f2950
f4bd84a
110ef82
b175896
60a932e
46f1f94
ae584c5
6641fb5
b79cc73
5cd0b7d
e749870
71ec4e7
391865b
c551947
e9d51be
60f257c
caabf4d
77d2ca7
c3f7490
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -208,6 +208,7 @@ def _get_client_id(): | |
|
||
return client_id | ||
|
||
|
||
def prepare_presigned_url_audit_log(protocol, indexed_file): | ||
""" | ||
Store in `flask.g.audit_data` the data needed to record an audit log. | ||
|
@@ -232,7 +233,12 @@ class BlankIndex(object): | |
""" | ||
|
||
def __init__( | ||
self, uploader=None, file_name=None, logger_=None, guid=None, authz=None | ||
self, | ||
uploader=None, | ||
file_name=None, | ||
logger_=None, | ||
guid=None, | ||
authz=None, | ||
): | ||
self.logger = logger_ or logger | ||
self.indexd = ( | ||
|
@@ -253,8 +259,10 @@ def __init__( | |
self.file_name = file_name | ||
self.authz = authz | ||
|
||
# if a guid is not provided, this will create a blank record for you | ||
self.guid = guid or self.index_document["did"] | ||
self.guid = guid | ||
self.guid = self.index_document[ | ||
"did" | ||
] # .index_document is a cached property with code below, it creates/retrieves the actual record and this line updates the stored GUID to the returned record | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Python comments should be on the line above, try to avoid in-line unless required by static code checking exclusions etc |
||
|
||
@cached_property | ||
def index_document(self): | ||
|
@@ -266,6 +274,20 @@ def index_document(self): | |
response from indexd (the contents of the record), containing ``guid`` | ||
and ``url`` | ||
""" | ||
|
||
if self.guid: | ||
index_url = self.indexd.rstrip("/") + "/index/" + self.guid | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We may need to percent-encode the guid b/c it's user input, so I'm wondering if there's some security concerns here (like what if I pass in slashes, |
||
indexd_response = requests.get(index_url) | ||
if indexd_response.status_code == 200: | ||
document = indexd_response.json() | ||
self.logger.info(f"Record with {self.guid} id found in Indexd.") | ||
return document | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not a huge fan of early returns in a longer function like this since at a glance it's unclear where things end. Maybe just breaking some of this into a helper function(s) would help. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great updates, looks much cleaner now, thanks! |
||
else: | ||
raise NotFound(f"No indexed document found with id {self.guid}") | ||
|
||
return self._create_blank_record() | ||
|
||
def _create_blank_record(self): | ||
index_url = self.indexd.rstrip("/") + "/index/blank/" | ||
params = {"uploader": self.uploader, "file_name": self.file_name} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -565,6 +565,8 @@ paths: | |
Previous authorization check requires a more general, global upload permission: | ||
"file_upload" on "/data_file" resource. When "authz" is *not* provided, this | ||
endpoint will check for that permission for your user. | ||
|
||
Accepts a "guid" field in the request body. If "guid" is provided, it checks indexd for an existing record. If not found, a new blank record will be created. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It doesn't seem like this is the behavior anymore? If it's not found, it raises a 404 |
||
security: | ||
- OAuth2: | ||
- user | ||
|
@@ -666,7 +668,10 @@ paths: | |
flow, Fence needs to provide a list of endpoints for supporting multipart upload presigned url | ||
This is the first step on the API side for the multipart upload presigned url. This endpoint | ||
causes fence to make a request to indexd to create a new, blank index record, and returns | ||
the GUID for this new record and an uploadId for multipart upload presigned url | ||
the GUID for this new record and an uploadId for multipart upload presigned url. | ||
|
||
|
||
Accepts a "guid" field in the request body. If "guid" is provided, it checks indexd for an existing record. If not found, a new blank record will be created. | ||
security: | ||
- OAuth2: | ||
- user | ||
|
@@ -1686,6 +1691,10 @@ components: | |
description: >- | ||
the requested bucket to upload to. | ||
If not provided, defaults to the configured DATA_UPLOAD_BUCKET. | ||
guid: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In addition to this new param, we also need to update the example responses to include the 404 that can now happen |
||
type: string | ||
required: false | ||
description: GUID to be assigned to the object | ||
expires_in: | ||
type: integer | ||
description: optional integer specifying the presigned URL lifetime | ||
|
@@ -1696,6 +1705,7 @@ components: | |
description: requested authorization resources to be set on the | ||
resulting indexed record. You must have proper authorization to set this | ||
example: | ||
guid: "123456abcd" | ||
file_name: "my_file.bam" | ||
bucket: "bucket1" | ||
expires_in: 1200 | ||
|
@@ -1708,6 +1718,10 @@ components: | |
type: string | ||
required: true | ||
description: the file name to use for this upload | ||
guid: | ||
type: string | ||
required: false | ||
description: GUID to be assigned to the object | ||
expires_in: | ||
type: integer | ||
description: optional integer specifying the presigned URL lifetime | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -819,6 +819,26 @@ def test_public_authz_object_upload_file( | |
Test `GET /data/upload/1` in which the `1` Indexd record has authz | ||
populated with the public value. | ||
""" | ||
did = str(uuid.uuid4()) | ||
index_document = { | ||
"did": did, | ||
"baseid": "", | ||
"rev": "", | ||
"size": 10, | ||
"file_name": "file1", | ||
"urls": ["s3://bucket1/key-{}".format(did[:8])], | ||
"acl": ["phs000789"], | ||
"hashes": {}, | ||
"metadata": {}, | ||
"form": "", | ||
"created_date": "", | ||
"updated_date": "", | ||
} | ||
mock_index_document = mock.patch( | ||
"fence.blueprints.data.indexd.BlankIndex.index_document", index_document | ||
) | ||
mock_index_document.start() | ||
|
||
indexd_client_accepting_record(INDEXD_RECORD_WITH_PUBLIC_AUTHZ_POPULATED) | ||
mock_arborist_requests({"arborist/auth/request": {"POST": ({"auth": True}, 200)}}) | ||
headers = { | ||
|
@@ -837,6 +857,8 @@ def test_public_authz_object_upload_file( | |
assert response.status_code == 200 | ||
assert "url" in response.json | ||
|
||
mock_index_document.stop() | ||
|
||
|
||
def test_public_authz_and_acl_object_upload_file_with_failed_authz_check( | ||
client, | ||
|
@@ -885,6 +907,26 @@ def test_public_authz_and_acl_object_upload_file( | |
acl populated with public values. In this case, authz takes precedence over | ||
acl. | ||
""" | ||
did = str(uuid.uuid4()) | ||
index_document = { | ||
"did": did, | ||
"baseid": "", | ||
"rev": "", | ||
"size": 10, | ||
"file_name": "file1", | ||
"urls": ["s3://bucket1/key-{}".format(did[:8])], | ||
"acl": ["phs000789"], | ||
"hashes": {}, | ||
"metadata": {}, | ||
"form": "", | ||
"created_date": "", | ||
"updated_date": "", | ||
} | ||
mock_index_document = mock.patch( | ||
"fence.blueprints.data.indexd.BlankIndex.index_document", index_document | ||
) | ||
mock_index_document.start() | ||
|
||
indexd_client_accepting_record(INDEXD_RECORD_WITH_PUBLIC_AUTHZ_AND_ACL_POPULATED) | ||
mock_arborist_requests({"arborist/auth/request": {"POST": ({"auth": True}, 200)}}) | ||
headers = { | ||
|
@@ -903,6 +945,8 @@ def test_public_authz_and_acl_object_upload_file( | |
assert response.status_code == 200 | ||
assert "url" in response.json | ||
|
||
mock_index_document.stop() | ||
|
||
|
||
def test_non_public_authz_and_public_acl_object_upload_file( | ||
client, | ||
|
@@ -916,6 +960,26 @@ def test_non_public_authz_and_public_acl_object_upload_file( | |
Test that a user can successfully generate an upload url for an Indexd | ||
record with a non-public authz field and a public acl field. | ||
""" | ||
did = str(uuid.uuid4()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you need to modify these to ensure you capture these test cases:
I believe you're only capturing the second one right now |
||
index_document = { | ||
"did": did, | ||
"baseid": "", | ||
"rev": "", | ||
"size": 10, | ||
"file_name": "file1", | ||
"urls": ["s3://bucket1/key-{}".format(did[:8])], | ||
"acl": ["phs000789"], | ||
"hashes": {}, | ||
"metadata": {}, | ||
"form": "", | ||
"created_date": "", | ||
"updated_date": "", | ||
} | ||
mock_index_document = mock.patch( | ||
"fence.blueprints.data.indexd.BlankIndex.index_document", index_document | ||
) | ||
mock_index_document.start() | ||
|
||
indexd_record_with_non_public_authz_and_public_acl_populated = { | ||
"did": "1", | ||
"baseid": "", | ||
|
@@ -951,6 +1015,8 @@ def test_non_public_authz_and_public_acl_object_upload_file( | |
assert response.status_code == 200 | ||
assert "url" in response.json | ||
|
||
mock_index_document.stop() | ||
|
||
|
||
def test_anonymous_download_with_public_authz( | ||
client, | ||
|
@@ -1664,6 +1730,76 @@ def json(self): | |
assert "uploadId" in response.json | ||
|
||
|
||
@pytest.mark.parametrize("indexd_200_response", [True, False]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. these should be separate unit tests. One is testing when the GUID exists, one is testing when it doesn't. If you want to share some code, you can create helper functions, but I'd rather keep the positive/negative testing as separate unit tests |
||
def test_initialize_multipart_upload_with_guid_in_request( | ||
Avantol13 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
app, client, auth_client, encoded_creds_jwt, user_client, indexd_200_response | ||
): | ||
""" | ||
Test /data/multipart/init with guid parameter in request data | ||
""" | ||
|
||
class MockResponse(object): | ||
def __init__(self, data, status_code=200): | ||
self.data = data | ||
self.status_code = status_code | ||
|
||
def json(self): | ||
return self.data | ||
|
||
data_requests_mocker = mock.patch( | ||
"fence.blueprints.data.indexd.requests", new_callable=mock.Mock | ||
) | ||
arborist_requests_mocker = mock.patch( | ||
"gen3authz.client.arborist.client.httpx.Client.request", new_callable=mock.Mock | ||
) | ||
|
||
fence.blueprints.data.indexd.BlankIndex.init_multipart_upload = MagicMock() | ||
with data_requests_mocker as data_requests, arborist_requests_mocker as arborist_requests: | ||
did = str(uuid.uuid4()) | ||
if indexd_200_response: | ||
data_requests.get.return_value = MockResponse( | ||
{ | ||
"did": did, | ||
"baseid": "", | ||
"rev": "", | ||
"size": 10, | ||
"file_name": "file1", | ||
"urls": ["s3://bucket1/key"], | ||
"hashes": {}, | ||
"metadata": {}, | ||
"authz": ["/open"], | ||
"acl": ["*"], | ||
"form": "", | ||
"created_date": "", | ||
"updated_date": "", | ||
} | ||
) | ||
data_requests.get.return_value.status_code = 200 | ||
else: | ||
data_requests.get.return_value = MockResponse("no record found") | ||
data_requests.get.return_value.status_code = 404 | ||
arborist_requests.return_value = MockResponse({"auth": True}) | ||
arborist_requests.return_value.status_code = 200 | ||
fence.blueprints.data.indexd.BlankIndex.init_multipart_upload.return_value = ( | ||
"test_uploadId" | ||
) | ||
headers = { | ||
"Authorization": "Bearer " + encoded_creds_jwt.jwt, | ||
"Content-Type": "application/json", | ||
} | ||
file_name = "asdf" | ||
data = json.dumps({"file_name": file_name, "guid": did}) | ||
response = client.post("/data/multipart/init", headers=headers, data=data) | ||
|
||
if indexd_200_response: | ||
assert response.status_code == 201, response | ||
assert "guid" in response.json | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you also check that the content of the guid is the same value as the one passed into the endpoint? |
||
assert did == response.json.get("guid") | ||
assert "uploadId" in response.json | ||
else: | ||
assert response.status_code == 404, response | ||
|
||
|
||
def test_multipart_upload_presigned_url( | ||
app, client, auth_client, encoded_creds_jwt, user_client | ||
): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened if an invalid or non-existent GUID is specified? Should the behavior be to 400 or move forward creating a blank record? I think we need to consider things like this. If I supply a GUID to the public API, I probably don't want a blank record. So if I mistyped the GUID I'd actually expect an error, not for it to move forward and create a blank record anyway