Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload to existing indexd record #1147

Merged
merged 19 commits into from
Aug 7, 2024
Merged

Upload to existing indexd record #1147

merged 19 commits into from
Aug 7, 2024

Conversation

BinamB
Copy link
Contributor

@BinamB BinamB commented May 21, 2024

Change reflected in gen3-client uc-cdis/cdis-data-client#131

Improvements

  • /upload endpoint accepts optional guid in request body. If the guid exists in indexd, it does not create a new record, else it creates a blank record with the provided guid.
  • /multipart/init endpoint accepts optional guid in request body. If the guid exists in indexd, it does not create a new record, else it creates a blank record with the provided guid.

@BinamB BinamB marked this pull request as ready for review May 23, 2024 18:44
@@ -134,6 +134,8 @@ def upload_data_file():
authz = params.get("authz")
uploader = None

guid = params.get("did")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer we leave did an implementation detail in indexd. Can we call this guid?

file_name=params["file_name"],
authz=params.get("authz"),
uploader=uploader,
guid=guid,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened if an invalid or non-existent GUID is specified? Should the behavior be to 400 or move forward creating a blank record? I think we need to consider things like this. If I supply a GUID to the public API, I probably don't want a blank record. So if I mistyped the GUID I'd actually expect an error, not for it to move forward and create a blank record anyway

document = indexd_response.json()
self.guid = document["did"]
self.logger.info(f"Record with {self.guid} id found in Indexd.")
return document
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of early returns in a longer function like this since at a glance it's unclear where things end. Maybe just breaking some of this into a helper function(s) would help. document = _create_blank_record() and put everything after 288 in there basically. This is nitpicky so you don't need to do this, I just think the readability suffers with early returns unless the function itself is pretty small

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great updates, looks much cleaner now, thanks!

@@ -916,6 +960,26 @@ def test_non_public_authz_and_public_acl_object_upload_file(
Test that a user can successfully generate an upload url for an Indexd
record with a non-public authz field and a public acl field.
"""
did = str(uuid.uuid4())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to modify these to ensure you capture these test cases:

  • When the guid is provided and it already exists
  • When the guid is provided and it doesn't exist
  • When the guid is not provided

I believe you're only capturing the second one right now

@@ -165,7 +167,10 @@ def upload_data_file():
)

blank_index = BlankIndex(
file_name=params["file_name"], authz=params.get("authz"), uploader=uploader
file_name=params["file_name"],
authz=params.get("authz"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was already there, but I think this can just be authz

@@ -266,6 +272,21 @@ def index_document(self):
response from indexd (the contents of the record), containing ``guid``
and ``url``
"""

if self.guid:
index_url = self.indexd.rstrip("/") + "/index/" + self.guid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to percent-encode the guid b/c it's user input, so I'm wondering if there's some security concerns here (like what if I pass in slashes, ../admin/ as the guid. I don't know if requests does that by default and this might get flagged as unsanitized user input in our security scans if it hasn't already. Not sure if/how we handle that elsewhere, just a thought

fence/blueprints/data/indexd.py Show resolved Hide resolved
indexd_response = requests.get(index_url)
if indexd_response.status_code == 200:
document = indexd_response.json()
self.guid = document["did"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also set outside this cached_property.. I'd remove this and mirror the way the blank ID works (it just returns the whole document and let's the init set the guid)

document = indexd_response.json()
self.guid = document["did"]
self.logger.info(f"Record with {self.guid} id found in Indexd.")
return document
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great updates, looks much cleaner now, thanks!

@@ -565,6 +565,8 @@ paths:
Previous authorization check requires a more general, global upload permission:
"file_upload" on "/data_file" resource. When "authz" is *not* provided, this
endpoint will check for that permission for your user.

Accepts a "did" field in the request body. If "did" is provided, it checks indexd for an existing record. If not found, a new blank record will be created.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guid not did

the GUID for this new record and an uploadId for multipart upload presigned url.


Accepts a "did" field in the request body. If "did" is provided, it checks indexd for an existing record. If not found, a new blank record will be created.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guid

tests/data/test_data.py Show resolved Hide resolved

if indexd_200_response:
assert response.status_code == 201, response
assert "guid" in response.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also check that the content of the guid is the same value as the one passed into the endpoint?

self.logger.info(f"Record with {self.guid} id found in Indexd.")
return document
else:
raise NotFound("No indexed document found with id {}".format(self.guid))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, yeah I think this is the right logic if I pass a guid 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you're using .format over f-strings here? There are other places with f-strings, I tend to prefer those. Not a big deal, just some inconsistency

self.guid = guid
self.guid = self.index_document[
"did"
] # .index_document is a cached property with code below, it creates/retrieves the actual record and this line updates the stored GUID to the returned record
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python comments should be on the line above, try to avoid in-line unless required by static code checking exclusions etc

@@ -565,6 +565,8 @@ paths:
Previous authorization check requires a more general, global upload permission:
"file_upload" on "/data_file" resource. When "authz" is *not* provided, this
endpoint will check for that permission for your user.

Accepts a "guid" field in the request body. If "guid" is provided, it checks indexd for an existing record. If not found, a new blank record will be created.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like this is the behavior anymore? If it's not found, it raises a 404

@@ -1686,6 +1691,10 @@ components:
description: >-
the requested bucket to upload to.
If not provided, defaults to the configured DATA_UPLOAD_BUCKET.
guid:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to this new param, we also need to update the example responses to include the 404 that can now happen

@@ -1664,6 +1730,76 @@ def json(self):
assert "uploadId" in response.json


@pytest.mark.parametrize("indexd_200_response", [True, False])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be separate unit tests. One is testing when the GUID exists, one is testing when it doesn't. If you want to share some code, you can create helper functions, but I'd rather keep the positive/negative testing as separate unit tests

@coveralls
Copy link

coveralls commented Jul 2, 2024

Pull Request Test Coverage Report for Build 9764333492

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 41 unchanged lines in 3 files lost coverage.
  • Overall coverage increased (+0.03%) to 75.083%

Files with Coverage Reduction New Missed Lines %
resources/ga4gh/passports.py 2 87.43%
blueprints/data/blueprint.py 9 86.75%
blueprints/data/indexd.py 30 94.65%
Totals Coverage Status
Change from base Build 9713845019: 0.03%
Covered Lines: 7720
Relevant Lines: 10282

💛 - Coveralls

@coveralls
Copy link

Pull Request Test Coverage Report for Build 10184257323

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 41 unchanged lines in 3 files lost coverage.
  • Overall coverage increased (+0.03%) to 75.172%

Files with Coverage Reduction New Missed Lines %
resources/ga4gh/passports.py 2 87.43%
blueprints/data/blueprint.py 9 86.75%
blueprints/data/indexd.py 30 94.69%
Totals Coverage Status
Change from base Build 10164514863: 0.03%
Covered Lines: 7763
Relevant Lines: 10327

💛 - Coveralls

@BinamB BinamB merged commit 4d98387 into master Aug 7, 2024
9 of 10 checks passed
@BinamB BinamB deleted the chore/multpart-init-change branch August 7, 2024 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants