Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(gs-cache): cleanup cache usage, don't try to save key_db_entry no… #1037

Merged
merged 23 commits into from
Sep 7, 2022

Conversation

Avantol13
Copy link
Contributor

@Avantol13 Avantol13 commented Aug 26, 2022

New Features

Breaking Changes

Bug Fixes

  • Fix intermittent issue with db cache containing info that can't be loaded into JSON

Improvements

Dependency updates

Deployment changes

@github-actions
Copy link

github-actions bot commented Aug 26, 2022

The style in this PR agrees with black. ✔️

This formatting comment was generated automatically by a script in uc-cdis/wool.

@coveralls
Copy link

coveralls commented Aug 26, 2022

Pull Request Test Coverage Report for Build 12795

  • 4 of 7 (57.14%) changed or added relevant lines in 1 file are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.003%) to 73.907%

Changes Missing Coverage Covered Lines Changed/Added Lines %
fence/blueprints/data/indexd.py 4 7 57.14%
Files with Coverage Reduction New Missed Lines %
fence/blueprints/data/indexd.py 1 93.84%
Totals Coverage Status
Change from base Build 12686: -0.003%
Covered Lines: 6866
Relevant Lines: 9290

💛 - Coveralls

key_db_entry.expires,
)

db_entry = {}
db_entry["gcp_proxy_group_id"] = proxy_group_id
db_entry["gcp_private_key"] = json.dumps(str(private_key))
db_entry["gcp_key_db_entry"] = str(key_db_entry)
db_entry["gcp_private_key"] = str(private_key)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we just json.dumps this and then json.loads should work without the need to do that string manipulation?

>>> import json
>>> test = {"foo": "bar"}
>>> db_entry = json.dumps(test)
>>> db_entry
'{"foo": "bar"}'
>>> json.loads(db_entry)
{'foo': 'bar'}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay yes, done. Tested it out and its working

json.loads(cache.gcp_key_db_entry),
cache.expires_at,
private_key = json.loads(
str(cache.gcp_private_key).replace("'", '"')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why this is necessary now. See next comment

keydbentry = UserGoogleAccountToProxyGroup()
keydbentry.expires = 10
google_object._assume_role_cache_gs = {"1": ("key", keydbentry, 10)}
google_object._assume_role_cache_gs = {"1": ("key", 10)}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could probably add some more unit tests here to regression test this? What do you think , I'm not sure we have to have integration tests

):
indexed_file = IndexedFile(file_id="some id")
google_object = GoogleStorageIndexedFileLocation("gs://some/location")
google_object._assume_role_cache_gs = {"1": ("key", 10)}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should use a dictionary as the key to better simulate the real world data (and hopefully catch any issues with the JSON parsing)

google_object = GoogleStorageIndexedFileLocation("gs://some/location")
google_object._assume_role_cache_gs = {"1": ("key", 10)}

assert google_object._assume_role_cache_gs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't really necessary, b/c you set it in the test, we're not really asserting behavior of the unit under test

after_cache = db_session.query(AssumeRoleCacheGCP).first()

assert after_cache
assert before_cache != after_cache
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we expect these to be different? Also I think we should test the actual contents of the cache, make sure it's a dictionary or JSON (whatever we actually expect in the code)

create presigned url again
make sure cache is set correctly
"""
# db_session.add(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be commented out?


assert after_cache
assert (
str(type(after_cache))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this assert is worth having. The important thing is that we can json load properly, this type check seems unecessary

)
# check if json loads can properly parse json string stored in cache
assert (
str(type(json.loads(after_cache.gcp_private_key)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this type check either. The assert below will make sure the json.loads works and creates a comparable object to the mock we provided. So I'd remove this line

assert json.loads(after_cache.gcp_private_key) == sa_private_key

db_session.delete(after_cache)
cleared_cache = db_session.query(AssumeRoleCacheGCP).first()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the regression test we want, it's not bad to keep this too, but the test we need is:

  • Generate url
  • Simulate pod rolling (e.g. the in memory cache being cleared, not the database one)
  • Generate url, ensure new in memory cache has valid info from the db


assert redo_cache
assert (
str(type(redo_cache)) == "<class 'fence.models.AssumeRoleCacheGCP'>"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments from above. I think we should remove all these type checks and only leave the json.loads assert. And we may want to do something like this to capture the failure correction:

try:
  assert json.loads(redo_cache.gcp_private_key) == sa_private_key
except Exception:
  pytest.fail("Could not json.loads(cache)")

r_pays_project=None,
)

try:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you only need the try catch around the json.loads

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or I guess this makes sense, in case it never got added into the cache. But I'd just assert that explicitly. The only reason for this pattern was b/c the json.loads itself can fail in the test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so here you can just

assert "1" in google_object._assume_role_cache_gs
assert len(google_object._assume_role_cache_gs["1"]) > 1
assert google_object._assume_role_cache_gs["1"][0] == sa_private_key

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and remove the try/except

Copy link
Contributor

@BinamB BinamB Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, saw these comments after i had added the try-catch for the json.loads. Should i still do this? My reasoning was, whatever is stored in the database should be json.loads-able. Which would be assert google_object._assume_role_cache_gs["1"][0] == sa_private_key

I suppose that would be cleaner to do. Okay, convinced writing this commend that I shoukd just do those asserts.

@BinamB BinamB merged commit d05f94c into master Sep 7, 2022
@BinamB BinamB deleted the fix/gs-cache branch September 7, 2022 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants