Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix/index_chunks_not_updated #216

Conversation

conantp
Copy link
Contributor

@conantp conantp commented Oct 1, 2022

Link to Relevant Issue

This PR relates to this discussion on Slack:

https://councildataproject.slack.com/archives/CMMNC8Y7P/p1664648853737539?thread_ts=1663622963.782599&cid=CMMNC8Y7P

Description of Changes

For the Asheville CDP instance, we encountered an issue where our event index / search was not being updated for new events. After investigating, I realized that the GCS index-chunks files were not being updates.

The generate_event_index_pipeline.py routine calls chunk_index, which in turn calls fs_functions.upload_file.

The upload_file function checks if the file exists on the remote resource - if so, it just returns the existing URI. As a result, there is no ability to overwrite existing index-chunk files.

This PR adds an optional parameter, overwrite to the upload_file function. It modifies generate_event_index_pipeline.py to set overrwrite=true when upload_file is called.

I attempted to add tests for this modification in behavior, included cases to confirm the same URI is returned if the file does / does not exist and overwrite=True.

In test_functions.py, I modified the EXISTING_FILE_URI variable - I felt that it indicated a URI that was not in alignment with what the upload_file function should be expected to produce. If the given SAVE_NAME is included in the call to upload_file, the EXISTING_FILE_URI would include that as part of the URI.

I'm not super familiar with how the mock.patch calls work - but I believe they are changing the behavior of "cdp_backend.file_store.functions.get_file_uri" in a way that caused the tests to pass previously - though they should have failed. For example, previously, EXISTING_FILE_URI = "gs://bucket/test_file.json". However, the test parameters indicated that there was an existing file with URI = EXISTING_FILE_URI, but the requested upload was for SAVE_NAME ("fakeSaveName"). The resulting URI should be "gs://bucket/fakeSaveName"

I'm not completely familiar with writing tests in Python — any review or feedback is appreciated!

@codecov
Copy link

codecov bot commented Oct 1, 2022

Codecov Report

Merging #216 (4888131) into main (e883508) will increase coverage by 0.05%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #216      +/-   ##
==========================================
+ Coverage   72.51%   72.57%   +0.05%     
==========================================
  Files          64       64              
  Lines        3493     3493              
==========================================
+ Hits         2533     2535       +2     
+ Misses        960      958       -2     
Impacted Files Coverage Δ
..._backend/pipeline/generate_event_index_pipeline.py 97.29% <ø> (ø)
cdp_backend/file_store/functions.py 80.85% <100.00%> (ø)
cdp_backend/tests/file_store/test_functions.py 100.00% <100.00%> (ø)
cdp_backend/tests/test_utils.py 100.00% <0.00%> (+25.00%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Member

@evamaxfield evamaxfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much!

@evamaxfield evamaxfield merged commit 5bb5488 into CouncilDataProject:main Oct 4, 2022
@conantp conantp deleted the bugfix/index_chunks_not_updated branch March 5, 2023 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants