Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Link to Relevant Issue
This PR relates to this discussion on Slack:
https://councildataproject.slack.com/archives/CMMNC8Y7P/p1664648853737539?thread_ts=1663622963.782599&cid=CMMNC8Y7P
Description of Changes
For the Asheville CDP instance, we encountered an issue where our event index / search was not being updated for new events. After investigating, I realized that the GCS index-chunks files were not being updates.
The generate_event_index_pipeline.py routine calls chunk_index, which in turn calls fs_functions.upload_file.
The upload_file function checks if the file exists on the remote resource - if so, it just returns the existing URI. As a result, there is no ability to overwrite existing index-chunk files.
cdp-backend/cdp_backend/file_store/functions.py
Line 64 in e883508
This PR adds an optional parameter, overwrite to the upload_file function. It modifies generate_event_index_pipeline.py to set overrwrite=true when upload_file is called.
I attempted to add tests for this modification in behavior, included cases to confirm the same URI is returned if the file does / does not exist and overwrite=True.
In test_functions.py, I modified the EXISTING_FILE_URI variable - I felt that it indicated a URI that was not in alignment with what the upload_file function should be expected to produce. If the given SAVE_NAME is included in the call to upload_file, the EXISTING_FILE_URI would include that as part of the URI.
I'm not super familiar with how the mock.patch calls work - but I believe they are changing the behavior of "cdp_backend.file_store.functions.get_file_uri" in a way that caused the tests to pass previously - though they should have failed. For example, previously, EXISTING_FILE_URI = "gs://bucket/test_file.json". However, the test parameters indicated that there was an existing file with URI = EXISTING_FILE_URI, but the requested upload was for SAVE_NAME ("fakeSaveName"). The resulting URI should be "gs://bucket/fakeSaveName"
I'm not completely familiar with writing tests in Python — any review or feedback is appreciated!