-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Command to automatically prune tiles of interest #176
Merged
Merged
Changes from 4 commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
0e2abca
Initial framework for command to prune tiles of interest
iandees 254ac0f
Rows, column, x's, y's!!
iandees 88f6891
Add fixme's for code that still needs to be written
iandees 071cc52
Make days to query configurable
iandees 4049b24
Adjust the yaml format expectations, change ordering of steps
iandees 654e4a1
Refactor config a little, add tile bboxes to include in TOI
iandees a08bea5
Add code to delete from S3 and Redis
iandees 2aaa6d6
utils, not util
iandees f186db9
Be consistent about the name of the grouper function
iandees f3ce544
Merge branch 'master' into prune_toi_command
iandees 0164a64
Return the result of the redis srem
iandees e4138ed
Fix the yaml s3 config code
iandees c082775
Go back to the grouper that allowed iterables
iandees 376d48f
Remove fixme about multiprocessing for now
iandees File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also think about formalizing this a bit more and doing this out of process. What's the order of the amount that we've been managing in the past, several million?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't calculated the number of tiles that would be removed. I can do that now.
I was thinking about putting together an SQS queue and a worker process to do the deletes, but that seemed heavy-handed. Maybe a lambda task to do the delete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the same. It's operationally heavier, but I think we'll need something like that if we want to scale past multiple processes/threads on a single instance.
Maybe a good use case for batch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is a comprehensive list of event sources for lambda, but thinking about it more I think that's a reasonable option. One idea is that we can split up the list into groups of 10k or so, push those groups to a location on s3, and have lambda listen to that. Lambda would perform the delete and remove that object from s3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tested this on dev and each run of the
delete_tile_of_interest()
function (which deletes 1000 tiles at a time) takes ~2 seconds. With ~9 million tiles to remove, that'll take ~5 hours or so. Subsequent runs should be faster.