-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimize sync lookups #1037
Minimize sync lookups #1037
Conversation
(Per PR body labeled |
e13184d
to
db15da6
Compare
Hey @rmol do you happen to have benchmark results. Would like to know what you're seeing to compare. For instance in your PR that made it so we no longer import source keys until we send a reply made |
I posted my results in #1024. |
a6129bf
to
a6eff29
Compare
securedrop_client/storage.py
Outdated
remote_submissions_by_source[s.source_uuid].append(s) | ||
|
||
for source_uuid, submissions in remote_submissions_by_source.items(): | ||
source = session.query(Source).filter_by(uuid=source_uuid).first() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before we would look up the source right before adding a new submission to the database. now looks like we look up the source for every unique source we get back from the server, so this piece of the change should be theoretically faster if there are many new submissions for one source, but should be slower if there are many more sources and not a lot of new submissions, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good catch, that would be a lot of unnecessary queries. I could move it back, and add another cache map for sources so it still only happens once per source, but also only if there are new submissions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that case, i think the only cache we would need is one that we would check and add to in the else clause in case there are more new submissions for the same source. instead of doing what we were doing before:
_, source_uuid = submission.source_url.rsplit('/', 1)
source = session.query(Source).filter_by(uuid=source_uuid).first()
we could do this:
source = session.query(Source).filter_by(uuid=submission.source_uuid).first()
^ that plus your addition of cache checking/ adding for that source. and i think we could go back to looping through all the submissions without breaking them down by source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, good call. I've added the source cache and reverted to just iterating the remote submissions/replies.
a6eff29
to
2d90bad
Compare
|
||
def get(self, source_uuid: str) -> Optional[db.Source]: | ||
if source_uuid not in self.cache: | ||
source = self.session.query(db.Source).filter_by(uuid=source_uuid).first() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirming that using first
is better than using one
because it returns None
instead of throwing NoResultFound
.
Reduce the number of database queries during sync: cache sources or journalists instead of looking them up for each incoming object associated with them. Use maps instead of sets to hold local objects in the update_ functions, so it's faster to check if we already have a record of incoming objects.
2d90bad
to
258306e
Compare
Description
Reduce the number of database queries during sync: cache sources or journalists instead of looking them up for each incoming object associated with them.
Use maps instead of sets to hold local objects in the
update_
functions, so it's faster to check if we already have a record of incoming objects.This is based on and should be reviewed after #1036.
Test Plan
Run the SD core dev server. Ensure that syncing completes, that the source list and conversation view populate correctly, and that you can reply to sources.
Checklist
If these changes modify code paths involving cryptography, the opening of files in VMs or network (via the RPC service) traffic, Qubes testing in the staging environment is required. For fine tuning of the graphical user interface, testing in any environment in Qubes is required. Please check as applicable:
If these changes add or remove files other than client code, packaging logic (e.g., the AppArmor profile) may need to be updated. Please check as applicable: