Minimize sync lookups #1037

rmol · 2020-03-31T20:34:25Z

Description

Reduce the number of database queries during sync: cache sources or journalists instead of looking them up for each incoming object associated with them.

Use maps instead of sets to hold local objects in the update_ functions, so it's faster to check if we already have a record of incoming objects.

This is based on and should be reviewed after #1036.

Test Plan

Run the SD core dev server. Ensure that syncing completes, that the source list and conversation view populate correctly, and that you can reply to sources.

Checklist

If these changes modify code paths involving cryptography, the opening of files in VMs or network (via the RPC service) traffic, Qubes testing in the staging environment is required. For fine tuning of the graphical user interface, testing in any environment in Qubes is required. Please check as applicable:

I have tested these changes in the appropriate Qubes environment
I do not have an appropriate Qubes OS workstation set up (the reviewer will need to test these changes)
These changes should not need testing in Qubes

If these changes add or remove files other than client code, packaging logic (e.g., the AppArmor profile) may need to be updated. Please check as applicable:

I have submitted a separate PR to the packaging repo
No update to the packaging logic (e.g., AppArmor profile) is required for these changes
I don't know and would appreciate guidance

eloquence · 2020-03-31T22:16:38Z

(Per PR body labeled blocked until #1036 is merged.)

sssoleileraaa · 2020-04-01T23:03:55Z

Hey @rmol do you happen to have benchmark results. Would like to know what you're seeing to compare. For instance in your PR that made it so we no longer import source keys until we send a reply made update_local_storage take 50% less time from what I saw. Curious what you saw for that too.

rmol · 2020-04-01T23:40:17Z

I posted my results in #1024.

sssoleileraaa · 2020-04-02T20:05:57Z

securedrop_client/storage.py

+        remote_submissions_by_source[s.source_uuid].append(s)
+
+    for source_uuid, submissions in remote_submissions_by_source.items():
+        source = session.query(Source).filter_by(uuid=source_uuid).first()


before we would look up the source right before adding a new submission to the database. now looks like we look up the source for every unique source we get back from the server, so this piece of the change should be theoretically faster if there are many new submissions for one source, but should be slower if there are many more sources and not a lot of new submissions, right?

Yes, good catch, that would be a lot of unnecessary queries. I could move it back, and add another cache map for sources so it still only happens once per source, but also only if there are new submissions.

in that case, i think the only cache we would need is one that we would check and add to in the else clause in case there are more new submissions for the same source. instead of doing what we were doing before:

_, source_uuid = submission.source_url.rsplit('/', 1) source = session.query(Source).filter_by(uuid=source_uuid).first()

we could do this:

source = session.query(Source).filter_by(uuid=submission.source_uuid).first()

^ that plus your addition of cache checking/ adding for that source. and i think we could go back to looping through all the submissions without breaking them down by source.

Again, good call. I've added the source cache and reverted to just iterating the remote submissions/replies.

sssoleileraaa · 2020-04-02T22:21:24Z

securedrop_client/utils.py

+
+    def get(self, source_uuid: str) -> Optional[db.Source]:
+        if source_uuid not in self.cache:
+            source = self.session.query(db.Source).filter_by(uuid=source_uuid).first()


Confirming that using first is better than using one because it returns None instead of throwing NoResultFound.

Reduce the number of database queries during sync: cache sources or journalists instead of looking them up for each incoming object associated with them. Use maps instead of sets to hold local objects in the update_ functions, so it's faster to check if we already have a record of incoming objects.

rmol requested review from sssoleileraaa, kushaldas and redshiftzero as code owners March 31, 2020 20:34

rmol mentioned this pull request Mar 31, 2020

Speed up update_local_storage in sync #1024

Closed

6 tasks

eloquence added the blocked label Mar 31, 2020

rmol force-pushed the minimize-sync-lookups branch from e13184d to db15da6 Compare April 1, 2020 22:28

rmol removed the blocked label Apr 1, 2020

rmol force-pushed the minimize-sync-lookups branch 3 times, most recently from a6129bf to a6eff29 Compare April 2, 2020 19:58

sssoleileraaa reviewed Apr 2, 2020

View reviewed changes

rmol force-pushed the minimize-sync-lookups branch from a6eff29 to 2d90bad Compare April 2, 2020 22:13

sssoleileraaa reviewed Apr 2, 2020

View reviewed changes

sssoleileraaa previously approved these changes Apr 2, 2020

View reviewed changes

rmol dismissed sssoleileraaa’s stale review via 258306e April 2, 2020 22:41

rmol force-pushed the minimize-sync-lookups branch from 2d90bad to 258306e Compare April 2, 2020 22:41

sssoleileraaa approved these changes Apr 2, 2020

View reviewed changes

sssoleileraaa merged commit 77bbb4b into master Apr 2, 2020

sssoleileraaa deleted the minimize-sync-lookups branch April 2, 2020 22:57

eloquence mentioned this pull request Feb 8, 2021

investigate how to speed up update_sources #1009

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimize sync lookups #1037

Minimize sync lookups #1037

rmol commented Mar 31, 2020

eloquence commented Mar 31, 2020

sssoleileraaa commented Apr 1, 2020 •

edited

Loading

rmol commented Apr 1, 2020

sssoleileraaa Apr 2, 2020 •

edited

Loading

rmol Apr 2, 2020

sssoleileraaa Apr 2, 2020

rmol Apr 2, 2020

sssoleileraaa Apr 2, 2020

Minimize sync lookups #1037

Minimize sync lookups #1037

Conversation

rmol commented Mar 31, 2020

Description

Test Plan

Checklist

eloquence commented Mar 31, 2020

sssoleileraaa commented Apr 1, 2020 • edited Loading

rmol commented Apr 1, 2020

sssoleileraaa Apr 2, 2020 • edited Loading

Choose a reason for hiding this comment

rmol Apr 2, 2020

Choose a reason for hiding this comment

sssoleileraaa Apr 2, 2020

Choose a reason for hiding this comment

rmol Apr 2, 2020

Choose a reason for hiding this comment

sssoleileraaa Apr 2, 2020

Choose a reason for hiding this comment

sssoleileraaa commented Apr 1, 2020 •

edited

Loading

sssoleileraaa Apr 2, 2020 •

edited

Loading