Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimize sync lookups #1037

Merged
merged 1 commit into from
Apr 2, 2020
Merged

Minimize sync lookups #1037

merged 1 commit into from
Apr 2, 2020

Conversation

rmol
Copy link
Contributor

@rmol rmol commented Mar 31, 2020

Description

Reduce the number of database queries during sync: cache sources or journalists instead of looking them up for each incoming object associated with them.

Use maps instead of sets to hold local objects in the update_ functions, so it's faster to check if we already have a record of incoming objects.

This is based on and should be reviewed after #1036.

Test Plan

Run the SD core dev server. Ensure that syncing completes, that the source list and conversation view populate correctly, and that you can reply to sources.

Checklist

If these changes modify code paths involving cryptography, the opening of files in VMs or network (via the RPC service) traffic, Qubes testing in the staging environment is required. For fine tuning of the graphical user interface, testing in any environment in Qubes is required. Please check as applicable:

  • I have tested these changes in the appropriate Qubes environment
  • I do not have an appropriate Qubes OS workstation set up (the reviewer will need to test these changes)
  • These changes should not need testing in Qubes

If these changes add or remove files other than client code, packaging logic (e.g., the AppArmor profile) may need to be updated. Please check as applicable:

  • I have submitted a separate PR to the packaging repo
  • No update to the packaging logic (e.g., AppArmor profile) is required for these changes
  • I don't know and would appreciate guidance

@eloquence
Copy link
Member

(Per PR body labeled blocked until #1036 is merged.)

@rmol rmol force-pushed the minimize-sync-lookups branch from e13184d to db15da6 Compare April 1, 2020 22:28
@rmol rmol removed the blocked label Apr 1, 2020
@sssoleileraaa
Copy link
Contributor

sssoleileraaa commented Apr 1, 2020

Hey @rmol do you happen to have benchmark results. Would like to know what you're seeing to compare. For instance in your PR that made it so we no longer import source keys until we send a reply made update_local_storage take 50% less time from what I saw. Curious what you saw for that too.

@rmol
Copy link
Contributor Author

rmol commented Apr 1, 2020

I posted my results in #1024.

@rmol rmol force-pushed the minimize-sync-lookups branch 3 times, most recently from a6129bf to a6eff29 Compare April 2, 2020 19:58
remote_submissions_by_source[s.source_uuid].append(s)

for source_uuid, submissions in remote_submissions_by_source.items():
source = session.query(Source).filter_by(uuid=source_uuid).first()
Copy link
Contributor

@sssoleileraaa sssoleileraaa Apr 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before we would look up the source right before adding a new submission to the database. now looks like we look up the source for every unique source we get back from the server, so this piece of the change should be theoretically faster if there are many new submissions for one source, but should be slower if there are many more sources and not a lot of new submissions, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch, that would be a lot of unnecessary queries. I could move it back, and add another cache map for sources so it still only happens once per source, but also only if there are new submissions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case, i think the only cache we would need is one that we would check and add to in the else clause in case there are more new submissions for the same source. instead of doing what we were doing before:

_, source_uuid = submission.source_url.rsplit('/', 1)
source = session.query(Source).filter_by(uuid=source_uuid).first()

we could do this:

source = session.query(Source).filter_by(uuid=submission.source_uuid).first()

^ that plus your addition of cache checking/ adding for that source. and i think we could go back to looping through all the submissions without breaking them down by source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, good call. I've added the source cache and reverted to just iterating the remote submissions/replies.

@rmol rmol force-pushed the minimize-sync-lookups branch from a6eff29 to 2d90bad Compare April 2, 2020 22:13

def get(self, source_uuid: str) -> Optional[db.Source]:
if source_uuid not in self.cache:
source = self.session.query(db.Source).filter_by(uuid=source_uuid).first()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming that using first is better than using one because it returns None instead of throwing NoResultFound.

sssoleileraaa
sssoleileraaa previously approved these changes Apr 2, 2020
Reduce the number of database queries during sync: cache sources or
journalists instead of looking them up for each incoming object
associated with them.

Use maps instead of sets to hold local objects in the update_
functions, so it's faster to check if we already have a record of
incoming objects.
@sssoleileraaa sssoleileraaa merged commit 77bbb4b into master Apr 2, 2020
@sssoleileraaa sssoleileraaa deleted the minimize-sync-lookups branch April 2, 2020 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants