Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash 3PID lookups #184

Merged
merged 27 commits into from
Aug 20, 2019
Merged

Hash 3PID lookups #184

merged 27 commits into from
Aug 20, 2019

Conversation

anoadragon453
Copy link
Member

@anoadragon453 anoadragon453 commented Jul 31, 2019

The implementation of matrix-org/matrix-spec-proposals#2134

TODO:

  • Support hashed lookups
  • Add column to the db for pre-computed hashes
  • Re-compute hashes for all 3PIDs on startup when pepper changes
  • Store previous pepper in the db so we know it changed
  • Hash addresses as they come in via /bind and replication servlets
  • Recompute lookup hashes if the algorithms in the config changed

We need to have a think about how this will work in the replication use case in general. Obviously we don't want to replicate lookup hashes as the receiving server may have a different pepper/algorithm in use than the sending server.

When we receive an association, we just rehash the address/medium combo with our own lookup pepper.

@anoadragon453 anoadragon453 marked this pull request as ready for review August 6, 2019 13:44
@anoadragon453 anoadragon453 requested a review from a team August 6, 2019 13:45
@anoadragon453
Copy link
Member Author

Heads up that CI is probably never going to pass on this.

@anoadragon453
Copy link
Member Author

Removed my assignment as this PR is currently waiting for review.

Copy link
Member

@erikjohnston erikjohnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some preliminary thoughts.

Whats the deal with CI? This can't really get merged with it failing. Also, what about some unit tests?

sydent/db/hashing_metadata.py Outdated Show resolved Hide resolved
sydent/db/hashing_metadata.py Outdated Show resolved Hide resolved
sydent/db/hashing_metadata.py Outdated Show resolved Hide resolved
sydent/db/hashing_metadata.py Outdated Show resolved Hide resolved
sydent/db/hashing_metadata.py Outdated Show resolved Hide resolved
sydent/http/servlets/lookupservlet.py Show resolved Hide resolved
sydent/http/servlets/lookupv2servlet.py Outdated Show resolved Hide resolved
sydent/http/servlets/lookupv2servlet.py Outdated Show resolved Hide resolved
sydent/http/servlets/v2_servlet.py Show resolved Hide resolved
sydent/http/srvresolver.py Outdated Show resolved Hide resolved
@anoadragon453
Copy link
Member Author

Whats the deal with CI? This can't really get merged with it failing.

CI has been turned on but without a pipeline, hence it failing.

Also, what about some unit tests?

Blocked on #171

)
self.db.commit()
logger.info("v2 -> v3 schema migration complete")
self._setSchemaVersion(3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to go back and hash the existing stuff?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will be done on first run. Sydent checks to see if a pepper is configured in the database, and if not, generate one and rehash all 3PIDs:

sydent/sydent/sydent.py

Lines 186 to 194 in 424270b

hashing_metadata_store = HashingMetadataStore(self)
lookup_pepper = hashing_metadata_store.get_lookup_pepper()
if not lookup_pepper:
# No pepper defined in the database, generate one
lookup_pepper = generateAlphanumericTokenOfLength(5)
# Store it in the database and rehash 3PIDs
hashing_metadata_store.store_lookup_pepper(sha256_and_url_safe_base64,
lookup_pepper)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, OK. Though please add a comment there to say that it must be run before we start serving requests.

sydent/http/servlets/lookupv2servlet.py Outdated Show resolved Hide resolved
sydent/sydent.py Show resolved Hide resolved
sydent/util/hash.py Outdated Show resolved Hide resolved
sydent/util/hash.py Outdated Show resolved Hide resolved
@erikjohnston
Copy link
Member

Does this all work correctly with canonicalised emails? I.e. things like case-insensitive etc.

@anoadragon453
Copy link
Member Author

Does this all work correctly with canonicalised emails? I.e. things like case-insensitive etc.

Sydent continues to make use of the lower() function here: https://github.com/matrix-org/sydent/pull/184/files#diff-c5c56bd2a12d20e5d6f24879fd48d15eR154

@babolivier What is the status of updating this on master?

@anoadragon453
Copy link
Member Author

anoadragon453 commented Aug 19, 2019

@babolivier says the current state using lower() is fine. We're using lower case emails everywhere.

Copy link
Member

@erikjohnston erikjohnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of trivial nits now

sydent/http/servlets/lookupv2servlet.py Show resolved Hide resolved
sydent/http/servlets/lookupv2servlet.py Show resolved Hide resolved
@anoadragon453
Copy link
Member Author

@erikjohnston https://matrix.org/docs/spec/identity_service/unstable#get-matrix-identity-api-v1 oh, maybe we should keep the empty 200 return on GET .../v2.

@anoadragon453
Copy link
Member Author

@erikjohnston https://matrix.org/docs/spec/identity_service/unstable#get-matrix-identity-api-v1 oh, maybe we should keep the empty 200 return on GET .../v2.

Have done this. Simple to remove in the future if we want to clean up that part of the spec for any reason.

@anoadragon453 anoadragon453 requested a review from dbkr August 20, 2019 12:25
Copy link
Member

@dbkr dbkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only other question is why the local threepid assocs have hashes? Wouldn't we only ever be matching against the global ones?

@anoadragon453
Copy link
Member Author

Only other question is why the local threepid assocs have hashes? Wouldn't we only ever be matching against the global ones?

Conclusion is it's not necessary but we'll remove it in a later PR. (issue #196)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants