From 9311038001e17da46ced4257da9f5f6561d3270b Mon Sep 17 00:00:00 2001 From: Oleg Avdeev Date: Mon, 8 Mar 2021 14:08:45 -0800 Subject: [PATCH] fix hashing algorithm (#1373) Signed-off-by: Oleg Avdeev --- docs/specs/online_store_format.md | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/docs/specs/online_store_format.md b/docs/specs/online_store_format.md index e0ba4038f1..b8e99a5595 100644 --- a/docs/specs/online_store_format.md +++ b/docs/specs/online_store_format.md @@ -74,10 +74,15 @@ We use the following structure to store feature data in the Firestore: Document id for the feature document is computed by hashing entity key using murmurhash3_128 algorithm as follows: -1. hash utf8-encoded entity names, sorted in alphanumeric order -2. hash the entity values in the same order as corresponding entity names, by serializing them to bytes as follows: - - binary values are hashed as-is - - string values hashed after serializing them as utf8 string +1. hash entity names, sorted in alphanumeric order, by serializing them to bytes using the Value Serialization steps below +2. hash the entity values in the same order as corresponding entity names, by serializing them to bytes using the Value Serialization steps below + +Value Serialization: +* Store the type of the value (ValueType enum) as little-endian uint32. +* Store the byte length of the serialized value as little-endian uint32 +* Store the serialized value as bytes: + - binary values are serialized as is + - string values serialized as utf8 string - int64 and int32 hashed as little-endian byte representation (8 and 4 bytes respectively) - bool hashed as 0 or 1 byte