-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
caching issues using dictionary #14
Comments
Upon further debugging. I found something very strange in the Encoder class. When this object cache gets sufficiently large ( about 69145 entries). I am getting some corrupted value. Example:
the pointer value of 1153669 decode to two very different value. Incidentally, the _encode_pointer returns the correct value. Pointer 1153668-1153671 will always decode to the incorrect value for this Encoder object. Yes, we have a different objects, the error would happens on another range of pointers. Example of a correctly encoded pointer.
After confirming this behavior, I narrow down to this
I suspect the mapping of the type_decoder may have some kind of caching of these function and call the incorrect function when the object grows to a significant size. (just speculating). Anyhow, since, I know the root cause. I patch the code to call the _encode_pointer(value) directly instead of going thru the function mapping. The random corruption of prefixes/data is completely gone. self.data_list.append(res)
pointer_position = self.data_pointer
self.data_pointer += len(res)
- pointer = self.encode(pointer_position, 1)
+ pointer = self._encode_pointer(pointer_position)
self.data_cache[cache_key] = pointer
return pointer
return res Do you foresee any issue with my change? I am not sure if this is the correct way to fix this issue. |
Thank you for bringing this issue. There is indeed a problem here. I think the bug is that point type value should not be cached. I've created a fix for this bug in a new branch: fix/14-caching-issues-using-dictionary. Could you please test this fix and let me know if it resolves the problem for you? Your contribution in identifying this bug is greatly appreciated. Thank you for helping me improve the project. |
I have run the code through my largest DB of 11 millions prefix/data entries. No pointer corruption reported. I think this patch should be good to go. Thanks for the fix. |
Some interesting info. With this fix, I am seeing about 20% drop in performance. Usually, using the old code, I observer a rate of 10,000 prefix per second process rate while after the patch, I am now seeing 8,000 prefixes per second. I think data accuracy out weights the performance. FYI. |
I've released new version v0.2.5, please try it. About performance, I'm trying to reimplement the entire logic using rust pyo3. Using the rust should give a big performance boost. But it still work in progress.😂 |
After running version 0.2.4 on a large data set, I have observed that caching large number of key value pair sometime causes cache error and corrupt some of the the cache entries. These occurrence are totally random and happens only when the cache size reach a significant size. I was not able to pin point the exact cause.
The text was updated successfully, but these errors were encountered: