Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW] Valkey-Bloom: BloomFilter support for Valkey. #407

Open
KarthikSubbarao opened this issue Apr 30, 2024 · 41 comments
Open

[NEW] Valkey-Bloom: BloomFilter support for Valkey. #407

KarthikSubbarao opened this issue Apr 30, 2024 · 41 comments
Labels
major-decision-approved Major decision approved by TSC team

Comments

@KarthikSubbarao
Copy link
Member

The problem/use-case that the feature addresses

Bloom filters are a space efficient probabilistic data structure that can be used to “check” whether an element exists in a set (with a defined false positive), and to “add” elements to a set. While checking whether an item exists, false positives are possible, but false negatives are not possible. https://en.wikipedia.org/wiki/Bloom_filter

Description of the feature

Valkey-Bloom is a Rust Valkey-Module which brings a native and space efficient probabilistic Module data type to Valkey. With this, users can create filters (space-efficient probabilistic Module data type) to add elements, perform “check” operation to test whether an element exists, check cardinality / INFO, auto scale their filters, reserve filters, perform RDB Save and load operations, etc.

Valkey-Bloom is built using bloomfilter::Bloom (https://crates.io/crates/bloomfilter which has a BSD-2-Clause license).

It is compatible with the BloomFilter (BF.*) command APIs of redislabs/rebloom from Redis Ltd. which has over 10M image pulls on Docker and is compatible with several client libraries.

The following commands are supported.

BF.EXISTS
BF.ADD
BF.MEXISTS
BF.MADD
BF.CARD
BF.RESERVE
BF.INFO
BF.INSERT

We would like to bring Valkey-Bloom into the valkey-io project as an open source Valkey-Module that is free to use, contribute to, etc.

Alternatives you've considered

A bloom filter module does exist today for Redis - https://github.com/goodform/rebloom. However, it uses an AGPL-3.0 license which has additional obligations that are are difficult to meet for many of the active contributors who are looking to provide Valkey as a service. AGPL is also widely disallowed by company open source program offices (including Amazon). Given that this package has not been significantly modified since it was created six year ago, it seems likely that the license is part of the issue.

@natoscott
Copy link
Contributor

natoscott commented Apr 30, 2024

@KarthikSubbarao we are continuing the goodform.io modules as native valkey modules too. Personally I don't think the lack of activity relates to the license - it's more that the code is essentially done and that all modules generally get little attention once mature - but we're just speculating here.

Can we find a way to co-exist? I have used naming like valkey-bloom (all lower case) and the module shared library valkeybloom.so for a simple transition for users (this module will be in Fedora soon with this naming convention as we transition away from Redis). This matches up with the other goodform.io modules like valkey-search, valkey-json, valkey-graph, and so on.

Would it be possible to name this new module in a way that highlights the differences perhaps? (e.g. Valkey-Bloom-Rust?)

@madolson
Copy link
Member

Can we find a way to co-exist?

Given your precedence, I think we shouldn't overwrite your naming. If you want to translate the names to valkey-*, I think we should respect that.

Would it be possible to name this new module in a way that highlights the differences perhaps? (e.g. Valkey-Bloom-Rust?)

We could call it Val-Bloom or something, more similar to how Redis was naming. Or we could name it based on the probability. Based on reading the docs (I've been historically advised not to read AGPL code while working in an AWS capacity), the rebloom only supports the Bloom data types and not any of the newer ones supported by Redis (like Top-K or Cookoo). I don't know how popular any of those are though.

@hpatro
Copy link
Contributor

hpatro commented Apr 30, 2024

Thanks @KarthikSubbarao for creating this.

This is one of the most popular modules and I've seen users used various alternatives like lua scripts, custom application around BITSET command when the prior modules weren't accessible (due to licensing). I believe it would be good if Valkey organization can make it part of the project.

Key questions :

  1. How do we bundle modules? Should it be part of the binary/containers/release(s) by default?
  2. Integration tests? Each module having their own testing framework might make it difficult for maintenance over the years. I would rather prefer continuing with TCL tests or introduce new lightweight framework and use it for each modules.

@natoscott
Copy link
Contributor

@hpatro there is an existing python-based test framework (BSD licensed) from the early days that has been kept and used with all of the goodform modules. The earlier version is named 'rmtest' (Redis Modules Test) and I've been working on transitioning it to 'vkmtest' (ValKey Modules Test). Maybe it'll work for the Rust module testing too - you can find the initial version here: https://github.com/goodform/valkey-module-test

@madolson
Copy link
Member

@natoscott That is something I am very interested in taking over (specifically because I want a python based testing framework for the main project) if you have any interest in offloading the maintenance of it. Ideally it could be re-usable across all projects that run Valkey (or Redis even).

@madolson
Copy link
Member

How do we bundle modules? Should it be part of the binary/containers/release(s) by default?

This isn't the question we should answer here. Can you make a separate issue for it?

@natoscott
Copy link
Contributor

@madolson happy to either work with you on it or have you take it over - I have alot on my plate (as I'm sure you do!) but I can definitely still dedicate some time to it. This test framework is also packaged in Fedora and I'd like to upload it to pypi for ease of use within the Valkey modules too.

@natoscott
Copy link
Contributor

natoscott commented Apr 30, 2024

@KarthikSubbarao another possibility if you're super keen on ValkeyBloom and not something with 'Rust' in the name would be for me to use valkey-module-bloom for the existing modules. In hindsight I see I've used that prefix for -test and -sdk (python and C respectively) and that convention could be used on the C modules also perhaps? Anyway, let me know your thoughts, I'm happy to change it at this early stage. There was also mention of a new implementation of ValkeyJSON (not sure if its using Rust) from someone at Alibaba IIRC - so this naming issue may not be an isolated problem.

@madolson
Copy link
Member

madolson commented May 1, 2024

happy to either work with you on it or have you take it over

Cool! Not an immediate something to figure out, but would love to collaborate on this.

@hwware
Copy link
Member

hwware commented May 1, 2024

Thanks @KarthikSubbarao for creating this.

This is one of the most popular modules and I've seen users used various alternatives like lua scripts, custom application around BITSET command when the prior modules weren't accessible (due to licensing). I believe it would be good if Valkey organization can make it part of the project.

Key questions :

  1. How do we bundle modules? Should it be part of the binary/containers/release(s) by default?
  2. Integration tests? Each module having their own testing framework might make it difficult for maintenance over the years. I would rather prefer continuing with TCL tests or introduce new lightweight framework and use it for each modules.

Here we are #408

@daniel-house
Copy link
Member

Would it be possible to name this new module in a way that highlights the differences perhaps? (e.g. Valkey-Bloom-Rust?)

I like a name that highlights the differences in behavior but not one that gives the slightest hint about how it is implemented.

@madolson
Copy link
Member

@valkey-io/core-team I guess maybe ask for a vote if we want to adopt this and continue developing it as an official bloom module? This is not committing to a specific date for when we will release it, just to start the ball rolling for a module based distribution.

Some things to consider. There are other modules like the good form modules. I believe Alibaba also has a module that implements bloom that they have not open sourced.

@zuiderkwast
Copy link
Contributor

Regarding naming, I though we had sort-of decided to reserve the valkey prefix for official modules and clients. OTOH, we agreed that the license precondition to become official is that it's open source / free software, which AGPL is, although cloud vendors and other enterprises don't like it. :)

Anyhow, I hope both can co-exist and that they're made API compatible. In that way, users don't need to worry about the differences running on their distro vs running against a hosted database as a service.

@KarthikSubbarao How complete is your module?

I'm fine with adding it, if you (or anyone else) promises to maintain it actively.

My name suggestion is "ValkeyBF", picking up the BF. prefix used in the command names.

@madolson madolson added the major-decision-pending Major decision pending by TSC team label Jun 12, 2024
@PingXie
Copy link
Member

PingXie commented Jun 12, 2024

@zuiderkwast I think @KarthikSubbarao's bloom filter is licensed under BSD 3-clause and it is the one being proposed here. My vote is yes on the same conditions as Viktor mentioned above: 1) full command compat; 2) active maintenance. Name wise, my preference would be Valkey-Bloom. BF is too short IMO and I would also prefer a dash after Valkey

@madolson
Copy link
Member

@PingXie Do you want to clarify what you mean with 1) full command compat;. I think right now there is not full command compatibility, since only some of the commands are implemented. Do you just mean that the APIs that do exist are compatible?

@zuiderkwast
Copy link
Contributor

@zuiderkwast I think @KarthikSubbarao's bloom filter is licensed under BSD 3-clause and it is the one being proposed here.

Yes it is, but we're also discussing the already-exising AGPL "valkey-bloom" module here.

@natoscott It's good that you're willing to name the AGPL module "valkey-module-bloom" and we can name the BSD licensed module "Valkey-Bloom". That's no collission.

Even if we allow projects under the Valkey unbrella to be AGPL, it might be good to avoid it for modules that are to be included in the "Valkey+" (name TBD) package, which will be a container containing Valkey + some official modules.

@PingXie
Copy link
Member

PingXie commented Jun 12, 2024

@PingXie Do you want to clarify what you mean with 1) full command compat;. I think right now there is not full command compatibility, since only some of the commands are implemented. Do you just mean that the APIs that do exist are compatible?

yeah existing commands being fully compatible is good for now. Also the maintainers (whoever they are) agree that eventual full compat (meaning new commands as well) is a p0 goal by default. We can always discuss exceptions on a case-by-case basis. "incremental perfection" (R) :-)

@KarthikSubbarao
Copy link
Member Author

KarthikSubbarao commented Jun 12, 2024

@KarthikSubbarao How complete is your module?

What is done:

  • Support for the Bloom Filter Module commands (compatible with the ReBloom Module syntax): BF.ADD, BF.EXISTS, BF.INFO, BF.INSERT, BF.MADD, BF.MEXISTS, BF.RESERVE
  • Auto Scaling of Bloom Filters
  • RDB save and load for Bloom Filter data types
  • Configs for bloom filter expansion rate (used for scaling) and max size of bloom filters (number of element that can be "added")
  • Additional Bloom data type callbacks: Copy command, Free, Memory Usage check, Defrag, Free Effort, etc.
  • Initial sanity Memory Usage and Performance tests

What is remaining:

  • Perf testing to set a baseline. We can decide on a baseline scenario and run tests & document results
  • Integration Testing / Unit testing coverage
  • Additional Bloom data type callbacks: AOF rewrite and Digest. These are generic Module data type callbacks that can be implemented in the Module.
  • Memory Based restrictions - If the expected memory that will allocated upon a bloom write type operation (such as BF.REVERSE, BF.CREATE) will result in exceeding allowed memory, then we should reject the command. We need to check if any additional logic needed to handle this should be added to the Module.
  • Additional Bloom specific Module configurations for customizing the created bloom objects & Tuning default/min/max config values.
  1. full command compat;

This Module supports every Bloom Filter command (from ReBloom) except for the BF.LOADCHUNK and BF.SCANDUMP and the commands have been implemented with ReBloom compatibility. The reason for not implementing the two cmds is because the Module provides the ability to load and save BloomModule data type items during RDB load and save. BF.LOADCHUNK and BF.SCANDUMP are APIs to load BloomModule data types through commands, but since we will provide RDB save & load and also AOF Rewrite, having specific commands for the same purpose was not considered as required. This can always be re-evaluated if we think it is useful

  1. active maintenance.

I would be glad to help with maintenance of the Module by addressing issues and having discussions on missing aspects that we would like to build into the Module's functionality and testing

@hpatro
Copy link
Contributor

hpatro commented Jun 12, 2024

@zuiderkwast I think @KarthikSubbarao's bloom filter is licensed under BSD 3-clause and it is the one being proposed here. My vote is yes on the same conditions as Viktor mentioned above: 1) full command compat; 2) active maintenance. Name wise, my preference would be Valkey-Bloom. BF is too short IMO and I would also prefer a dash after Valkey

Full command compat is one of the point I wanted clarification on for all the future modules we're planning to build/accept. As we don't have any data points for Redis Modules, one can't be sure which API(s) were really used. Do you think it's wise to build full compatibility? Right now the changes which @KarthikSubbarao has made supports all the bloom filter related API(s) but leaves out some of the other probabilistic filter(s). I think we should not strive for full command compatibility to accept a Module. Rather accept one if it meets the performance/memory/language/coding standards aspect of the project. We can always improve/add as per user(s) request.

@PingXie
Copy link
Member

PingXie commented Jun 12, 2024

As we don't have any data points for Redis Modules, one can't be sure which API(s) were really used.

We can argue the opposite way too without concrete data and this would become pure speculation at the end.

If there is a legit reason to not be fully compatible we can always take an exception but I think it is important to aim at a higher compat bar so that existing Redis users can migrate their workload seamlessly to Valkey. Any incompatibility adds adoption friction and they add up. I am not saying a module needs to be bit by bit compatible in order to be adopted under Valkey. I am talking about directional alignment on helping all customers move on to Valkey with minimum possible friction.

@hpatro
Copy link
Contributor

hpatro commented Jun 12, 2024

As we don't have any data points for Redis Modules, one can't be sure which API(s) were really used.

We can argue the opposite way too without concrete data and this would become pure speculation at the end.

If there is a legit reason to not be fully compatible we can always take an exception but I think it is important to aim at a higher compat bar so that existing Redis users can migrate their workload seamlessly to Valkey. Any incompatibility adds adoption friction and they add up. I am not saying a module needs to be bit by bit compatible in order to be adopted under Valkey. I am talking about directional alignment on helping all customers move on to Valkey with minimum possible friction.

Well the bloom filter module proposed here has all the bloom filter commands implemented. Remaining commands, technically don't fit under bloom filter they would ideally be under probabilistic filter.

@KarthikSubbarao could we also list out the remaining commands not built yet?

@PingXie
Copy link
Member

PingXie commented Jun 12, 2024

This Module supports every Bloom Filter command (from ReBloom) except for the BF.LOADCHUNK and BF.SCANDUMP and the commands have been implemented with ReBloom compatibility. The reason for not implementing the two cmds is because the Module provides the ability to load and save BloomModule data type items during RDB load and save. BF.LOADCHUNK and BF.SCANDUMP are APIs to load BloomModule data types through commands, but since we will provide RDB save & load and also AOF Rewrite, having specific commands for the same purpose was not considered as required.

This got me thinking about the on-disk format compatibility, which would be another very valuable property. Though I can see it being harder to achieve.

This can always be re-evaluated if we think it is useful

I agree.

Along the compat topic, I would also like the module maintainer to provide migration best practices, when applicable.

@hpatro
Copy link
Contributor

hpatro commented Jun 13, 2024

@KarthikSubbarao could we also list out the remaining commands not built yet?

Realized the other probabilistic filter/algorithm commands are each under different command namespace like Cuckoo filter commands s are under CF.*, count min sketch commands are under CMS.*, etc.

@zuiderkwast
Copy link
Contributor

Ok, so we can eventually have separate modules for cuckoo and minsketch. Seems reasonable.

@madolson
Copy link
Member

madolson commented Jun 13, 2024

If there is a legit reason to not be fully compatible we can always take an exception but I think it is important to aim at a higher compat bar so that existing Redis users can migrate their workload seamlessly to Valkey. Any incompatibility adds adoption friction and they add up. I am not saying a module needs to be bit by bit compatible in order to be adopted under Valkey. I am talking about directional alignment on helping all customers move on to Valkey with minimum possible friction.

I think we should start with first principals and decide what we want the APIs to look, and then decide if we want to be API compatible with Redis. You are starting with the assumption that our users are migrating from Redis, but that need not be the case. They also might be net new developers, and we want to build the right application for them. We may want to alter the APIs to better suite those users.

It should always be evaluated case by case, and should not be a general tenet. I also would bias to skipping APIs that don't make a lot of sense. For example, I know in the search modules they implemented functionality like FT.CONFIG SET, which has largely been replaced with the module config functionality.

@madolson
Copy link
Member

madolson commented Jun 13, 2024

This Module supports every Bloom Filter command (from ReBloom) except for the BF.LOADCHUNK and BF.SCANDUMP and the commands have been implemented with ReBloom compatibility. The reason for not implementing the two cmds is because the Module provides the ability to load and save BloomModule data type items during RDB load and save. BF.LOADCHUNK and BF.SCANDUMP are APIs to load BloomModule data types through commands, but since we will provide RDB save & load and also AOF Rewrite, having specific commands for the same purpose was not considered as required.

This got me thinking about the on-disk format compatibility, which would be another very valuable property. Though I can see it being harder to achieve.

This can always be re-evaluated if we think it is useful

I agree.

Along the compat topic, I would also like the module maintainer to provide migration best practices, when applicable.

On the compat topic, we have a lot of issues to deal with the issue that Redis RDB OP code has changed. I documented the issue here: #645. We don't have a good compatibility story in general with Redis.

@hpatro
Copy link
Contributor

hpatro commented Jun 25, 2024

@valkey-io/core-team Any TSC interested in helping shape this up? I think this would be a nice module to start with and help set the baseline for other modules in the future.

@zuiderkwast
Copy link
Contributor

I'm not particularly interested in spending time with this, but I'm in favor of accepting it, with the relevant bloom filter commands being rebloom-compatible. It's no problem that it excludes non-bloom probabilistic filters (they can be provided by another module in the future) and the obsolete commands (dump/load).

@PingXie
Copy link
Member

PingXie commented Jun 26, 2024

I think this would be a nice module to start with and help set the baseline for other modules in the future.

+1. I am in favor of accepting this module too.

@hwware
Copy link
Member

hwware commented Jun 26, 2024

@hpatro I already spoke with Ping and Viktor privately, I will take this module support, Thanks

@KarthikSubbarao
Copy link
Member Author

KarthikSubbarao commented Jul 18, 2024

Hello - I wanted to post an update here regarding the work remaining and also follow up on the next steps regarding the valkey-bloom Module's review.

From this list, do we want to close out on all these items before the Module can be accepted into the Valkey project?

Or instead, would we want to pick and address the "high priority" items from this list as a requirement? (And continue addressing the remaining as follow up issues)

Issues remaining to close on:

  • Performance Comparison with ReBloom - Both Memory Usage and Latency/TPS
  • Integration Testing strategy for Modules
  • Compatibility Story (with ReBloom). We are currently API compatible with the BloomFilter APIs of ReBloom. However, we are not RDB compatible.
  • AOF Rewrite Support (Datatype callback) - If we want, we can support AOF Rewrite. However, Bloom is not like Data types like String where we can write a command to AOF to recreate it exactly. We can execute a command (BF.RESERVE / BF.INSERT) and it will result in an empty bloom object being created (without any items added / bits set). If we want the exact same bloom object to be created, we need a mechanism for restoring an item using a dump containing bit arrays of existing objects. This means we need to support operations such as BF.LOADCHUNK - which restores from a dump of the BloomObject. On the other hand, if we are OK with AOF Rewrite resulting in empty bloom objects created, we can just write BF.REWRITE / BF.INSERT to the AOF file.
  • Large value consideration: We can consider exempting bloom objects greater than X bytes from synchronous free and from defrag. We can also consider available memory based validation before BloomObject creation.
  • Configs and Tuning - In general, we need to review min/max/default of all existing configs - Default Capacity, Default Expansion Rate. We can also consider the following additional configs:
    • Default False Positive Rate.
    • Max number of sub-filters (scaling) per Bloom object.
    • Bytes Threshold after which we exempt items from synchronous free and from defrag.
  • Digest datatype callback: This will be invoked from DEBUG DIGEST and generates a checksum on the BloomObject.
  • Metrics / Counters (Optional)
    • Bytes and Number of items, number of defrag hits

Performance Comparison

Default parameters:

  • Number of requests = 1000000
  • Number of Filters per BloomFilter object = 1
  • Expansion rate = Scaling disabled
  • Number of BloomFilter objects = 1
  • False positive rate = 0.001
  • Number of cores on the machine = 4
  • Command Used: BF.EXISTS

Starting the server (pinned to 2 cores):

valkey-server --loadmodule <path_to_module>
sudo taskset -cp 0,1 <valkey-server pid>

Creating the BloomObject & Running the benchmark (pinned to 1 core):

127.0.0.1:6379> bf.reserve key 0.001 <capacity>
sudo taskset -c 2 /home/ec2-user/valkey-benchmark -n 1000000 BF.EXISTS key item

Performance Comparison (Non Scaling)

Capacity BloomFilter objects ReBloom p50 ValkeyBloom p50 ValkeyBloom p50 % Increase ReBloom p95 ValkeyBloom p95 ValkeyBloom p95 % Increase ReBloom p99 ValkeyBloom p99 ValkeyBloom p99 % Increase ReBloom TPS ValkeyBloom TPS ValkeyBloom TPS % Increase
1 1 0.24967 0.24433 -2.14% 0.303 0.30567 0.88% 0.51633 0.495 -4.13% 98130.28667 99324.09667 1.22%
100 1 0.24967 0.247 -1.07% 0.32167 0.25767 -19.90% 0.54033 0.431 -20.23% 97041.71333 103638.66333 6.80%
10000 1 0.247 0.247 0.00% 0.319 0.30833 -3.34% 0.52167 0.47367 -9.20% 97793.19667 99049.15667 1.28%
1000000 1 0.247 0.247 0.00% 0.30033 0.31367 4.44% 0.50033 0.48433 -3.20% 99641.27 99115.8 -0.53%
100000000 1 0.24967 0.24967 0.00% 0.327 0.327 0.00% 0.559 0.53233 -4.77% 94195.21 95751.18 1.65%

Regarding performance comparison - valkey-bloom performs roughly the same as Rebloom during Non Scaling tests.

@hwware
Copy link
Member

hwware commented Jul 19, 2024

Thanks for sharing the information. Reference all above comments, I prefer you can begin with the following items.

  1. Performance Test for Auto Scaling of Bloom Filters
  2. Create the Integration Testing framework (But I am hesitating which language we should use, Rust, Python or TCL which I do not like ^_^), do you some idea to suggest?

I always want to build a proper framework then add more features here, then other contributors could work on the same way.
And we can use the existing features to build the baseline for performance, then it is easier to figure out how much the new feature influence current system if the feature is involved..

After above 2 tasks are done, let us check which work we should do on next step, how do you think?

@zuiderkwast
Copy link
Contributor

  • On the other hand, if we are OK with AOF Rewrite resulting in empty bloom objects created, we can just write BF.REWRITE / BF.INSERT to the AOF file.

I don't understand. Does this mean that the data is lost on AOF rewrite?

@KarthikSubbarao
Copy link
Member Author

On the other hand, if we are OK with AOF Rewrite resulting in empty bloom objects created, we can just write BF.REWRITE / BF.INSERT to the AOF file.
I don't understand. Does this mean that the data is lost on AOF rewrite?

Currently, we have not yet implemented the AOF callback and I was hoping to first discuss this here.

If we handle AOF rewrite by saving commands such as BF.RESERVE or BF.INSERT we will be able to re-create a Bloom Object with the same properties (expansion, capacity, false positive rate). However, the bits will not be set and when restored from the AOF, no items would "exist" (be set) on the bloom object. So, yes, this can be considered data loss.

This is not an issue with RDB Load and Save because we save the raw BloomObject's byte vector data during RDB Save and we are able to restore this during RDB Load.

For AOF rewrite to support saving the exact state of the Bloom object (including the items that were "set"), we need to include the dump in the AOF and will need to support a command that can restore this data. ReBloom supports a BF.LOADCHUNK command to restore a bloom object from its dump

@zuiderkwast
Copy link
Contributor

Thanks. It sounds to me that LOADCHUNK is needed. Is seems wrong to me to assume a BF is only volatile cache data.

@hwware
Copy link
Member

hwware commented Jul 23, 2024

As we discussed in the meeting, now we are blocked by 2 issues:

  1. RDB compatible
  2. AOF rewrite problem,

Please draft these 2 points in our rfc https://github.com/valkey-io/valkey-rfc

@KarthikSubbarao Thanks

@KarthikSubbarao
Copy link
Member Author

Hello - I have documented the ValkeyBloom feature as an RFC here: valkey-io/valkey-rfc#4

Please do take a look when possible

@ashtul
Copy link
Contributor

ashtul commented Aug 24, 2024

@hpatro there is an existing python-based test framework (BSD licensed) from the early days that has been kept and used with all of the goodform modules. The earlier version is named 'rmtest' (Redis Modules Test) and I've been working on transitioning it to 'vkmtest' (ValKey Modules Test). Maybe it'll work for the Rust module testing too - you can find the initial version here: https://github.com/goodform/valkey-module-test

I believe the current testing framework in use is https://github.com/RedisLabsModules/RLtest

@ashtul
Copy link
Contributor

ashtul commented Aug 24, 2024

@hwware @zuiderkwast @KarthikSubbarao
Here is a link to an issue at Redis bloom about the of and rib issues.

RedisBloom/RedisBloom#12

@zuiderkwast
Copy link
Contributor

That issue mentions a limit of 512MB for DUMP/RESTORE due to the protocol limits. It's for any type, so not specific to bloom filters. A large string or hash has the same problem. This is configurable though. In valkey.conf, I find this comment:

# In the server protocol, bulk requests, that are, elements representing single
# strings, are normally limited to 512 mb. However you can change this limit
# here, but must be 1mb or greater
#
# proto-max-bulk-len 512mb

(We really should change "mb" to "MB" though, because in the metric standard "m" = milli, "M" = mega and in internet standards define "b" = bit, "B" = byte. Millibit doesn't make much sense.)

@madolson madolson added major-decision-approved Major decision approved by TSC team and removed major-decision-pending Major decision pending by TSC team labels Sep 1, 2024
@ashtul
Copy link
Contributor

ashtul commented Sep 1, 2024

@madolson @zuiderkwast @KarthikSubbarao @gkorland

I am working on a new rust bloom filter implementation which I think valkey can benefit from.

It will have the following features:

  1. Scalablability.
  2. Number of hash functions is floored (link).
  3. The filter can be locked.
    • No additional items can be added.
    • The filter is compressed to release unused memory and increase the false-positive rate to the user's defined rate.

Is there a timeline for the release of the module?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-decision-approved Major decision approved by TSC team
Projects
None yet
Development

No branches or pull requests

9 participants