Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rocksdb): Support more configurable items for bloom filter #522

Merged
merged 4 commits into from
Apr 26, 2020

Conversation

acelyc111
Copy link
Member

@acelyc111 acelyc111 commented Apr 24, 2020

What problem does this PR solve?

  • Support to config bits_per_key for Bloom filter to provide lower false positive rate
  • Support to config format_version for SST to provide a faster and more accurate Bloom filter implementation

What is changed and how it works?

Add config in replica server's config.ini:

[pegasus.server]
rocksdb_bloom_filter_bits_per_key = 10
rocksdb_format_version = 2

Check List

Tests

  • Manual test (add detailed scripts or steps below)
  1. Write test data set A
  2. Read test data set B, whose hashkey scale size is 100 times larger than test data set A.
    So 99% of these items are not exist, but these point lookups will consult Bloom filters, then negative or positive will be returned by Bloom filters to indicate whether the key is definity not exist or may exist.
  3. Oberve point_fp_rate in shell command app_stat

Result

  • When rocksdb_bloom_filter_bits_per_key = 10 and rocksdb_format_version = 2
>>> app_stat -a test
[app_stat]
pidx          GET  MGET   PUT  ...  point_n_rate  point_fp_rate
0            0.00  0.00  0.00  ...      0.960577       0.011309
1            0.00  0.00  0.00  ...      0.960897       0.010979
2            0.00  0.00  0.00  ...      0.961837       0.009758
3            0.00  0.00  0.00  ...      0.961982       0.009609
4            0.00  0.00  0.00  ...      0.960831       0.010793
5            0.00  0.00  0.00  ...      0.961616       0.009985
6            0.00  0.00  0.00  ...      0.961044       0.010783
7            0.00  0.00  0.00  ...      0.959726       0.012140
(total:8)    0.00  0.00  0.00  ...      0.961064       0.010670
  • When rocksdb_bloom_filter_bits_per_key = 24 and rocksdb_format_version = 2
>>> app_stat -a test_v2_b24
[app_stat]
pidx          GET  MGET   PUT  ...  point_n_rate  point_fp_rate
0            0.00  0.00  0.00  ...      0.997455       0.002545
1            0.00  0.00  0.00  ...      0.997706       0.002294
2            0.00  0.00  0.00  ...      0.997608       0.002392
3            0.00  0.00  0.00  ...      0.997616       0.002384
4            0.00  0.00  0.00  ...      0.997608       0.002392
5            0.00  0.00  0.00  ...      0.997591       0.002409
6            0.00  0.00  0.00  ...      0.997116       0.002884
7            0.00  0.00  0.00  ...      0.997156       0.002844
(total:8)    0.00  0.00  0.00  ...      0.997482       0.002518
  • When rocksdb_bloom_filter_bits_per_key = 24 and rocksdb_format_version = 5
>>> app_stat -a test_v5_b24
[app_stat]
pidx          GET  MGET   PUT  ...  point_n_rate  point_fp_rate
0            0.00  0.00  0.00  ...      0.999933       0.000067
1            0.00  0.00  0.00  ...      0.999944       0.000056
2            0.00  0.00  0.00  ...      0.999965       0.000035
3            0.00  0.00  0.00  ...      0.999920       0.000080
4            0.00  0.00  0.00  ...      0.999944       0.000056
5            0.00  0.00  0.00  ...      0.999955       0.000045
6            0.00  0.00  0.00  ...      0.999944       0.000056
7            0.00  0.00  0.00  ...      0.999965       0.000035
(total:8)    0.00  0.00  0.00  ...      0.999946       0.000054

Code changes

  • Has exported function/method change
    No
  • Has exported variable/fields change
    No
  • Has interface methods change
    No
  • Has persistent data change
    Yes

Side effects

  • Possible performance regression
    No
  • Increased code complexity
    No
  • Breaking backward compatibility
  1. New version of Pegasus can read bloom filter normally when rocksdb_format_version is 2 or 5
  2. Old version of Pegasus can ONLY read bloom filter normally when rocksdb_format_version is 2
  3. Old version of Pegasus would see the new structure as corrupt filter data and read the table as if there's no filter

Related changes

  • Need to cherry-pick to the release branch
    Yes
  • Need to update the documentation
    Yes
  • Need to be included in the release note
    Yes

@acelyc111 acelyc111 force-pushed the bf_config branch 2 times, most recently from 31a80f4 to cc54c2c Compare April 25, 2020 14:50
@acelyc111 acelyc111 marked this pull request as ready for review April 26, 2020 05:46
@neverchanje neverchanje changed the title feat(bloom filter): Support more configurable items for bloom filter feat(rocksdb): Support more configurable items for bloom filter Apr 26, 2020
@levy5307 levy5307 merged commit 221ac43 into apache:master Apr 26, 2020
@neverchanje neverchanje mentioned this pull request May 14, 2020
@neverchanje neverchanje added the type/config-change Added or modified configuration that should be noted on release note of new version. label May 14, 2020
@neverchanje neverchanje mentioned this pull request Jun 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/config-change Added or modified configuration that should be noted on release note of new version. v2.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants