Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cobra style surgery clear-page-elements command #417

Merged
merged 8 commits into from
Mar 28, 2023
Merged

Add cobra style surgery clear-page-elements command #417

merged 8 commits into from
Mar 28, 2023

Conversation

ahrtr
Copy link
Member

@ahrtr ahrtr commented Mar 10, 2023

Linked to #370. Please read #370 (comment)

It can be used to fix the corrupted db file in #402.

@ahrtr ahrtr marked this pull request as draft March 10, 2023 08:50
@ahrtr ahrtr marked this pull request as ready for review March 11, 2023 07:06
@ahrtr ahrtr added this to the v1.4.0 milestone Mar 11, 2023
@ahrtr
Copy link
Member Author

ahrtr commented Mar 11, 2023

Added a couple of test cases, ready for review.

Steps to fix the corrupted db file in #402 are as below.

$ ./bbolt surgery clear-page-elements ./local-kv.db --output ./new1.db --pageId 29 --from 5 --to 6

$ ./bbolt surgery clear-page-elements ./new1.db  --output ./new2.db --pageId 29 --from 7 --to 8

$ ./bbolt surgery clear-page-elements ./new2.db --output ./new3.db --pageId 29 --from 9 --to 10

$ ./bbolt check ./new3.db 
OK

@ahrtr
Copy link
Member Author

ahrtr commented Mar 15, 2023

@ptabor could you please let me finish this PR? This should be the last surgery series command. At least, no plan to add more in near future.

@ahrtr
Copy link
Member Author

ahrtr commented Mar 17, 2023

cc @tjungblu and @cenkalti to take a look as well. Please let me know if you need clarification.

@cenkalti
Copy link
Member

cenkalti commented Mar 17, 2023

@ahrtr Does it make more sense this command to take keys instead of integer indices? Couple of questions come into my mind when I see integers:

  • Do they start from 0 or 1?
  • If I want to clear items at 5 and 10 and clear 5 first do the items shift? Do I have to clear index 9 instead of 10 next?

If it takes list of keys it'll be easier to integrate with another CLI tools.
Example:

$ find-corrupt-keys | xargs bbolt surgery clear-page-elements ./new1.db  --output ./new2.db

cmd/bbolt/surgery_command_cobra.go Outdated Show resolved Hide resolved
cmd/bbolt/root.go Outdated Show resolved Hide resolved
cmd/bbolt/main.go Outdated Show resolved Hide resolved
cmd/bbolt/surgery_command_cobra.go Outdated Show resolved Hide resolved
cmd/bbolt/surgery_command_cobra.go Outdated Show resolved Hide resolved
cmd/bbolt/surgery_command_cobra.go Outdated Show resolved Hide resolved
cmd/bbolt/surgery_command_cobra_test.go Outdated Show resolved Hide resolved
cmd/bbolt/root.go Outdated Show resolved Hide resolved
cmd/bbolt/surgery_command_cobra.go Outdated Show resolved Hide resolved
@cenkalti
Copy link
Member

Sorry if I look obsessed with the naming but from the clear word, I understand that the command will scrub the contents of the element. If this is not intended, I suggest renaming the command to delete-page-elements.

@ahrtr
Copy link
Member Author

ahrtr commented Mar 18, 2023

Resolved all comments, thx. PTAL. cc @ptabor and @tjungblu to take a look as well.

@ahrtr Does it make more sense this command to take keys instead of integer indices? Couple of questions come into my mind when I see integers:

* Do they start from 0 or 1?

* If I want to clear items at 5 and 10 and clear 5 first do the items shift? Do I have to clear index 9 instead of 10 next?

If it takes list of keys it'll be easier to integrate with another CLI tools. Example:

$ find-corrupt-keys | xargs bbolt surgery clear-page-elements ./new1.db  --output ./new2.db

I tend to keep using indices. The keys are not always visible / printable strings. It also needs deep understanding on both boltDB and business data to perform the surgery commands. Users must do very deep analysis before performing the surgery operations.

The indices start from 0. I updated the flag usages.

$ ./bbolt surgery clear-page-elements -h
Clears elements from the given page

Usage:
  bbolt surgery clear-page-elements [options] [flags]

Flags:
      --from int        start element index (included) to clear, starting from 0
  -h, --help            help for clear-page-elements
      --output string   path to the target db file
      --pageId uint     page id
      --to int          end element index (excluded) to clear, starting from 0, -1 means to the end of page

Sorry if I look obsessed with the naming but from the clear word, I understand that the command will scrub the contents of the element. If this is not intended, I suggest renaming the command to delete-page-elements.

The naming is discussed & agreed between me and @ptabor . We have two commands: clear-page and clear-page-elements. clear-page pageId is just a syntactic sugar of clear-page-elements pageId --from 0 --to -1. clear-page clears all elements in the given page, but we do not remove the page from the tree structure for simplicity for now. It shouldn't a problem, because bbolt may automatically combine it with it's neighbor page when needed (e.g. after deleting elements). clear-page-elements is a little different, it indeed deletes the elements. We want to keep the naming consistent, so I tent to keep using clear-page and clear-page-elements.

cmd/bbolt/main.go Outdated Show resolved Hide resolved
cmd/bbolt/main.go Outdated Show resolved Hide resolved
@tjungblu
Copy link
Contributor

lgtm (non-binding). Also happy to help to move everything over to Cobra.

Shall we add some "warranty" warning to the help text? I don't expect average users to go and fix their bbolt files and know what they do with the surgery commands.

For higher level commands like proposed earlier:

find-corrupt-keys | xargs bbolt surgery clear-page-elements

it kinda makes sense, but then we could directly offer a "bbolt repair" command?

@ahrtr
Copy link
Member Author

ahrtr commented Mar 20, 2023

Shall we add some "warranty" warning to the help text? I don't expect average users to go and fix their bbolt files and know what they do with the surgery commands.

Makes sense to me. We can enhance it later & separately.

it kinda makes sense, but then we could directly offer a "bbolt repair" command?

We can discuss this separately. Please also see my comment #417 (comment).

@ahrtr
Copy link
Member Author

ahrtr commented Mar 20, 2023

Thanks @tjungblu

@cenkalti do you have additional comment?

@cenkalti
Copy link
Member

@ahrtr No

cmd/bbolt/main.go Outdated Show resolved Hide resolved
cmd/bbolt/main.go Outdated Show resolved Hide resolved
internal/surgeon/surgeon.go Outdated Show resolved Hide resolved
p.SetCount(uint16(start))
p.SetOverflow(0)
if preOverflow != 0 || p.IsBranchPage() {
if err := clearFreelist(path); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting philosophical tradeoff.
I'm on the side, that surgery by default should be really minimal.

  • I assume that 'leaking' (i.e. getting unreferenced) pages is not critical for correctness.
    -The cost of running etcd post freelist truncation might be significant.

So I would:

  • add a flag (to the method) whether to this method whether freelist should be truncated.
  • If the flag is false, the method should log Warning that following number of pages were potentially 'abandoned'.
  • Have separate command to 'surgery abandon-freelist'
  • [optional] Have a flag surgery clear-page-elements ... --abandon-freelist that might trigger both together. But as usage of this tools should be infrequent, I wouldn't optimize that aspect of user's experience.

Copy link
Member Author

@ahrtr ahrtr Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to do something on the freelist, otherwise some pages are neither reachable nor in the freelist, eventually it can't pass the bbolt check.

  • either cleanup the freelist, so bbolt will scan the whole db to read all the freelist again. The bad side is reduce of performance on next startup. The good side is the implementation is simple.
  • Or update its parent page, all the way to the root if needed. Obviously it complicates the implementation.
$ ./bbolt page ./db 36
Page ID:    36
Page Type:  branch
Total Size: 4096 bytes
Overflow pages: 0
Item Count: 30

"0000": <pgid=4>
"0067": <pgid=2>
"0134": <pgid=3>
"10001": <pgid=9>
"10066": <pgid=10>
"10131": <pgid=5>
"10196": <pgid=6>
"20061": <pgid=7>
"20126": <pgid=11>
"20191": <pgid=12>
"30056": <pgid=13>
"30121": <pgid=8>
"30186": <pgid=15>
"40051": <pgid=16>
"40116": <pgid=14>
"40181": <pgid=18>
"50046": <pgid=19>
"50111": <pgid=17>
"50176": <pgid=21>
"60041": <pgid=22>
"60106": <pgid=20>
"60171": <pgid=24>
"70036": <pgid=25>
"70101": <pgid=23>
"70166": <pgid=27>
"80031": <pgid=28>
"80096": <pgid=26>
"80161": <pgid=30>
"90026": <pgid=31>
"90091": <pgid=32>

$ ./bbolt surgery clear-page-elements ./db --output ./new.db --pageId 36 --from 2 --to 3
All elements in [2, 3) in page 36 were cleared

$ ./bbolt check ./new.db 
page 3: unreachable unfreed
1 errors found
invalid value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or update its parent page, all the way to the root if needed. Obviously it complicates the implementation.
I didn't get how it would help. I assume we are loosing OVERFLOW

$ ./bbolt surgery clear-page-elements ./db --output ./new.db --pageId 36 --from 2 --to 3
WARNING: The clearing has abandoned 1 page that is not yet referenced from free list. 
Consider `./bbolt surgery abandon-freelist ...`.
All elements with indexes in [2, 3) in page 36 were cleared

$./bbolt surgery abandon-freelist ./new.db --output ./new.db2

$ ./bbolt check ./new.db2
check OK

Copy link
Member Author

@ahrtr ahrtr Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or update its parent page, all the way to the root if needed. Obviously it complicates the implementation.
I didn't get how it would help. I assume we are loosing OVERFLOW

Sorry for the confusion. I wanted to say: If it's clearing branch page items, then we need to traverse all sub pages referenced by the cleared items, so as to figure out the pages which should be added into the freelist.

$ ./bbolt surgery clear-page-elements ./db --output ./new.db --pageId 36 --from 2 --to 3
WARNING: The clearing has abandoned 1 page that is not yet referenced from free list.
Consider ./bbolt surgery abandon-freelist ....
All elements with indexes in [2, 3) in page 36 were cleared

If I understand it correctly, you are suggesting two things:

  1. Add one more command bbolt surgery abandon-freelist;
  2. Users must manually & explicitly execute the surgery abandon-freelist; we shouldn't do it automatically for users?

Generally I agree with this. Just with two minor comments:

  1. The page number may not be 1, so warning message will be something like "WARNING: The clearing has abandoned some page(s) that are not yet referenced from free list.";
  2. If no need to abandon the freelist, then do not print the warning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Let me finish the bbolt surgery abandon-freelist in a separate following PR.

Clear a branch element,

$ ./bbolt surgery clear-page-elements ./local-kv.db --output new.db --pageId 34 --from-index 1 --to-index 2
WARNING: The clearing has abandoned some pages that are not yet referenced from free list.
Please consider executing `./bbolt surgery abandon-freelist ...`
All elements in [1, 2) in page 34 were cleared

clear a leaf element,

$ ./bbolt surgery clear-page-elements ./local-kv.db --output new1.db --pageId 37 --from-index 1 --to-index 2
All elements in [1, 2) in page 37 were cleared

Copy link
Member Author

@ahrtr ahrtr Mar 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ptabor PTAL, thx

I rebased this PR, but please only take a look at the last commit, in which I addressed your latest comments.

internal/surgeon/surgeon.go Outdated Show resolved Hide resolved
cmd/bbolt/command_surgery_cobra_test.go Outdated Show resolved Hide resolved
internal/surgeon/surgeon.go Outdated Show resolved Hide resolved
cmd/bbolt/command_surgery_cobra.go Outdated Show resolved Hide resolved
cmd/bbolt/command_surgery_cobra.go Show resolved Hide resolved
cmd/bbolt/command_surgery_cobra_test.go Outdated Show resolved Hide resolved
@ahrtr
Copy link
Member Author

ahrtr commented Mar 22, 2023

@ptabor Resolved all your comments (thanks). Could you take another look?

@ahrtr
Copy link
Member Author

ahrtr commented Mar 24, 2023

ping @ptabor @serathius

Signed-off-by: Benjamin Wang <wachao@vmware.com>
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Signed-off-by: Benjamin Wang <wachao@vmware.com>
…ould have the same content

Signed-off-by: Benjamin Wang <wachao@vmware.com>
Signed-off-by: Benjamin Wang <wachao@vmware.com>
…page

Signed-off-by: Benjamin Wang <wachao@vmware.com>
Also resolved a bug related to overflow page.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
…ndon freelist

Signed-off-by: Benjamin Wang <wachao@vmware.com>
@ahrtr
Copy link
Member Author

ahrtr commented Mar 28, 2023

Thank you @ptabor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants