Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blockstore: Allow fallback for AddressSignature index() #34440

Merged
merged 1 commit into from
Dec 14, 2023

Conversation

steviez
Copy link
Contributor

@steviez steviez commented Dec 13, 2023

Problem

A change landed somewhat recently in master that changed the key format of the transaction metadata columns. A compatibility backport was introduced to allow a blockstore that had been populated with this newer version to still be readable by v1.17 (backwards software compat).

However, there was an oversight in the backport. Namely, the index() function for AddressSignatures column did a regular unwrap() in try_current_index(). try_current_index() can fail if a key with an unknown size is encountered. This would be exactly the case for encountering a key that was populated by the newer software version with the different key format.

Namely, the following panic was observed when downgrading from recent tip of master to v1.17 with --enable-rpc-transaction-history flag:

thread '<unnamed>' panicked at ledger/src/blockstore_db.rs:878:88:
called `Result::unwrap()` on an `Err` value: UnpackError
stack backtrace:
   0: rust_begin_unwind
             at ./rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:595:5
   1: core::panicking::panic_fmt
             at ./rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/panicking.rs:67:14
   2: core::result::unwrap_failed
             at ./rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/result.rs:1652:5
   3: rocksdb::compaction_filter::filter_callback
   4: _ZNK26rocksdb_compactionfilter_t6FilterEiRKN7rocksdb5SliceES3_PNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPb
   5: _ZN7rocksdb18CompactionIterator20InvokeFilterIfNeededEPbPNS_5SliceE
   6: _ZN7rocksdb18CompactionIterator13NextFromInputEv
   7: _ZN7rocksdb18CompactionIterator11SeekToFirstEv
   8: _ZN7rocksdb13CompactionJob25ProcessKeyValueCompactionEPNS_18SubcompactionStateE
   9: _ZN7rocksdb13CompactionJob3RunEv
  10: _ZN7rocksdb6DBImpl20BackgroundCompactionEPbPNS_10JobContextEPNS_9LogBufferEPNS0_19PrepickedCompactionENS_3Env8PriorityE
  11: _ZN7rocksdb6DBImpl24BackgroundCallCompactionEPNS0_19PrepickedCompactionENS_3Env8PriorityE
  12: _ZN7rocksdb6DBImpl16BGWorkCompactionEPv
  13: _ZN7rocksdb14ThreadPoolImpl4Impl8BGThreadEm
  14: _ZN7rocksdb14ThreadPoolImpl4Impl15BGThreadWrapperEPv
  15: <unknown>
  16: start_thread
  17: clone
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
[2023-12-11T20:53:23.073363622Z ERROR solana_metrics::metrics] datapoint: panic program="validator" thread="?" one=1i message="panicked at ledger/src/blockstore_db.rs:878:88:
    called `Result::unwrap()` on an `Err` value: UnpackError" location="ledger/src/blockstore_db.rs:878:88" version="1.17.9 (src:daf37308; feat:1428472342, client:SolanaLabs)"

And here is the line for v1.17.9 mentioned in the panic message:

<columns::AddressSignatures as ColumnIndexDeprecation>::try_current_index(key).unwrap()

Summary of Changes

So, use .unwrap_or_else() in the index() implementation for AddressSignatures; this will now be consistent with the implementation of index() for TransactionStatus column.

A change landed somewhat recently in master that changed the key format
of the transaction metadata columns. A compatibility backport was
introduced to allow a blockstore that had been populated with this newer
version to still be readable by v1.17 (backwards software compat).

However, there was an oversight in the backport. Namely, the index()
function for AddressSignatures column did a regular unwrap() on the
try_current_index() result. try_current_index() can fail if a key with
an unknown size is encountered. This would be exactly the case for
encountering a key that was populated by the newer software version with
the different key format.

So, use .unwrap_or_else() in the index() implementation for
AddressSignatures; this will now be consistent with the implementation
of index() for TransactionStatus column.
@steviez
Copy link
Contributor Author

steviez commented Dec 13, 2023

@CriesofCarrots - I'm going to try to manually reproduce this panic, but just throwing on your radar incase you want to mull on it as well.

One related item that I was thinking about... By returning Self::as_index(0) in the else case of .unwrap_or_else(), this item will be marked for purge as the slot will 0 for that default. So, this item will get cleaned even if it is not necessarily the oldest item.

For v1.17, I think that is fine given that v1.17 can't read the key/value pair anyways. But, some potential weirdness if someone is toggling back and forth between v1.17 and v1.18 (master). I think this is fine because that shouldn't be the common use case, but mentioning for completeness. I think the only way to avoid that would be backporting a lot more code to v1.17 in order to be able to read the new columns, which is probably not the way we want to go.

@steviez steviez changed the title blockstore: Allow fallback for AddressSignature index blockstore: Allow fallback for AddressSignature index() Dec 13, 2023
Copy link

codecov bot commented Dec 13, 2023

Codecov Report

Merging #34440 (663fff7) into v1.17 (928e384) will decrease coverage by 0.1%.
The diff coverage is 50.0%.

Additional details and impacted files
@@            Coverage Diff            @@
##            v1.17   #34440     +/-   ##
=========================================
- Coverage    81.8%    81.8%   -0.1%     
=========================================
  Files         803      803             
  Lines      218166   218167      +1     
=========================================
- Hits       178515   178504     -11     
- Misses      39651    39663     +12     

@CriesofCarrots
Copy link
Contributor

Ah nuts, I can't believe we (I) messed this up for AddressSignatures 🤕

So, this item will get cleaned even if it is not necessarily the oldest item.

I also tend to think this is fine, although if we care about this potential data loss, we could extend the v1.17 version of the ColumnIndexDeprecation trait, I guess.

@steviez
Copy link
Contributor Author

steviez commented Dec 13, 2023

Ah nuts, I can't believe we (I) messed this up for AddressSignatures 🤕

Emphasis on "we"; I missed this too 😢

I was able to reproduce the panic right after the Blockstore had opened with v1.17.10 tag:

[2023-12-13T22:33:25.132740705Z INFO  solana_validator] solana-validator 1.17.10 (src:00000000; feat:1428472342, client:SolanaLabs)
...
[2023-12-13T22:33:27.380597537Z INFO  solana_ledger::blockstore] Opening database at "/home/sol/ledger/rocksdb"
[2023-12-13T22:33:40.399092527Z INFO  solana_ledger::blockstore] "/home/sol/ledger/rocksdb" open took 13.0s
thread '<unnamed>' panicked at ledger/src/blockstore_db.rs:878:88:
[2023-12-13T22:33:49.546532801Z ERROR solana_metrics::metrics] datapoint: panic program="validator" thread="?" one=1i message="panicked at ledger/src/blockstore_db.rs:878:88:
    called `Result::unwrap()` on an `Err` value: UnpackError" location="ledger/src/blockstore_db.rs:878:88" version="1.17.10 (src:00000000; feat:1428472342, client:SolanaLabs)"

And then after restart with my patch applied, confirmed that the node was able to restart:

[2023-12-13T22:35:33.841230173Z INFO  solana_validator] solana-validator 1.17.10 (src:00000000; feat:1428472342, client:SolanaLabs)
...
[2023-12-13T22:40:18.110862380Z INFO  solana_metrics::metrics] datapoint: validator-new id="83TTSHyHsrM9XHNSVAy5C3uXwifAxXMa1GwrHCVRNv77" version="1.17.10 (src:00000000; feat:1428472342, client:SolanaLabs)" cluster_type=1i

I also tend to think this is fine, although if we care about this potential data loss, we could extend the v1.17 version of the ColumnIndexDeprecation trait, I guess.

Cool, think we're on the same page here with "do nothing further" than this fix

@steviez steviez marked this pull request as ready for review December 13, 2023 22:42
Copy link
Contributor

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this got into CI before the new v1.17 was cut 😅

@steviez
Copy link
Contributor Author

steviez commented Dec 13, 2023

I guess this got into CI before the new v1.17 was cut 😅

Ha, good point. Think I should let the v1.17 version update land before merging ?

@CriesofCarrots
Copy link
Contributor

Think I should let the v1.17 version update land before merging ?

Up to you; I don't think it really matters much. I do sometimes scan through commit lists and use those bump commits to determine what's in the previous release, but that's obv an imperfect technique already :)
I don't see a version-bump PR created yet, though...

@steviez
Copy link
Contributor Author

steviez commented Dec 13, 2023

Up to you; I don't think it really matters much. I do sometimes scan through commit lists and use those bump commits to determine what's in the previous release, but that's obv an imperfect technique already :) I don't see a version-bump PR created yet, though...

I do the same, so I'll hold off until my manual bump goes through in #34451

@steviez
Copy link
Contributor Author

steviez commented Dec 14, 2023

It'd probably be more proper to rebase this and re-run CI, but I think it should be fine and I will clean up the mess if something goes wrong

@steviez steviez merged commit 4d131b0 into solana-labs:v1.17 Dec 14, 2023
32 checks passed
@steviez steviez deleted the bstore_v117_compaction_panic branch December 14, 2023 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants