-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EaR: Refactor encryption header std::variant serializer and versioning #9345
Conversation
a4f74dc
to
a6f7d6c
Compare
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr-macos-m1 on macOS Monterey 12.x
|
Result of foundationdb-pr-macos on macOS Monterey 12.x
|
Doxense CI Report for Windows 10
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
a6f7d6c
to
e9b4d89
Compare
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Doxense CI Report for Windows 10
|
Result of foundationdb-pr on Linux CentOS 7
|
e9b4d89
to
c0c6eb6
Compare
Result of foundationdb-pr-clang on Linux CentOS 7
|
@@ -256,104 +261,139 @@ struct BlobCipherEncryptHeaderFlagsV1 { | |||
// 'encrypted buffer', compared to reading only encryptionHeader and ensuring its sanity; for instance: | |||
// backup-files. | |||
|
|||
template <uint32_t AuthTokenSize> | |||
struct AesCtrWithAuthV1 { | |||
template <class Params> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like the idea of passing class Params
than adding more parameters to the template if needed in the future.
However, am not sure if following is a good idea:
- Defining
AesCtrWithAuth
replacing version specific header, for instance:AesCtrWithAuthV1
. - Pulling header and flags version from main header into component headers.
I initially had the same implementation and then moved away from it for the following reasons:
- For instance, a header is defined today as follows:
struct AV1 { int a, int b, foo y; }
The usage of the code today is done as follows:
if ( struct_version == 1) { // parse and extract information from AV1 struct }
Now if the header in future gets involved (elements added and/or removed, we shall define
struct A2 {int a; int x; foo f; bar b; }
Now the usage in the code for backward compatibility will become
if (version == 1) { A1 a1 = <do necessary assignment> } else if (version == 2) { A2 a2 = <do necessary assignment> }
Compared to the proposed scheme it might end up looking like
A a; if ( version == 1) { a.<field1> = ... a.<field2> = ... } else (version == 2) { a.<field3> = ... }
I feel code #1 seems much better to read and handle (lesser accidental failures, as compiler helps catching bugs while assignment is done etc.); it may look verbose, but, then more simplistic and given header evolution might not be that common (yes we might change header, but, not super frequently I hope).
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still have both ways to evolve the schema: (1) introduce a new algo type, and (2) evolve an existing algo type. If the new schema diverge substantially from any existing algo header, then we shall do (1). If the new schema share enough common parts with one of the existing header, then a pattern like
a.a = ...;
if (version == 1) {
a.b = ...;
} else if (version == 2) {
a.x = ...;
a.y = ...;
}
is not a bad pattern, in that it allow sharing code with clarity. Does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that it allow sharing code with clarity
Am not so convinced about the clarity; in some ways am worried about the if-else
code clutter that needs to happen on per-field basis in extreme case. Otoh, explicit naming of struct with VersionNumber seems more clear and readable, especially given it is a on-disk structure, more explicit the code, easier for long term maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some cases the sharing part could be non-trivial, say if the two version share most of the fields and differ in just one field, and for the other fields we want to have a lot of validation logic for them, we will not want to repeat those logics.
that said, with this patch, you can still introduce a AesCtrWithAuthV2
making it one of the algoHeader
variant and rename the exiting AesCtrWithAuth
to AesCtrWithAuthV1
, or maybe given them a better naming than V1/V2. Say the new type can be AesCtrWithAuthAndSomeNewFeature
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some cases the sharing part could be non-trivial, say if the two version share most of the fields and differ in just one field, and for the other fields we want to have a lot of validation logic for them, we will not want to repeat those logics.
I think we can easily abstract it out in a common header and include the fields; also, this is even used in the current design, for instance: BlobCipherDetails
is used in both algorithm header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In reality both the version are ugly
, but, yes for more flexibility they are what they are. Am not sure I would say one is cleaner than other, as in near future one can easily evolve it to:
std::variant<AesCtr, AesGcm, TripleDES, RSA, BlowFish, TwoFish, RSA,....>
So even if we have one variation of the header, the list would look the same lengthy. We can easily abstract it out by using a using XXX = std::variant<xxx>
if the goal is beautification.
Again, my concern here is version contained in the component headers, requires client code to be written with extra care, one mistake of not
setting a required field for that particular version is a recipe for disaster. The compilation would be all fine, but, functionally it won't be. However, with explicit VersionHeader scheme, first, it makes the code more explicit while reading, one can abstract it out to define explicit functions and/or classes to handle different versions and maybe better asserts (if possible), plus, simple memory assignment would fail to compile to whatever its worth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw with regard to the suggestion below:
struct AesCtrWithAuth { uint8_t version; union { AesCtrWithAuthV1 v1; AesCtrWithAuthV2 v2; } template <class Ar> void serialize(Ar& ar) { serializer(ar, version); switch (version) { case 1: serializer(ar, v1); break; case 2: serializer(ar, v2); break; } } };
Just to clarify I like the proposal above, however, all I meant is the client code still would have to do the same if-else
based on the header version, but, am advocate of header evolution happening in a more explicit manner.
All up for doing it, if you like the idea as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
std::variant<AesCtr, AesGcm, TripleDES, RSA, BlowFish, TwoFish, RSA,....>
is messy, then
std::variant<AesCtrV1, AesCtrV2, AesCtrV3, AesGcmV1, AesGcmV2, AesGcmV3, TripleDESV1, TripleDESV2, TripleDESV3, RSAV1, RSAV2, RSAV3, BlowFishV1, BlowFishV2, TwoFishV3, TwoFishV2, TwoFishV3, RSAV1, RSAV2, RSAV3,....>
is hell.. someone come here looking at algoHeader
probably want to figure out what are the list of supported algorithms, and not for their versioning. We are deliberately mixing two layers of abstraction.
Again, my concern here is version contained in the component headers, requires client code to be written with extra care...
If you want I can make the change I proposed in #9345 (comment), adding back the V1 header, but with it nested inside the non-versioned algorithm struct, which should address your concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we just crossed messages, read my reply at #9345 (comment). I think we are on the same page here. Great discussion, thanks for doing this, Yi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all I meant is the client code still would have to do the same if-else based on the header version
well, with that we can still enforce the "V1" client code to only deal with the V1 sub-headers and so on, achieving what you suggested, that the V1 client code don't deal with any versioning.
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Doxense CI Report for Windows 10
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
c0c6eb6
to
0383d25
Compare
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
CI failures are irrelevant. PR ready for review. |
Doxense CI Report for Windows 10
|
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr-macos on macOS Monterey 12.x
|
Result of foundationdb-pr on Linux CentOS 7
|
Doxense CI Report for Windows 10
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks for doing it.
Please rebase with current main
, I think some recent changes might needs updates as well.
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
@@ -40,6 +40,7 @@ | |||
#include "flow/Trace.h" | |||
#include "flow/UnitTest.h" | |||
#include "flow/xxhash.h" | |||
#include "include/fdbclient/BlobCipher.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: duplicate include?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix in separate PR.
@@ -202,6 +205,15 @@ struct hash<BlobCipherDetails> { | |||
}; | |||
} // namespace std | |||
|
|||
struct EncryptHeaderCipherDetails { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am confused: this should have been already present on main
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just being move from line 334 to here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Couple of nits though.
Doxense CI Report for Windows 10
|
I'm merging it too fast. will address final comments in separate PR. |
* main: (50 commits) Client status report API in Java and python bindings (apple#9366) EaR: Update encryption methods to make 'cipherHeaderKey' optional (apple#9378) EaR: Refactor encryption header std::variant serializer and versioning (apple#9345) Fix arm nightly tests by skipping "until" restarting tests if no old binaries (apple#9362) Backup Mutation Log Separates Tenant Map Modifications During Restore (apple#9292) Remove storageEngineExcludeTypes from blob tests EaR: fix BlobCipher.h build failure Fix one more test toml spec Include missing tenants in the restore output if their state is already an error state Fix a few minor restore bugs and add a dry-run mode. Some improvements to the fdbcli output. Fix restarting to 7.2 tests for sharded rocks Save shard_encode_location_metadata knob value for restarting tests fix anyExisted when beginTenant==endTenant Enable RocksDB restarting tests Fix some merge issues and review comments Fix some merge issues Fix get mapped range test assertion to account for the possibility of a range terminating early when it reaches the end of a shard Fix merge issue Remove unnecessary try/catch Make a few minor fixes, refactor some code for clarity, and improve throughput of repopulating a management cluster ...
Changes:
std::variant
. Serialize size is 1 byte (the type index, i.e.std::variant::index()
), plus the serialize size of the actual type stored in thestd::variant
. UpdateBlobCipherEncryptHeaderRef
to use thestd::variant
binary serializerflagsVersion
andalgoHeaderVersion
fromBlobCipherEncryptHeaderRef
. The former is replaced byflags.index() + 1
, and the latter is moved into each of the algorithm-specific sub-headers. Each sub-header types will have nesting version-specific subtypes to handle serialization of that specific version (e.g. forAesCtrNoAuth
it has aAesCtrNoAuthV1
subtype).Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branch
ormain
if this is the youngest branch)