Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API for Bookie checksum verification writeFlags #2182

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

karanmehta93
Copy link
Member

@karanmehta93 karanmehta93 commented Oct 17, 2019

Descriptions of the changes in this PR:

Motivation

Reserving the bits from WriteFlag so that clients can send the checksum information to the bookies

Changes

Updated WriteFlag class
Bit no 2-4 (from the right) indicates checksum type.
0 --> No verification enabled
1 --> DIGEST_TYPE_CRC32
2 --> DIGEST_TYPE_MAC
3 -->DIGEST_TYPE_CRC32C

Bit no 0 (from the right) is currently being used for DEFERRED_SYNC option.

Master Issue: #2174

@karanmehta93
Copy link
Member Author

@eolivelli @reddycharan Check this out.

DEFERRED_SYNC(0x1 << 0);
DEFERRED_SYNC(0x1 << 0),
/**
* Bits 1,2,3 - value 0.
Copy link
Contributor

@reddycharan reddycharan Oct 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. For digesttype we need 3 bits, instead of taking Bits 1, 2, 3 I would suggest you to take Bits 2, 3, 4. This gives some space for SYNC types (REGULAR_SYNC and DEFERRED_SYNC). Currently it might need just 2 values (0/1) but it is better to be safe to leave space for expansion.

  2. For reserving bits for digesttype, we should declare something like
    static final int DIGEST_TYPE_BITS = 0x7 << 2; as static int value. By doing so we are declaring that we need three bits and it would be bits 2, 3, 4.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. That's a great point, will update it.
  2. I am not sure what the benefit will be. Since this would require code changes, someone may still mess it up. Currently, we don't throw exceptions in getWriteFlags() method, silently ignoring unexpected values, otherwise I could add up some tests that would have failed if someone tried to re-use them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we have to use this static final int DIGEST_TYPE_BITS = 0x7 << 2; in getWriteFlags() method. Basically to know the digestType we should do logical and with DIGEST_TYPE_BITS first and then do the logical and with the corresponding enum value. Basically we are making it clear in getWriteFlags method that these bits are considered to be related to DigestType.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I did very lame mistake with bit manipulation here. 🤦‍♂️
With this logic, everything will become DigestType.DUMMY. I will add a test to cover this.

return ONLY_DEFERRED_SYNC;
writeFlags.add(DEFERRED_SYNC);
}
if ((flagValue & DIGEST_TYPE_DUMMY.value) == DIGEST_TYPE_DUMMY.value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw what if in this 'flagValue' int multiple digesttypes are enabled? How do we handle this error scenario?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot have multiple digestTypes being represented by a single int value. The server initializes the digestType based on the first ever flag value received and doesn't allow it to get updated (unless it was disabled earlier and then enabled).

We would essentially rely on client to ensure that it doesn't get corrupted. The writeFlags would be automatically set based on digestType. If the client re-opens a writeHandle again (for example, replication), then those bits would be automatically set based on metadata.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to define more constants, one for each supported digest type?
The writer already knows the current digest type.
We can just have only one single enum value.
We are not constrained to map the API 1-1 to the wire protocol.

Side note: we cannot merge this change until the work on the bookie side is not ready and committed.
We can release a bookie with some unused features but once we commit to the public client API the feature must be available on the server side.

@karanmehta93
Copy link
Member Author

Why do we have to define more constants, one for each supported digest type? The writer already knows the current digest type.

How would server know the digestType if we just have a single enum to represent them all. We would then need bit manipulation logic at other places to decode those bits to determine the type. This is helping consolidate some of it.

Side note: we cannot merge this change until the work on the bookie side is not ready and committed. We can release a bookie with some unused features but once we commit to the public client API the feature must be available on the server side.

That is fine, I wanted to put this PR out to get some feedback and also reserve the bits from writeFlags.

@eolivelli
Copy link
Contributor

eolivelli commented Oct 22, 2019

On the client API we should only have one WriteFlag CHECKSUM_VERIFICATION, as the LedgerHandle already knows the digest type.

On the wire we can still send more detail to the bookie, still using the 'flags' field

Another question: MAC needs a password, how will you handle it? We could say that MAC is not supported.

@karanmehta93
Copy link
Member Author

karanmehta93 commented Oct 22, 2019

On the client API we should only have one WriteFlag CHECKSUM_VERIFICATION, as the LedgerHandle already knows the digest type.

I guess that comment is coming based on the fact that the WriteFlag enum is based in org.apache.bookkeeper.client.api package. However the server also uses it from the same place. DEFERRED_SYNC flag has been implemented the same way. The server needs to know the information of digestType and I don't think that parsing raw integer value outside of this enum is a good idea.

That being said, I wouldn't ask the app developer to add the corresponding flag on their own. Rather, it would get added automatically as part of LedgerHandle. Does that seem fine?

Another question: MAC needs a password, how will you handle it? We could say that MAC is not supported.

We are sending ledger key as part of the request too, hence we can use it.

@karanmehta93
Copy link
Member Author

@eolivelli Any comments here?
I don't want to merge this now, just get an informal approval.

@eolivelli
Copy link
Contributor

the MAC part is okay.

Still I think we must have only one enum value on the WriteFlags enum.
We can change the server side and do use that class directly, we are free to decouple the wire protocol handling from the API.

@karanmehta93
Copy link
Member Author

So if I understand this correctly, we should copy over a writeFlag class to server side, have only 2 flags at the client level.
Since the server uses getWriteFlags() method to get a list of flags, we would still need those enum types on the server that map to the exact digest types. Another option is that I split over this method, and create separate methods that return different portions of write flag, boolean in case of DEFERRED_SYNC and an integer for digestType. Is that something you would like to do?

@karanmehta93
Copy link
Member Author

@eolivelli @sijie Thoughts?

@eolivelli
Copy link
Contributor

Sorry for late reply, I missed the notification.

Yes, on the client API we need only two flags.

For the server I like option 2, that is to create two methods: one for DEFERRED_SYNC and one for getting the DigestType

@eolivelli
Copy link
Contributor

@karanmehta93
How are you doing with this change ?

@karanmehta93
Copy link
Member Author

@eolivelli Currently I am not working, busy with some internal stuff.
Will update the PR when I get some time.

@eolivelli
Copy link
Contributor

@karanmehta93 @reddycharan @jvrao
what's the status of this work ?

@karanmehta93
Copy link
Member Author

@eolivelli Apologies for late reply.

I tried re-factoring the code but it's seems like a bigger effort.
We ended up implementing the code internally in the same way as this PR.

@eolivelli
Copy link
Contributor

I hope you will be able to port back your change to Apache Repo,
we had a time with several forks and it was hard to merge them to a common form

@karanmehta93
Copy link
Member Author

karanmehta93 commented Feb 13, 2020

Are you fine with the approach I have in this PR? @eolivelli

If yes, I can bring the other patches in too.
If we want to modify the public API for WriteFlags, I am afraid that is a bigger work and I wont be able to take it up. Your approach is definitely cleaner, I agree on that.

@eolivelli
Copy link
Contributor

eolivelli commented Feb 14, 2020

@karanmehta93
I would like to see a better API for the client.
I am ok on the wire protocol side.

We should see opinions for others.
I could accept this patch and let you finish your port but then I would like to clean up the final API before a new release, I can help if you want.

I don't want to see new diverging forks, especially about wire protocol

@sijie @fpj @ivankelly @jvrao

@StevenLuMT
Copy link
Member

fix old workflow,please see #3455 for detail

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants