-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc: v3api rfc #2675
doc: v3api rfc #2675
Conversation
raft_term = 0x1, | ||
kvs = { | ||
{ | ||
key = foo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
odd spacing
0e5783c
to
44182a0
Compare
optional uint64 raft_term = 5; | ||
} | ||
|
||
message RangeRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the plan for the store to implement MVCC based on index or the version of the key? If so, would a range request be able to filter on a maximum index? I didn't know whether the underlying store would include MVCC, or whether that would be the level above the store (imposed as the last key segment by the etcd simple api).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Range query is done atomically. The underlay store is MVCC, so it is low cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So then adding an optional int64 max_index to this struct would not break the underlying model and also allow you to do range queries where you get the latest resource up to index X?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use case I'm asking about is being able to do consistent paged range queries,
- get initial range min_key=start limit=100 max_index=0,
- read index from initial range Y and max key Z
- get next range min_key=Z max_index=Y limit=100
- repeat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton
Let us be more clear here.
foo = bar [index 1]
foo = bar00 [index 2]
range query all -> foo = bar00; range query (all, max_index=1) -> nothing
foo1 = bar [index 3]
range query all -> foo = bar00; foo1 = bar 1; range query (all, max_index=2) -> foo = bar
Is this what you want?
We are not going to keep all versions in the key value space.
Instead, You can get the old versions only by scanning the history.
You can range scan the history via watch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton All the old versions of keys can be retrieved from the history store, which is sorted by logic time order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Apr 14, 2015, at 12:03 PM, Xiang Li notifications@github.com wrote:
In Documentation/rfc/v3api.md:
- // the cluster id to check with
- // if the cluster id is not matched, an error is returned.
- optional uint64 cluster_id = 1;
+}
+message ResponseHeader {
- // an error type message?
- optional string error = 1;
- optional uint64 cluster_id = 2;
- optional uint64 member_id = 3;
- // index of the store when the requested was processed.
- optional int64 index = 4;
- optional uint64 raft_term = 5;
+}
+message RangeRequest {
@smarterclaytonThe current API does not directly support consistent range over multiple requests. You can build your own via range + history.
Do you have a real use case? What are you trying to solve?
For Kubernetes, there are advantages to being able to do ranged snapshot queries to process data in chunks (instead of loading 100k keys in memory at once, make 100 1K range requests). We can do that via buckets in the keyspace, but the larger appeal of range was the consistent chunk aspect.
That being said, it does not break our use cases if that's not available. I believe in order to emulate that we would need delete tombstones in the key space and would have to track the highest modified index in the scan. Do you anticipate/expect there to be delete tombstones in the keyspace?
—
Reply to this email directly or view it on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you anticipate/expect there to be delete tombstones in the keyspace?
The tombstones is in the event history. Do a range query with max_index. Fill in the gaps via another range query over the history from the max_index to now.
Does that make sense to you?
We can do the consistent query too. But it would make the API a little bit complicated. For each range query, we will return you a range ID and a timeout. Within the timeout, you can get consistent snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Apr 14, 2015, at 12:21 PM, Xiang Li notifications@github.com wrote:
In Documentation/rfc/v3api.md:
- // the cluster id to check with
- // if the cluster id is not matched, an error is returned.
- optional uint64 cluster_id = 1;
+}
+message ResponseHeader {
- // an error type message?
- optional string error = 1;
- optional uint64 cluster_id = 2;
- optional uint64 member_id = 3;
- // index of the store when the requested was processed.
- optional int64 index = 4;
- optional uint64 raft_term = 5;
+}
+message RangeRequest {
Do you anticipate/expect there to be delete tombstones in the keyspace?The tombstones is in the event history. Do a range query with max_index. Fill in the gaps via another range query over the history from the max_index to now.
Does that make sense to you
Yes, I think so.
We can do the consistent query too. But it would make the API a little bit complicated. For each range query, we will return you a range ID and a timeout. Within the timeout, you can get consistent snapshot.I don't want to overcomplicate the store or API, certainly. I think consistent paged range queries are valuable on their own to a wide range of users, as long as it does not break the core consistency guarantees of the store. But I may also be bringing my own assumptions to the discussion. If easy to emulate above, or can lead to complex problems on failover, it doesn't need to be in the API.
—
Reply to this email directly or view it on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton Sure. We will keep that in mind. We will have another iteration on the API soon.
@barakmich Can you take a look? Want you know your opinion on this. |
- easy for people to write simple etcd application | ||
|
||
|
||
## Protobuf Defined API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split into a .proto
file for further comment (and ultimately copy!)
59573f7
to
41017dc
Compare
// LeaseTnx likes Tnx. It does one additional thing that all the keys put into the | ||
// store are attached with the given lease. This is rpc is useful when you want to | ||
// put a key and attach it to a lease atomically. | ||
rpc LeaseTnx(LeaseTnxRequest) returns (LeaseTnxResponse) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be separate or be part of the generic Tnx?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wil explore more. i am not a big fan of this api either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm... after think this for a while, i think we should still let the lease know about tnx (a key space operation) instead of letting the key operation knowing about the lease.
rpc LeaseKeepAlive(stream LeaseKeepAliveRequest) returns (stream LeaseKeepAliveResponse) {} | ||
} | ||
|
||
message RequestHeader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused?
optional ResponseHeader header = 1; | ||
} | ||
|
||
message DeleteRangeRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this part of RequestUnion?
@barakmich I fixed up the proto. Now it compiles. I am going to squash and merge this. |
Looks like I just missed the merge window for this, but I do think #2857 (non-recursive Create/PUT behavior) would be a useful feature. |
@schmichael In v3, there is no dir concept. So there is no recursive Create/PUT actually... |
Can I store binary objects as values? If yes, is there a C binding which supports this API? |
This is a real Request For Comments thing.
All feedbacks are welcomed! Any question are also welcomed!
For #2647