Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protect LeaseTimeToLive with RBAC #15656

Merged
merged 2 commits into from
May 2, 2023
Merged

Conversation

mitake
Copy link
Contributor

@mitake mitake commented Apr 6, 2023

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

Currently, LeaseTimeToLive() API or etcdctl lease timetolive command don't require RBAC permission. However, their responses can have key names attached to the target lease. Value of the keys cannot be accessed through the API, but protecting the key name is better. So this PR let the API require read permission of keys attached to a lease (if a lease isn't attached to specific keys, behavior won't be changed).

cc @ahrtr @serathius @spzala @ptabor

@mitake mitake changed the title protect LeaseTimeToLive with RBAC WIP: protect LeaseTimeToLive with RBAC Apr 6, 2023
@mitake mitake force-pushed the lease-timetolive-auth branch 3 times, most recently from 67ba663 to 009ba5d Compare April 6, 2023 23:36
@mitake mitake changed the title WIP: protect LeaseTimeToLive with RBAC protect LeaseTimeToLive with RBAC Apr 6, 2023
tests/e2e/ctl_v3_auth_test.go Outdated Show resolved Hide resolved
@mitake mitake force-pushed the lease-timetolive-auth branch 4 times, most recently from ce757b1 to f18d257 Compare April 7, 2023 11:28
@ahrtr
Copy link
Member

ahrtr commented Apr 23, 2023

@mitake Overall looks good to me, and sorry for the late response. Could you rebase this PR although there is no conflict?

@mitake
Copy link
Contributor Author

mitake commented Apr 24, 2023

@ahrtr thanks, rebased on the latest main, could you check?

@serathius
Copy link
Member

Can we measure the performance impact when someone has enabled auth?

I think that all other etcd methods authorization check can be done in constant time. Here we have linear to number of keys under lease. Would be good to find impact for higher number of leases. For example K8s (doesn't use auth, but just an example) has up to 1000 keys per lease.

Comment on lines +333 to +340
l := s.lessor.Lookup(leaseID)
if l != nil {
for _, key := range l.Keys() {
if err := s.AuthStore().IsRangePermitted(authInfo, []byte(key), []byte{}); err != nil {
return err, 0
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before iterating over all the keys, I think we also need to check LeaseTimeToLiveRequest.Keys; if it's false, it means users don't want to query al the keys at all, then no need to check the permission either on the server side.

We also need to add test to cover such case, such as the client has no range permission on a key, but it intentionally set LeaseTimeToLiveRequest.Keys to false, then it should not get an error response.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment, I'll update tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, could you check @ahrtr ?

@mitake
Copy link
Contributor Author

mitake commented Apr 25, 2023

@serathius I measured overhead with this simple benchmark which emulates 1000 keys/lease on my local laptop (just for comparing the difference w/ and w/o the checking overhead). Result was like below (unit is microsecond):

auth enabled:

  • 1st run: min: 50.000000, median: 116.000000, 99 percentile: 539.000000
  • 2nd run: min: 52.000000, median: 117.000000, 99 percentile: 617.000000
  • 3rd run: min: 48.000000, median: 115.000000, 99 percentile: 686.000000

auth disabled:

  • 1st run: min: 49.000000, median: 111.500000, 99 percentile: 700.000000
  • 2nd run: min: 47.000000, median: 112.000000, 99 percentile: 642.000000
  • 3rd run: min: 48.000000, median: 111.000000, 99 percentile: 666.000000

It's a noisy environment so 99 percentile wouldn't be so informative. The workload is quite artificial (e.g. cache hit rate should be quite high) but I think overhead is low enough. What do you think?

@serathius
Copy link
Member

Looks good, for the next time I recommend looking into benchmarking https://about.sourcegraph.com/blog/go/gophercon-2019-optimizing-go-code-without-a-blindfold and using https://pkg.go.dev/golang.org/x/perf/cmd/benchstat to eliminate noisy environment and confirming that results are statistically significant.

@serathius
Copy link
Member

auth enabled:

  • 1st run: min: 50.000000, median: 116.000000, 99 percentile: 539.000000
  • 2nd run: min: 52.000000, median: 117.000000, 99 percentile: 617.000000
  • 3rd run: min: 48.000000, median: 115.000000, 99 percentile: 686.000000

auth disabled:

  • 1st run: min: 49.000000, median: 111.500000, 99 percentile: 700.000000
  • 2nd run: min: 47.000000, median: 112.000000, 99 percentile: 642.000000
  • 3rd run: min: 48.000000, median: 111.000000, 99 percentile: 666.000000

One observation, shouldn't we expect auth enabled take more as it one that does checks of keys?

@mitake
Copy link
Contributor Author

mitake commented Apr 26, 2023

@serathius Sorry I noticed that the benchmark isn't attaching lease correctly :( I did it again and could observe overhead correctly:

auth enabled:

  • 1st run: min: 583.000000, median: 705.000000, 99 percentile: 3411.000000
  • 2nd run: min: 592.000000, median: 703.000000, 99 percentile: 3535.000000
  • 3rd run: min: 594.000000, median: 707.000000, 99 percentile: 3659.000000

auth disabled:

  • 1st run: min: 53.000000, median: 118.000000, 99 percentile: 708.000000
  • 2nd run: min: 53.000000, median: 119.000000, 99 percentile: 693.000000
  • 3rd run: min: 52.000000, median: 118.000000, 99 percentile: 631.000000

(checking 1000 permission in a few microseconds is clearly impossible even for high cache hit condition...)

What do you think?

Looks good, for the next time I recommend looking into benchmarking https://about.sourcegraph.com/blog/go/gophercon-2019-optimizing-go-code-without-a-blindfold and using https://pkg.go.dev/golang.org/x/perf/cmd/benchstat to eliminate noisy environment and confirming that results are statistically significant.

Thanks for sharing, I'll check these materials.

mitake and others added 2 commits April 26, 2023 20:35
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Co-authored-by: Benjamin Wang <wachao@vmware.com>
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Co-authored-by: Benjamin Wang <wachao@vmware.com>
Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks @mitake

Copy link
Member

@spzala spzala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mitake

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants