Skip to content

Commit

Permalink
Address Wojciech's comments
Browse files Browse the repository at this point in the history
  • Loading branch information
chelseychen committed May 26, 2020
1 parent 5deaa6d commit 2df240d
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 19 deletions.
40 changes: 23 additions & 17 deletions keps/sig-instrumentation/383-new-event-api-ga-graduation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -475,9 +475,8 @@ _This section must be completed when targeting alpha to a release._
- Will enabling / disabling the feature require downtime of the control
plane?

(1) Yes, enabling API requires to restart apiserver.

(2) No, enabling the use of the API doesn't require that.
Yes, enabling / disabling API requires to restart apiserver as well as
the components using that.

- Will enabling / disabling the feature require downtime or reprovisioning
of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).
Expand Down Expand Up @@ -534,6 +533,10 @@ _This section must be completed when targeting beta graduation to a release._

* **What specific metrics should inform a rollback?**

[apiserver_request_total](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go#L66):
If the value is much higher or lower than expected, it might be due to bugs
in the library.

* **Were upgrade and rollback tested? Was upgrade->downgrade->upgrade path tested?**
Describe manual testing that was done and the outcomes.
Longer term, we may want to require automated upgrade/rollback tests, but we
Expand Down Expand Up @@ -618,16 +621,16 @@ previous answers based on experience in the field._

In the new EventRecorder, every 30 minutes a "heartbeat" call will be performed
to update Event status and prevent garbage collection in etcd. This heartbeat
is happening for events that are happening all the time (If an event didn't
is happening for events that are happening periodically (If an event didn't
happen for 6 minutes, it will be GC-ed).

* **Will enabling / using this feature result in introducing new API types?**

Yes, a new API type "eventsv1.Event" is being introduced.
The migration of Event API will cause creation of new types of Event objects.
The number of Event objects depends on cluster state, which theoretically
won't be too large due to deduplication logic and reasonable-cardinality
of objects in the system.
No, the new Events will be roundtripped to use the same representation as the
old Events in Etcd. The number of Event objects depends on cluster state and
its churn. Event deduplication and reasonable cardinality of the fields should
keep their number within reasonable boundaries (obviously dependent on cluster
size).

* **Will enabling / using this feature result in any new calls to cloud
provider?**
Expand All @@ -639,11 +642,14 @@ previous answers based on experience in the field._
Describe them providing:

The difference in size of the Event object comes from new Action and Related
fields. We can safely estimate the increase to be smaller than 30%. We'll
also emit additional Event per Pod creation, as currently Events for that
are being deduplicated. There are currently at least 6 Events emitted when
Pod is started, so impact of this change can be bounded by 20%. This means
that in the worst case the increase in Event size can be bounded by 56%.
fields. We can safely estimate the increase to be smaller than 30%. However,
more events may be emitted. For example, new Events will be emitted for Pod
creation done by standard controllers (e.g. ReplicaSet), as they are currently
deduplicated across all 'owner' objects. However, given that that are at least
5 other events being emitted during pod startup, the impact for it can be
bounded by 20%. In total, we estimated that increase in total size of all
Events can be conservatively bounded by around 50%, but practical boundary
should be much smaller.

* **Will enabling / using this feature result in increasing time taken by any
operations covered by [existing SLIs/SLOs][]?**
Expand All @@ -653,8 +659,8 @@ previous answers based on experience in the field._
* **Will enabling / using this feature result in non-negligible increase of
resource usage (CPU, RAM, disk, IO, ...) in any components?**

The potential increase of Event size might cause non-negligible storage
increase in Etcd.
The potential increase of Event size might cause non-negligible increase of
storage in Etcd, network bandwidth to send them, and CPU to process them.

### Troubleshooting

Expand All @@ -666,7 +672,7 @@ _This section must be completed when targeting beta graduation to a release._

* **How does this feature react if the API server and/or etcd is unavailable?**

The Events will be dropped if API server or etcd is unavailable,
The Events will be dropped if API server or etcd is unavailable.

* **What are other known failure modes?**

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@ reviewers:
- "@yastij"
- "@wojtekt"
approvers:
- "@wojtekt"
- "@wojtekt" # PRR Approver
- "@brancz"
creation-date: 2019-01-31
last-updated: 2020-05-19
last-updated: 2020-05-26
status: implementable
see-also:
replaces:
Expand Down

0 comments on commit 2df240d

Please sign in to comment.