Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Informer caching and event propagation #4077

Closed
shawkins opened this issue Apr 20, 2022 · 7 comments
Closed

Informer caching and event propagation #4077

shawkins opened this issue Apr 20, 2022 · 7 comments

Comments

@shawkins
Copy link
Contributor

There is one more feature that we support, is to not propagate the event that was a result of our update. Like if I update an ingress for example in the reconciler and I have an informer registered for this ingress. There we will not propagate event for this update by the informer ( InformerEventSource). So no additional reconciliation will happen on our own update. This would make sense probably to put into informer, the problem is that it is quite complex code, and also from the users perspective.

We can go into details in an other issue maybe. If there is a consensus that this might be useful in general.

Originally posted by @csviri in #3943 (comment)

@shawkins
Copy link
Contributor Author

Splitting off a new issue now that the pr has been resolved.

Taking a step back, my thought with the ReducedStateItemStore, is that application assumes the responsibility for maintaining a resource cache - referred to in other places as a fronting cache. For example modifying the operator sdk TemporaryResourceCache and keeping with the assumption of optimistically locked updates we have:

public class ResourceCache<T extends HasMetadata> implements ResourceEventHandler<T> {

  private final Map<ResourceID, T> cache = // some fancy cache
  private final ItemStore<T> store;

  ...

  public synchronized void removeResourceFromCache(T resource) {
    cache.remove(ResourceID.fromResource(resource));
  }

  public synchronized void putAddedResource(T newResource) {
    putUpdatedResource(newResource, null);
  }

  public synchronized void putUpdatedResource(T newResource, String previousResourceVersion) {
    var resourceId = ResourceID.fromResource(newResource);

    T current = cache.get(resourceId);
    boolean inCache = true;
    if (current == null) {
      current = store.get(toInformerId(resourceId));
      inCache = false;
    }    
    if (current == null) {
      if (previousResourceVersion == null) {
        cache.put(resourceId, newResource);
      } // else already removed
    } else if (!inCache || current.getMetadata().getResourceVersion().equals(previousResourceVersion)) {
      cache.put(resourceId, newResource);
    }  
  }

  public synchronized Optional<T> getResourceFromCache(ResourceID resourceID) {
    return Optional.ofNullable(cache.get(resourceID));
  }

  @Override
  public void onAdd(T obj) {
    putAddedResource(obj);
  }

  @Override
  public void onUpdate(T oldObj, T newObj) {
    putUpdatedResource(newObj, oldObj.getMetadata().getResourceVersion());
  }

  @Override
  public void onDelete(T obj, boolean deletedFinalStateUnknown) {
    removeResourceFromCache(obj);
  }
}

That I believe should maintain a customizable Temporary/Temporal cache - without modifying the store. But it doesn't address the concern raised in the issue - that you don't want to not consume local events.

If you try to update the store - with some additional assumptions, such as not using the informer for indexing, resync being disabled or at least sync events being filtered, and adding new ItemStore/ReducedStateItemStore methods you'd have:

  public synchronized void removeResourceFromCache(T resource, boolean fromEvent) {
    boolean remove = true;
    if (!fromEvent) {
      remove = store.remove(resource); // would need to be a new method that checks the resourceVersion
    }
    if (remove) {
      cache.remove(ResourceID.fromResource(resource));
    }
  }

  public synchronized void putAddedResource(T newResource, boolean fromEvent) {
    putUpdatedResource(newResource, null, fromEvent);
  }

  public synchronized void putUpdatedResource(T newResource, String previousResourceVersion, boolean fromEvent) {
    var resourceId = ResourceID.fromResource(newResource);
    boolean put = true;
    if (!fromEvent) {
      put = store.put(newResource, previousResourceVersion));
    }
    if (put) {
      cache.put(resourceId, newResource);
    }
  }

  @Override
  public void onAdd(T obj) {
    putAddedResource(obj, true);
  }

  @Override
  public void onUpdate(T oldObj, T newObj) {
    putUpdatedResource(newObj, oldObj.getMetadata().getResourceVersion(), true);
  }

  @Override
  public void onDelete(T obj, boolean deletedFinalStateUnknown) {
    removeResourceFromCache(obj, true);
  }

If you directly call putUpdatedResource(resource, version, false) and that successfully modifies the store, then no event will be emitted when the informer consumes the update event from the api server as it will appear to be a sync event (no change in resource version).

However this introduces the problem that if the application moves ahead by more than 1 update, then the event processing will not be correct - that is:

putUpdatedResource(resource, version, false);
// get resource from cache and modify it, before the watch update from the first update is processed
putUpdatedResource(resource, nextVersion, false);
// the first watch event is now processed and will emit an invalid update event

So either you need to further prevent that case, or I think you'd have to move application level event handling into the ResourceCache construct - such that it knows to make updates based upon cache state. However that seems inelegant given that it duplicates event processing that is already in the informer.

@lburgazzoli @metacosm @csviri is this inline with everyone's thoughts? And what if anything would be good to add in 6.0 to aid in your usage of the informer?

@csviri
Copy link
Contributor

csviri commented Apr 22, 2022

@shawkins so regarding this, as I mentioned in the comment:

This would make sense probably to put into informer, the problem is that it is quite complex code, and also from the users perspective.

The current implementation is basically based on a quite complex code (unfortunately), but this was the only way I found that is correct (not claiming there is not simpler solution but all simpler variations I tried had some flaws).

And the current solution mainly makes sense for us because we have a higher level abstraction, called Dependent Resources . So all the necessary call made for the InformerEventSource API can be handled in the reconciliation logic of this abstraction (so in one place).

The idea of the current solution is that if we know there will be a create or update operation we notify InformerEventSource about it. It basically stops propagating the events, just put's the new events into a temporal cache (this is other cache that solves the issues to see always the last updated resource) called EventRecorder:

https://github.com/java-operator-sdk/java-operator-sdk/blob/d9ebacef4de2cb316f45fbb557e1d29a84d35f4d/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/dependent/kubernetes/KubernetesDependentResource.java#L96-L97

Note the prepareEventFiltering part.

After the update is done, we again call the InformerEventSource to notify it about that update is done and the let it know about the new version of the resource. The event recording stops at this moment, and it takes a look what events were received meanwhile. Based on that it's not hard to determine if an event should be propagated or not - in other words if we received some additional event that is the result of our update or not:

https://github.com/java-operator-sdk/java-operator-sdk/blob/d9ebacef4de2cb316f45fbb557e1d29a84d35f4d/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/InformerEventSource.java#L213-L249

(It event more complex because it has to play well with the TemporaryResourceCache from previous issue.)

So this is not ideal, and was thinking if adding this code (with such complexity) is "worth it", but at the end it solved this problem in the issue. We kinda considering this experimental, it is tested fairly in detail with unit and integration tests, and no issues with it until now. We can dive deeper if needed.

@stale
Copy link

stale bot commented Jul 22, 2022

This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!

@shawkins
Copy link
Contributor Author

Removed the never stale to see if we can move this towards a resolution. It's not clear to me yet what belongs down in the fabric8 client for this. One minor thing I do see in the go client is the notion of a filtering event handler. That looks to be a pretty straightforward class that has tests for the old and new object to see if the event should be passed along.

@csviri
Copy link
Contributor

csviri commented May 2, 2023

One minor thing I do see in the go client is the notion of a filtering event handler

there are filters for such events in JOSDK on top of Informer in fabric8. But, might be interesting to put it into fabric8 directly.

We can discuss the event propagation filtering on the weekly meeting. But might be just stick with this inside JOSDK.

@shawkins
Copy link
Contributor Author

shawkins commented May 2, 2023

But, might be interesting to put it into fabric8 directly.

I can also see that it would be pretty straight-forward to have event handler that omits locally generated events, but I'm not sure how common place that need is or if that would seen as needing to be coupled with cache mutations. For that there is also a mutation_cache construct in the go client, but it's based upon an assumption about resourceVersion format/ordering - that puts the onus on the user to understand that might not always work. As far as I can tell there's no way around that with cache mutations - you have to be able to tell what's latest to be consistent.

@stale
Copy link

stale bot commented Jul 31, 2023

This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants