Informer caching and event propagation #4077

shawkins · 2022-04-20T15:09:20Z

There is one more feature that we support, is to not propagate the event that was a result of our update. Like if I update an ingress for example in the reconciler and I have an informer registered for this ingress. There we will not propagate event for this update by the informer ( InformerEventSource). So no additional reconciliation will happen on our own update. This would make sense probably to put into informer, the problem is that it is quite complex code, and also from the users perspective.

We can go into details in an other issue maybe. If there is a consensus that this might be useful in general.

Originally posted by @csviri in #3943 (comment)

shawkins · 2022-04-20T19:48:12Z

Splitting off a new issue now that the pr has been resolved.

Taking a step back, my thought with the ReducedStateItemStore, is that application assumes the responsibility for maintaining a resource cache - referred to in other places as a fronting cache. For example modifying the operator sdk TemporaryResourceCache and keeping with the assumption of optimistically locked updates we have:

public class ResourceCache<T extends HasMetadata> implements ResourceEventHandler<T> {

  private final Map<ResourceID, T> cache = // some fancy cache
  private final ItemStore<T> store;

  ...

  public synchronized void removeResourceFromCache(T resource) {
    cache.remove(ResourceID.fromResource(resource));
  }

  public synchronized void putAddedResource(T newResource) {
    putUpdatedResource(newResource, null);
  }

  public synchronized void putUpdatedResource(T newResource, String previousResourceVersion) {
    var resourceId = ResourceID.fromResource(newResource);

    T current = cache.get(resourceId);
    boolean inCache = true;
    if (current == null) {
      current = store.get(toInformerId(resourceId));
      inCache = false;
    }    
    if (current == null) {
      if (previousResourceVersion == null) {
        cache.put(resourceId, newResource);
      } // else already removed
    } else if (!inCache || current.getMetadata().getResourceVersion().equals(previousResourceVersion)) {
      cache.put(resourceId, newResource);
    }  
  }

  public synchronized Optional<T> getResourceFromCache(ResourceID resourceID) {
    return Optional.ofNullable(cache.get(resourceID));
  }

  @Override
  public void onAdd(T obj) {
    putAddedResource(obj);
  }

  @Override
  public void onUpdate(T oldObj, T newObj) {
    putUpdatedResource(newObj, oldObj.getMetadata().getResourceVersion());
  }

  @Override
  public void onDelete(T obj, boolean deletedFinalStateUnknown) {
    removeResourceFromCache(obj);
  }
}

That I believe should maintain a customizable Temporary/Temporal cache - without modifying the store. But it doesn't address the concern raised in the issue - that you don't want to not consume local events.

If you try to update the store - with some additional assumptions, such as not using the informer for indexing, resync being disabled or at least sync events being filtered, and adding new ItemStore/ReducedStateItemStore methods you'd have:

  public synchronized void removeResourceFromCache(T resource, boolean fromEvent) {
    boolean remove = true;
    if (!fromEvent) {
      remove = store.remove(resource); // would need to be a new method that checks the resourceVersion
    }
    if (remove) {
      cache.remove(ResourceID.fromResource(resource));
    }
  }

  public synchronized void putAddedResource(T newResource, boolean fromEvent) {
    putUpdatedResource(newResource, null, fromEvent);
  }

  public synchronized void putUpdatedResource(T newResource, String previousResourceVersion, boolean fromEvent) {
    var resourceId = ResourceID.fromResource(newResource);
    boolean put = true;
    if (!fromEvent) {
      put = store.put(newResource, previousResourceVersion));
    }
    if (put) {
      cache.put(resourceId, newResource);
    }
  }

  @Override
  public void onAdd(T obj) {
    putAddedResource(obj, true);
  }

  @Override
  public void onUpdate(T oldObj, T newObj) {
    putUpdatedResource(newObj, oldObj.getMetadata().getResourceVersion(), true);
  }

  @Override
  public void onDelete(T obj, boolean deletedFinalStateUnknown) {
    removeResourceFromCache(obj, true);
  }

If you directly call putUpdatedResource(resource, version, false) and that successfully modifies the store, then no event will be emitted when the informer consumes the update event from the api server as it will appear to be a sync event (no change in resource version).

However this introduces the problem that if the application moves ahead by more than 1 update, then the event processing will not be correct - that is:

putUpdatedResource(resource, version, false);
// get resource from cache and modify it, before the watch update from the first update is processed
putUpdatedResource(resource, nextVersion, false);
// the first watch event is now processed and will emit an invalid update event

So either you need to further prevent that case, or I think you'd have to move application level event handling into the ResourceCache construct - such that it knows to make updates based upon cache state. However that seems inelegant given that it duplicates event processing that is already in the informer.

@lburgazzoli @metacosm @csviri is this inline with everyone's thoughts? And what if anything would be good to add in 6.0 to aid in your usage of the informer?

csviri · 2022-04-22T14:32:56Z

@shawkins so regarding this, as I mentioned in the comment:

This would make sense probably to put into informer, the problem is that it is quite complex code, and also from the users perspective.

The current implementation is basically based on a quite complex code (unfortunately), but this was the only way I found that is correct (not claiming there is not simpler solution but all simpler variations I tried had some flaws).

And the current solution mainly makes sense for us because we have a higher level abstraction, called Dependent Resources . So all the necessary call made for the InformerEventSource API can be handled in the reconciliation logic of this abstraction (so in one place).

The idea of the current solution is that if we know there will be a create or update operation we notify InformerEventSource about it. It basically stops propagating the events, just put's the new events into a temporal cache (this is other cache that solves the issues to see always the last updated resource) called EventRecorder:

https://github.com/java-operator-sdk/java-operator-sdk/blob/d9ebacef4de2cb316f45fbb557e1d29a84d35f4d/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/dependent/kubernetes/KubernetesDependentResource.java#L96-L97

Note the prepareEventFiltering part.

After the update is done, we again call the InformerEventSource to notify it about that update is done and the let it know about the new version of the resource. The event recording stops at this moment, and it takes a look what events were received meanwhile. Based on that it's not hard to determine if an event should be propagated or not - in other words if we received some additional event that is the result of our update or not:

https://github.com/java-operator-sdk/java-operator-sdk/blob/d9ebacef4de2cb316f45fbb557e1d29a84d35f4d/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/InformerEventSource.java#L213-L249

(It event more complex because it has to play well with the TemporaryResourceCache from previous issue.)

So this is not ideal, and was thinking if adding this code (with such complexity) is "worth it", but at the end it solved this problem in the issue. We kinda considering this experimental, it is tested fairly in detail with unit and integration tests, and no issues with it until now. We can dive deeper if needed.

stale · 2022-07-22T02:48:32Z

This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!

shawkins · 2023-04-29T13:10:54Z

Removed the never stale to see if we can move this towards a resolution. It's not clear to me yet what belongs down in the fabric8 client for this. One minor thing I do see in the go client is the notion of a filtering event handler. That looks to be a pretty straightforward class that has tests for the old and new object to see if the event should be passed along.

csviri · 2023-05-02T08:24:25Z

One minor thing I do see in the go client is the notion of a filtering event handler

there are filters for such events in JOSDK on top of Informer in fabric8. But, might be interesting to put it into fabric8 directly.

We can discuss the event propagation filtering on the weekly meeting. But might be just stick with this inside JOSDK.

shawkins · 2023-05-02T15:59:47Z

But, might be interesting to put it into fabric8 directly.

I can also see that it would be pretty straight-forward to have event handler that omits locally generated events, but I'm not sure how common place that need is or if that would seen as needing to be coupled with cache mutations. For that there is also a mutation_cache construct in the go client, but it's based upon an assumption about resourceVersion format/ordering - that puts the onus on the user to understand that might not always work. As far as I can tell there's no way around that with cache mutations - you have to be able to tell what's latest to be consistent.

stale · 2023-07-31T21:37:47Z

This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!

stale bot added the status/stale label Jul 22, 2022

shawkins added status/never-stale and removed status/stale labels Jul 22, 2022

shawkins removed the status/never-stale label Apr 29, 2023

stale bot added the status/stale label Jul 31, 2023

shawkins mentioned this issue Aug 3, 2023

converts the ingress logic to a conditional dependent resource keycloak/keycloak#22221

Merged

stale bot closed this as completed Aug 8, 2023

shawkins mentioned this issue Aug 8, 2023

draft of changes to simplify the recording mechanism operator-framework/java-operator-sdk#2005

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Informer caching and event propagation #4077

Informer caching and event propagation #4077

shawkins commented Apr 20, 2022

shawkins commented Apr 20, 2022

csviri commented Apr 22, 2022 •

edited

Loading

stale bot commented Jul 22, 2022

shawkins commented Apr 29, 2023

csviri commented May 2, 2023

shawkins commented May 2, 2023

stale bot commented Jul 31, 2023

Informer caching and event propagation #4077

Informer caching and event propagation #4077

Comments

shawkins commented Apr 20, 2022

shawkins commented Apr 20, 2022

csviri commented Apr 22, 2022 • edited Loading

stale bot commented Jul 22, 2022

shawkins commented Apr 29, 2023

csviri commented May 2, 2023

shawkins commented May 2, 2023

stale bot commented Jul 31, 2023

csviri commented Apr 22, 2022 •

edited

Loading