Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #3587 #3472 adding more control over indexing, key function, storage #3943

Merged
merged 1 commit into from
Apr 13, 2022

Conversation

shawkins
Copy link
Contributor

@shawkins shawkins commented Mar 7, 2022

Description

replaces #3479 and #3616

cc @attilapiros @lburgazzoli

Instead of adding additional Informable dsl methods, this has been implemented as logic on the SharedIndexInformer. It removes the separate SharedInformer interface so that the methods can act more like a builder without having to create additional overrides. While not as simple as top level dsl methods, it's still pretty straight-forward:

SharedIndexInformer<Pod> informer = client.pods()
        .runnableInformer(0)
        .itemStore(
            new ReducedStateItemStore<>(ReducedStateItemStore.NAME_KEY_STATE, Pod.class, "metadata.ownerReferences"))
        .removeNamespaceIndex()
        .addEventHandler(handler)
        .run(); // or start()

This allows for the full ItemStore to be supplied by the user as well.

Both itemStore and initialState can only be called before the informer is running, so you have use runnableInformer or the SharedInformerFactory to get a reference to the SharedIndexInformer before it is started. An alternative to this is to try and differentiate informers by whether they are running - SharedIndexInformer (presumed running) RunnableSharedIndexInformer (not yet running) and also exposes these additional methods.

I would also like to deprecate the Informable.withIndexers - as I've changed the logic to support add/remove of indexes even when the Informer is running directly from SharedIndexInformer.

From here the operator sdk or other users would introduce a fronting cache with whatever locking and retention semantics are required. We can talk about in later versions making that more of built-in feature (it's unfortunate that store and cache terms are used interchangeable already for informers - Store, Cache/CacheImpl, and now ItemStore - it would be nice if they were distinct concepts). The division of responsibility roughly envisioned is:

"Fronting Cache" - maintains a configurable cache of key to full object - this is only needed when using something other than the BasicItemStore (or when the user want to modify the state directly), so it could be coupled eventually with the ItemStore. On a cache miss, it would check for existence in the store (likely a reduced state store) and perform a get if needed. It could be writable by the user level logic, but would default to being overwritten by events from the informer. It could take a merging function if it were desirable to override that behavior. If this were directly used by the Informer, it could use items from the cache in events - but only when the cache entries are immutable and match the expected resourceVersion.

CacheImpl - maintains the (optional) secondary in-memory indexes, fronts the ItemStore.

ItemStore - Stores all the items, indexed by primary key.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change
  • Chore (non-breaking change which doesn't affect codebase;
    test, version modification, documentation, etc.)

Checklist

  • Code contributed by me aligns with current project license: Apache 2.0
  • I Added CHANGELOG entry regarding this change
  • I have implemented unit tests to cover my changes
  • I have added/updated the javadocs and other documentation accordingly
  • No new bugs, code smells, etc. in SonarCloud report
  • I tested my code in Kubernetes
  • I tested my code in OpenShift

@shawkins
Copy link
Contributor Author

shawkins commented Mar 8, 2022

@attilapiros @metacosm For caching my thoughts are along the lines of:

    // some fancy caffeine cache or similar. For the purposes of this example
    // it just needs to provide a few map operations
    Map<String, HasMetadata> cache = ... ;

    // doesn't really matter the resource, just an informer keyed by uid
    var resources = client.pods();
    var informer = resources
        .withStoreValueFields("metadata.ownerReferences")
        .inform()
        .removeNamespaceIndex();

    // add just an event handler dealing with the cache
    // in general there would be others dealing with the creation of the events to process
    informer.addEventHandler(new ResourceEventHandler<HasMetadata>() {
      @Override
      public void onAdd(HasMetadata obj) {
        onUpdate(null, obj);
      }

      @Override
      public void onDelete(HasMetadata obj, boolean deletedFinalStateUnknown) {
        cache.remove(Cache.metaUidKeyFunc(obj));
      }

      @Override
      public void onUpdate(HasMetadata oldObj, HasMetadata newObj) {
        cache.compute(Cache.metaUidKeyFunc(newObj), (k, v) -> {
          if (v == null || Long.compare(Long.parseLong(v.getMetadata().getResourceVersion()),
              Long.parseLong(newObj.getMetadata().getResourceVersion())) < 0) {
            return newObj;
          }
          return v;
        });
      }
    });

It's safe to always remove an object from the application level cache because we shouldn't be reusing uids. However the add/update is where things get trickier. With the assumption that updates to the cache are only made with responses from the server:

cache.put(key, client...edit(...));

I think it's then safe as shown above to compare the resourceVersions. Any other usage model of the cache that involves putting in working state (especially values without resourceVersions), would not be safe.

The informer store / indexes can still be used in this scenario for quick existence checks when a value is not in cache - that's not fool-proof as the pathological case is that the resource was dropped from cache before it was ever added to the informer store.

Any full object not found in the cache would require a lookup again from the api server:

cache.computeIfAbsent(key, k -> resources.withField("metadata.uid", k).list().getItems().stream().findFirst().orElse(null))

If we only consider this cache application, and not additional handlers for level event / triggers, the features of an informer beyond a watch that are being used here are:

  • the ability to keep reconnecting if something goes wrong. even with bookmarks it's still possible for the resource version to become too old if the app looses connectivity with the api server.
  • on a relist it already has the mechanics it will provide onDelete events for things that are now missing.

With a cache keyed by uid you may not care about the latter - things that have been deleted will just roll out of the cache eventually by some eviction policy. So really only a Watch with more reconnection logic may be required to just do the above. However my understanding is that you want to provide something for both caching and eventing - and those events could either be edge triggered (requiring the entire prior state) or level triggered (requiring only a small amount of prior state).

@shawkins
Copy link
Contributor Author

shawkins commented Mar 9, 2022

A couple of follow on thoughts.

Even - cache.put(key, client...edit(...)); Is not safe from a concurrency perspective. I think all cache modifications would have to check the resourceVersion.

cache.computeIfAbsent(key, k -> resources.withField("metadata.uid", k).list().getItems().stream().findFirst().orElse(null))

I haven't checked about if the uid field is indexed (there's an old reference in a 2016 post that it is not). If that's the case, then either the caller would also have to supply those fields, or would use the informer to also save the name/namespace so that when an item is not found in cache, you can lookup the name/namespace in the informer store to query the api server.

@shawkins
Copy link
Contributor Author

What is shown in #3943 (comment) is essentially https://github.com/kubernetes/client-go/blob/f6ce18ae578c8cca64d14ab9687824d9e1305a67/tools/cache/mutation_cache.go - we could easily create a built in class for that.

Another update needed on this pr is to expose a method on the Store to get the key for an object, so that if the key function is changeable callers will have a better place to reference it.

@attilapiros @metacosm can we get a full understanding of what your usecase is?

  • do you need this as an internal only construct, or will informer events / store entries be exposed to implementors?
  • is this for the primary resource, dependent resources, or both?
  • related to that if users are doing direct lookups they generally won't be doing so by uid. Dependent resources in particular they would known by namespace/name

That will help determine if we need to further refine or break this apart.

@lburgazzoli
Copy link
Contributor

Informable.withStoreValueFields - an array of simple paths of what needs to be saved. For example in our project all this would need to specify are the owner references. The resourceVersion, key, and whatever is specified here are the only things that are restored onto an object gotten from the store.

This is very much needed +1

@shawkins do you think it would be possible to have the ability to attach arbitrary metadata to a key ? As example, in my application I need top track the time of the last update (without having to add it to the object itself). Right now I'm keeping a separate map which is fed by a dedicated ResourceEventHandler but I think it would be nice to have such capability at the informer storage level.

@shawkins
Copy link
Contributor Author

shawkins commented Mar 28, 2022

do you think it would be possible to have the ability to attach arbitrary metadata to a key ?

@lburgazzoli With the current implementation attaching to the key would be hard. It's just flattened to a string.

As example, in my application I need top track the time of the last update (without having to add it to the object itself)

If I understand correctly you are maintaining a separate map of key -> update time, but you want to put that into the informer store. Is this because you see the informer as system or record for the update time? That is are you only determining update time from when the informer events are processed? How much benefit is there to having this in the informer store vs maintaining a separate map?

@lburgazzoli
Copy link
Contributor

do you think it would be possible to have the ability to attach arbitrary metadata to a key ?

@lburgazzoli With the current implementation attaching to the key would be hard. It's just flattened to a string.

Sorry, I mean to maintain some metadata along with the value associated to a key

As example, in my application I need top track the time of the last update (without having to add it to the object itself)

If I understand correctly you are maintaining a separate map of key -> update time, but you want to put that into the informer store. Is this because you see the informer as system or record for the update time? That is are you only determining update time from when the informer events are processed? How much benefit is there to having this in the informer store vs maintaining a separate map?

Yeah I'm not really interesting about the effective type of when a resource was update, just when the object reached the informer.

About maintaining an external cache, that's perfectly fine and that's what I'm doing, but I was thinking that since the informer has to maintain a cache whatsoever, it could also include some metadata. But if this is outside the scope of the informer, I'm happy with my cache

@shawkins
Copy link
Contributor Author

But if this is outside the scope of the informer, I'm happy with my cache

Ok, let's go with that for now.

Of the other things that were proposed in this pr, which have not been committed, it seems like you have interest in using uuid as a key and in having a reduced state store. Is that correct? My hesitancy in committing those was that there wasn't confirmation that those proposals would work for the operator sdk scenarios. Can you confirm how these will be used?

@lburgazzoli
Copy link
Contributor

Of the other things that were proposed in this pr, which have not been committed, it seems like you have interest in using uuid as a key and in having a reduced state store. Is that correct? My hesitancy in committing those was that there wasn't confirmation that those proposals would work for the operator sdk scenarios. Can you confirm how these will be used?

This is not in the context of the operator sdk but for another application where I need to periodically update the status of some resources to an external system. In such case, to avoid flooding the external system, I use a very basic mechanic based on the last update time and once in a while a full re-sync is performed.

Having an informer when I can maintain some basic data (not the whole resource) and some metadata would make simplify a lot my code.

@shawkins
Copy link
Contributor Author

@lburgazzoli and to double check you want to have the primary key be uid instead of namespace/name?

@shawkins
Copy link
Contributor Author

shawkins commented Apr 1, 2022

Now that the major conflicting work has been committed, I'll revive this pr next week based upon @lburgazzoli stated needs.

@shawkins
Copy link
Contributor Author

shawkins commented Apr 3, 2022

Updated the description to match the updated pr.

An alternative approach to hide Basic/ReducedState stores from api would be to have higher level methods on the SharedIndexInformer - SharedIndexInformer.itemStore(keyFunction) rather than SharedIndexInformer.itemStore(new BasicItemStore(keyFunction)) , etc.

@manusa manusa added this to the 6.0.0 milestone Apr 12, 2022
@manusa manusa linked an issue Apr 12, 2022 that may be closed by this pull request
@manusa manusa linked an issue Apr 12, 2022 that may be closed by this pull request
@csviri
Copy link
Contributor

csviri commented Apr 12, 2022

do you need this as an internal only construct, or will informer events / store entries be exposed to implementors?
is this for the primary resource, dependent resources, or both?
related to that if users are doing direct lookups they generally won't be doing so by uid. Dependent resources in particular they would known by namespace/name

@shawkins sorry "attilapiros", is an other Attila I guess, you ment me.

do you need this as an internal only construct, or will informer events / store entries be exposed to implementors?

Ideally the informers will be wrapped to our InformerEventSource, and the underling implementation would be configurable, but but at the end it will be hiddent.

is this for the primary resource, dependent resources, or both?

is it for both.

related to that if users are doing direct lookups they generally won't be doing so by uid. Dependent resources in particular they would known by namespace/name

Yes, that is true, now it's done by name/namespace pair.

Sorry for late response, will dig deeper. And try to explain one additional use case soon, what might be useful in general .

@csviri
Copy link
Contributor

csviri commented Apr 12, 2022

  1. regarding ReducedStateItemStore implementation. Just out of curiosity, wouldn't it be simpler just have it implemented in a way (or also have an additionl implementation) that just provides a pruning function (I remember we discussed something like that before, but not the details). So the function would just set null on the values that it not interested for the user.

It's safe to always remove an object from the application level cache because we shouldn't be reusing uids. However the add/update is where things get trickier. With the assumption that updates to the cache are only made with responses from the server:
cache.put(key, client...edit(...));

Not sure if I understand this. Let's assume we have this cache where we just put items changes from the informer (using the event handler only). Even in that case it would not be safe to say that we store the latest version without comparing the resources? (AFAIK only the latest event should be receive in a single Thread, so that should be ok)

What we do now in case we want to make sure that our InformerEventSource (what is a wrapper around fabric8 informer with some additional functionality) checks something called temporal cache.
See also how it's called

The problem what this solves is that the InformerEventSource should return the most up to date value in case we know that we made an add/update on the resource. But the event is not received by the informer yet.

So we check first if the resource is in this cache, if not then check the informers cache. (The resource is added after and add or update operation explicitly) Similarly as you say we compare the resource version (just an equality check is enough) in this case. The resource is removed from temp cache if any evet with same ResourceID (name+namespace) is received. Well this is little tricky but should work in all cases. (The actual situation is little more complex, well much more, since there as additional related feature (can describe that more in details if needed))

So basically the the point is that the resource version is not checked on received event, but by the other party that updates the cache.

Not sure if something like this would make sense to put into the client directly. It solves our particular use case.

Copy link
Contributor

@csviri csviri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some comments what are basically clarifications. LGTM . We can always add out implementation of the item store mentioned there.

@shawkins
Copy link
Contributor Author

Just out of curiosity, wouldn't it be simpler just have it implemented in a way (or also have an additionl implementation) that just provides a pruning function (I remember we discussed something like that before, but not the details). So the function would just set null on the values that it not interested for the user.

It would be simpler from the perspective of the store - that is you only have to prune on the way in. But it's not as cross-cutting of a solution. Per type you would need to use a builder to explicitly set things like the spec and status to null. It also doesn't necessarily play nice with extracting the additional fields from the key.

Even in that case it would not be safe to say that we store the latest version without comparing the resources?

I should clarify what I mean by safe - if you always overwrite with what comes from the informer, it will be eventually consistent, but it will not be guaranteed to represent the latest state known to the application.

We see situations where modifications happen very close in time together, typically from status updates. Consider the following:

There's an incoming resource modification event for version v1. The application reacts to this and performs some other update of the resource, the returned resource is actually v3 and that is put into your application level cache.

Almost immediately the application then sees the resource modification event for v2. As a side-effect of the event processing of v2, it will flip your application level cache from reporting the latest as v3 to v2. Depending on the other the details of the processing of the event, it will temporarily stay at v2, or become v3 again.

It will then definitely flip back to v3 once that event is processed.

See also how it's called

Yes that helps prevent stale writes into the cache, but won't guarantee that you will always see the latest.

Not sure if something like this would make sense to put into the client directly. It solves our particular use case.

I think we're still avoiding claiming any support for updates to store once the informer is running.

@csviri
Copy link
Contributor

csviri commented Apr 12, 2022

Yes that helps prevent stale writes into the cache, but won't guarantee that you will always see the latest.

Actually I think it does, I probably did not describe it clearly what happens there:

  1. We always do optimistic locking when update a resource. This guarantees that if the update is done (no conflict) we actually worked with the latest version v1 of the resource (read from the informer cache). We receive an updated version back v2. We know that v1 and v2 happened after each other (because of the optimistic locking) and no updates happened between.

  2. In temporal cache we cache the resource only when we see the resource version v1 in informer cache on which we executed the update
    There can be two cases what we see in the informer cache (what version of the resource):

    • No v1 - thus informer already has the newer resource (the result of our update) or even an additional update happened after that. In neither case want to put the resource into the temporal cache since we know the informer has already the new or even newer version of the resource. We know those are newer (not older) than v1 since for the update we used the resource from the cache.
    • v1 - if the informer still has the version we updated, we safely can put the updated resource into the temporal cache. Not that this happens in a synchronized block - where the logic that in a case of a new event would remove this item on any event coming from the informer (so no race condition there). We know these events happened after each other (v1,v2 and eventually event a newer v3) since we updated with optimistic locking. So the next event will be probably the event for our update and possible others coming after, but we don't care we just remove the resource from temporal cache since we know now we have the newer event in informer cache.

So we always see the latest (first checking the temporal cache and if not present then informer cache) in terms what we can see from out update / received by the process.

There's an incoming resource modification event for version v1. The application reacts to this and performs some other update of the resource, the returned resource is actually v3 and that is put into your application level cache.
Almost immediately the application then sees the resource modification event for v2. As a side-effect of the event processing of v2, it will flip your application level cache from reporting the latest as v3 to v2. Depending on the other the details of the processing of the event, it will temporarily stay at v2, or become v3 again.

This cannot happen because we know there is not event between v1 and v2 (versions mentioned from my example). because the optimistic locking

@csviri
Copy link
Contributor

csviri commented Apr 12, 2022

There is one more feature that we support, is to not propagate the event that was a result of our update. Like if I update an ingress for example in the reconciler and I have an informer registered for this ingress. There we will not propagate event for this update by the informer ( InformerEventSource). So no additional reconciliation will happen on our own update. This would make sense probably to put into informer, the problem is that it is quite complex code, and also from the users perspective.

We can go into details in an other issue maybe. If there is a consensus that this might be useful in general.

@shawkins
Copy link
Contributor Author

shawkins commented Apr 12, 2022

This cannot happen because we know there is not event between v1 and v2 (versions mentioned from my example). because the optimistic locking

Yes, if you are doing optimistic locking that will indeed prevent the effect that I'm talking about because you are guaranteeing that you are just incrementing off a known version. Sorry, I had mentioned this on other issues/prs, but did not bring it up here.

Since those updates are out of the control of the informer, it's hard to make assumptions about if they are enforced correctly - so I didn't think it was worth designing the store around.

If you wanted to base a custom store off of the reduced state store, one modification that would be useful is retrieving the item resourceVersion without reconstituting the full item.

We can go into details in an other issue maybe. If there is a consensus that this might be useful in general.

With the current pr store modifications are below the other cache functions (indexing and event notification). So for example if you update the store to the next state, and then the modification event comes in, it will produce a sync event - which will only be seen if you have resync turned on and are in a sync period. However that's not guaranteed - from the time you perform an update until the time you update the store, it's possible that the event has already been processed. Preventing that would require more locking.

@sonarcloud
Copy link

sonarcloud bot commented Apr 13, 2022

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 2 Code Smells

56.0% 56.0% Coverage
0.0% 0.0% Duplication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a construct in between a Watch and an Informer Make it possible to customize Informer Cache key function
5 participants