Implement cache update deduplication per fetch cycle #5509

edwbuck · 2024-09-20T18:28:54Z

This PR also fixes #5349 by collecting all registration entries (or attested node) updates into a single set of items to pull (using the golang "map keys" pattern) and only after that set is built, pulling those for the items to update the cache.

This item blocks other issues, such as the backoff polling algorithm and the polling query which would do batch selects.

The sections of the algorithm:

beforeFirstEvent (any event that arrives with a lower id than the first event)
polledEvents (any event that was skipped in a previous call of newEvents)
newEvents (any event that is detected past the previous last event)

Description of fix for #5349

Each of the sections of the algorithm above no longer update the cache directly. Instead, as they detect a Registration Entry / Attested Node that requires a cache update, it stores the need to fetch the Cache Entry by storing the appropriate key into a fetch map where the keys include the id of the item to be fetched.

As a map cannot contain duplicate identical keys, the fetched items are deduplicated. This map is then used to fetch the items once.

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

edwbuck · 2024-09-20T18:30:23Z

Converting to a draft until I can fix the unit tests, which haven't been updated to match the new structure of the old algorithm.

Maintainers are free to comment on items they'd like changed early, if they have the time to review what exists.

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

pkg/server/endpoints/eventTracker.go

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Improved testing of authorized_entryfetcher_attested_nodes.go Minor fixes to authorized_entryfetcher_regisration_entries.go Fixed authorizedentries/cache_test.go Better documentation for eventTracker.go Fixed lack of sub-minute polling on eventTracker.go Fixed unit tests to match sub-minute polling abilities. Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Renamed items in authorized entryfetcher attested nodes to not conflict. Signed-off-by: Edwin Buck <edwbuck@gmail.com>

…ies. Renamed similar test in attested nodes unit test to not conflict. Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Rename equivalent attested nodes unit test to avoid collision. Rename registration entries func call to match attested nodes pattern. Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Added unit testing for first three boundaries. Signed-off-by: Edwin Buck <edwbuck@gmail.com>

edwbuck · 2024-10-07T21:21:43Z

Failing unit tests seem unrelated to the PR.

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

stevend-uber · 2024-10-08T23:22:39Z

pkg/server/endpoints/eventTracker.go

+ * are primarily set by the low order bits repeatedly by multipliation with
+ * a number designed to mix bits deterministicly for better hash dispersion.
+ */
+func hash(event uint) uint {


Confirming that since this is a uint, we expect this to be valid for how many events until uint overflow?

4_294_967_295 is max uint (4 billion). At 800_000 a day, that would be 5_368 days, or about 14.6 years.

I imagine that in order to guard against this, we would have to dig into the database setup code that GORM creates. That's because the database is the origin of the event's id, set through autoincrement.

Since Uber is skipping one for every entry, Uber would be burning through these at twice the 800_000 rate, so you have 7 years to address the issue.

Also, the odds are very good that your mysql implementation is using Integer, and not Uint, so you might have 1/2 of that. Note that mysql (last I checked) had a very unusual way of handling the issue, it simply reused the max_int value over and over again, so you would get a strong failure (can't use a primary key for two entries).

To reset this you could shutdown the servers, delete the event tables, and relaunch. There's nothing persisted about the next item to process, and so the system would start again at 0.

I guess this brings to mind for me tangential issue #5501

I wouldn't necessarily say it's blocking for me, but it sounds like workaround proposed is a bit of surprise significant manual action, which the operator also needs to have awareness of to even know they need to do let alone do safely.

It would be worthwile to consider (over the 7 years as you brought up :) ) how we can solve this more gracefully

uint is arch dependent. We only ship 64-bit artifacts so unless someone is building against 32-bit (unlikely), then the uint value itself will be 64-bits. That said, GORM (maybe now, and at least with the version we are using) uses 32-bit integer database types for IDs of type uint.

Ideally, SPIRE declare should declare its own model type using an explicit 64-bit type and then provide a migration to move the IDs to a 64-bit database type.

stevend-uber

If my understanding of the added code and changed functionality is correct, these changes LGTM -> Apropos the custom Boundary and Hash code, we'd expect this to benefit our environment by saving approximately 1.5gb of memory in the worst case scenarios which is non-trivial but also not-breaking. (read: We would benefit, but would be happy and fully understand if maintainers went with a different approach)

Thanks so much @edwbuck for making these changes!

edwbuck · 2024-10-09T00:54:57Z

@stevend-uber Thanks for the LGTM. There are likely fine optimizations that can happen, and one optimization that @azdagron mentioned which would improve the memory performance greatly.

In any case, the database performance should improve, even if the server performance takes a hit. The server performance can further be improved by the memory optimization that @azdagron has suggested, and if the current algorithm is completely replaced, by whatever replacement is offered.

Thank you for the passing review.

azdagron

Without providing benchmark results (or better yet, providing a repeatable benchmark) the performance and impact of the event tracker (on CPU and memory) is hard to determine. This is of particular importance considering the scale of events we expect to be tracked in some of the larger SPIRE user deployments.
While I wouldn't block this PR on that, it certainly would be prudent to be made available before we'd consider graduating this feature. We've missed this kind of benchmarking on related features in the past and have at times been surprised by the impact.

I similarly won't block for nitpick comments but would welcome fixes for improved readability.

It has been pointed out in the review and on contributor calls, but there is some concern from maintainers and other parties about the readability/maintainability of the event tracker. An alternative approach has been prototyped and is available, with benchmarks, here and here. As mentioned in the last contributor sync, the maintainers are discussing how we'd like to move forward and will have an update soon.

Thank you for your patience.

pkg/server/datastore/sqlstore/sqlstore.go

pkg/server/endpoints/authorized_entryfetcher_attested_nodes_test.go

pkg/server/endpoints/eventTracker.go

azdagron · 2024-10-09T04:06:58Z

pkg/server/endpoints/eventTracker.go

+	if boundaryList == nil {
+		boundaryList = []uint{}
+	}


nit: why does the returned slice have to be non-nil even when empty?

It doesn't, but it makes the unit testing framework happier.

pkg/server/endpoints/eventTracker.go

pkg/server/endpoints/eventTracker_test.go

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

azdagron · 2024-10-09T20:15:03Z

pkg/server/endpoints/eventTracker.go

+ * the poll list.
+ */
+func (et *eventTracker) SelectEvents() []uint {
+	pollList := et.pool.Get().([]uint)


This isn't sufficient. In order to leverage the pool, memory needs to be returned to it. If you look at the pq branch, you'll see an additional "FreeEvents" method added to the tracker which is invoked by the caller of SelectEvents when it has finished to "return" the events slice back to the event tracker for reuse.

That method is only called in benchmarks in the pq branch. FWIW.

edwbuck · 2024-10-09T20:51:51Z

Without providing benchmark results (or better yet, providing a repeatable benchmark) the performance and impact of the event tracker (on CPU and memory) is hard to determine. This is of particular importance considering the scale of events we expect to be tracked in some of the larger SPIRE user deployments. While I wouldn't block this PR on that, it certainly would be prudent to be made available before we'd consider graduating this feature. We've missed this kind of benchmarking on related features in the past and have at times been surprised by the impact.

Currently we are creating a slice of all missed events and walking that slice unconditionally, each polling cycle. My submission is basically doing the same, with a bit of additional math that was carefully crafted to use: increment, comparison by subtraction, pointer (base address + offset) memory semantics, and modulus division, all of which are relatively fast operations, many with support for single machine code commands.

All of those items add some time to the calculation, but basically keep it a O(n) semantics, without heavy need for branching or pipeline bubble creation. As a result, the algorithm is relatively fast. If the modulus divison operation was not O(1) then hash table lookup would not be (effectively) O(1). My "n" comes from the fact that we do this O(1) calculation N times.

I similarly won't block for nitpick comments but would welcome fixes for improved readability.

I don't mind improving readability. I'll even update the nits if I have the time to do so. Sometimes they really help readability.

It has been pointed out in the review and on contributor calls, but there is some concern from maintainers and other parties about the readability/maintainability of the event tracker. An alternative approach has been prototyped and is available, with benchmarks, here and [here](https://github.com/azdagron/spire/blob/pq/pkg/server/endpoints/benchmark_results.txt. As mentioned in the last contributor sync, the maintainers are discussing how we'd like to move forward and will have an update soon.

And what isn't pointed out in the contributor call is that the benchmark above didn't include dispersion of identically scheduled items, such that all items would initially poll at the exact same offsets. It's also not quite clear that even with this not being added yet, the benchmark is about 60% of the speed of the original submission.

Finally, the initial dispersion routine offered can't be unit tested easily, because it adds a random jitter of -3 seconds to +3 seconds. This means polling for the next minute will "miss" a few elements (randomly) and potentially double-poll elements (also randomly).

While the presented hash above doesn't have perfect dispersion, it's good enough to be within 10% of the input value across all the currently tested partitions, and more importantly, it is deterministic. That means we can write tests to validate that Event ID 323423 polls at polling intervals 1,2,3,4,5,8,11,18, 23, and so on without ever having to run a population of events to determine if their average is correctly polling within a minute.

In fact the replacement algorithm would not even pass the same unit tests, because those tests validate that an item is only polled a certain number of times, and the random jitter would potentially increase or decrease that number of times as the jitter reduced or increased the polling delay.

Thank you for your patience.

You're welcome, but we have a solution that's been in review for two weeks now, and the maintainers are resorting to attempting to rewrite it to meet some goal, discarding some of the desired characteristics as they progress in redoing the work offered.

Debugging a priority queue is not trivial. Debugging a priority queue with random jitter is just a bit harder. Even if we fix the drift to not have the next poll accumulate errors over time, just writing the unit tests to prove some of the items that we have now will not be trivial. We cannot prove, for example that something is only polled X number of times. We cannot trivially remove the polling of an item without allowing it to poll first.

Hiding the complexity in another package isn't reduction of complexity. Both solutions are non-trivial, but one is deterministic, and the other has random elements. We should prefer to debug deterministic items, instead of relying on luck and chance with random ones. And while one uses actual times instead of poll counts, we only poll on certain time offsets (determined by the polling interval) so actual times don't matter as much as one might think.

azdagron · 2024-10-09T22:03:42Z

And what isn't pointed out in the contributor call is that the benchmark above didn't include dispersion of identically scheduled items, such that all items would initially poll at the exact same offsets.

You are correct that your solution immediately distributes bursty event arrival across the poll windows. That said these events are already arriving somewhat distributed. Further, even if there are bursts of changes, SPIRE server can only ingest registration entries/agents at a certain rate. Considering the largest "window" according the the current backoff strategy is 1 minute, it seems reasonable to assume there will be natural enough distribution even without some jitter to spread that load.

It's also not quite clear that even with this not being added yet, the benchmark is about 60% of the speed of the original submission.

On my machine it was 7ms (this PR) vs 12ms (priority queue). An extra 5ms of cpu spent across a 5000ms interval (5s) does not seem like a high price to pay for readability and simplicity.

Finally, the initial dispersion routine offered can't be unit tested easily

Jitter can be disabled during tests (and is, if you look at the pq branch) to assert the correct backoff strategy. Jitter distribution events can be measured separately.

, because it adds a random jitter of -3 seconds to +3 seconds. This means polling for the next minute will "miss" a few elements (randomly) and potentially double-poll elements (also randomly).

The consequence of a miss or double poll of an individual event seems irrelevant? The time-to-noticing-a-skipped-event-has-arrived will still be within 57-63s vs 60s. Database load should also not be materially different. Over the aggregate you're still averaging once every 60s (assuming a good PRNG).

You're welcome, but we have a solution that's been in review for two weeks now,

I would respectfully ask you to reframe your expectations on how long it takes to get PRs reviewed and merged. The project maintainers are busy with all varieties of work. We have a slice of time we can dedicate to SPIRE. Within that slice of time we have to fit in servicing myriad PRs, issues, releases, security responses, community support, community engagement, etc. Our time is largely volunteer and in many cases comes at a cost to our own personal time. A PR of this size takes time to review, often requiring a large chunk of time to keep context fresh. Finding that time is challenging and rarely immediate. Your expectation and continued pressure that these PRs merge according to your timetable is unhelpful and unwelcome.

and the maintainers are resorting to attempting to rewrite it to meet some goal, discarding some of the desired characteristics as they progress in redoing the work offered.

Ultimately maintainers are responsible for the long-term maintenance of the project. Every piece of "work offered" must be evaluated on correctness, impact, and maintainability. Vetting a PR against alternative designs is a critical part of that.

Both solutions are non-trivial

I'll leave that as an exercise to the reader. I personally find the PQ implementation to be much easier to reason about. I understand if you feel differently. I look forward to additional scrutiny and evaluation from other interested maintainers and contributors.

rturner3 · 2024-10-10T20:11:55Z

After reviewing this PR, we as a maintainer group decided that we are going to put a pause on this PR for now given the complexity of the proposed approach. It's our opinion that the complexity of this approach may be difficult for operators to understand and debug should any issues come up related to entry distribution. Additionally, we feel it will be challenging for us to maintain as there is a lot of new state now being tracked in the server that, in our opinion, takes quite a bit of time to reason about. We feel maintainability and ease of understanding are important considerations for any changes we make in the project given its current adoption and maturity level.

We are currently reflecting on whether to pursue an alternative solution targeting the specific issue laid out in #5341 (such as the PQ implementation proposed by @azdagron or some other solution), or whether we should rethink the overall approach taken to solve this event-based entry sync with the database. After seeing this feature take shape over time, it's not clear whether the current solution is addressing the overall problem of optimizing synchronization of entry data between the server and its database in the simplest way. We plan to continue to discuss the future of this feature and help propose a path forward for where to take it.

@edwbuck In the future, if you are planning to propose a non-trivial change to the project, I would highly encourage you to document the design in an issue that several maintainers can review before we get to the implementation stage. I can see that you put a lot of thought and effort into this PR, and I truly appreciate the time you devoted to try to help improve the project. However, I think in this instance, I believe it could have saved both of us some time and trouble if we had aligned earlier on a more detailed design in an issue upfront. Documenting the design in an issue also gives other members of the community an opportunity to provide early feedback in a more public forum.

That being said, I think we as project maintainers could have also done better to provide clearer guidelines to contributors to facilitate the change proposal process. We plan to improve on this and hopefully prevent future situations like this. We are tracking these process improvements in #5567.

Lastly, I want to acknowledge what is probably a frustrating experience for you with this PR. I hope you can appreciate why we have decided not to accept this change at this time and are open to revisiting how we collaborate to make future change proposals more successful.

This limits the solution to full polling as before, but still preserves the work that would close spiffe#5349. Signed-off-by: Edwin Buck <edwbuck@gmail.com>

azdagron · 2024-10-16T21:25:06Z

pkg/server/datastore/sqlstore/sqlstore.go

@@ -351,6 +351,8 @@ func (ds *Plugin) UpdateAttestedNode(ctx context.Context, n *common.AttestedNode
 		if err != nil {
 			return err
 		}
+		// TODO: this is at the wrong level of the software stack.
+		// It should be created in the caller of the datastore interface.


I think we missed removal of this TODO. Can you please remove?

Fixed in next push.

azdagron · 2024-10-16T21:25:09Z

pkg/server/datastore/sqlstore/sqlstore.go

@@ -367,6 +369,8 @@ func (ds *Plugin) DeleteAttestedNode(ctx context.Context, spiffeID string) (atte
 		if err != nil {
 			return err
 		}
+		// TODO: this is at the wrong level of the software stack.


I think we missed removal of this TODO. Can you please remove?

Fixes in next push.

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

azdagron · 2024-10-17T16:37:38Z

Ah, yes, there is a nuance with sync/pool use that is failing the linter. Pools should generally only be used with pointer values (e.g. pointer to slice in this situation). It won't really make a difference in this situation since we're only putting in one item, so we can either fix it or suppress the linter.

https://staticcheck.dev/docs/checks#SA6002

Signed-off-by: Andrew Harding <azdagron@gmail.com>

edwbuck added 3 commits September 20, 2024 08:54

Intial work on backoff algorithm.

208d1cf

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Push after refactoring registration entry event tracking.

c6d23aa

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Final refactoring done. Need to fix unit tests.

b4d2de9

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

edwbuck requested review from evan2645, amartinezfayo, azdagron, MarcosDY and rturner3 as code owners September 20, 2024 18:28

edwbuck marked this pull request as draft September 20, 2024 18:29

Reworked some of the unit tests to the refactored algorithm.

bf6ad3c

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

amartinezfayo assigned azdagron Sep 24, 2024

heymarcel reviewed Sep 24, 2024

View reviewed changes

pkg/server/endpoints/eventTracker.go Outdated Show resolved Hide resolved

heymarcel reviewed Sep 24, 2024

View reviewed changes

pkg/server/endpoints/eventTracker.go Outdated Show resolved Hide resolved

heymarcel reviewed Sep 24, 2024

View reviewed changes

pkg/server/endpoints/eventTracker.go Outdated Show resolved Hide resolved

edwbuck added 11 commits September 25, 2024 14:19

Saving work off so I can switch machines.

f81942f

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Fix compliation.

4118679

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Add in boundary builder unit tests.

aad7444

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Add unit tests for updating the node cache from the database.

806ebf7

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Added loadCache testing for authorized entries registered entry.

8ab4d68

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Add in updates to fakedatastore.

b5a068a

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Add unit test for authorized entryfetcher registration entries.

a43e9ef

Renamed items in authorized entryfetcher attested nodes to not conflict. Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Add poll entries testing to authorized entryfetcher registration entr…

2eb66d1

…ies. Renamed similar test in attested nodes unit test to not conflict. Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Add in new entry unit tests for auth entry fetcher registraiton entries.

7c645c4

Rename equivalent attested nodes unit test to avoid collision. Rename registration entries func call to match attested nodes pattern. Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Update cache unit test for authorized entryfetcher registration entries.

9278f36

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

edwbuck changed the title ~~Backoff~~ Implement cache update deduplication per fetch cycle and backoff algorithm. Oct 1, 2024

edwbuck added 3 commits October 1, 2024 00:48

Fix datastore wrapper_test.

673a480

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Fix unit tests.

9d0986b

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Fix some linting issues.

978fbab

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Fix eventTracker for multiple boundaries.

55f8254

Added unit testing for first three boundaries. Signed-off-by: Edwin Buck <edwbuck@gmail.com>

edwbuck added 2 commits October 7, 2024 17:10

Removed inner loop assignment as capture problem fixed in language.

9a9a942

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Merge branch 'main' into backoff

9205427

stevend-uber reviewed Oct 8, 2024

View reviewed changes

stevend-uber approved these changes Oct 8, 2024

View reviewed changes

Merge branch 'main' into backoff

24e772e

azdagron reviewed Oct 9, 2024

View reviewed changes

edwbuck added 3 commits October 9, 2024 12:20

Fixed small typos and style choices for code review.

ee1ba09

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Fixed export of metrics selector name sets in unit tests.

8fc09e3

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Add in sync.Pool semantics to avoid memory trashing / overallocation.

ac075b8

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

azdagron reviewed Oct 9, 2024

View reviewed changes

azdagron modified the milestones: 1.11.0, 1.11.1 Oct 15, 2024

Remove backoff polling.

59f2b7a

This limits the solution to full polling as before, but still preserves the work that would close spiffe#5349. Signed-off-by: Edwin Buck <edwbuck@gmail.com>

azdagron changed the title ~~Implement cache update deduplication per fetch cycle and backoff algorithm.~~ Implement cache update deduplication per fetch cycle Oct 16, 2024

azdagron reviewed Oct 16, 2024

View reviewed changes

azdagron modified the milestones: 1.11.1, 1.11.0 Oct 16, 2024

edwbuck added 2 commits October 16, 2024 14:44

Add in buffer resetting for eventTracker, remove TODO items.

de28e0d

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

Fixes for freeing the shared buffer between eventTracker and its users.

28bd44d

Signed-off-by: Edwin Buck <edwbuck@gmail.com>

edwbuck and others added 2 commits October 17, 2024 12:00

Merge branch 'main' into backoff

889cc29

fix pool usage to satisfy linter

f0cbc4b

Signed-off-by: Andrew Harding <azdagron@gmail.com>

azdagron approved these changes Oct 17, 2024

View reviewed changes

azdagron merged commit 5186212 into spiffe:main Oct 17, 2024
34 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement cache update deduplication per fetch cycle #5509

Implement cache update deduplication per fetch cycle #5509

edwbuck commented Sep 20, 2024 •

edited

Loading

edwbuck commented Sep 20, 2024 •

edited

Loading

edwbuck commented Oct 7, 2024

stevend-uber Oct 8, 2024

edwbuck Oct 9, 2024

amoore877 Oct 9, 2024

azdagron Oct 9, 2024 •

edited

Loading

stevend-uber left a comment

edwbuck commented Oct 9, 2024

azdagron left a comment •

edited

Loading

azdagron Oct 9, 2024

edwbuck Oct 9, 2024

azdagron Oct 9, 2024

azdagron Oct 9, 2024

edwbuck commented Oct 9, 2024

azdagron commented Oct 9, 2024

rturner3 commented Oct 10, 2024

azdagron Oct 16, 2024

edwbuck Oct 16, 2024

azdagron Oct 16, 2024

edwbuck Oct 16, 2024

azdagron commented Oct 17, 2024

Implement cache update deduplication per fetch cycle #5509

Implement cache update deduplication per fetch cycle #5509

Conversation

edwbuck commented Sep 20, 2024 • edited Loading

edwbuck commented Sep 20, 2024 • edited Loading

edwbuck commented Oct 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

azdagron Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

stevend-uber left a comment

Choose a reason for hiding this comment

edwbuck commented Oct 9, 2024

azdagron left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edwbuck commented Oct 9, 2024

azdagron commented Oct 9, 2024

rturner3 commented Oct 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

azdagron commented Oct 17, 2024

edwbuck commented Sep 20, 2024 •

edited

Loading

edwbuck commented Sep 20, 2024 •

edited

Loading

azdagron Oct 9, 2024 •

edited

Loading

azdagron left a comment •

edited

Loading