Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(reports): reports sidecar, configurable caching #104

Merged
merged 45 commits into from
Jan 9, 2024

Conversation

andrewazores
Copy link
Member

@andrewazores andrewazores commented Oct 13, 2023

Replaces #65


Based on #62
Depends on #62
Depends on cryostatio/cryostat-core#301

  1. Reimplements sidecar report generation using a cryostat-reports instance, configurable in smoketest.bash with the new -r flag, and
  2. implements two tiers of caching:
  • in-memory caching using Caffeine caches for both active and archived reports. Active report cache entries expire 10 seconds after write, archived report cache entries expire 10 minutes after the last access (read or write).
  • S3 caching for archived reports. These persist until the source archived recording is deleted through the Cryostat API, and also have an S3 expiration of 1 day after creation, but this requires the S3 provider to actually implement this object expiry pruning proactively which not all may do. The Cryostat server sends this expiry data metadata along with the cached report object to S3, so it's up to the provider to actually execute upon at that point.

Reports sourced from active recordings are never stored in S3-backed cache, because active recordings are assumed to be too dynamic and rapidly-changing to make sense to cache this way. They are cached in-memory only and for a short duration by default simply to reduce the cost of clients rapidly requesting report generations for the same recording.

Reports sourced from archived recordings are sourced from the in-memory cache if available. If not, they are sourced from the S3-backed cache if available. If not, then they are freshly generated, then stored in S3, then stored in memory, and returned to the client. Since archived reports are (by definition) static data, the in-memory cache duration for them is longer by default to improve performance while also (hopefully) not collecting too many and exerting undue memory pressure on the Cryostat server. If an archived recording report is requested after it has been evicted from in-memory cache then it will be retrieved from S3 rather than regenerated, since this should also be a much cheaper operation. This S3-backed scheme also means that archived reports have a lifecycle decoupled from the Cryostat server, so the Cryostat server can be restarted and the cached reports for archived recordings can still be retrieved and used to repopulate in-memory caches as needed. Here is a log snippet illustrating the tiered caching action for an archived report and a sidecar report generator:

s3_1                  | 2024-01-08T19:32:50.947  INFO --- [   asgi_gw_0] localstack.request.aws     : AWS s3.HeadObject => 404 (NoSuchKey)
s3_1                  | 2024-01-08T19:32:50.962  INFO --- [   asgi_gw_0] localstack.request.aws     : AWS s3.GetObject => 200
reports_1             | Jan 08, 2024 7:32:51 PM io.cryostat.reports.ReportResource handleUpload
reports_1             | INFO: Received request for null (2463387 bytes)
s3_1                  | 2024-01-08T19:32:52.252  INFO --- [   asgi_gw_0] localstack.request.http    : GET / => 200
auth_1                | [::1]:51844 - 9e48deea-4ed8-4cae-881f-bd2b8621a100 - - [2024/01/08 19:32:52] localhost:8080 GET - "/ping" HTTP/1.1 "Wget" 200 2 0.000
cryostat_1            | Jan 08, 2024 7:32:52 PM io.quarkus.vertx.http.runtime.filters.accesslog.JBossLoggingAccessLogReceiver logMessage
cryostat_1            | INFO: 10.89.1.96 - - [08/Jan/2024:19:32:52 +0000] "GET /health/liveness HTTP/1.1" 204 -
reports_1             | Jan 08, 2024 7:32:52 PM io.cryostat.reports.ReportResource cleanupHelper
reports_1             | INFO: Deleted /tmp/uploads/resteasy-reactive11145411609386885570upload
reports_1             | Jan 08, 2024 7:32:52 PM io.cryostat.reports.ReportResource cleanupHelper
reports_1             | INFO: Completed request for null after 1598ms
reports_1             | Jan 08, 2024 7:32:52 PM io.quarkus.vertx.http.runtime.filters.accesslog.JBossLoggingAccessLogReceiver logMessage
reports_1             | INFO: 10.89.1.96 - - [08/Jan/2024:19:32:52 +0000] "POST /report HTTP/1.1" 200 18254
s3_1                  | 2024-01-08T19:32:52.981  INFO --- [   asgi_gw_0] localstack.request.aws     : AWS s3.PutObject => 200
cryostat_1            | Jan 08, 2024 7:32:52 PM io.quarkus.vertx.http.runtime.filters.accesslog.JBossLoggingAccessLogReceiver logMessage
cryostat_1            | INFO: 10.89.1.99 - - [08/Jan/2024:19:32:52 +0000] "GET /api/v3/reports/YVNiM0JlSml6LVphN25xdzE1MnBmR1N1RGtMaFduWHdETXU4REoyanNqOD0vY29tcG9zZS1jcnlvc3RhdC0xX29uc3RhcnRfMjAyNDAxMDhUMTkzMjQyWi5qZnI HTTP/1.1" 200 18254

Here is an illustration of Cryostat retrieving a cached report from S3 after it has timed out and been evicted from the in-memory cache:

s3_1                  | 2024-01-08T19:53:30.346  INFO --- [   asgi_gw_0] localstack.request.aws     : AWS s3.HeadObject => 200
s3_1                  | 2024-01-08T19:53:30.350  INFO --- [   asgi_gw_0] localstack.request.aws     : AWS s3.GetObject => 200
cryostat_1            | Jan 08, 2024 7:53:30 PM io.quarkus.vertx.http.runtime.filters.accesslog.JBossLoggingAccessLogReceiver logMessage
cryostat_1            | INFO: 10.89.1.141 - - [08/Jan/2024:19:53:30 +0000] "GET /api/v3/reports/TTVTMzJobUxtVUd4Q2h5OGxvMGM2Y3dWQmhFNTVTb182bzhLMWgxVURUQT0vY29tcG9zZS1jcnlvc3RhdC0xX29uc3RhcnRfMjAyNDAxMDhUMTk1MzA2Wi5qZnI HTTP/1.1" 200 18152
auth_1                | 10.89.1.141:33564 - 725dc67d-1642-4161-8058-577d1b331f2f - user [2024/01/08 19:53:30] localhost:8080 GET cryostat "/api/v3/reports/TTVTMzJobUxtVUd4Q2h5OGxvMGM2Y3dWQmhFNTVTb182bzhLMWgxVURUQT0vY29tcG9zZS1jcnlvc3RhdC0xX29uc3RhcnRfMjAyNDAxMDhUMTk1MzA2Wi5qZnI" HTTP/1.1 "Mozilla/5.0 (X11; Linux x86_64; rv:121.0) Gecko/20100101 Firefox/121.0" 200 4690 0.016

The above parameters are configurable using the following new properties:

cryostat.services.reports.url= # URL for the sidecar report generator, single instance or load balancer
quarkus.cache.enabled=true # whether in-memory Caffeine caches are globally enabled
cryostat.services.reports.memory-cache.enabled=true # whether the Caffeine caches for reports are specifically enabled
quarkus.cache.caffeine."activereports".expire-after-write=10s # how long after write to expire report caches from active recordings
quarkus.cache.caffeine."archivedreports".expire-after-access=10m # how long after access to expire report caches from archived recordings
cryostat.services.reports.storage-cache.enabled=true # whether S3-backed cache for reports are enabled
cryostat.services.reports.storage-cache.name=archivedreports # name of the S3 bucket used for archived report cache storage
cryostat.services.reports.storage-cache.expiry-duration=24h # how long after access to expire report caches from archived recordings

TODO: after rebasing, the in-memory active recording cache doesn't seem to expire on a timer, it just always returns cached entries.

TODO: the MemoryCachingReportsListener doesn't currently work as expected. When it observes Target losses, the Targets always appear to have no associated active recordings, so the listener cannot determine which entries to invalidate. This is a relatively minor issue since the cache entries will expire after a relatively short duration anyway, and each cache entry is simply the report JSON document, which is pretty small. I think this is actually a deeper issue to do with how the relation between the Target, ActiveRecording, and DiscoveryNode entities works, not specific to this PR - it's just made apparent by this cache invalidation listener.

As in previous Cryostat releases, if there is no sidecar report generator configured, then the Cryostat server itself will handle report generation requests.

@github-actions
Copy link

Hi @andrewazores! Add at least one of the required labels to this PR

Required labels are : chore,ci,cleanup,docs,feat,fix,perf,refactor,style,test

@andrewazores andrewazores added the feat New feature or request label Oct 13, 2023
@andrewazores andrewazores force-pushed the reports-sidecar branch 2 times, most recently from 47563b9 to 2503918 Compare November 20, 2023 16:52
@andrewazores andrewazores force-pushed the reports-sidecar branch 2 times, most recently from f111b6a to 089ed5d Compare January 8, 2024 14:42
@github-actions github-actions bot removed the dependent label Jan 8, 2024
Copy link

github-actions bot commented Jan 8, 2024

@andrewazores
Copy link
Member Author

/build_test

Copy link

github-actions bot commented Jan 8, 2024

Workflow started at 1/8/2024, 9:45:42 AM. View Actions Run.

Copy link

github-actions bot commented Jan 8, 2024

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat3/actions/runs/7449027166

1 similar comment
Copy link

github-actions bot commented Jan 8, 2024

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat3/actions/runs/7449027166

@andrewazores andrewazores marked this pull request as ready for review January 8, 2024 21:33
@andrewazores andrewazores requested a review from a team as a code owner January 8, 2024 21:33
@andrewazores
Copy link
Member Author

/build_test

Copy link

github-actions bot commented Jan 9, 2024

Workflow started at 1/9/2024, 10:28:45 AM. View Actions Run.

Copy link

github-actions bot commented Jan 9, 2024

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat3/actions/runs/7463182255

Copy link

github-actions bot commented Jan 9, 2024

CI build and push: At least one test failed ❌
https://github.com/cryostatio/cryostat3/actions/runs/7463182255

@andrewazores
Copy link
Member Author

Guess I broke something. I'll figure it out.

@andrewazores
Copy link
Member Author

/build_test

Copy link

github-actions bot commented Jan 9, 2024

Workflow started at 1/9/2024, 1:49:21 PM. View Actions Run.

Copy link

github-actions bot commented Jan 9, 2024

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat3/actions/runs/7465546076

1 similar comment
Copy link

github-actions bot commented Jan 9, 2024

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat3/actions/runs/7465546076

aali309
aali309 previously approved these changes Jan 9, 2024
Copy link
Contributor

@aali309 aali309 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@andrewazores andrewazores dismissed aali309’s stale review January 9, 2024 20:24

The merge-base changed after approval.

@andrewazores
Copy link
Member Author

/build_test

Copy link

github-actions bot commented Jan 9, 2024

Workflow started at 1/9/2024, 3:27:18 PM. View Actions Run.

Copy link

github-actions bot commented Jan 9, 2024

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat3/actions/runs/7466524367

1 similar comment
Copy link

github-actions bot commented Jan 9, 2024

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat3/actions/runs/7466524367

@andrewazores andrewazores merged commit b1f30c3 into main Jan 9, 2024
8 checks passed
@andrewazores andrewazores deleted the reports-sidecar branch January 9, 2024 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat New feature or request
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants