Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(archives): add target-specific archive API endpoints #1047

Merged
merged 58 commits into from
Sep 27, 2022
Merged

fix(archives): add target-specific archive API endpoints #1047

merged 58 commits into from
Sep 27, 2022

Conversation

hareetd
Copy link
Contributor

@hareetd hareetd commented Aug 18, 2022

Fixes #1046
Depends on cryostatio/cryostat-web#499

Summary:

  1. I've added five new beta target-specific archived recording handlers: RecordingDeleteHandler, RecordingUploadPostHandler, RecordingGetHandler, RecordingGetWithJwtHandler, ReportGetHandler and ReportGetWithJwtHandler. They're essentially equivalent to their v1/v2 counterparts, except with an added sourceTarget request parameter that expects the JMX connection URL of the source target for the archived recording in question.

  2. The WebServer now points to the beta RecordingGetHandler and ReportGetHandler endpoints when the WebServer#getArchivedDownloadUrl and WebServer#getArchivedReportUrl methods are called, respectively. This means archived recording-related notifications will now contain target-specific download URLs. Notifications have also been updated so that the target field is equal to "uploads" rather than the empty string for uploaded recordings. This is because for the new target-specific handlers the sourceTarget parameter cannot be set to the empty string in place of a connectUrl for uploaded recordings. Since the "uploads" string is the logical alternative to use for this parameter, in order to maintain consistency, notifications also explicitly identify a recording as having originated as an upload directly into Cryostat's archives.

  3. The implementation should still preserve the functionality of the older, non-target-specific v1/v2 handlers. I didn't have time to manually test but unit and integration tests are all passing. In the case where such an endpoint is called, the archives are searched using only the recording name, which is still error-prone for the reasons discussed in [Task] Deleting archived recordings can delete wrong file for identically-named but distinct files cryostat-web#445. Perhaps these handlers should now be marked as deprecated?

  4. For the cached report path, if no sourceTarget is provided the report is placed inside a new default subdirectory which is itself inside the temporary directory provided by the WebServer. Uploaded recordings are placed inside an uploads subdirectory and target-specific recordings inside a subdirectory named according to the target's Base32 encoded connectUrl. See my comment below on issues with creating these subdirectories.

  5. The web-client in fix(archives): use target-specific archive API endpoints cryostat-web#499 has been updated to use the new beta handlers in place of the old ones (I might have missed something though, so please double-check). Unfortunately, the View Report functionality is broken as it keeps timing out with the report not being delivered. However, I'm experiencing this behavior on the main branch as well, so could be a general issue or an error on my part. The report dropdown generation works as expected.

  6. See also cryostatio/cryostat#1045 when working on the backend implementation discussed above. The fix applied to the existing handler should also be present in the new handlers.

original comment by @andrewazores: cryostatio/cryostat-web#445 (comment)

I ran out of time to implement this request. New tests and HTTP API documentation also need to be added still.

  1. Overall, I didn't have as much time as I would have liked to check things over, so please take a closer look at the code as it touches on a lot of the archived recordings-related functionality. Thanks!

TODOs [Max, Thuan]:

  • Fixed generation of archive sub-directories should be checked for existence and updated test mocks.
  • Deprecating old v1/v2 handlers and updating tests for rejection when requests are sent to deprecated endpoints.
  • Fixed View Report concurrency issue that leads to report generation timeout error.
  • Updated archive-related tests.
  • Overall clean-ups
  • Uploaded recordings' metadata should be handled.

@github-actions
Copy link
Contributor

This PR/issue depends on:

@hareetd
Copy link
Contributor Author

hareetd commented Aug 18, 2022

There are still some minor kinks to iron out, I'm hoping to have the implementation done and working by tomorrow, likely won't have time to add new tests (I'm updating existing tests as I go).

@hareetd hareetd requested a review from andrewazores August 20, 2022 01:21
@andrewazores
Copy link
Member

@maxcao13 @tthvo this is in a mostly completed state now but as @hareetd says, it's still in need of some finishing touches before merge. Would either of you like to take over the development side as I review it? I don't think this is far off from being mergeable.

@andrewazores
Copy link
Member

Thanks for taking this on and getting so much of it done in the final stretch of your internship, @hareetd . This looks quite good as it is, and I'm really looking forward to getting it merged because it's also an important milestone for #520/cryostat-agent.

@hareetd
Copy link
Contributor Author

hareetd commented Aug 20, 2022

@andrewazores No problem, it was a fun way to end off the internship :-). Also, good to hear that it's an important part of those features, it'll definitely be nice to have a more clear link between targets and their archived data.

Well, this is my final sign-off. Best of luck with Cryostat and hopefully our paths cross in the future!

@tthvo
Copy link
Member

tthvo commented Aug 22, 2022

I think we can start on this! There is also front-end work so I will talk with @maxcao13 to divide the work to get them done.

@tthvo
Copy link
Member

tthvo commented Aug 31, 2022

The implementation should still preserve the functionality of the older, non-target-specific v1/v2 handlers. I didn't have time to manually test but unit and integration tests are all passing. In the case where such an endpoint is called, the archives are searched using only the recording name, which is still error-prone for the reasons discussed in cryostatio/cryostat-web#445. Perhaps these handlers should now be marked as deprecated?

To avoid such issue, we start to disable these handlers by setting isAvailable() to return false and mark the classes as @Deprecated for now. Though, I believe we want to remove them later as https://github.com/cryostatio/cryostat/pull/971#issuecomment-1227765350. Any thoughts?

Edit: With this, Cryostat will return 404 if requested to these non-target-specific routes.

@andrewazores
Copy link
Member

Perhaps the most client-friendly thing to do for the next release is to mark the class as @Deprecated and return a 410 Gone status. We could also send a response header that informs the client that they should use the updated API endpoint(s) instead.

@andrewazores
Copy link
Member

Oh boy, that's a bunch of conflicts :/

Can we also add some logic either to these new handlers or within ex. RecordingArchiveHelper so that the sourceTarget is verified to exist? I think think should be as simple as checking that the sourceTarget URL matches some connectUrl in discoveryStorage.listDiscoverableServices().

@maxcao13
Copy link
Member

maxcao13 commented Sep 15, 2022

Oh boy, that's a bunch of conflicts :/

Yeah, I've been dreading these conflicts cause I've been working on both :-)

Can we also add some logic either to these new handlers or within ex. RecordingArchiveHelper so that the sourceTarget is verified to exist? I think think should be as simple as checking that the sourceTarget URL matches some connectUrl in discoveryStorage.listDiscoverableServices().

For sure, thanks for the feedback. I'm wondering about task 6,
cryostatio/cryostat-web#445 (comment)
This PR no longer has to include this issue right?

I think all we need to do about the uploading metadata issue now is to update the web-client to handle the recent merge https://github.com/cryostatio/cryostat/pull/1063 with this associated PR cryostatio/cryostat-web#499 ?

@andrewazores
Copy link
Member

I believe so, yes.

@maxcao13
Copy link
Member

maxcao13 commented Sep 16, 2022

Can we also add some logic either to these new handlers or within ex. RecordingArchiveHelper so that the sourceTarget is verified to exist? I think think should be as simple as checking that the sourceTarget URL matches some connectUrl in discoveryStorage.listDiscoverableServices().

If we do it this way, the endpoints won't be able to work if the sourceTarget is something like cryostat:9091 because the discoverableServices is a list of ServiceRefs and the connectUrl is something like service:jmx:rmi:///jndi/rmi://cryostat:9091/jmxrmi. We could go the route of testing a connection to the sourceTarget through the connectUrl to solve this, but then it just seems like re-using functionality of a TargetRecording.

Also, if a user still wanted to delete an archived recording from their fs (from the command line), but the target was gone and removed from the database, I don't think either solutions would work.

What do you think?

@andrewazores
Copy link
Member

andrewazores commented Sep 16, 2022

Command line (curl, httpie) vs web-client shouldn't matter, in the end they're both talking to the API as the source of data about what is present and what actions can be performed.

If we do it this way, the endpoints won't be able to work if the sourceTarget is something like cryostat:9091 because the discoverableServices is a list of ServiceRefs and the connectUrl is something like service:jmx:rmi:///jndi/rmi://cryostat:9091/jmxrmi.

So we can apply the transformation of cryostat:9091 -> service:jmx:rmi:///jndi/rmi://cryostat:9091/jmxrmi early, at the API endpoints, rather than allowing those "bare" host:port values to trickle all the way down. Anything that is already in storage can be pretty easily checked for proper JMXServiceURL format and migrated if it's not. Or if the plan is for the archives to also use the jvmId, which is pretty reasonable, then the storage and migration should be handled the same way that you did for the metadata.

Though, this now has me wondering what to do about push-only agents. They won't publish JMXServiceURLs and won't be connectable over JMX, so Cryostat won't be able to compute a jvmId hash for them. Their connectUrl is probably going to be the http:// callback URL or maybe modified with some custom scheme like cryostat-agent:// to mark what they are. But the built-in discoveries use null for their callback URL.

Perhaps we need another set of similar endpoints for agents to push JFR files to? Rather than using the connectUrl as the targetId/sourceTarget, they would use their discovery registration ID.

Also, if a user still wanted to delete an archived recording from their fs (from the command line), but the target was gone and removed from the database, I don't think either solutions would work.

Fair - so just don't enforce that check on the deletion action, but do enforce it on the upload.

@maxcao13
Copy link
Member

So we can apply the transformation of cryostat:9091 -> service:jmx:rmi:///jndi/rmi://cryostat:9091/jmxrmi early, at the API endpoints, rather than allowing those "bare" host:port values to trickle all the way down. Anything that is already in storage can be pretty easily checked for proper JMXServiceURL format and migrated if it's not. Or if the plan is for the archives to also use the jvmId, which is pretty reasonable, then the storage and migration should be handled the same way that you did for the metadata.

I'll probably start to use the jvmIds here as well because of https://github.com/cryostatio/cryostat/issues/972

Though, this now has me wondering what to do about push-only agents. They won't publish JMXServiceURLs and won't be connectable over JMX, so Cryostat won't be able to compute a jvmId hash for them. Their connectUrl is probably going to be the http:// callback URL or maybe modified with some custom scheme like cryostat-agent:// to mark what they are. But the built-in discoveries use null for their callback URL.

Perhaps we need another set of similar endpoints for agents to push JFR files to? Rather than using the connectUrl as the targetId/sourceTarget, they would use their discovery registration ID.

Seems reasonable, I guess that should be another PR for later.

@maxcao13
Copy link
Member

Ignoring tests, thoughts on the structure or anything else?

The reason I have the migration and TargetDiscoveryEvent accept Archives logic in the RecordingMetadataManager is because I felt that both did similar things and I needed a way to be able to manage the subdirectories again because of the fact that jvmIds change after every run (since they include time as part of the hash). So putting it in different classes felt odd as they both would fight for the jvmId map, and even though it's a ConcurrentMap, to me it made sense to just put all the similar logic in the same class so that they make sense synchronously. Maybe it even makes sense to rename the RecordingMetadataManager to something like RecordingFilesManager or something like that.

Something else I added is a "metadata-like" file in each archives/* subdirectory very helpfully named connectUrl. All it does is contain the connectUrl for the associated archived directory. The reason something like that is needed is again, because of the way jvmIds work and the reason why we didn't need this in the metadata PR is because the targetId was contained in the metadata itself so it was easy to read it from a json file. And the reason why it wasn't needed before was because the subdirectory was literally just the base32 encoding of the folder name. I was also thinking maybe just including the encoded targetId part of the folder name, but just appended by some delimiter like - or _, but a small disadvantage would be the huge subdirectory name and parsing it. I did it the way in the PR as just a quick and dirty method but I doesn't have to be this way.

I still need to update the web-client and update tests.

@andrewazores
Copy link
Member

Ignoring tests, thoughts on the structure or anything else?

The reason I have the migration and TargetDiscoveryEvent accept Archives logic in the RecordingMetadataManager is because I felt that both did similar things and I needed a way to be able to manage the subdirectories again because of the fact that jvmIds change after every run (since they include time as part of the hash). So putting it in different classes felt odd as they both would fight for the jvmId map, and even though it's a ConcurrentMap, to me it made sense to just put all the similar logic in the same class so that they make sense synchronously. Maybe it even makes sense to rename the RecordingMetadataManager to something like RecordingFilesManager or something like that.

Sure, this all makes sense. I will note that the nature of the jvmId being a consistent hash of "fixed" (for the target's lifecycle) properties should mean that we can have various different classes which all compute and maintain their own jvmId maps, and they would all end up agreeing with each other since the computed hash is the same for a given target no matter who is looking at it. But, keeping only one such jvmId map around makes practical sense for a lot of reasons, so the JvmIdHelper class makes sense. This does seem like it could possibly be merged with the TargetConnectionManager, but that can be a refactoring for later.

Something else I added is a "metadata-like" file in each archives/* subdirectory very helpfully named connectUrl. All it does is contain the connectUrl for the associated archived directory. The reason something like that is needed is again, because of the way jvmIds work and the reason why we didn't need this in the metadata PR is because the targetId was contained in the metadata itself so it was easy to read it from a json file. And the reason why it wasn't needed before was because the subdirectory was literally just the base32 encoding of the folder name. I was also thinking maybe just including the encoded targetId part of the folder name, but just appended by some delimiter like - or _, but a small disadvantage would be the huge subdirectory name and parsing it. I did it the way in the PR as just a quick and dirty method but I doesn't have to be this way.

I think this solution is fine. The subdirectory name could really explode in length if we encode the URL and append that on top of something else like an ID hash. If URLs are particularly long we might even run into the Linux filename length limit.

@maxcao13
Copy link
Member

I missed this in the other PR but noticed that you deprecated the RecordingsPostHandler and RecordingsGetHandler in the HTTP docs. Those are not being replaced in this PR so a new issue will have to be made.

@maxcao13 maxcao13 marked this pull request as ready for review September 23, 2022 15:48
@andrewazores
Copy link
Member

andrewazores commented Sep 27, 2022

Just to confirm my understanding - the purpose of the connectUrl file within each archive subdirectory is because the subdirectory itself is a hashed version of the source target's JVM ID. If/when the source target restarts its JVM ID will change because the JVM startup time goes into the hash. By also storing the connectUrl we can update the archives and move archived files from the old hashed ID to the new hashed ID when a target reappears using the same connectUrl we've seen before. Is that right?

@maxcao13
Copy link
Member

Just to confirm my understanding - the purpose of the connectUrl file within each archive subdirectory is because the subdirectory itself is a hashed version of the source target's JVM ID. If/when the source target restarts its JVM ID will change because the JVM startup time goes into the hash. By also storing the connectUrl we can update the archives and move archived files from the old hashed ID to the new hashed ID when a target reappears using the same connectUrl we've seen before. Is that right?

Yes, that's exactly right.

@andrewazores
Copy link
Member

Okay, I think that's okay for now. We will want to revisit that in the future and replace the stored connectUrl with some other properties about the JVM - maybe everything else that goes into the hash other than the startup time, for example. The reason for that is that the connectUrl may actually change in between lifecycles of the conceptually "same" target application instance. In an OpenShift environment we use the IP address of the target to come up with the connectUrl, and it's very possible that the IP address changes when the application comes back online, so the archives wouldn't get transferred.

This would necessitate connecting to every application that appears and retrieving those properties from it in order to compare it to every single archive we have, so that's definitely not ideal because for large installations with a lot of services coming and going and a lot of archives it becomes very slow and I/O intensive.

On the other hand, when we have the Cryostat Agent pushing data into the archives, maybe the Agent can be a little smarter and compute its own hash ID for itself and use that as the sourceTarget so that the data can be properly correlated across instances.

@andrewazores
Copy link
Member

Ah but wait, if we look at the JVM properties other than the startup time and the connectUrl (which encodes information about the IP address or hostname), then how can we tell apart different replicas of the same microservice? I think they would all look the same, so we wouldn't be able to distinguish archives between instances anymore in that case. Bah.

The Agent is looking better and better every day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[Task] Deleting archived recordings can delete wrong file for identically-named but distinct files
4 participants