Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify monitor tracking API #42816

Closed
dcramer opened this issue Jan 5, 2023 · 12 comments · Fixed by #42819
Closed

Simplify monitor tracking API #42816

dcramer opened this issue Jan 5, 2023 · 12 comments · Fixed by #42819

Comments

@dcramer
Copy link
Member

dcramer commented Jan 5, 2023

Problem Statement

Currently triggering a monitor check-in requires two calls:

-> POST .../monitors/{monitor_id}/checkins/

-> PUT .../monitors/{monitor_id}/checkins/{checkin_in}/ {"status": "ok"}

This creates a minor hurdle as it requires you to track the checkin_id returned in the first step, to publish the second step (to mark a job as finished, for example).

Solution Brainstorm

Instead we could provide a simplified API along the lines of:

-> PUT .../monitors/{monitor_id}/checkins/latest/

This request would modify the most recent unfinished check-in, meaning a few behaviors could be implemented:

  • With no checkins, this returns a 404 (same as invalid ID).
  • With no unfinished checkin, this returns a 404 (same as invalid ID).
  • With one unfinished checkin, this would modify that checkin.
  • With multiple unfinished checkins, this would modify the most recent checkin. Meaning likely the previous checkin would time out and fail.

Refs GH-42283

@modernben
Copy link

Yes, this would be great. We could also let the user guide the process based on the request body similar to how it is functioning now.

i.e.

PUT /monitors/{monitor_id}/checkins {"status": "in_progress"}
PUT /monitors/{monitor_id}/checkins {"status": "completed"}

If the body is empty, then you can run the determineIntent() logic

@dcramer
Copy link
Member Author

dcramer commented Jan 5, 2023

From a documentation angle, we'd need to be clear that if you're using a single monitor to track jobs that might overlap or run in parallel you should not use this API.

@dcramer
Copy link
Member Author

dcramer commented Jan 5, 2023

One other option is to support "latest" for the checkin identifier, which would be a bit less of an overload of the main endpoint.

Start a checkin:

POST /monitors/{monitor_id}/checkins/

Complete a checkin:

PUT /monitors/{monitor_id}/checkins/latest/

@modernben
Copy link

Yeah I could be on board with that. Then at least there's some intent from the user's perspective.

@dcramer
Copy link
Member Author

dcramer commented Jan 5, 2023

One last thought, leaning towards the latest magic identifier, as you can already one-line register a result:

run-my-command.sh && curl -x POST .../monitors/my-cool-identifier/checkins/ -d '{"status": "finished"}'

We should just document that case (and caveat its not ideal)

@abkfenris
Copy link

At least the few other tools that I've used and explored have started from the 'ping when completed' endpoint, and then layered additional functionality on top of that.

By going that route, it might be easier to convince folks to migrate and then they can progressively take advantage of additional features.

@modernben
Copy link

At least the few other tools that I've used and explored have started from the 'ping when completed' endpoint, and then layered additional functionality on top of that.

By going that route, it might be easier to convince folks to migrate and then they can progressively take advantage of additional features.

I like the addition of the /start, /stop, and /fail endpoints from them. Then I don't even have to fuss with a request body.
POST .../monitors/my-cool-identifier/checkins/start
POST .../monitors/my-cool-identifier/checkins/finish

@modernben
Copy link

One last thought, leaning towards the latest magic identifier, as you can already one-line register a result:

run-my-command.sh && curl -x POST .../monitors/my-cool-identifier/checkins/ -d '{"status": "finished"}'

We should just document that case (and caveat its not ideal)

@dcramer Would your one-liner need to start a checkin as well?

curl -x POST .../monitors/my-cool-identifier/checkins/ && run-my-command.sh && curl -x POST .../monitors/my-cool-identifier/checkins/ -d '{"status": "finished"}'

@dcramer
Copy link
Member Author

dcramer commented Jan 5, 2023

At least the few other tools that I've used and explored have started from the 'ping when completed' endpoint, and then layered additional functionality on top of that.
By going that route, it might be easier to convince folks to migrate and then they can progressively take advantage of additional features.

I like the addition of the /start, /stop, and /fail endpoints from them. Then I don't even have to fuss with a request body. POST .../monitors/my-cool-identifier/checkins/start POST .../monitors/my-cool-identifier/checkins/finish

Somethign we could easily add as we've done elsewhere to proxy over. e.g. have start just forward a payload to the other endpoint, asme with finish. I'll start w/ the basic PUT endpoint and we can decide from there.

One last thought, leaning towards the latest magic identifier, as you can already one-line register a result:
run-my-command.sh && curl -x POST .../monitors/my-cool-identifier/checkins/ -d '{"status": "finished"}'
We should just document that case (and caveat its not ideal)

@dcramer Would your one-liner need to start a checkin as well?

curl -x POST .../monitors/my-cool-identifier/checkins/ && run-my-command.sh && curl -x POST .../monitors/my-cool-identifier/checkins/ -d '{"status": "finished"}'

Yeah it'd create the checkin and immediately mark it as finished. I'm not sure we have test cases covering desired outcome here so its possible it would not cascade all metadata appropriately but we can verify that.

@dcramer
Copy link
Member Author

dcramer commented Jan 5, 2023

Actually we already have branch coverage for creating a passing/failing checkin:

    def test_passing(self):
        self.login_as(self.user)

        for path_func in self._get_path_functions():
            monitor = self._create_monitor()
            path = path_func(monitor)

            resp = self.client.post(path, {"status": "ok"})
            assert resp.status_code == 201, resp.content

            checkin = MonitorCheckIn.objects.get(guid=resp.data["id"])
            assert checkin.status == CheckInStatus.OK

            monitor = Monitor.objects.get(id=monitor.id)
            assert monitor.status == MonitorStatus.OK
            assert monitor.last_checkin == checkin.date_added
            assert monitor.next_checkin == monitor.get_next_scheduled_checkin(checkin.date_added)

@mitsuhiko
Copy link
Member

From a documentation angle, we'd need to be clear that if you're using a single monitor to track jobs that might overlap or run in parallel you should not use this API.

A simple improvement to that could be to allow a client to send along a hostname and use that as disambiguation key of concurrent monitors but I'm not even sure if that currently makes a lot of sense. If you have a cron job that is supposed to run hourly on all machines (eg: temp dir clean up etc.) that's not easily trackable in Sentry anyways unless you make a monitor for each of those.

But if there was, having a host name as key would solve a very common case, and it would be easy enough to supply as a user.

dcramer added a commit that referenced this issue Jan 5, 2023
Add the ability to target `latest` as the `checkin_id` parameter, which
will pull the most recent (creation date) check-in that is not yet marked
as finished.

Fixes GH-42816
dcramer added a commit that referenced this issue Jan 5, 2023
Add the ability to target `latest` as the `checkin_id` parameter, which
will pull the most recent (creation date) check-in that is not yet
marked as finished.

Fixes GH-42816
@modernben
Copy link

Love the addition of /latest. I went ahead and created a Laravel Package for it.

https://packagist.org/packages/modernmcguire/sentry-cron-monitoring-laravel

@github-actions github-actions bot locked and limited conversation to collaborators Jan 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants