-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Step start times are reported inaccurately #1512
Comments
How do you get the start time of each step? The only thing I could get using the
|
|
ok, got it. Right now it looks like the startedTime and finishedTime are related to the corresponding container time. When the step status is added, there is a deepcopy of the container status and then it adds several fields. I am trying to locate where the actual container is executed. |
@imjasonh In order to report accurate time, the starting time and ending time of a specific entrypoint should be saved to disk and then retrieved to be able to update the status. Would it be possible for you to guide me about where this update of the status should take place? |
@othomann That's a good question. One possible solution is to have the entrypoint binary write data to So, roughly:
Code would live in |
Careful overwriting the termination message path: it's utilized by pipeline resources to emit resource results from the steps they inject. See https://github.com/tektoncd/pipeline/blob/master/pkg/termination/termination.go#L27 for the package that writes the messages and https://github.com/tektoncd/pipeline/blob/master/cmd/git-init/main.go#L60 for an example of its usage. |
Ooh thanks for catching that! So it looks like we'll either need to find some other way to handle this, or at least have it only append to data already present there, so it can play nicely with output resources. Could |
This sounds like a solid approach to me! |
So you would use the termination.writeMessage(..) call to write some json file that contains the starting and ending time of the corresponding endpoint? Do you think of a particular file that can be reused? |
|
I think we should block users from setting a With the user out of the way, we can point the path to wherever we want when running the user's steps. It only has to be a path that isn't likely to be used by anything else. We'll reserve |
it would need to be a different path for each endpoint since we want to report accurate starting and ending time in the status for each step. So the path must be unique for each step, right? Also once this is done, it needs to be used to update the status of the corresponding step. What kind of events we can have to trigger a status update? |
It doesn't have to be a unique path, since the contents of that path won't be persisted across step containers (it's not backed by a volume). Each step only needs to report its own start time (step end times are already reported correctly), plus whatever output resource metadata the step writes. These aren't stupid questions at all, we're just designing this as we go! 😄 |
Is this something that I should expose as a new option for the endpoint or it should be kept hidden? Once the file is available, I don't quite see where the update needs to be done. Any hint for this? |
I think we can just go with a convention that the entrypoint binary appends start time to a JSON file at Let me know if you'd like to pair on this over Hangouts or Zoom, it might be easier. |
@imjasonh might actually be great to do pairing on this. Let me start with the new json file first. I'll ping you tomorrow about this. Thanks. |
@imjasonh let me know how you want to proceed for using Hangouts if you have time for this of course :-). |
Right now I am saving this in the /tekton/termination file:
In my case the two steps are 5 minutes apart and we can see the termination time is 5 minutes after the starting time. Now I think it is a matter of updating the step's statuds with these values. |
@othomann We actually already get accurate step finish times, because completed step containers exit and report their finish time. So just reporting start time should be fine. As for receiving the written information in the Pod status, first you'll need to set each container's |
As far as I can see, all finished times are identical for all containers. This is why I added the finished time as well in the endpoint code.
I believe finished time are not accurate. |
Hmm, I'm seeing step
Produced these statuses (
|
@imjasonh could you please assign this issue to me so that nobody else will try to fix it. I got the code working now and looking at adding some test for it. |
/assign othomann |
Expected Behavior
Step start times are recorded and reported accurately.
Actual Behavior
Step start times are all reported to be the same as the beginning of the first step, because according to k8s all the containers started at once -- the entrypoint binary ensures the actual step's work doesn't start until a previous step finishes.
Steps to Reproduce the Problem
After completing, this TaskRun's status will incorrectly show that the second step started at the same time as the first. It should report to have started 1000 seconds after the first.
The text was updated successfully, but these errors were encountered: