-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add written proposal for enhancement #103 #104
Conversation
0d821eb
to
b1791b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can merge it, so that work can start begin on this.
We should clean up the APIs between agent and server during the implementation, but this is out-of-scope for this enhancement
@THS-on Thanks for giving the PR your approval. I agree that the APIs need a bit of a refactor/clean-up and I think it is natural to tackle that during implementation of the push model. Is anything else needed to get this merged? |
Let's get at least 1-2 more approvals, but because the discussed this now for a long time I don't think there is going to be changes. |
Makes sense. Thanks for clarifying the process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is a big one, and a very very good one
honestly I still do not have the full picture, but what for the parts that I understand I like very much this path
103_agent-driven-attestation.md
Outdated
|
||
3. The agent will gather the information required by the verifier (UEFI log, IMA entries and quote) and report these in a new HTTP request along with other information relevant to the quote (such as algorithms used). | ||
|
||
4. The verifier will reply with the number of seconds the agent should wait before performing the next attestation and an indication of whether the request from agent appeared well formed according to basic validation checks. Actual processing and verification of the measurements against policy can happen asynchronously after the response is returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what to do if the attestation is outside the window (too soon, too late)? Should be described here, or is maybe too much detail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is too soon, @mheese has suggested that the verifier could reply with a 429 status which I think makes a lot of sense—I just forgot to add it. I'll amend the proposal now to include that detail.
For the "too late" case: I suppose it would make sense to specify a period of validity for the nonce, after which it would expire. If an agent submits an attestation using a nonce that is no longer valid, the verifier would then reply with 400 status. What are your thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "too many requests" and the TTL for the nonce makes a lot of sense 👍. We can make the TTL > TimeForNextAttestation, so we can play with network delays. This clarify the protocol.
What would be the consideration from the attestation PoV? If there is a 429 I guess that this will not make the note failed in the attestation state (it should be considered trusted), but maybe the 400 should require a change of status if it is repeated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reiterate, a 400 is returned when the verifier receives an attestation for a nonce that is invalid or has expired.
This may occur if the agent or node is behaving unexpectedly. However, it can also occur if the network or the verifier is behaving unexpectedly. E.g., if the verifier is overloading and taking a long time to reply to requests, this could delay the delivery of nonces and cause them to be used outside their period of validity.
So, we cannot use repeated 400s as a reliable indicator of a problem with the node. The verifier should only change the status of the node if an attestation fails verification or if no valid attestation has been received for the configured period of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe to add on to that: a successful response should always include a timestamp (or a time duration) on when to query again, so that the agent knows when to perform another request. If it is too early, it receives a 429. (At least that was my idea when we discussed it with @stringlytyped )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mheese I agree that for 429s the verifier should include the seconds to wait before trying again (in the Retry-After
header).
But for 400s, there is no way the verifier can know how long the agent should wait before trying again. Really, if everything is working fine, then the agent should be able to try again straight away by requesting a new nonce.
However, the key thing is that the 400 condition should not occur under normal operation. So, it makes sense for the agent to wait a while and then try again, increasing the wait time at each attempt until reaching a maximum wait time. This gives the verifier a chance to recover, if it happens that the condition has been caused by an overloaded verifier.
b1791b8
to
0d20ee9
Compare
Signed-off-by: Jean Snyman <jean.snyman@hpe.com>
0d20ee9
to
0a7c78b
Compare
I've made some changes to the proposal based on the above discussion, specifically:
I noticed the bit about HTTP proxy support was missing from the earlier draft by mistake, so I've added that back in also (see HTTP Proxy Support). |
I believe we discussed it at a length and depth that allows the merge and the starting of the actual implementation work. Of course minor course correction might be proposed/request @stringlytyped , but lets not hold it any longer. I again, thank you for the time, effort and thought put on it! |
Add written proposal for enhancement #103 which has been adapted from the earlier Roadmap to Push Model Support in Keylime document