Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add written proposal for enhancement #103 #104

Merged
merged 1 commit into from
Sep 21, 2023

Conversation

stringlytyped
Copy link
Contributor

Add written proposal for enhancement #103 which has been adapted from the earlier Roadmap to Push Model Support in Keylime document

Copy link
Member

@THS-on THS-on left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can merge it, so that work can start begin on this.

We should clean up the APIs between agent and server during the implementation, but this is out-of-scope for this enhancement

@stringlytyped
Copy link
Contributor Author

@THS-on Thanks for giving the PR your approval. I agree that the APIs need a bit of a refactor/clean-up and I think it is natural to tackle that during implementation of the push model. Is anything else needed to get this merged?

@THS-on
Copy link
Member

THS-on commented Sep 20, 2023

Is anything else needed to get this merged?

Let's get at least 1-2 more approvals, but because the discussed this now for a long time I don't think there is going to be changes.

@stringlytyped
Copy link
Contributor Author

Makes sense. Thanks for clarifying the process

Copy link

@aplanas aplanas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a big one, and a very very good one

honestly I still do not have the full picture, but what for the parts that I understand I like very much this path

103_agent-driven-attestation.md Show resolved Hide resolved

3. The agent will gather the information required by the verifier (UEFI log, IMA entries and quote) and report these in a new HTTP request along with other information relevant to the quote (such as algorithms used).

4. The verifier will reply with the number of seconds the agent should wait before performing the next attestation and an indication of whether the request from agent appeared well formed according to basic validation checks. Actual processing and verification of the measurements against policy can happen asynchronously after the response is returned.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what to do if the attestation is outside the window (too soon, too late)? Should be described here, or is maybe too much detail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is too soon, @mheese has suggested that the verifier could reply with a 429 status which I think makes a lot of sense—I just forgot to add it. I'll amend the proposal now to include that detail.

For the "too late" case: I suppose it would make sense to specify a period of validity for the nonce, after which it would expire. If an agent submits an attestation using a nonce that is no longer valid, the verifier would then reply with 400 status. What are your thoughts on this?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "too many requests" and the TTL for the nonce makes a lot of sense 👍. We can make the TTL > TimeForNextAttestation, so we can play with network delays. This clarify the protocol.

What would be the consideration from the attestation PoV? If there is a 429 I guess that this will not make the note failed in the attestation state (it should be considered trusted), but maybe the 400 should require a change of status if it is repeated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reiterate, a 400 is returned when the verifier receives an attestation for a nonce that is invalid or has expired.

This may occur if the agent or node is behaving unexpectedly. However, it can also occur if the network or the verifier is behaving unexpectedly. E.g., if the verifier is overloading and taking a long time to reply to requests, this could delay the delivery of nonces and cause them to be used outside their period of validity.

So, we cannot use repeated 400s as a reliable indicator of a problem with the node. The verifier should only change the status of the node if an attestation fails verification or if no valid attestation has been received for the configured period of time.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe to add on to that: a successful response should always include a timestamp (or a time duration) on when to query again, so that the agent knows when to perform another request. If it is too early, it receives a 429. (At least that was my idea when we discussed it with @stringlytyped )

Copy link
Contributor Author

@stringlytyped stringlytyped Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mheese I agree that for 429s the verifier should include the seconds to wait before trying again (in the Retry-After header).

But for 400s, there is no way the verifier can know how long the agent should wait before trying again. Really, if everything is working fine, then the agent should be able to try again straight away by requesting a new nonce.

However, the key thing is that the 400 condition should not occur under normal operation. So, it makes sense for the agent to wait a while and then try again, increasing the wait time at each attempt until reaching a maximum wait time. This gives the verifier a chance to recover, if it happens that the condition has been caused by an overloaded verifier.

103_agent-driven-attestation.md Show resolved Hide resolved
103_agent-driven-attestation.md Outdated Show resolved Hide resolved
103_agent-driven-attestation.md Outdated Show resolved Hide resolved
Signed-off-by: Jean Snyman <jean.snyman@hpe.com>
@stringlytyped
Copy link
Contributor Author

I've made some changes to the proposal based on the above discussion, specifically:

  • @aplanas you noted that you were having a bit of trouble grasping the full picture. I've gone ahead and lifted out the Agent Lifecycle section from my earlier draft and added it to the proposal. If you read through those steps, does that clarify things for you?

  • Thinking about it more, I've realised that we do in fact need the nonce to expire. So, I've gone ahead and added more detailed explanations on error handling in Proposed Attestation Protocol (which have also been carried through to the Agent Lifecycle section, mentioned above).

  • I've dropped the dependency change.

  • Clarified those bits that were confusing.

I noticed the bit about HTTP proxy support was missing from the earlier draft by mistake, so I've added that back in also (see HTTP Proxy Support).

@maugustosilva
Copy link
Contributor

I believe we discussed it at a length and depth that allows the merge and the starting of the actual implementation work. Of course minor course correction might be proposed/request @stringlytyped , but lets not hold it any longer. I again, thank you for the time, effort and thought put on it!

@maugustosilva maugustosilva merged commit 9578111 into keylime:master Sep 21, 2023
1 check passed
@stringlytyped stringlytyped mentioned this pull request Nov 21, 2023
24 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants