Skip to content
Luke Hinds edited this page Nov 27, 2019 · 30 revisions

Network Architecture

Purpose

With KVM, the Keylime verifier is unable to use a Xen hypervisor type deepquote function to directly communicate with the hardware TPM underneath the virtual TPM made via libvirt. Because of this, we need the provider (who is in charge of communicating with the hardware TPM) to set up an instance of Keylime so that a tenant verifier is able to request a quote from the hardware TPM using the provider verifier as a proxy.

System Diagram

alt text

This KVM implementation is divided into 3 main tasks:

  1. Develop a communication channel to allow one tenant verifier to request a quote from the hardware TPM
  2. Develop a batching data structure to handle multiple tenant verifiers requesting a quote from the hardware TPM
  3. Brainstorm/design/develop a registration process so the public AIK of the hardware TPM is registered/stored in the tenant registrar

Development Environment

We are currently using two Fedora 30 VMs (launched either via Vagrant/Ansible or VirtualBox). These VMs are connected in a NAT network and are assigned IPs. One VM acts as the provider (with a "hardware" TPM) and the other acts as the tenant. For development purposes, we turned TLS off and set require_ek_cert to False.

One tenant verifier requesting a quote from the hardware TPM

Verifier changes

  1. New agent dictionary key: need_provider_quote

This is a flag that is identifies which verifiers need a hardware TPM quote from the provider's Keylime instance. Tenant verifiers who require this quote will have this flag set as True in their agent dictionary.

For now, this is hard coded as True on the tenant side, so it may be useful to put this in Keylime.conf or somewhere.

  1. New states: GET_PROVIDER_QUOTE and GET_PROVIDER_QUOTE_RETRY, New functions: invoke_get_prov_quote

Upon receiving a POST, the verifier starts its state machine and goes to the GET_QUOTE state to call invoke_get_quote. If need_provider_quote is True, then the invoke_get_quote function will transition the state to a new state called GET_PROVIDER_QUOTE. In this state, the new (but similar) function invoke_get_prov_quote is called to send a GET request to provider verifier. We use the await function so that the tenant verifier must wait (blocked) to get a response back. Currently, we have hardcoded the port and IP of the provider verifier.

  1. Additions to GET handler function

Because there are possibly two different GET requests that can be handled by the verifier (request from tenant for the agent's status, and request from another verifier for quote), we set an if statement in the get handler function see if the REST params are labeled with "agents" (GET request from tenant) or with "verifier" (GET request from another verifier). If it is labeled "verifier", the provider verifier will send a GET request to the agent/hardware TPM and wait (blocked via await) to get a response back. Once a response is received, it relays the body/contents back to the tenant verifier.

Agent changes

None (for now).

Open issues and questions

  • Unable to verify this quote from the hardware TPM since the tenant registrar lacks its public AIK (this is done in the registration process)
  • IPs, ports, flags are hardcoded for now

Next steps

  • Figure out a way to set the need_provider_quote flag without hardcoding (instead, use keylime_tenant to do so)
  • Include the provider verifier's port and IP in the POST request from keylime_tenant to the verifier
  • Current work in progress patch is available here - this is to enable verifiers to talk to one another.
  • Test validity of encrypted quote by checking with a hardcoded public AIK

Multiple tenant verifiers requesting for a quote from the hardware TPM

Merkle trees (for quote pooling)

We use Merkle trees as a way to store the nonces (random numbers) that come with a quote request from the verifier. This way, we can batch all requests together so that the hardware TPM only needs to deliver and sign a single quote.

(Note that some quote requests ask for different PCRs, but we plan to just use an OR bitwise operation to combine all PCRs needed).

We are currently using the merklelib library for implementing the Merkle tree structure. The repo for the library is here.

The library includes functions to initialize a Merkle tree, finding the root of the Merkle tree, and finding a proof (hash pathway) to get from a target leaf that contains a hashed nonce up to the root node. In order to verify that a nonce is in the Merkle tree, we must get a proof using the Merkle tree structure and the nonce. The proof is then used to verify whether or not the nonce is in the Merkle tree.

Our plan is to initialize a Merkle tree in the provider verifier, and collect nonces for every GET request from a tenant verifier. When the TPM is ready to use, we then send the Merkle tree root in the GET request to the provider agent/hardware TPM and receive a quote. The quote will then be sent along with the tree back to the tenant verifier. The tenant will need the tree to find if the sent nonce is located within the tree.

Open issues and questions

  • This merklelib library may need to be pip3 installed before running Keylime (include in installer.sh?)
  • The provider verifier GET request handler will block while waiting for a response. This block will prevent other GET requests from being handled and new nonces from being processed/added to the tree, because they are being processed on the same execution thread as the request handler.
  • If we were to switch to a multiple execution thread approach in the provider verifier handler, the Merkle tree becomes a critical data section, and mutexes/locks might need to be used. More research has to be done in this area to determine the path forward.
  • Can 2 tenant verifiers send a quote request with the same nonce?
    • If so, will this matter since the Merkle tree will just prove that this nonce exists in its data structure?
      • "Technically speaking, if two requesters pick the same nonce for some reason and a valid Merkle tree comes back with that nonce in it, then they should be happy"

Next steps

  • Investigate ways to resolve blocking issues
    • Observe how the verifier works concurrently (either using asyncio or multiple processes)
    • Observe if multiple GET requests can be handled by the agent (if so, we can put the Merkle mechanism there)
    • Determine if a "mini verifier" is needed to act as a middle man that collects all nonces and sends requests directly to the agent

Registration Process

Libvirt and swtpm_setup

When a new VM and a vTPM is made for a tenant, we believe libvirt and the swtpm_setup script are involved in setting up/creating the vTPM. The swtpm_setup asks for an public EK, which currently Keylime gives randomly. We want to give the public EK of the hardware TPM to the swtpm_setup when it is creating the vTPM. Through this, a "link" is created between the vTPM and hardware TPM. The vTPM's public AIK will be delivered to the provider registrar.

Certificate authority

To send the public EK, we will need a CA to prove ownership of the key.

Open issues and questions

  • Where does the CA belong in the system?
  • Who controls the CA (Keylime, tenant or provider)?
  • Unsure how giving the public EK when creating the vTPM helps establish a "link" (what is this link?)
  • (Implementation) VirtualBox does not support nested virtualization on Intel processors link. How to set up the investigation environment without a nested VM, or how to set Keylime up in VMWare?
    • Changing v.memory = "2048" to v.memory = "4096" in Vagrant file fixes the "cannot allocate memory" issue when running the vtpm_testing_setup script

Next steps

  • Figure out if libvirt calls swtpm_setup (if not, how to change libvirt to do so)
    • Investigate through nested virtualization: use VirtualBox for initial VM, then libvirt for the nested VM
    • When calling the vtpm_testing_setup script to create a nested VM, the swtpm-localca directory is created in /var/lib. This directory contains cert and private key files for the vTPM on the nested VM

Deployment Methods