Provider and tenant verifier development #214

cjustacoder · 2019-11-19T21:16:20Z

What has been done:
@ericli21 also contribute to this PR
how to differentiate tenant and provider:

set a flag in the verifier's agent library: cloud_verifier_tornado.py L281 d['need_provider_quote'] = False. Provider set False and tenant set True

for the tenant side:

be able to send the request to the provider after getting a quote from its agent
- using the GET method to send the request to provider side endpoint
be able to the verifier the provider quote with provider information (hardcoding)
can get provider quote only once, then enter attestation loop

for provider side:

develop an endpoint /verifier?nonce=%s&mask=%s&vmask=%s to receive and handle the request from tenant
be able to forward these parameters to agent api /quotes/integrity?nonce=%s&mask=%s&vmask=%s&partial=%s
- using the GET method to send the request to provider agent
be able to get a quote from provider agent and send back to tenant verifier

Problem:

still confused about the synchronize/asynchronize approach for getting a quote.
- the current approach is set both tenant and provider getting the quote as 'await', and it works for now
the request handler for tornado asynchronized server will be blocked, can't processing multiple requests simultaneously
- it seems the handler only run on one thread even in this concurrent situation.

Detail of the problem:
Explain in the following comments

How to run the code:

Run a full set of keylime instance for provider
- set cloud_verifier_tornado.py L281 d['need_provider_quote'] = False before run the code
Run a full set of keylime instance for tenant
- set loud_verifier_tornado.py L281 d['need_provider_quote'] = True before run the code
- hardcoding provider's ip and port on cloud_verifier_tornado.py L410

What's the behavior of the code
After provision the tenant agent from tenant's tenant terminal, you can see the quote which requests by tenant verifier in the tenant's verifier terminal and the result of validating the quote.

@nabilschear @lukehinds , thanks a lot!
Resolves: #201

callbacks have now been depreciated so we need to move to using asyncio. Resolves: keylime#196

catch up with keylime team, change the framework of verifier: add Async implementation to support Tornado version 6 Note: Current version has been roll-back, not support multi-verifier on same machine (running on different port) now. Support library has been changed, need to run `python3 setup.py install` again. Old functions are good under current framework. Newly added functions has not been verified yet, but please continue working on this version.

… into multi_verifier add disable tls support

…ome print debugging

What has been done: how to differetiate tenant and provider: - set a flag in the verifier's agent library: cloud_verifier_tornado.py L281 `d['need_provider_quote'] = False`. Provider set `False` and tenant set `True` for tenant side: - be able to send request to provider after getting quote from its own agent - using GET method to send request to provider side endpoint - be able to verifier the provider quote with provider information (hardcoding) - can get provider quote only once, then enter attestation loop for provider side: - develop an endpoint `/verifier?nonce=%s&mask=%s&vmask=%s` to receive and handle the request from tenant - be able to forward these parameters to agent api `/quotes/integrity?nonce=%s&mask=%s&vmask=%s&partial=%s` - using GET method to send request to provider agent - be able to get quote from provider agent and send back to tenant verifier Problem and TODO: - still confused about the synchronize/asynchronize approch for getting quote. - current approach is set both tenant and provider getting quote as 'await', and it works for now - the request handler for tornado asynchronized server will blocked, can't processing multiple requests simultaneously - hence don't have the condition to test asynchronized Detail of problem: Explain in following comments How to run the code: 1. Run a full set of keylime instance for provider - set `cloud_verifier_tornado.py` L281 `d['need_provider_quote'] = False` before run the code 2. Run a full set of keylime instance for tenant - set `loud_verifier_tornado.py L281` `d['need_provider_quote'] = True` before run the code - hardcoding provider's ip and port on `cloud_verifier_tornado.py` L410 What's the behavior of the code After provision the tenant agent from tenant's `tenant terminal`, you can see the quote which requests by tenant verifier in the tenant's `verifier terminal` and the result of validating the quote.

cjustacoder · 2019-11-19T21:20:31Z

My confusion about synchronizing and asynchronizing approach:

The question is that for the two-stage request process (tenant verifier get provider quote), @nabilschear you have mentioned the provider verifier shouldn't use ensure_future. I'm confused about this part. From my understanding, for the tenant verifier, to get a provider quote is a part of its bootstrapping (one state in the state machine). So the tenant should wait to proceed provide_V until it gets a provider quote and the quote is valid, it should be blocked. Therefore, the request stage for the tenant verifier should be synchronized and use await.
For the provider verifier, it spun up before the tenant verifier, so basically it has already entered the attestation get quote loop when it received the get provider quote request from tenant verifier. Besides, all tenant verifiers will send the get provider quote request to the provider verifier, so the provider verifier should be able to handle these tenants as well as keep doing attestation loop, it shouldn't be blocked. Therefore, the provider verifier should be asynchornized and use ensure_future.

lukehinds

A few thing noted, but if you intend to keep this in its own branch for now and don't need to merge into master, I don't mind, as I figure most of the points i highlight are just to make testing and development easier.

lukehinds · 2019-11-21T11:36:05Z

keylime.conf

@@ -23,7 +23,7 @@ revocation_notifier_ip = 127.0.0.1
 revocation_notifier_port = 8992

 # turn on or off TLS keylime wide
-enable_tls = True
+enable_tls = False


We can't disable it by default, if its for testing development that you need this off, I recommend you have debug value that sets this accordingly if recognised as running for testing, an example here the @nabilschear and @jetwhiz used for a value when running in their IDE:

keylime/keylime/provider_platform_init.py

Lines 54 to 55 in 7c6e0ea

if common.DEVELOP_IN_ECLIPSE:

argv = ['provider_platform_init.py','1','2']

OK, I'll follow that. I don't know we can do this for development. I'll look into that.

lukehinds · 2019-11-21T11:36:50Z

keylime.conf

@@ -312,7 +312,7 @@ max_retries = 10
 # might provide a signed list of EK public key hashes.  Then you could write
 # an ek_check_script that checks the signature of the whitelist and then
 # compares the hash of the given EK with the whistlist
-require_ek_cert = True
+require_ek_cert = False


Same again, needs to stay as True or it creates a security hole for anyone using a Hardware TPM

We can still solve this with keylime/keylime/provider_platform_init.py above during developing right?

Honestly, its not a big problem and its fine being set this way in this branch, my main point was we can't have False as the default for master branch and a release.

Got it, understand.

lukehinds · 2019-11-21T11:40:07Z

keylime/registrar_client.py

@@ -45,6 +47,8 @@ def init_client_tls(config,section):

    if not config.getboolean('general',"enable_tls"):
        logger.warning("TLS is currently disabled, AIKs may not be authentic.")
+        global enableTLS
+        enableTLS = False


don't set this here please, just use a pre run script with sed:

keylime/test/run_tests.sh

Lines 136 to 137 in 7c6e0ea

echo -e "Setting require_ek_cert to False"

sed -i 's/require_ek_cert = True/require_ek_cert = False/g' /etc/keylime.conf

Ok, I'll follow that.

lukehinds · 2019-11-21T11:42:32Z

keylime/cloud_verifier_tornado.py

+                provider_agent = {'v': '6pffdsXraIoxcDc3QxVCJKJUqdAZTzle+XUdIV1rgOc=', 
+                'ip': '127.0.0.1', 'port': 9002, 
+                'operational_state': 3, 
+                'public_key': '-----BEGIN PUBLIC KEY-----\nMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEApCVReaFJHqQl4kj0CCtw\nqP0YOvW+4Y4x5d0chZvCF77EIZpPG+4sANhfxPaXkkPiyRrrpgtsFMNPQWhDTgWE\n7hCCQeBXAQc3SUn+o2FmuN5xGYHoEBXjeZQrUUJN8kTqEtrftUgoBRfXfQauNRLE\nmxBpotLnuLOIWyBtPAzjcX4tvQOki+Cg5gZBRbwpSBmuigoto53+ZTZ4gd5K0yBz\n9sZt6jru/OAlpMbm5XO0qtbgW6JpdE/4+JPfF+SHcL7dJesGMtorPLNodKRUlVAr\nVk1YW7g7+dZZZ+esABwPpTsnWyykdxHquWY5in4p4cwgsFVoBkr7pgstT4FjmUty\nlQIDAQAB\n-----END PUBLIC KEY-----\n', 
+                'tpm_policy': {'22': ['0000000000000000000000000000000000000001', '0000000000000000000000000000000000000000000000000000000000000001', '000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001', 'ffffffffffffffffffffffffffffffffffffffff', 'ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff', 'ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff'], '15': ['0000000000000000000000000000000000000000', '0000000000000000000000000000000000000000000000000000000000000000', '000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000'], 'mask': '0x408000'}, 
+                'vtpm_policy': {'23': ['ffffffffffffffffffffffffffffffffffffffff', '0000000000000000000000000000000000000000'], '15': ['0000000000000000000000000000000000000000'], 'mask': '0x808000'}, 
+                'metadata': {}, 
+                'ima_whitelist': {}, 
+                'revocation_key': '', 
+                'tpm_version': 2, 
+                'accept_tpm_hash_algs': ['sha512', 'sha384', 'sha256', 'sha1'], 
+                'accept_tpm_encryption_algs': ['ecc', 'rsa'], 
+                'accept_tpm_signing_algs': ['ecschnorr', 'rsassa'], 
+                'hash_alg': 'sha256', 
+                'enc_alg': 'rsa', 
+                'sign_alg': 'rsassa', 
+                'need_provider_quote': False, 
+                'registrar_keys': {'aik': '-----BEGIN PUBLIC KEY-----\nMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA1YDgoAABaEBMtDzZ7u0q\nD1MZpwxP0QGzDhs54F7iYt3Vee8x86EArvV9qnzylGu6JhQ+vc9VS6K6mZIDjUtc\nMdgXM5V6p2HDZveAr2w9aH4sCbVUNN8YcIp3G96WOzFcoa6k5Medt8LpAZjL9J7J\nhEFdwYhG4b4nVWP2YTHwsvEmpG7FBe46chWY46N3/spmvOi1NFQuzCz+oYQNZ/mG\nskBGQLO+zT+Fmv3sQHx/qPpxrLRtUrzQqWz3R6pyTUrn1FJcrFj2VDzs0zhc/WE2\nb2wvnR6IxoMsE/imRuJZXMlArT+ZpPEIYPmWnKZiU8Co7E5kxNjQ1HoQvC3yxhPM\n5QIDAQAB\n-----END PUBLIC KEY-----\n', 'ek': '-----BEGIN PUBLIC KEY-----\nMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA0dLxdAABVJO6qxamjCMh\nyhWZgiFHZHnPEe0tMFyK3fNVr/w8lX9r+QOLxLmkT0IdgsEYtGZGefbD+qQl4O1s\nk25823Xzu5tEF8966rTdkfsv8CRrNaBLwWlnt/n+qjIoU3xZJMmR+mFfqTc3a6zV\nmPOYJstFtM8r4b9HPCUq6Mte/J3Wx4FxI9R4UrCUyiAeH++0QapIxuEGsVIYs92n\nGyvFQYBZFRU6cIt33iaqTrRCICJp+YblMnw54YJGAH2vTVQf6/fLAnQt5L1UfmTy\nR/ZA6advx8soekSBOIAW7XmV8Xp9mSquIHZdSXMJlcn/B35PU3BdkUtIYm5JuGGt\nPQIDAQAB\n-----END PUBLIC KEY-----\n', 'ekcert': 'emulator', 'regcount': 1}, 
+                'nonce': 'HjhabaRBE2Aiiyz5R0YH', 
+                'b64_encrypted_V': b'c6B/uXCDIPpeEnGu64vF92aWuDrGhMtKyt61eg/Am1y/TFbmKFvhsyCoAQr6WnJTjoinllwfE7ou22wc4DOyWWWMG7L/E94I8fu2ooxdcFY+a5W5tr6RFa1i54ogbR/SM4s0IR7si3FANk30P66Ifu2fTM5lXd9u+ly4hkdOpYQIvH82gCf/J+S0m9+VhHtP5q7CyQzzVqu6pqTRERTwW6DQ2GsAB26CPepD3YOlXFcmLMFssB4lyvRcWKZ7CUk4FB6jcVruneqJkzdLiWd8icgJHdl7qwKdniRuwZiXAIAJ7ARZqPp4M5oOmJgKoy555MxOAglxmgAx6HeZP8CqHg==', 


I figure this is just to help testing / development?

Yes, this hardcoding information is for development. Since we haven't developed the registration process yet, we need the pubkey and AIK pubkey to verify the provider quote. We will eliminate these part after registration process is done.

cjustacoder · 2019-11-27T17:24:31Z

Update for this PR

What are we trying to do

Do the nonce aggregation in the provider verifier when it receiving multiple get provider quote requests from the tenant verifier. We add a sleep function in the provider's GET handler to simulate the stall of hardware TPM.

What's the problem

Simulation of the stall

We add a sleep function in the provider's GET handler to simulate the stall of hardware TPM. We want to add sleep to the agent originally, but we find the agent's HTTPServer is multi-tread and if we add sleep it will be treated separately on each thread. And the type of sleep is asynio.sleep(). Is this approach of simulation acceptable? Or we can have another desirable way to simulate the stall?

Attempt to gather nonce with Global variable, asyn.queue, and SQLdb

We want to batch all the nonce from requests together. But since we have tornado.process.fork_processes(config.getint('cloud_verifier','multiprocessing_pool_num_workers'))
The requests are randomly partitioned, so as the nonce inside. So we can't gather them all. As you can see from the screenshot below.

In this example, I send 10 requests simultaneously. I try to use the async queue (first column), global variable(second column), apparently, the requests are partitioned when pooling jobs to workers(divided into 3 + 7). Then I remember Nabil has mentioned sharing and synchronizing information with the SQL database. So I hack a little bit inside the SQLdb, adding a new column 'quote_col' to store tenant's nonce. Each time the provider verifier receives a get provider quote request from the tenant verifier, it will read info from the DB, append the nonce to the column and update the DB. And this approach seems to work (third column), it gathers all ten nonces.

Then we have another problem. After gathering all the nonce and finishing the stall, we want to extract all nonces by reading from DB. But when we read from DB, this column is empty.

It doesn't make any sense, if the column is empty, the length of the list should be zero (in screenshot column three). So that may be the asyn things. And we don't know how to do this.

Another solution (double post)

@ericli21 please leave a comment below with your design and problems, with the link of code preferably.

Serialize and put Merkle tree into the response

@klwalsh please leave a comment below with your design and problems, with the link of code preferably.

How to run the code

(haven't integrated @astoycos resolved hardcoding version yet)

Start two identical virtual machines with keylime installed. One is provider and the other one is the tenant. Put these two machines into same network. Can use @astoycos ansible vagrant file. (@astoycos please leave the link of the file below)
Since the structure of DB has been changed. You need to remove the original DB file to resolve the incompatibility. rm /var/lib/keylime/cv_data.sqlite
Some library has changed, run python3 setup.py install in the keylime root folder.
Run the provider first
i. Edit d['need_provider_quote'] = False
ii. Bring up the whole set of keylime instance as usual. keylime_verifier, keylime_registrar, keylime_agent and keylime_tenant -t 127.0.0.1 -f /home/zycai/Keylime_test/keylime/README.md
Run the tenant
i. Edit d['need_provider_quote'] = True
ii. Hardcoding the provider's IP and verifier's port in the URL.
iii. Bring up the whole set of keylime instance as usual.

What you will see

From the tenant verifier terminal, you can see the quote from the provider verifier with the nonce you sent. And it can also verify the quote with hardcoding provider info(Since we haven't implemented the registration process yet, we don't have provider's pubAIK).

How to test the batch get request

Turn off the tls and use curl to test. (can test on one VM using its own URL)

curl "http://127.0.0.1:8881/verifier?nonce=70159ea0d&mask=0x408000&vmask=0x808000" & \
curl "http://127.0.0.1:8881/verifier?nonce=701561a1d&mask=0x408000&vmask=0x808000" & \
curl "http://127.0.0.1:8881/verifier?nonce=708f2d&mask=0x408000&vmask=0x808000" & \
curl "http://127.0.0.1:8881/verifier?nonce=70caea3d&mask=0x408000&vmask=0x808000" & \
curl "http://127.0.0.1:8881/verifier?nonce=7800ea4d&mask=0x408000&vmask=0x808000" & \
curl "http://127.0.0.1:8881/verifier?nonce=70891ea5d&mask=0x408000&vmask=0x808000" & \
curl "http://127.0.0.1:8881/verifier?nonce=7016ea6d&mask=0x408000&vmask=0x808000" & \
curl "http://127.0.0.1:8881/verifier?nonce=7050ea7d&mask=0x408000&vmask=0x808000" & \
curl "http://127.0.0.1:8881/verifier?nonce=7043ea8d&mask=0x408000&vmask=0x808000" & \
curl "http://127.0.0.1:8881/verifier?nonce=7098ea9d&mask=0x408000&vmask=0x808000";

(note! requests must send simultaneously, using &, otherwise, the script will wait until the last curl receives its response )

lukehinds · 2019-11-28T16:42:28Z

thanks @cjustacoder , just need to fix #212 and will jump onto this.

klwalsh · 2019-12-02T17:11:01Z

Merkle Tree

Design

We want to use the Merkle tree data structure to accumulate nonces sent to our provider verifier.

Ideally, what we wanted to do was to serialize the Merkle tree into a string or JSON format (to send back to the tenant verifiers in the GET response). Then, we send the root to the provider agent, and the TPM could sign the quote with the root as the given nonce, which then gets sent back to the provider verifier. From there, there were two options:

Send the entire tree back to each tenant verifier, and have nonce inclusion verification happen there.
Have the provider verifier verify the inclusion of each tenant verifier nonce, and send back only the appropriate audit proof to each tenant verifier.

The first option is more straightforward, but transmits more data. The second option requires bookkeeping at the provider verifier level, but sends log(n) data instead of (n) data. This was originally considered step two of our development process, since we needed to figure out how we're going to send the Merkle tree first.

However, sending the Merkle tree has proven more difficult than we originally thought.

Current Issues

(Note: I have been testing Merkle tree functionality in this script before integrating it with Keylime for debugging purposes)

The main problem with sending the Merkle tree is that the typical serialization methods do not work: Jsonify and json.dumps fail because the object MerkleTree is not serializable. Using the pickle library yielded the same results. Using the export function included in merklelib also fails (NameError: name 'io' is not defined).

If we can't serialize the Merkle tree, then we can't send it to the tenant verifiers after processing. This puts us in a pickle (no pun intended).

Moving forward, if we can't figure out a way to serialize the tree, then nonce inclusion verification needs to happen in the provider verifier. We would have to include some bookkeeping in the provider verifier to keep track of which nonces go to each tenant verifier. This way, we know where each individual proof needs to be sent. One way to do this is to tie each nonce to its tenant verifier's IP and port.

ericli21 · 2019-12-02T17:31:56Z

The current solution is having the tenant verifier send a GET request with the nonce to the provider verifier, which handles the request using its get function and sends back the quote as a response. Before it sends the response, this get function must await invoke_get_quote, which must await the GET response from the provider agent. Because multiple get handler functions must await a single invoke_get_quote function that batches all nonces up, this becomes confusing to deal with using the asyncio tools.

An alternate solution to the REST API interface is to have the tenant verifier POST their nonce to the provider verifier, so it does not expect the quote back as a response. The tenant verifier will then wait in a new state (the state is called GET_STATUS for now).

The provider verifier will store the nonce + PCRs requested, IP, and port through the post handler function into a series of data structures. This series of data structures is the next batch of GET requests that will be processed when getting a quote. When the provider is ready to request a quote again, it will copy this next batch series of data structures into a current batch series of data structures. The next batch series is then emptied to allow for aggregation of new nonces.

Using this new batch of nonces in a Merkle tree, invoke_get_quote will send a single GET request using the root of this Merkle tree to the provider agent, and await the GET response. When receiving the quote via response, it will go through the current batch data structures to look for the port and IPs to POST the quote back to the correct tenant verifiers.

Upon receiving the POST, the tenant verifier will enter the next state, which is PROVIDE_V. Overall, this method is similar to the original, but tries to avoid having to deal with asyncio tweaks. A weakness is that if the tenant verifier never receives the 2nd POST, it will be stuck in the waiting state.

stale · 2020-01-31T18:21:55Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs. Thank you for your contributions.

Luke Hinds and others added 18 commits November 8, 2019 19:00

Async implementation to support Tornado version 6

09ec014

callbacks have now been depreciated so we need to move to using asyncio. Resolves: keylime#196

Create New_framework_exp.md

d0797c9

Update New_framework_exp.md

50bfb71

update for supporting disable tls

adf61f2

Merge branch 'multi_verifier' of https://github.com/cjustacoder/keylime…

b5a4ac6

… into multi_verifier add disable tls support

Added provider response

76ac70e

merge work

17c7179

merge work

510d5e5

fix await in the get handler

f392fc0

ensure_future_test

38b6e01

update for provider side

21daff5

update for provider side

ff1b247

reform the get response body to eliminate redundancy. Also wipe out s…

b64e35d

…ome print debugging

add hardcoding validate quote function

9eefe43

add hardcoding provider quote check

dc2f326

add thread blocking test

5e36b9b

lukehinds self-requested a review November 21, 2019 11:31

lukehinds requested changes Nov 21, 2019

View reviewed changes

cjustacoder added 2 commits November 27, 2019 09:32

attempt of nonce aggregation

965bdb4

update global

2d5a2da

lukehinds mentioned this pull request Dec 11, 2019

enable provider verifier to run on the same machine as a tenant verifier #194

Closed

stale bot added the wontfix label Jan 31, 2020

stale bot closed this Feb 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provider and tenant verifier development #214

Provider and tenant verifier development #214

cjustacoder commented Nov 19, 2019 •

edited

Loading

cjustacoder commented Nov 19, 2019

lukehinds left a comment

lukehinds Nov 21, 2019

cjustacoder Nov 22, 2019

lukehinds Nov 21, 2019

cjustacoder Nov 22, 2019

lukehinds Nov 22, 2019

cjustacoder Nov 22, 2019

lukehinds Nov 21, 2019

cjustacoder Nov 22, 2019

lukehinds Nov 21, 2019

cjustacoder Nov 22, 2019

cjustacoder commented Nov 27, 2019 •

edited

Loading

lukehinds commented Nov 28, 2019

klwalsh commented Dec 2, 2019

ericli21 commented Dec 2, 2019

stale bot commented Jan 31, 2020

	if common.DEVELOP_IN_ECLIPSE:
	argv = ['provider_platform_init.py','1','2']

	echo -e "Setting require_ek_cert to False"
	sed -i 's/require_ek_cert = True/require_ek_cert = False/g' /etc/keylime.conf

Provider and tenant verifier development #214

Provider and tenant verifier development #214

Conversation

cjustacoder commented Nov 19, 2019 • edited Loading

cjustacoder commented Nov 19, 2019

My confusion about synchronizing and asynchronizing approach:

lukehinds left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjustacoder commented Nov 27, 2019 • edited Loading

Update for this PR

What are we trying to do

What's the problem

Simulation of the stall

Attempt to gather nonce with Global variable, asyn.queue, and SQLdb

Another solution (double post)

Serialize and put Merkle tree into the response

How to run the code

What you will see

How to test the batch get request

lukehinds commented Nov 28, 2019

klwalsh commented Dec 2, 2019

Merkle Tree

Design

Current Issues

ericli21 commented Dec 2, 2019

stale bot commented Jan 31, 2020

cjustacoder commented Nov 19, 2019 •

edited

Loading

cjustacoder commented Nov 27, 2019 •

edited

Loading