Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provider and tenant verifier development #214

Closed
wants to merge 20 commits into from

Conversation

cjustacoder
Copy link

@cjustacoder cjustacoder commented Nov 19, 2019

What has been done:
@ericli21 also contribute to this PR
how to differentiate tenant and provider:

  • set a flag in the verifier's agent library: cloud_verifier_tornado.py L281 d['need_provider_quote'] = False. Provider set False and tenant set True

for the tenant side:

  • be able to send the request to the provider after getting a quote from its agent
    • using the GET method to send the request to provider side endpoint
  • be able to the verifier the provider quote with provider information (hardcoding)
  • can get provider quote only once, then enter attestation loop

for provider side:

  • develop an endpoint /verifier?nonce=%s&mask=%s&vmask=%s to receive and handle the request from tenant
  • be able to forward these parameters to agent api /quotes/integrity?nonce=%s&mask=%s&vmask=%s&partial=%s
    • using the GET method to send the request to provider agent
  • be able to get a quote from provider agent and send back to tenant verifier

Problem:

  • still confused about the synchronize/asynchronize approach for getting a quote.
    • the current approach is set both tenant and provider getting the quote as 'await', and it works for now
  • the request handler for tornado asynchronized server will be blocked, can't processing multiple requests simultaneously
    • it seems the handler only run on one thread even in this concurrent situation.

Detail of the problem:
Explain in the following comments

How to run the code:

  1. Run a full set of keylime instance for provider
    • set cloud_verifier_tornado.py L281 d['need_provider_quote'] = False before run the code
  2. Run a full set of keylime instance for tenant
    • set loud_verifier_tornado.py L281 d['need_provider_quote'] = True before run the code
    • hardcoding provider's ip and port on cloud_verifier_tornado.py L410

What's the behavior of the code
After provision the tenant agent from tenant's tenant terminal, you can see the quote which requests by tenant verifier in the tenant's verifier terminal and the result of validating the quote.
Result
@nabilschear @lukehinds , thanks a lot!
Resolves: #201

Luke Hinds and others added 18 commits November 8, 2019 19:00
callbacks have now been depreciated so we need to move to using
asyncio.

Resolves: keylime#196
catch up with keylime team, change the framework of verifier:
add Async implementation to support Tornado version 6

Note:
Current version has been roll-back, not support multi-verifier on same machine (running on different port) now.
Support library has been changed, need to run `python3 setup.py install` again.
Old functions are good under current framework.
Newly added functions has not been verified yet, but please continue working on this version.
What has been done:

how to differetiate tenant and provider:
- set a flag in the verifier's agent library: cloud_verifier_tornado.py L281 `d['need_provider_quote'] = False`. Provider set `False` and tenant set `True`

for tenant side:
- be able to send request to provider after getting quote from its own agent
	- using GET method to send request to provider side endpoint
- be able to verifier the provider quote with provider information (hardcoding)
- can get provider quote only once, then enter attestation loop

for provider side:
- develop an endpoint `/verifier?nonce=%s&mask=%s&vmask=%s` to receive and handle the request from tenant
- be able to forward these parameters to agent api `/quotes/integrity?nonce=%s&mask=%s&vmask=%s&partial=%s`
	- using GET method to send request to provider agent
- be able to get quote from provider agent and send back to tenant verifier

Problem and TODO:
- still confused about the synchronize/asynchronize approch for getting quote.
	- current approach is set both tenant and provider getting quote as 'await', and it works for now
- the request handler for tornado asynchronized server will blocked, can't processing multiple requests simultaneously
	- hence don't have the condition to test asynchronized

Detail of problem:
Explain in following comments

How to run the code:

1. Run a full set of keylime instance for provider
	- set `cloud_verifier_tornado.py` L281 `d['need_provider_quote'] = False` before run the code
2. Run a full set of keylime instance for tenant
	- set `loud_verifier_tornado.py L281` `d['need_provider_quote'] = True` before run the code
	- hardcoding provider's ip and port on `cloud_verifier_tornado.py` L410

What's the behavior of the code
After provision the tenant agent from tenant's `tenant terminal`, you can see the quote which requests by tenant verifier in the tenant's `verifier terminal` and the result of validating the quote.
@cjustacoder
Copy link
Author

My confusion about synchronizing and asynchronizing approach:

The question is that for the two-stage request process (tenant verifier get provider quote), @nabilschear you have mentioned the provider verifier shouldn't use ensure_future. I'm confused about this part. From my understanding, for the tenant verifier, to get a provider quote is a part of its bootstrapping (one state in the state machine). So the tenant should wait to proceed provide_V until it gets a provider quote and the quote is valid, it should be blocked. Therefore, the request stage for the tenant verifier should be synchronized and use await.
For the provider verifier, it spun up before the tenant verifier, so basically it has already entered the attestation get quote loop when it received the get provider quote request from tenant verifier. Besides, all tenant verifiers will send the get provider quote request to the provider verifier, so the provider verifier should be able to handle these tenants as well as keep doing attestation loop, it shouldn't be blocked. Therefore, the provider verifier should be asynchornized and use ensure_future.

@lukehinds lukehinds self-requested a review November 21, 2019 11:31
Copy link
Member

@lukehinds lukehinds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thing noted, but if you intend to keep this in its own branch for now and don't need to merge into master, I don't mind, as I figure most of the points i highlight are just to make testing and development easier.

@@ -23,7 +23,7 @@ revocation_notifier_ip = 127.0.0.1
revocation_notifier_port = 8992

# turn on or off TLS keylime wide
enable_tls = True
enable_tls = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't disable it by default, if its for testing development that you need this off, I recommend you have debug value that sets this accordingly if recognised as running for testing, an example here the @nabilschear and @jetwhiz used for a value when running in their IDE:

if common.DEVELOP_IN_ECLIPSE:
argv = ['provider_platform_init.py','1','2']

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll follow that. I don't know we can do this for development. I'll look into that.

@@ -312,7 +312,7 @@ max_retries = 10
# might provide a signed list of EK public key hashes. Then you could write
# an ek_check_script that checks the signature of the whitelist and then
# compares the hash of the given EK with the whistlist
require_ek_cert = True
require_ek_cert = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same again, needs to stay as True or it creates a security hole for anyone using a Hardware TPM

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still solve this with keylime/keylime/provider_platform_init.py above during developing right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, its not a big problem and its fine being set this way in this branch, my main point was we can't have False as the default for master branch and a release.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, understand.

@@ -45,6 +47,8 @@ def init_client_tls(config,section):

if not config.getboolean('general',"enable_tls"):
logger.warning("TLS is currently disabled, AIKs may not be authentic.")
global enableTLS
enableTLS = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't set this here please, just use a pre run script with sed:

keylime/test/run_tests.sh

Lines 136 to 137 in 7c6e0ea

echo -e "Setting require_ek_cert to False"
sed -i 's/require_ek_cert = True/require_ek_cert = False/g' /etc/keylime.conf

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll follow that.

Comment on lines +435 to +454
provider_agent = {'v': '6pffdsXraIoxcDc3QxVCJKJUqdAZTzle+XUdIV1rgOc=',
'ip': '127.0.0.1', 'port': 9002,
'operational_state': 3,
'public_key': '-----BEGIN PUBLIC KEY-----\nMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEApCVReaFJHqQl4kj0CCtw\nqP0YOvW+4Y4x5d0chZvCF77EIZpPG+4sANhfxPaXkkPiyRrrpgtsFMNPQWhDTgWE\n7hCCQeBXAQc3SUn+o2FmuN5xGYHoEBXjeZQrUUJN8kTqEtrftUgoBRfXfQauNRLE\nmxBpotLnuLOIWyBtPAzjcX4tvQOki+Cg5gZBRbwpSBmuigoto53+ZTZ4gd5K0yBz\n9sZt6jru/OAlpMbm5XO0qtbgW6JpdE/4+JPfF+SHcL7dJesGMtorPLNodKRUlVAr\nVk1YW7g7+dZZZ+esABwPpTsnWyykdxHquWY5in4p4cwgsFVoBkr7pgstT4FjmUty\nlQIDAQAB\n-----END PUBLIC KEY-----\n',
'tpm_policy': {'22': ['0000000000000000000000000000000000000001', '0000000000000000000000000000000000000000000000000000000000000001', '000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001', 'ffffffffffffffffffffffffffffffffffffffff', 'ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff', 'ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff'], '15': ['0000000000000000000000000000000000000000', '0000000000000000000000000000000000000000000000000000000000000000', '000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000'], 'mask': '0x408000'},
'vtpm_policy': {'23': ['ffffffffffffffffffffffffffffffffffffffff', '0000000000000000000000000000000000000000'], '15': ['0000000000000000000000000000000000000000'], 'mask': '0x808000'},
'metadata': {},
'ima_whitelist': {},
'revocation_key': '',
'tpm_version': 2,
'accept_tpm_hash_algs': ['sha512', 'sha384', 'sha256', 'sha1'],
'accept_tpm_encryption_algs': ['ecc', 'rsa'],
'accept_tpm_signing_algs': ['ecschnorr', 'rsassa'],
'hash_alg': 'sha256',
'enc_alg': 'rsa',
'sign_alg': 'rsassa',
'need_provider_quote': False,
'registrar_keys': {'aik': '-----BEGIN PUBLIC KEY-----\nMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA1YDgoAABaEBMtDzZ7u0q\nD1MZpwxP0QGzDhs54F7iYt3Vee8x86EArvV9qnzylGu6JhQ+vc9VS6K6mZIDjUtc\nMdgXM5V6p2HDZveAr2w9aH4sCbVUNN8YcIp3G96WOzFcoa6k5Medt8LpAZjL9J7J\nhEFdwYhG4b4nVWP2YTHwsvEmpG7FBe46chWY46N3/spmvOi1NFQuzCz+oYQNZ/mG\nskBGQLO+zT+Fmv3sQHx/qPpxrLRtUrzQqWz3R6pyTUrn1FJcrFj2VDzs0zhc/WE2\nb2wvnR6IxoMsE/imRuJZXMlArT+ZpPEIYPmWnKZiU8Co7E5kxNjQ1HoQvC3yxhPM\n5QIDAQAB\n-----END PUBLIC KEY-----\n', 'ek': '-----BEGIN PUBLIC KEY-----\nMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA0dLxdAABVJO6qxamjCMh\nyhWZgiFHZHnPEe0tMFyK3fNVr/w8lX9r+QOLxLmkT0IdgsEYtGZGefbD+qQl4O1s\nk25823Xzu5tEF8966rTdkfsv8CRrNaBLwWlnt/n+qjIoU3xZJMmR+mFfqTc3a6zV\nmPOYJstFtM8r4b9HPCUq6Mte/J3Wx4FxI9R4UrCUyiAeH++0QapIxuEGsVIYs92n\nGyvFQYBZFRU6cIt33iaqTrRCICJp+YblMnw54YJGAH2vTVQf6/fLAnQt5L1UfmTy\nR/ZA6advx8soekSBOIAW7XmV8Xp9mSquIHZdSXMJlcn/B35PU3BdkUtIYm5JuGGt\nPQIDAQAB\n-----END PUBLIC KEY-----\n', 'ekcert': 'emulator', 'regcount': 1},
'nonce': 'HjhabaRBE2Aiiyz5R0YH',
'b64_encrypted_V': b'c6B/uXCDIPpeEnGu64vF92aWuDrGhMtKyt61eg/Am1y/TFbmKFvhsyCoAQr6WnJTjoinllwfE7ou22wc4DOyWWWMG7L/E94I8fu2ooxdcFY+a5W5tr6RFa1i54ogbR/SM4s0IR7si3FANk30P66Ifu2fTM5lXd9u+ly4hkdOpYQIvH82gCf/J+S0m9+VhHtP5q7CyQzzVqu6pqTRERTwW6DQ2GsAB26CPepD3YOlXFcmLMFssB4lyvRcWKZ7CUk4FB6jcVruneqJkzdLiWd8icgJHdl7qwKdniRuwZiXAIAJ7ARZqPp4M5oOmJgKoy555MxOAglxmgAx6HeZP8CqHg==',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figure this is just to help testing / development?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this hardcoding information is for development. Since we haven't developed the registration process yet, we need the pubkey and AIK pubkey to verify the provider quote. We will eliminate these part after registration process is done.

@cjustacoder
Copy link
Author

cjustacoder commented Nov 27, 2019

Update for this PR

What are we trying to do

Do the nonce aggregation in the provider verifier when it receiving multiple get provider quote requests from the tenant verifier. We add a sleep function in the provider's GET handler to simulate the stall of hardware TPM.

What's the problem

Simulation of the stall

We add a sleep function in the provider's GET handler to simulate the stall of hardware TPM. We want to add sleep to the agent originally, but we find the agent's HTTPServer is multi-tread and if we add sleep it will be treated separately on each thread. And the type of sleep is asynio.sleep(). Is this approach of simulation acceptable? Or we can have another desirable way to simulate the stall?

Attempt to gather nonce with Global variable, asyn.queue, and SQLdb

We want to batch all the nonce from requests together. But since we have tornado.process.fork_processes(config.getint('cloud_verifier','multiprocessing_pool_num_workers'))
The requests are randomly partitioned, so as the nonce inside. So we can't gather them all. As you can see from the screenshot below.

1127_01

In this example, I send 10 requests simultaneously. I try to use the async queue (first column), global variable(second column)corresponding code, apparently, the requests are partitioned when pooling jobs to workers(divided into 3 + 7). Then I remember Nabil has mentioned sharing and synchronizing information with the SQL database. So I hack a little bit inside the SQLdb, adding a new column 'quote_col' to store tenant's noncechange to SQLdb. Each time the provider verifier receives a get provider quote request from the tenant verifier, it will read info from the DB, append the nonce to the column and update the DBcorresponding code. And this approach seems to work (third column), it gathers all ten nonces.

Then we have another problem. After gathering all the nonce and finishing the stall, we want to extract all nonces by reading from DB. But when we read from DB, this column is empty.corresponding code
1127_02

It doesn't make any sense, if the column is empty, the length of the list should be zero (in screenshot column three). So that may be the asyn things. And we don't know how to do this.

Another solution (double post)

@ericli21 please leave a comment below with your design and problems, with the link of code preferably.

Serialize and put Merkle tree into the response

@klwalsh please leave a comment below with your design and problems, with the link of code preferably.

How to run the code

(haven't integrated @astoycos resolved hardcoding version yet)

  1. Start two identical virtual machines with keylime installed. One is provider and the other one is the tenant. Put these two machines into same network. Can use @astoycos ansible vagrant file. (@astoycos please leave the link of the file below)
  2. Since the structure of DB has been changed. You need to remove the original DB file to resolve the incompatibility. rm /var/lib/keylime/cv_data.sqlite
  3. Some library has changed, run python3 setup.py install in the keylime root folder.
  4. Run the provider first
    i. Edit d['need_provider_quote'] = Falselocation
    ii. Bring up the whole set of keylime instance as usual. keylime_verifier, keylime_registrar, keylime_agent and keylime_tenant -t 127.0.0.1 -f /home/zycai/Keylime_test/keylime/README.md
  5. Run the tenant
    i. Edit d['need_provider_quote'] = Truelocation
    ii. Hardcoding the provider's IP and verifier's port in the URL.loaction
    iii. Bring up the whole set of keylime instance as usual.

What you will see

From the tenant verifier terminal, you can see the quote from the provider verifier with the nonce you sent. And it can also verify the quote with hardcoding provider info(Since we haven't implemented the registration process yet, we don't have provider's pubAIK).

How to test the batch get request

Turn off the tls and use curl to test. (can test on one VM using its own URL)

curl "http://127.0.0.1:8881/verifier?nonce=70159ea0d&mask=0x408000&vmask=0x808000" & \
curl "http://127.0.0.1:8881/verifier?nonce=701561a1d&mask=0x408000&vmask=0x808000" & \
curl "http://127.0.0.1:8881/verifier?nonce=708f2d&mask=0x408000&vmask=0x808000" & \
curl "http://127.0.0.1:8881/verifier?nonce=70caea3d&mask=0x408000&vmask=0x808000" & \
curl "http://127.0.0.1:8881/verifier?nonce=7800ea4d&mask=0x408000&vmask=0x808000" & \
curl "http://127.0.0.1:8881/verifier?nonce=70891ea5d&mask=0x408000&vmask=0x808000" & \
curl "http://127.0.0.1:8881/verifier?nonce=7016ea6d&mask=0x408000&vmask=0x808000" & \
curl "http://127.0.0.1:8881/verifier?nonce=7050ea7d&mask=0x408000&vmask=0x808000" & \
curl "http://127.0.0.1:8881/verifier?nonce=7043ea8d&mask=0x408000&vmask=0x808000" & \
curl "http://127.0.0.1:8881/verifier?nonce=7098ea9d&mask=0x408000&vmask=0x808000";

(note! requests must send simultaneously, using &, otherwise, the script will wait until the last curl receives its response )

@lukehinds
Copy link
Member

thanks @cjustacoder , just need to fix #212 and will jump onto this.

@klwalsh
Copy link

klwalsh commented Dec 2, 2019

Merkle Tree

Design

We want to use the Merkle tree data structure to accumulate nonces sent to our provider verifier.

Ideally, what we wanted to do was to serialize the Merkle tree into a string or JSON format (to send back to the tenant verifiers in the GET response). Then, we send the root to the provider agent, and the TPM could sign the quote with the root as the given nonce, which then gets sent back to the provider verifier. From there, there were two options:

  1. Send the entire tree back to each tenant verifier, and have nonce inclusion verification happen there.
  2. Have the provider verifier verify the inclusion of each tenant verifier nonce, and send back only the appropriate audit proof to each tenant verifier.

The first option is more straightforward, but transmits more data. The second option requires bookkeeping at the provider verifier level, but sends log(n) data instead of (n) data. This was originally considered step two of our development process, since we needed to figure out how we're going to send the Merkle tree first.

However, sending the Merkle tree has proven more difficult than we originally thought.

Current Issues

(Note: I have been testing Merkle tree functionality in this script before integrating it with Keylime for debugging purposes)

The main problem with sending the Merkle tree is that the typical serialization methods do not work: Jsonify and json.dumps fail because the object MerkleTree is not serializable. Using the pickle library yielded the same results. Using the export function included in merklelib also fails (NameError: name 'io' is not defined).

If we can't serialize the Merkle tree, then we can't send it to the tenant verifiers after processing. This puts us in a pickle (no pun intended).

Moving forward, if we can't figure out a way to serialize the tree, then nonce inclusion verification needs to happen in the provider verifier. We would have to include some bookkeeping in the provider verifier to keep track of which nonces go to each tenant verifier. This way, we know where each individual proof needs to be sent. One way to do this is to tie each nonce to its tenant verifier's IP and port.

@ericli21
Copy link

ericli21 commented Dec 2, 2019

The current solution is having the tenant verifier send a GET request with the nonce to the provider verifier, which handles the request using its get function and sends back the quote as a response. Before it sends the response, this get function must await invoke_get_quote, which must await the GET response from the provider agent. Because multiple get handler functions must await a single invoke_get_quote function that batches all nonces up, this becomes confusing to deal with using the asyncio tools.

An alternate solution to the REST API interface is to have the tenant verifier POST their nonce to the provider verifier, so it does not expect the quote back as a response. The tenant verifier will then wait in a new state (the state is called GET_STATUS for now).

The provider verifier will store the nonce + PCRs requested, IP, and port through the post handler function into a series of data structures. This series of data structures is the next batch of GET requests that will be processed when getting a quote. When the provider is ready to request a quote again, it will copy this next batch series of data structures into a current batch series of data structures. The next batch series is then emptied to allow for aggregation of new nonces.

Using this new batch of nonces in a Merkle tree, invoke_get_quote will send a single GET request using the root of this Merkle tree to the provider agent, and await the GET response. When receiving the quote via response, it will go through the current batch data structures to look for the port and IPs to POST the quote back to the correct tenant verifiers.

Upon receiving the POST, the tenant verifier will enter the next state, which is PROVIDE_V. Overall, this method is similar to the original, but tries to avoid having to deal with asyncio tweaks. A weakness is that if the tenant verifier never receives the 2nd POST, it will be stuck in the waiting state.

@stale
Copy link

stale bot commented Jan 31, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jan 31, 2020
@stale stale bot closed this Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-Verifier testing
4 participants