Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scenario testing - Restarts and upgrades #3269

Open
jamshale opened this issue Oct 3, 2024 · 35 comments
Open

Scenario testing - Restarts and upgrades #3269

jamshale opened this issue Oct 3, 2024 · 35 comments
Assignees

Comments

@jamshale
Copy link
Contributor

jamshale commented Oct 3, 2024

In the acapy tools repo there is some really useful testing using docker containers that allows testing of upgrade scripts and restarts. https://github.com/hyperledger/aries-acapy-tools/tree/main/askar_tools/tests/e2e.

It should be possible to support this in the scenarios directory from acapy itself. It would be awesome to be able to test a supported upgrade path from the last LTS version and test a few different restarts with changed configurations such as making public dids and using seeds etc.

@ianco ianco self-assigned this Nov 26, 2024
@ianco
Copy link
Contributor

ianco commented Nov 26, 2024

@jamshale are there instructions for running the e2e tests?

@jamshale
Copy link
Contributor Author

In [acapy-tools](https://github.com/openwallet-foundation/acapy-tools) I don't think there is any instructions :/ . It's pretty simple though. I just poetry install and then navigate to askar_tools and run poetry run pytest ..

You can use other pytest commands to run an individual test.

@ianco
Copy link
Contributor

ianco commented Nov 26, 2024

ok thanks, I'll add this to the README

@ianco
Copy link
Contributor

ianco commented Nov 26, 2024

Probably more useful to be able to test this with a back-end postgres database rather than having to worry about import/export for sqlite wallets (and then the tests could run in a separate docker container, and just point to the external database).

@jamshale
Copy link
Contributor Author

I agree. For this scenario testing against a postgres db would be easier and better. The sqlite testing was much more difficult with restarts in acapy-tools. Also would be good to have postgres getting tested more here in general.

@ianco
Copy link
Contributor

ianco commented Nov 27, 2024

@jamshale I'm getting errors running the tests.

I ran poetry install --with=dev (to include the dev dependencies, basically acapy-controller), which failed with the following error:

HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /indicio-tech/acapy-minimal-example.git/info/refs?service=git-upload-pack (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)')))

I updated the project to use acapy-controller = "~=0.2.0" (the published version of the library) and then the install worked.

After running poetry run pytest . however everything failed:

FAILED tests/e2e/test_export.py::TestPgExport::test_export_pg - aiohttp.client_exceptions.ClientConnectorCertificateError: Cannot connect to host selfserve.indiciotech.io:443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed:...
FAILED tests/e2e/test_export.py::TestPgExport::test_export_sqlite - docker.errors.APIError: 409 Client Error for http+docker://localhost/v1.43/containers/create?name=alice: Conflict ("Conflict. The container name "/alice" is already in use by container "81137e408a975071c15cfb032...
FAILED tests/e2e/test_mt_convert_to_mw.py::TestMultitenantConvertToMultiwallet::test_conversion_pg - aiohttp.client_exceptions.ClientConnectorCertificateError: Cannot connect to host selfserve.indiciotech.io:443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed:...
FAILED tests/e2e/test_mt_convert_to_mw.py::TestMultitenantConvertToMultiwallet::test_conversion_sqlite - docker.errors.APIError: 409 Client Error for http+docker://localhost/v1.43/containers/create?name=admin: Conflict ("Conflict. The container name "/admin" is already in use by container "14b1b2cf31f1c7e6c9f075f70...
FAILED tests/e2e/test_tenant_import.py::TestTenantImport::test_tenant_import_pg - aiohttp.client_exceptions.ClientConnectorCertificateError: Cannot connect to host selfserve.indiciotech.io:443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed:...

Any thoughts? Looks like a certificate error and then the container ... already is use is because a previous test failed.

@ianco
Copy link
Contributor

ianco commented Nov 27, 2024

(the above is in the askar_tools directory, for the tests under acapy_wallet_upgrade some work and some fail)

(most of the tests fail)

@jamshale
Copy link
Contributor Author

It doesn't clean up very well when there's a fail. Sometimes you need to manually delete the containers or the network.

I'm not sure about the certificate error though. I'm not getting any and the tests are passing for me. You might be able to switch the genesis file. I'm just trying that now.

@ianco
Copy link
Contributor

ianco commented Nov 27, 2024

What's running at selfserve.indiciotech.io:443?

@jamshale
Copy link
Contributor Author

jamshale commented Nov 27, 2024

The acapy-tools tests use the indicio indy test ledger. I guess that's where it's trying to register a did. I changed the scenario tests in acapy to use http://test.bcovrin.vonx.io/. There's something automatic that the acapy-controller indy_anoncred_onboard function does that requires https that prevents just switching easily :/

Not sure why your getting certificate problems though.

@ianco
Copy link
Contributor

ianco commented Nov 27, 2024

Looks like all that is hard-coded into the unit tests. I'll update to use env vars (if available) and also it looks like some tests have a postgres dependency.

Eventually would be nice to run the tests in docker (I always have to mess around when running locally to get all the proper versions, like python 3.10 which isn't a default on my local ...)

@jamshale
Copy link
Contributor Author

Ya. These test in acapy-tools aren't the best, but the container process that allows restarts is what I wanted to replicate for acapy.

I got around the hardcoded stuff in the library for the ledger here https://github.com/openwallet-foundation/acapy/blob/main/scenarios/examples/presenting_revoked_credential/example.py#L50.

@ianco
Copy link
Contributor

ianco commented Nov 27, 2024

I notice that the test container logic starts up a bunch of services within the python code. Typically in other projects we do this within a docker-compose file, and then pass parameters in (overrides) via environment variables.

I'd like to convert this project to use a similar structure - docker compose and then build a docker image to run the unit tests.

Before I dive into the code, does this make sense? It should make it easier to use these components within other repo's (like aca-py).

@jamshale @swcurran @dbluhm

@jamshale
Copy link
Contributor Author

Yes. That would be good. In the scenario tests in acapy we use docker compose files. It's really just about how to persist the storage containers for the length of the test and restart the agents with a different image and configs. The acapy-tools e2e tests do this so I thought we could use a similar design.

@ianco
Copy link
Contributor

ianco commented Nov 27, 2024

It's really just about how to persist the storage containers for the length of the test and restart the agents with a different image and configs.

I'll work on updating the acapy-tools tests to use docker and think about how to do this for the aca-py scenario tests ...

@dbluhm
Copy link
Contributor

dbluhm commented Nov 27, 2024

The reason the tests start the container in python is because that seemed like the best way to control starting and stopping containers. I also generally prefer to keep it simple and just use docker compose wherever possible but, at the time, I was not finding good solutions that permitted both simple docker compose and controlling startup parameters and starting and stopping the containers. So docker compose got nixed.

@ianco
Copy link
Contributor

ianco commented Dec 3, 2024

@jamshale I'm getting errors running the tests.

FYI managed to fix the certificate issue with this: https://stackoverflow.com/questions/52805115/certificate-verify-failed-unable-to-get-local-issuer-certificate

@ianco
Copy link
Contributor

ianco commented Dec 5, 2024

@jamshale I think the scenario testing approach will have to change to support acapy restarts (especially if we are shutting down one acapy version and then starting up another).

Right now the scenario test all start their agents via docker compose, so there is no mechanism for the test (which runs in one container) to restart the agent (which runs in a different container). (The test creates an instance of an acapy_controller.Controller to control the running agent.)

One possibility is to change the acapy-test container to include the acapy_controller.Controller wrapper, and then expose an additional endpoint to start, stop or restart the agent. (The agent would then be controlled from within the python code, similar to how the askar_tools tests work.) Of course all the other protocols would have to be exposed in a similar way.

Another possibility is to ditch the use of docker compose to start the agents and just manage all the agents within python code, similar to how the askar_tools tests work.

One wrinkle is the requirement to support different versions of acapy - i.e. startup an older version of acapy, run some scenarios, shut down the agent and then start up a new agent (with a newer acapy version) pointing to the same wallet. (Or running a wallet upgrade in between.)

@jamshale @dbluhm any thoughts?

@jamshale
Copy link
Contributor Author

jamshale commented Dec 5, 2024

What if it was separate from the scenario tests? Like, it was a different set of tests called migration tests. Maybe it could be in the same poetry project but started as a different poetry script?

I think the suite of tests runs the tests in the examples folder sequentially. If the migration tests were in another folder, it might be easier to have different setup/structure.

@ianco
Copy link
Contributor

ianco commented Dec 5, 2024

What if it was separate from the scenario tests?

I'll setup a separate test ... One option for the "acapy version" issue is to have two Controller containers based on two different acapy versions (for example the latest acapy release and the local acapy source code) and then starting an agent in one container, executing some protocols, shutting down the agent, and then starting a new agent from the other Controller container (pointing to the same wallet).

The upgrade (if we're running a wallet upgrade) could be run directly from the example code.

@ianco
Copy link
Contributor

ianco commented Dec 11, 2024

@jamshale @swcurran @dbluhm ...

After a bit of beating my head against a wall I've come to the conclusion that the way we're testing the aca-py tools vs the aca-py scenario tests are basically incompatible:

  • for aca-py tools, the tests are run locally and the docker containers are started/stopped from within the python code (using https://docker-py.readthedocs.io/en/stable/)
  • for the scenario tests, there is a python script that is run locally, but it's just a "driver" that uses docker compose to start some containers, run tests, and then shut down the containers

The python docker library use by aca-py tools tests can't be run within a docker container (can't connect to docker on the host) so it can't be used within the scenario tests, the way they are currently structured.

A couple of options:

  • we could write a process that starts/stops the aca-py agent, and run this in a container (this was my original thought, but it can't use the docker library so would need to run the agent as a sub-process, the way the alice/faber demo does it)
  • we could modify the whole test approach, so that the driver script (https://github.com/openwallet-foundation/acapy/blob/main/scenarios/conftest.py) runs multiple steps for each scenario - ie. start containers, run tests, stop agent(s), run upgrades, start agent(s), etc.

Either of these would be fairly complicated so not sure it would be worth the effort, vs just using the aca-py tools repo to do the upgrade testing.

Thoughts?

@jamshale
Copy link
Contributor Author

jamshale commented Dec 11, 2024

Ok. That's a bit of a bummer. I didn't expect it to be this complicated. Maybe it would be easier to just have a test in acapy-tools that tests the upgrade to latest acapy with the scenarios we think are important. I still think it would be very useful for migration stuff like anoncreds upgrade and the upcoming DIDInfo work.

We could have a github action that clones acapy-tools and runs the test on a release like we are doing with OATH right now.

@ianco
Copy link
Contributor

ianco commented Dec 11, 2024

I'm not sure the python docker library will work under github actions (it needs a docker instance to talk to)

@jamshale
Copy link
Contributor Author

Ok. Maybe we can leave this for now. I didn't intend for it to be a lot of work. Maybe we can think of another way to test these scenarios.

@dbluhm
Copy link
Contributor

dbluhm commented Dec 12, 2024

It's possible to share the docker socket with a container to enable it to use docker on the host machine. This usually involves adding a volume to the container. A quick ChatGPT generated example:

version: "3.8"

services:
  docker-client:
    image: docker:latest
    container_name: docker_client
    privileged: true
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./workspace:/workspace
    environment:
      - DOCKER_HOST=unix:///var/run/docker.sock
    entrypoint: ["sh", "-c", "while true; do sleep 3600; done"]

# Example Usage:
# To run a Docker command inside the container:
# docker exec -it docker_client docker ps

@ianco
Copy link
Contributor

ianco commented Dec 12, 2024

It's possible to share the docker socket with a container to enable it to use docker on the host machine. This usually involves adding a volume to the container.

I ran a quick test and this worked! I'll put together a simple example and open a PR.

@ianco
Copy link
Contributor

ianco commented Dec 13, 2024

I'm working on a simple test scenario with an agent restart (including updating the agent from the previous release to the latest code). The stop/start is working but the networking config on the new agent isn't working yet :-S I've created a draft PR in case anyone wants to take a look.

Note that the new agent is started by docker directly rather than docker-compose, so docker-compose doesn't know anything about it. This will complicate the test cleanup a bit ...

@dbluhm
Copy link
Contributor

dbluhm commented Dec 13, 2024

Explicitly creating a docker network might help with that. There's a lot of gotchas in docker networking though -- hopefully we can work through this one

@WadeBarnes
Copy link
Contributor

Explicitly creating a docker network might help with that. There's a lot of gotchas in docker networking though -- hopefully we can work through this one

You can use indy-test-automation as a reference, it sets up a docker network in several places (GHA workflows and scripts) for the purpose of setup and testing of a network. Search for docker network.

@ianco
Copy link
Contributor

ianco commented Dec 13, 2024

Turns out I didn't need to create a network just explicitly connect to the network docker compose already created ...

The restart is now working (shut down alice aca-py 1.0.1 and then start a new alice with acapy-test (built from local code) connecting to the same database. Docker logs are all screwy because some containers are started by docker compose and some aren't, and also the post-test cleanup will be a bit trickier.

I'll work on including an upgrade step in between the alice shutdown and restart.

@ianco
Copy link
Contributor

ianco commented Dec 19, 2024

Good news (or bad news) - I've implemented a fairly robust anoncreds upgrade test and I'm now getting an error in the upgrade step. I'll dig into this but it looks like an aca-py issue. Alice has issued 6 credentials to 3 holders (2 each) and revoked 3 credentials (1 each). I set the revocation registry size to 5 so a new registry should have been created during this scenario.

See PR #3410

alice-1           | 2024-12-19 18:57:15,708 aiohttp.access INFO 172.23.0.8 [19/Dec/2024:18:57:15 +0000] "POST /anoncreds/wallet/upgrade?wallet_name=alice HTTP/1.1" 200 248 "-" "Python/3.10 aiohttp/3.11.10"
alice-1           | 2024-12-19 18:57:15,710 acapy_agent.wallet.anoncreds_upgrade INFO Upgrade in process for wallet: alice
alice-1           | 2024-12-19 18:57:15,720 acapy_agent.wallet.anoncreds_upgrade ERROR Error when upgrading wallet alice : cannot access local variable 'is_active' where it is not associated with a value 
alice-1           | 2024-12-19 18:57:15,728 acapy_agent.wallet.anoncreds_upgrade ERROR Error when upgrading records for wallet alice : cannot access local variable 'is_active' where it is not associated with a value 
alice-1           | 2024-12-19 18:57:15,728 acapy_agent.wallet.anoncreds_upgrade INFO Retry attempt 1 to upgrade wallet alice
alice-1           | 2024-12-19 18:57:16,745 acapy_agent.wallet.anoncreds_upgrade ERROR Error when upgrading records for wallet alice : cannot access local variable 'is_active' where it is not associated with a value 
alice-1           | 2024-12-19 18:57:16,745 acapy_agent.wallet.anoncreds_upgrade INFO Retry attempt 2 to upgrade wallet alice
alice-1           | 2024-12-19 18:57:17,759 acapy_agent.wallet.anoncreds_upgrade ERROR Error when upgrading records for wallet alice : cannot access local variable 'is_active' where it is not associated with a value 
alice-1           | 2024-12-19 18:57:17,760 acapy_agent.wallet.anoncreds_upgrade INFO Retry attempt 3 to upgrade wallet alice
alice-1           | 2024-12-19 18:57:18,776 acapy_agent.wallet.anoncreds_upgrade ERROR Error when upgrading records for wallet alice : cannot access local variable 'is_active' where it is not associated with a value 
alice-1           | 2024-12-19 18:57:18,776 acapy_agent.wallet.anoncreds_upgrade INFO Retry attempt 4 to upgrade wallet alice
alice-1           | 2024-12-19 18:57:19,799 acapy_agent.wallet.anoncreds_upgrade ERROR Error when upgrading records for wallet alice : cannot access local variable 'is_active' where it is not associated with a value 
alice-1           | 2024-12-19 18:57:19,799 acapy_agent.wallet.anoncreds_upgrade INFO Retry attempt 5 to upgrade wallet alice
alice-1           | 2024-12-19 18:57:20,814 acapy_agent.wallet.anoncreds_upgrade ERROR Error when upgrading records for wallet alice : cannot access local variable 'is_active' where it is not associated with a value 
alice-1           | 2024-12-19 18:57:20,814 acapy_agent.wallet.anoncreds_upgrade ERROR Failed to upgrade wallet: alice after 5 retries. 
alice-1           |                 Try fixing any connection issues and re-running the update

@jamshale
Copy link
Contributor Author

OK. Thanks for testing this. I wasn't super confident about some of this more complicated revocation stuff. There's some subtle differences between revocation with anoncreds.

Maybe we can keep the failing test as a separate branch, and I can try and look at it soon.

@ianco
Copy link
Contributor

ianco commented Dec 19, 2024

Doing a very quick review of the aca-py upgrade code, I suspect the issue is with the record for the "used up" revocation registry (used for the first 5 credentials).

@ianco
Copy link
Contributor

ianco commented Dec 19, 2024

Found the issue!

@swcurran
Copy link
Contributor

This is awesome — nice work! Testing is really helpful! Having a good ways to test — amazing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants