Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker files #713

Merged
merged 27 commits into from
Aug 30, 2022
Merged

Docker files #713

merged 27 commits into from
Aug 30, 2022

Conversation

airenas
Copy link
Contributor

@airenas airenas commented Aug 19, 2022

Resolves #695
Prepared docker files for:

  • lotus-dev and lotus-miner-dev - based on official lotus image file
  • boost-dev (contains boosted, boostx, boost, lotus, lotus-miner). I was thinking do we need a separate container for testing the client? I made a decision to go with one container. So it initially runs the boost service, but users can attach to it and test the client and make a sample deal.
  • boost-gui - based on nginx

I intentionally added the dev suffix as the images contain tools that are built in debug mode. As the images are for demo purposes, I made some simplifications, like container's processes run with root permissions. Doing otherwise needs to solve issues with mapped volumes. It is doable but it is not worth demo containers.

Docker compose contains:

  • all containers noted above
  • plus an http server for serving static files. It acts as a public http server during a demo deal process.

I documented:

  • how to build/publish containers
  • how to start devnet on docker
  • added a sample script with an explanation. It initiates boost client and makes a deal

In order to test the devnet now:

  • first, you need to build images using a prepared script, and then start the devnet
  • or you can use public containers I have published under my account when testing. Open examples\devnet\.env file and change the docker user from filecoin to airenas.

If the pull request is OK then what next?
Someone having access to the filecoin dockerhub account should publish the images. It would allow users to test the devnet without building these images on local machines

@LaurenSpiegel
Copy link
Collaborator

LaurenSpiegel commented Aug 20, 2022

This is great!

I was able to build the docker containers from within build/devnet, run:

make build/all lotus_version=1.17.1-rc2 boost_version=1.3.0-rc1

Then start the containers from examples/devnet, run:

docker compose up -d

and
docker compose logs -f

I ran ./sample/make-a-deal.sh and got stuck on GasEstimateMessageGas error: estimating gas used: CallWithGas failed

@nonsense, will you please review the PR next week to provide any feedback to @airenas? As we discussed at colo a few weeks ago this is meant for devs to be able to spin up a lotus and boost for testing quickly and for people generally new to the project to be able to try it quickly.

@airenas
Copy link
Contributor Author

airenas commented Aug 22, 2022

I ran ./sample/make-a-deal.sh and got stuck on GasEstimateMessageGas error: estimating gas used: CallWithGas failed

Perhaps the error is dropped from the command: boostx market-add 1. If I'm right, the error indicates that a wallet has no funds. We need to wait a bit. Money transfer was initiated with the previous command lotus send ... and we need to wait for the process to complete. Now I manage it in the script by displaying the error and allowing to retry the boostx market-add 1 command.
We could solve it in the script another way also. Like checking for funds in a wallet every sec, and continue with marked-add when funds are in.

Fix problem when nginx caches boost IP and GUI is not functioning
after `docker compose up` finishes
@nonsense
Copy link
Member

nonsense commented Aug 22, 2022

Overall looks good, great work @airenas

Here are my comments having tried this setup:

  1. I think in the README, we should explain that after starting up the docker compose deployment, it will take a few minutes (about 10 minutes on a good connection) for the network to come up, as Lotus needs to download a few GiB worth of proof parameter files. (In the past we've also included the proof parameters in the image itself, but I am not sure if we want to do that here -- I guess downloading them during runtime is fine too).

  2. Ideally we should publish the deal after initiating it, and then try to retrieve it -- at the moment it seems like the demo stops after we have initiated the deal, and it will just block on the publish stage.

@LaurenSpiegel
Copy link
Collaborator

I ran ./sample/make-a-deal.sh and got stuck on GasEstimateMessageGas error: estimating gas used: CallWithGas failed

Perhaps the error is dropped from the command: boostx market-add 1. If I'm right, the error indicates that a wallet has no funds. We need to wait a bit. Money transfer was initiated with the previous command lotus send ... and we need to wait for the process to complete. Now I manage it in the script by displaying the error and allowing to retry the boostx market-add 1 command. We could solve it in the script another way also. Like checking for funds in a wallet every sec, and continue with marked-add when funds are in.

I rebuilt the containers and reran and got through the whole make a deal script! I then used the UI to publish and got this error --

Paused at 'Announcing': failed to add index and announce deal: failed to announce deal to network indexer: failed to announce deal to index provider: failed to get iterable index: failed to get iterable index: open /var/lib/boost/dagstore/index/baga6ea4seaqhwu3somf6p27ivsxsjulmpykwjrs7ykbu66x3vuahi4yasm64gkq.full.idx: no such file or directory

Something wrong with the volume mounting?

@airenas
Copy link
Contributor Author

airenas commented Aug 23, 2022

Paused at 'Announcing': failed to add index and announce deal: failed to announce deal to network indexer: failed to announce deal to index provider: failed to get iterable index: failed to get iterable index: open /var/lib/boost/dagstore/index/baga6ea4seaqhwu3somf6p27ivsxsjulmpykwjrs7ykbu66x3vuahi4yasm64gkq.full.idx: no such file or directory

Something wrong with the volume mounting?

I have reproduced the error, but don't know what is going on. boost container runs with root user permissions and it has full control of /var/lib/boost directory. I see the only error in the log sealer/piece_provider.go:189 failed to SectorsUnsealPiece: cannot unseal piece (sector: {{1000 2} 5}, offset: 0 size: 508) - unsealed cid is undefined. I'm attaching the combined log of lotus, miner, and boost from the start of a deal to the error in a GUI. Any ideas on what could be wrong?

713_0.log

@airenas
Copy link
Contributor Author

airenas commented Aug 23, 2022

2. Ideally we should publish the deal after initiating it, and then try to retrieve it -- at the moment it seems like the demo stops after we have initiated the deal, and it will just block on the publish stage.

Thanks, @nonsense for the review. I can extend the demo with the publish and the retrieve. The current demo was based on this: https://boost.filecoin.io/tutorials/how-to-store-files-with-boost-on-filecoin. So it stops on the deal stage :). Is there any similar info/doc on how to publish/retrieve? What tools to use for it?

@nonsense
Copy link
Member

@airenas you can trigger a publish of all pending deals with the GraphQL interface:

curl -X POST \
-H "Content-Type: application/json" \
-d '{"query":"mutation { dealPublishNow }"}' \
http://localhost:8080/graphql/query | jq

See more docs at: https://boost.filecoin.io/graphql-api


You can trigger a retrieval from a client with:

lotus client retrieve --miner <MINER_ID> <DATA_CID> <OUTPUT_FILE>

@LaurenSpiegel
Copy link
Collaborator

Paused at 'Announcing': failed to add index and announce deal: failed to announce deal to network indexer: failed to announce deal to index provider: failed to get iterable index: failed to get iterable index: open /var/lib/boost/dagstore/index/baga6ea4seaqhwu3somf6p27ivsxsjulmpykwjrs7ykbu66x3vuahi4yasm64gkq.full.idx: no such file or directory

Something wrong with the volume mounting?

I have reproduced the error, but don't know what is going on. boost container runs with root user permissions and it has full control of /var/lib/boost directory. I see the only error in the log sealer/piece_provider.go:189 failed to SectorsUnsealPiece: cannot unseal piece (sector: {{1000 2} 5}, offset: 0 size: 508) - unsealed cid is undefined. I'm attaching the combined log of lotus, miner, and boost from the start of a deal to the error in a GUI. Any ideas on what could be wrong?

713_0.log

@LexLuthr , any ideas here?

@kylehuntsman kylehuntsman mentioned this pull request Aug 25, 2022
@LexLuthr
Copy link
Collaborator

LexLuthr commented Aug 25, 2022

@airenas I was testing this pull request and keep running into rustup-init error when building lotus image.

#8 6.241 qemu-x86_64: Could not open '/lib64/ld-linux-x86-64.so.2': No such file or directory
#8 6.248 chmod: cannot access '/usr/local/rustup': No such file or directory
#8 6.248 chmod: cannot access '/usr/local/cargo': No such file or directory
#8 6.248 /bin/sh: 1: rustup: not found
#8 6.248 /bin/sh: 1: cargo: not found
#8 6.248 /bin/sh: 1: rustc: not found
------
executor failed running [/bin/sh -c wget "https://static.rust-lang.org/rustup/dist/x86_64-unknown-linux-gnu/rustup-init";     chmod +x rustup-init;     ./rustup-init -y --no-modify-path --profile minimal --default-toolchain $RUST_VERSION;     rm rustup-init;     chmod -R a+w $RUSTUP_HOME $CARGO_HOME;     rustup --version;     cargo --version;     rustc --version;]: exit code: 127
make: *** [prepare/lotus-test] Error 1

Problem is because I am running it on M1 Mac and it requires the image to be compatible with platform. Issue seems to be coming from lotus repo but our Makefile should be compatible with Apple silicon as well as intel. Can you create a conditional to set platform based on the architecture?

Similar issue can be found in boost related images as well.

I am trying to debug the issue with dagstore.

I don't think this PR should be merged till we sort out all the issues.

@airenas
Copy link
Contributor Author

airenas commented Aug 25, 2022

Problem is because I am running it on M1 Mac and it requires the image to be compatible with platform. Issue seems to be coming from lotus repo but our Makefile should be compatible with Apple silicon as well as intel. Can you create a conditional to set platform based on the architecture?

Thanks, @LexLuthr for the good point. I didn't test on arm. I will try to play on aws arm instance now. Definitely, the solution must start on Mac as this is a demo for developers. From your request to change the makefile, I understood, that you were able to run the solution using --platform=linux/amd64 flag. Correct?

I am trying to debug the issue with dagstore.

thank you. From my side I'll try to complete the https://github.com/filecoin-project/boost/blob/main/documentation/devnet.md locally without docker again. First time, when starting the task, I completed it up to the deal step but did not go further.

@LexLuthr
Copy link
Collaborator

Problem is because I am running it on M1 Mac and it requires the image to be compatible with platform. Issue seems to be coming from lotus repo but our Makefile should be compatible with Apple silicon as well as intel. Can you create a conditional to set platform based on the architecture?

Thanks, @LexLuthr for the good point. I didn't test on arm. I will try to play on aws arm instance now. Definitely, the solution must start on Mac as this is a demo for developers. From your request to change the makefile, I understood, that you were able to run the solution using --platform=linux/amd64 flag. Correct?

I am trying to debug the issue with dagstore.

thank you. From my side I'll try to complete the https://github.com/filecoin-project/boost/blob/main/documentation/devnet.md locally without docker again. First time, when starting the task, I completed it up to the deal step but did not go further.

Yes. But, it seems building on linux/arm64 does work as libraries required are not found. Thus, I had to force the build on linux/amd64 and then buildkit started creating problem. It won't let me build lotus-dev, lotus-miner-dev and boost images because of amd64 vs amr64 mismatch. So, I had to build manually by disabling buildkit within the Makefile for each of these images.

At this point, it is very clear that lotus team need to fix their Dockerfile and ensure that both arm64 and amd64 are supported. I don't think boost will fail compilation on arm64 if once lotus is fixed.

Let me know if you have any questions. Unfortunately, I am still building/testing different combination so I can't confirm that final images will work.

@LexLuthr
Copy link
Collaborator

boost keep crashing for me. As soon as it sends fil to boost wallets, the container for boost and lotus both crash.
Another problem is that boost entry script is not idempotent. It keeps adding new wallet and send fil on each restart.

Perhaps, we should use init containers for boost and lotus (to add default wallet)

@airenas
Copy link
Contributor Author

airenas commented Aug 25, 2022

I just tried to test on AWS arm64 instance (20.04.1-Ubuntu aarch64 GNU/Linux). The outcome is this:

Building

Running - I used published containers on dockerhub for amd64 and using the DOCKER_DEFAULT_PLATFORM=linux/amd64 setting.

  • Containers started, but the initialization failed the same way as noted by @LexLuthr - lotus and boost crashes.

So it looks like it won't work with amd64 images. What do we do?
Do we need to inform Lotus team and wait for the lotus arm images? Or do you want me to try to prepare the Dockerfile for the lotus working on arm64?

@airenas
Copy link
Contributor Author

airenas commented Aug 25, 2022

Another problem is that boost entry script is not idempotent. It keeps adding new wallet and send fil on each restart.

Perhaps, we should use init containers for boost and lotus (to add default wallet)

The initialization is idempotent in the happy day scenarios (sure not today :)). As the containers are crashing the init container would not help: it would leave the system in a semi-initialized state.

@nonsense
Copy link
Member

I think we should focus on getting this PR to be working for x86_64 and focus on other architectures later, as a separate issue, to keep scope reasonable for this PR.

@airenas
Copy link
Contributor Author

airenas commented Aug 29, 2022

sealer/piece_provider.go:189 failed to SectorsUnsealPiece: cannot unseal piece (sector: {{1000 2} 5}, offset: 0 size: 508) - unsealed cid is undefined.

I tried to debug boost and found the problem. It turned out that lotus-miner was advertising the wrong URL. Boost container was not able to retrieve a file from the miner by http://127.0.0.1:2345/... I had to set LOTUS_API_REMOTELISTENADDRESS for the miner container.

Will update the PR soon.

@airenas
Copy link
Contributor Author

airenas commented Aug 29, 2022

lotus client retrieve --miner <MINER_ID> <DATA_CID> <OUTPUT_FILE>

@nonsense, I have just tested. I run the command on a demo boost container, and it retrieves the file. But the file is saved on a machine running the lotus daemon :) !
It does not look good for the demo. Is there any param or other cmd on how to download a file from boost/lotus to a local machine?

Solved: found lotus client cat

@nonsense
Copy link
Member

nonsense commented Aug 30, 2022

As a follow up (doesn't have to be in this PR):

  • 1. Ideally we would also want to before startup modify the lotus-miner configuration (~/.lotusminer/config.toml) to:
    -- disable BatchPreCommits
    -- disable AggregateCommits
    -- reduce WaitDealsDelay

(see https://github.com/filecoin-project/boost/blob/main/itests/framework/framework.go#L114)

  • 2. We should probably also use an image/configuration that supports 8MiB sectors, rather than 2KiB sectors.

Co-authored-by: Anton Evangelatov <anton.evangelatov@gmail.com>
@nonsense
Copy link
Member

@airenas why do we need a separate boost-gui container? Can't we just use/run the GUI as part of the boost container? Is there any benefit to having the boost-gui? Asking, as it seems to me we could drop this container and have less complexity without it (i.e. no nginx, fewer containers, etc.)

@airenas
Copy link
Contributor Author

airenas commented Aug 30, 2022

@airenas why do we need a separate boost-gui container? Can't we just use/run the GUI as part of the boost container? Is there any benefit to having the boost-gui? Asking, as it seems to me we could drop this container and have less complexity without it (i.e. no nginx, fewer containers, etc.)

I got the answer on this from @jacobheun : #695 (comment)

@nonsense
Copy link
Member

@airenas right, sorry, I just saw this comment too... Still I don't see the necessity for a separate container, we can just hit 8080 on boostd and access the GUI (which is embedded).

@nonsense
Copy link
Member

@airenas approving as I think this is already a good first step and mostly works as we need it to. I suggest merging and continuing with outstanding tasks in separate PRs, as this one is getting quite large and hard to follow for any new changes. We are also relying on this setup for the metrics/tracing docker setup, so it'd make it easier if this is already merged.

@nonsense nonsense merged commit 3bc2b7d into filecoin-project:main Aug 30, 2022
@nonsense
Copy link
Member

Thank you @airenas , great work!!!

@nonsense
Copy link
Member

I've added a new issue - #742 - feel free to add any outstanding tasks there that I might have missed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sample Docker compose with a Boost container for deal-making
4 participants