-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
relayer deployment failing due to unsupported value type #2330
Comments
Progress with the updated protobufs from PR into upstream. With the latest code, we still encounter failure, but the config building step shows a different error:
Tracking that message down, it comes from Recall that the final step in config-building for the relayer is
It's that last step that's failing immediately, on the "invalid identifier" error. |
The relayer reports that connections aren't established:
But I'm not sure how to look up the corresponding values via |
hmm, i don't think that error is coming from ibc-go, because there's no IBC-go in the path for testing between two Penumbra testnets. in the past, when we had this error, it was due to the identifier not getting parsed and persisted correctly in the relayer side. Can we look at the events for connection creation to verify that they're emitting IDs correctly? |
We sure can! Here's a gist containing the full logs from an end-to-end configure-and-try-run invocation: https://gist.github.com/conorsch/615e6002ae806a04ffcb02db6bea27cd Grepping through that text, it looks to me like the client_ids are populared; but I still see some "unsupported value type" messages, e.g.
The steps I used to generate these logs were:
We can at any point build a custom image and run that on the cluster, but for iterating on the debugging, local rebuilds will be fastest, I expect. |
Not currently working due to a mismatch in the protos between preview & testnet. Refs #2330.
We've tracked this down to breaking changes in the protos between preview & testnet. To resolve, we must:
It'll break again if corresponding protos break. Let's also consider tackling #2349 for a bit more of a heads up. |
Confirmed relayer working between testnet and local devnet, as described in 89fa953. The relayer repo has the most recent proto definitions, as well. Will check again prior to testnet release on Monday. Post-release, we'll reactivate the relayer deployment between testnet & preview, and then we can lose this ticket. |
Redeployed the relayer, but it's failing again on the same "invalid identifier" error. Added some debug statements on a local build, and traced that error to ibc-go for verifying the connection id string, specifically for the
The querying testnet connection-0 via rly
querying testnet connection-0 via pcli
But thereafter, i.e. all connections greater than
which is why the connection_id fails to validate, because the ibc-go code ensures it isn't an empty string: https://github.com/cosmos/ibc-go/blob/87ac1a9ac3e28983ccf9fa5b6cefe6ff60ddee20/modules/core/24-host/validate.go#L40-L42 Note that the Next, I'll try repeating these procedures between preview and a local devnet, to rule out recent code changes breaking things again. |
Shuts off the relayer again, because it's stuck in a crash loop. See details in #2330. We should consider replacing the full relayer deployment with a simpler CI check that tries to build the configs, then reports an error and stays stopped if it fails. The current setup will continue to retry forever, which junks up the chains with "init" connections. This reverts commit fbc80b7.
Indeed, was able to build a path between preview and local devnet:
Additionally, I was able build several paths, all of which stayed open:
However, still no evidence of successfully built channels yet. While it's possible I'm not querying them correctly—the
Compare the malformed data block with other successful queries, such as:
After some discussion with @avahowell, we expect we'll need to patch the golang relayer code and see if we can coax out channel construction. Will work on that tomorrow. |
The The client and connection calls use the |
Spent some time on this with @avahowell today. We suspect that the Regarding the querying logic above, that remains an issue and one we intend to solve, but it's lower priority than ensuring our rust code is conforming to the IBC spec. We'll focus on that near-term, and follow up on querying improvements once we have channels built successfully again. |
Today we determined that this change to the golang relayer code is indeed necessary, because the channel-creation logic is gated on the queries being populated correctly. We should update the relayer code to 1) make working query calls and 2) error out if they aren't made successfully. The fact that the relayer currently exits 0 prior to channel creation obscured this problem for too long.
Second, we also identified a logic bug in our IBC implementation, in which we mix up which Finally, as to "how did channel creation ever work?" we now believe we had originally tested with the |
After most recent changes landing in #2482, we have channel creation restored. Behold: We got clients:
We got connections:
And—drumroll, please—we even have channels:
As part of this push, we've been testing with locally built versions of the relayer, with the most recent protobuf definitions. Those are now PR'd into the upstream repo here: cosmos/relayer#1181 |
Describe the bug
The CI step to deploy the relayer has started failing, post merge of #2315.
To Reproduce
Easily reproducible locally:
Expected behavior
Relayer config scripts can build a connection between the Penumbra chains.
Additional context
Log messages contain failure messages such as:
Note the ubiquitous "unsupported value type". Let's figure out what broke and see if we can get it working again.
The text was updated successfully, but these errors were encountered: