-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: extract gossipsub / discv5 dependency gauge prometheus metrics #9710
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Maddiaa0
changed the title
wip: extract gossipsub / libp2p / discv5 dependency prometheus metrics
feat: extract gossipsub / discv5 dependency gauge prometheus metrics
Nov 5, 2024
Maddiaa0
force-pushed
the
md/gossip-sub-metrics
branch
from
November 5, 2024 05:50
b23a205
to
fe6974c
Compare
This stack of pull requests is managed by Graphite. Learn more about stacking. |
This was referenced Nov 5, 2024
This redoes #9704 with the PR title pattern needed for an ARM build --------- Co-authored-by: Tom French <tom@tomfren.ch>
Was broken due to not all tests being ran when using e2e all when implementing tagging outgoing logs.
Adds notes to the docs that indicate that authwits only work in single player mode, describes the problems with not simulating simulations, and updates the bberg readme install instructions closes AztecProtocol/dev-rel#422 closes AztecProtocol/dev-rel#433 closes #9256 closes AztecProtocol/dev-rel#423 closes #6865
Minimal repro of issue in public processor, in which state is not updated during simulation, so each tx comes out with the same `.startState`. This hasn't been caught because all of our testing either does: - One tx per block with public calls, so the initial db state is correct - Passes the *same db* to the public processor and the txHandler (see `TestContext`), so when the handler runs processedTxHandler.addNewTx(processedTx) it actually updates the db of both itself and the public processor EDIT: Note that failure is visible in `prover-client` CI tests [here](https://github.com/AztecProtocol/aztec-packages/actions/runs/11618042996/job/32356873092#step:5:788) --- Fix now included in this PR: - Update `this.db` for each processed tx in `public_processor` (this is eventually queried in `public_kernel_tail_simulator.ts` to get the state reference to use for the kernel tail) - Ensure that if a `txHandler` is supplied, its db is independent of the public processor's
…rics collection (#9754) Simplifies local docker compose using otel-lgtm, allow native metrics collection using local testnet by just setting m flag if the docker compose is running
I had seen a sequencer trying to build a block with 100+ transactions, which results in a death spiral. Also increase the timeout on discv5 for when that is supported across multiple nodes.
This PR removes the RethrowableError hack in the AVM simulator, and relies on the PublicContext's [rethrow](https://github.com/AztecProtocol/aztec-packages/blob/master/noir-projects/aztec-nr/aztec/src/context/public_context.nr#L88) to propagate the errors. There are two caveats. First, because currently Aztec-nr does not keep track of the cause chain, it would be impossible to have the call stack and original contract address available, so that the PXE can interpret the error and show debug information. Solidity has the same problem. I'm introducing a heuristic to keep track of the call stack for the simple case where the caller always rethrows: the simulator keeps a running trace in the machineState, and the caller uses this trace IF the revertData coincides. That is, if you are (re)throwing the same as what we were tracking. Second, while this all works well for errors in user code (e.g., `assert` in Noir), because they generate a revertData with an error selector and data, the "intrinsic" errors from the simulator (aka exceptional halts) do not work as well. E.g., "division by zero", "duplicated nullifier", "l1 to l2 blah blah". These errors are exceptions in typescript and do not have an associated error selector, and do not add to the revertdata. This _could_ be done with enshrined error selectors. That's easy in the simulator, but it's not easy in the circuit for several reasons that are beyond the scope of this description. The current solution is to propagate the error message (the user will see the right error) but you cannot "catch and process" the error in aztec.nr because there is no selector. This is not a limitation right now because there's no interface in the PublicContext that would let you catch errors. To be continued. Part of #9061.
Construct the _hiding_ circuit, which recursively verifies the last folding proof and the decider proof in Client IVC and amend the e2e prover accordingly. The ClientIVC proof becomes a mega proof for the hiding circuit and a goblin proof which simplifies the work required to transform this in a zero knowledge proof.
Per #9746 , mac os compilation on ARM fails by default because of `std::execution::par_unseq`. This PR fixes that by explicitly disabling it on MacOS since apple clang doesn't support it. There's probably a better plug somewhere but I'm not ultra-familiar with how you setup the build system, should this be a preset, should this be based on apple-clang specifically, etc. Can adapt the PR as needed. I also rename DISABLE_TBB here because it seems to me that this should be part of the native C++20 support on other platforms, but I have to admit I'm really unfamiliar with the details here and I'm not sure if intel TBB is needed, or indeed what it does exactly. Can cut this from the PR. Likewise, this turns parallel algorithms ON by default on the same C++20 assumption, but that could well not work.
TODO also stop in prover node
# Change Log - **UDP and TCP load balancers for Validator stateful set replicas** When using `--set network.public=true`, each Validator stateful set replica gets 2 new load balancers. This configuration can also be added for other stateful sets if needed in the future. - **Removed invalid `{{- if not .Values.ethereum.external }}` template code** `not` is not a recognized based case within Helm templating. The code being removed was not functional in Kubernetes deployments. More information: https://helm.sh/docs/chart_template_guide/control_structures/#ifelse - **Removed duplicate Otel env vars in `prover-node.yaml`** This resolves the following Helm output warning during deployment: ``` W1105 14:56:12.948976 57996 warnings.go:70] spec.template.spec.containers[0].env[19]: hides previous definition of "OTEL_EXPORTER_OTLP_METRICS_ENDPOINT", which may be dropped when using apply W1105 14:56:12.949029 57996 warnings.go:70] spec.template.spec.containers[0].env[20]: hides previous definition of "OTEL_EXPORTER_OTLP_TRACES_ENDPOINT", which may be dropped when using apply W1105 14:56:12.949040 57996 warnings.go:70] spec.template.spec.containers[0].env[21]: hides previous definition of "OTEL_EXPORTER_OTLP_LOGS_ENDPOINT", which may be dropped when using apply ``` Previous prover ENV VARS: ``` - name: OTEL_EXPORTER_OTLP_METRICS_ENDPOINT value: {{ include "aztec-network.otelCollectorMetricsEndpoint" . | quote }} - name: OTEL_EXPORTER_OTLP_TRACES_ENDPOINT value: {{ include "aztec-network.otelCollectorTracesEndpoint" . | quote }} - name: OTEL_EXPORTER_OTLP_LOGS_ENDPOINT value: {{ include "aztec-network.otelCollectorLogsEndpoint" . | quote }} ... (additional vars excluded for brevity) ... - name: OTEL_EXPORTER_OTLP_METRICS_ENDPOINT value: {{ include "aztec-network.otelCollectorMetricsEndpoint" . | quote }} - name: OTEL_EXPORTER_OTLP_TRACES_ENDPOINT value: {{ include "aztec-network.otelCollectorTracesEndpoint" . | quote }} - name: OTEL_EXPORTER_OTLP_LOGS_ENDPOINT value: {{ include "aztec-network.otelCollectorLogsEndpoint" . | quote }} ```
subrepo: subdir: "barretenberg" merged: "c97c11293d" upstream: origin: "https://github.com/AztecProtocol/barretenberg" branch: "master" commit: "c97c11293d" git-subrepo: version: "0.4.6" origin: "???" commit: "???" [skip ci]
subrepo: subdir: "noir-projects/aztec-nr" merged: "ac58205331" upstream: origin: "https://github.com/AztecProtocol/aztec-nr" branch: "master" commit: "ac58205331" git-subrepo: version: "0.4.6" origin: "???" commit: "???" [skip ci]
Maddiaa0
requested review from
fcarreiro,
dbanks12 and
charlielye
as code owners
November 6, 2024 10:08
Maddiaa0
requested review from
just-mitch
and removed request for
fcarreiro,
charlielye and
dbanks12
November 6, 2024 10:11
just-mitch
reviewed
Nov 7, 2024
@@ -257,6 +260,8 @@ export class LibP2PService extends WithTracer implements P2PService { | |||
heartbeatInterval: config.gossipsubInterval, | |||
mcacheLength: config.gossipsubMcacheLength, | |||
mcacheGossip: config.gossipsubMcacheGossip, | |||
metricsRegister: otelMetricsAdapter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow. Nice work.
just-mitch
approved these changes
Nov 7, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
peers for each topic
Get access to all of the p2p metrics counts to help investigate issues
fixes: #9691
The conversion is non trivial so only gauge operations are activated for now