Adding `ProfilerService` to Node for diagnostics. Closes #223 #225

edavalosanaya · 2023-08-05T23:32:43Z

Added ProfilerService for the Node and make changes in non-breaking DataChunk, ProcessorService, and PollerService to compute latency and payload sizes.

Still need to add a method for enable/disable diagnostic logging from the front-facing API of the Engine.

…nges in non-breaking DataChunk, ProcessorService, and PollerService to compute latency and payload sizes.

…low making unittest transferring the data from the Node->Worker->Manager.

… to the Orchestrator, the NodeDiagnostics dataclass was embedded to the NodeState.

…rmation.

edavalosanaya · 2023-08-07T02:56:58Z

These are the new configuration options in the chimerapyrc.yaml:

diagnostics:
  deque-length: 10000
  interval: 10
  logging-enabled: false

Instead of adding a new parameter to a class construction, you can enable the logging of diagnostics info via the logging-enabled variable. The diagnostics information is transmitted from Node->Worker->Manager->Orchestrator at intervals, specified via the interval config variable (not sure if 10 seconds is too fast or just right).
These are the agreed metrics (NodeDiagnostics):

latency (ms)
payload_size(KB)
memory_usage(KB)
cpu_usage(%)
num_of_steps(int)

In the Latency Node implementation, all the metrics were computed during the step. Now in the ProfilerService, only the latency and payload size information is extracted during the step. Memory, cpu, and num_of_steps are determined at fixed intervals (with an async timer). Latency and payload information are computed after receiving information from pre successor nodes, i.e source nodes don't have these metrics, step and sink nodes do.

Since multiple steps might occur during the fixed interval, the DataChunk's payload size and latency metrics are stored in deques. The latency value report is the average of all the record instances. The payload size is the summed total of all the DataChunks' payloads.

WIP(ProfilerService): Added ProfilerService for the Node and make cha…

0d5f4b9

…nges in non-breaking DataChunk, ProcessorService, and PollerService to compute latency and payload sizes.

edavalosanaya self-assigned this Aug 5, 2023

edavalosanaya added 4 commits August 5, 2023 22:27

WIP(diagnostics): Made the profiler service pass unit test, and now s…

3a7bbcc

…low making unittest transferring the data from the Node->Worker->Manager.

feat(NodeDiagnostics): To make the diagnostics data quickly available…

6f3d87e

… to the Orchestrator, the NodeDiagnostics dataclass was embedded to the NodeState.

fix: Addressed failing test.

870abf8

chore(.yaml): Making the deque larger to make sure we don't loss info…

1606dbc

…rmation.

edavalosanaya marked this pull request as ready for review August 7, 2023 01:41

edavalosanaya merged commit 662d976 into main Aug 7, 2023
2 checks passed

edavalosanaya deleted the 223-diagnostics-service branch August 7, 2023 01:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding `ProfilerService` to Node for diagnostics. Closes #223 #225

Adding `ProfilerService` to Node for diagnostics. Closes #223 #225

edavalosanaya commented Aug 5, 2023

edavalosanaya commented Aug 7, 2023

Adding ProfilerService to Node for diagnostics. Closes #223 #225

Adding ProfilerService to Node for diagnostics. Closes #223 #225

Conversation

edavalosanaya commented Aug 5, 2023

edavalosanaya commented Aug 7, 2023

Adding `ProfilerService` to Node for diagnostics. Closes #223 #225

Adding `ProfilerService` to Node for diagnostics. Closes #223 #225