Set some default resource requests on the workspace pod #698

blampe · 2024-10-01T17:12:00Z

Manager has limits on it already -- currently has guaranteed QoS.

Related to #694 and probably a pre-req -- set a small request limit to give the workspace pod burst-able QoS.

Additional considerations:

use GOMEMLIMIT to tell the agent to not use more than the requested memory for itself. Consider setting it using the Downward API. Avoid setting it on the child processes (use SetMemoryLimit in code?).
check for zombie sub-processes.

The text was updated successfully, but these errors were encountered:

EronWright · 2024-10-02T16:44:09Z

The baseline stats for random-yaml with 1-minute resync interval.

EronWright · 2024-10-02T17:13:34Z

Zombie processes do seem to accumulate in the workspace pod, given a per-minute resync:

pulumi@random-yaml-workspace-0:/$ ps auxwww 
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
pulumi       1  0.0  0.3 1248856 14268 ?       Ssl  16:07   0:01 /share/agent serve --workspace /share/workspace --skip-install
pulumi      46  0.0  0.0      0     0 ?        Z    16:07   0:00 [pulumi-language] <defunct>
pulumi      75  0.0  0.0      0     0 ?        Z    16:07   0:00 [pulumi-language] <defunct>
pulumi     236  0.0  0.0      0     0 ?        Z    16:07   0:00 [pulumi-language] <defunct>
pulumi     256  0.0  0.0      0     0 ?        Z    16:07   0:00 [pulumi-resource] <defunct>
pulumi     271  0.0  0.0      0     0 ?        Z    16:07   0:00 [pulumi-resource] <defunct>
pulumi     400  0.0  0.0      0     0 ?        Z    16:08   0:00 [pulumi-language] <defunct>
pulumi     415  0.0  0.0      0     0 ?        Z    16:08   0:00 [pulumi-resource] <defunct>
pulumi     431  0.0  0.0      0     0 ?        Z    16:08   0:00 [pulumi-resource] <defunct>
pulumi     563  0.0  0.0      0     0 ?        Z    16:09   0:00 [pulumi-language] <defunct>
pulumi     579  0.0  0.0      0     0 ?        Z    16:09   0:00 [pulumi-resource] <defunct>
pulumi     594  0.0  0.0      0     0 ?        Z    16:09   0:00 [pulumi-resource] <defunct>
pulumi     724  0.0  0.0      0     0 ?        Z    16:10   0:00 [pulumi-language] <defunct>
pulumi     739  0.0  0.0      0     0 ?        Z    16:10   0:00 [pulumi-resource] <defunct>
pulumi     753  0.0  0.0      0     0 ?        Z    16:10   0:00 [pulumi-resource] <defunct>
pulumi     886  0.0  0.0      0     0 ?        Z    16:11   0:00 [pulumi-language] <defunct>
pulumi     901  0.0  0.0      0     0 ?        Z    16:11   0:00 [pulumi-resource] <defunct>
pulumi     917  0.0  0.0      0     0 ?        Z    16:11   0:00 [pulumi-resource] <defunct>
pulumi    1044  0.0  0.0      0     0 ?        Z    16:12   0:00 [pulumi-language] <defunct>
pulumi    1059  0.0  0.0      0     0 ?        Z    16:12   0:00 [pulumi-resource] <defunct>
pulumi    1075  0.0  0.0      0     0 ?        Z    16:12   0:00 [pulumi-resource] <defunct>
pulumi    1205  0.0  0.0      0     0 ?        Z    16:13   0:00 [pulumi-language] <defunct>
pulumi    1220  0.0  0.0      0     0 ?        Z    16:13   0:00 [pulumi-resource] <defunct>
pulumi    1236  0.0  0.0      0     0 ?        Z    16:13   0:00 [pulumi-resource] <defunct>
pulumi    1368  0.0  0.0      0     0 ?        Z    16:14   0:00 [pulumi-language] <defunct>
pulumi    1383  0.0  0.0      0     0 ?        Z    16:14   0:00 [pulumi-resource] <defunct>
...

justinvp · 2024-10-02T17:32:52Z

Likely related to pulumi/pulumi#17361

EronWright · 2024-10-02T17:45:29Z

These measurements made after "zombie" process issue was fixed.

After another hour of periodic execution:

And another:

EronWright · 2024-10-02T22:31:22Z

A case of failed updates causing a lot more interactions with the workspace:

EronWright · 2024-10-03T03:08:56Z

With all fixes:

### Proposed changes  Implements good defaults for the workspace resource, using a ["burstable"](https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/#burstable) approach. Since a workspace pod's utilization is bursty - with low resource usage during idle times and with high resource usage during deployment ops - the pod requests a small amount of resources (64mb, 100m) to be able to idle. A deployment op is able to use much more memory - all available memory on the host. Users may customize the resources (e.g. to apply different requests and/or limits). For large/complex Pulumi apps, it might make sense to reserve more memory and/or use #694. The agent takes some pains to stay within the requested amount, using a programmatic form of the [GOMEMLIMIT](https://weaviate.io/blog/gomemlimit-a-game-changer-for-high-memory-applications) environment variable. The agent detects the requested amount via the Downward API. We don't use `GOMEMLIMIT` to avoid propagating it to sub-processes, and because the format is a Kubernetes 'quantity'. It was observed that zombies weren't being cleaned up, and this was leading to resource exhaustion. Fixed by using [tini](https://github.com/krallin/tini/) as the entrypoint process (PID 1). ### Related issues (optional)  Closes #698

This comment has been minimized.

Sign in to view

pulumi-bot added the needs-triage Needs attention from the triage team label Oct 1, 2024

cleverguy25 mentioned this issue Oct 1, 2024

[Epic] Kubernetes Operator Core Functionality Enhancements (PKOv2) #586

Open

5 tasks

blampe added this to the 0.111 milestone Oct 1, 2024

blampe removed the needs-triage Needs attention from the triage team label Oct 1, 2024

blampe assigned EronWright Oct 1, 2024

justinvp mentioned this issue Oct 2, 2024

sdk/go/auto: lots of pulumi-resource processes pulumi/pulumi#17361

Closed

EronWright mentioned this issue Oct 4, 2024

Set some default resource requests on the workspace pod #707

Merged

mikhailshilkov added the kind/enhancement Improvements or new features label Oct 7, 2024

EronWright added the resolution/fixed This issue was fixed label Oct 7, 2024

EronWright closed this as completed Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set some default resource requests on the workspace pod #698

Set some default resource requests on the workspace pod #698

blampe commented Oct 1, 2024 •

edited by EronWright

Loading

This comment has been minimized.

EronWright commented Oct 2, 2024

EronWright commented Oct 2, 2024 •

edited

Loading

justinvp commented Oct 2, 2024

EronWright commented Oct 2, 2024 •

edited

Loading

EronWright commented Oct 2, 2024

EronWright commented Oct 3, 2024

Set some default resource requests on the workspace pod #698

Set some default resource requests on the workspace pod #698

Comments

blampe commented Oct 1, 2024 • edited by EronWright Loading

This comment has been minimized.

EronWright commented Oct 2, 2024

EronWright commented Oct 2, 2024 • edited Loading

justinvp commented Oct 2, 2024

EronWright commented Oct 2, 2024 • edited Loading

EronWright commented Oct 2, 2024

EronWright commented Oct 3, 2024

blampe commented Oct 1, 2024 •

edited by EronWright

Loading

EronWright commented Oct 2, 2024 •

edited

Loading

EronWright commented Oct 2, 2024 •

edited

Loading