Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike: Investigate the use of swap for OCP-4.15 to deal with default memory requirements #861

Open
praveenkumar opened this issue Mar 7, 2024 · 9 comments
Assignees

Comments

@praveenkumar
Copy link
Member

As per https://docs.openshift.com/container-platform/4.15/nodes/nodes/nodes-nodes-managing.html#nodes-nodes-swap-memory_nodes-nodes-managing it is possible to use swap but this is in Tech preview. I was trying it out and see how reliable we can start the cluster without increase the resources on crc side. Because for 4.15, OVN-K is default and require more memory resource (~1.5G) then SDN for network operator.

  1. Enable Tech- preview feature: Can be done using install-config or as day-2 https://docs.openshift.com/container-platform/4.15/nodes/clusters/nodes-cluster-enabling-features.html#nodes-cluster-enabling-features-cli_nodes-cluster-enabling
  2. Add kernel arg swapaccount=1 which can be done with following machine config
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 05-kernelarg-swapon
spec:
  kernelArguments:
    - swapaccount=1
  1. Have a custom kubelet setting to enable swap
oc label machineconfigpool master kubelet-swap=enabled
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: swap-config
spec:
  machineConfigPoolSelector:
    matchLabels:
      kubelet-swap: enabled
  kubeletConfig:
    failSwapOn: false 
    memorySwap:
      swapBehavior: UnlimitedSwap
  1. Have a swap partition in the VM

After all those steps swap is used for the the workload and take care of all the extra mem requirement but it has some caveats which is part of https://kubernetes.io/blog/2023/08/24/swap-linux-beta/ one. On openshift side since we enable Techpreview feature gate which means anything behind this gate is enabled automatic which are lot of things mentioned in the doc.

Node resources when swap is on (you can see memory is over committed because of swap is taking hit) and I started this cluster with default mem setting (9G)

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests       Limits
  --------           --------       ------
  cpu                3245m (85%)    0 (0%)
  memory             9967Mi (117%)  0 (0%)
  ephemeral-storage  0 (0%)         0 (0%)
  hugepages-1Gi      0 (0%)         0 (0%)
  hugepages-2Mi      0 (0%)         0 (0%)
<crc_vm>$ $ sudo swapon
NAME              TYPE SIZE   USED PRIO
/var/vm/swapfile1 file 7.9G 360.6M   -2

Should we go with this option and not update the resource limit on crc side or should we not use it because it is tech preview?

All this is done as day-2 operation on our existing 4.15 bundle so I am not sure how much bundle size increase if we do it.

@cfergeau
Copy link
Contributor

Should we go with this option and not update the resource limit on crc side or should we not use it because it is tech preview?

In general swap is no magic bullet, it helps to overcommit, but the price to pay is slower performance. The more you overcommit, the slower your system will get. What is the impact here?

@praveenkumar
Copy link
Member Author

In general swap is no magic bullet, it helps to overcommit, but the price to pay is slower performance. The more you overcommit, the slower your system will get. What is the impact here?

@cfergeau impact in case of cluster performance? Because I didn't see but I also didn't put any workload. Docs on kubenetes already suggest the https://kubernetes.io/blog/2023/08/24/swap-linux-beta/#caveats those.

@cfergeau
Copy link
Contributor

Is there an impact on cluster startup time?

@gbraad gbraad changed the title Spike: Can we use swap for OCP-4.15 so we don't have to increase the memory resource Spike: Investigate the use of swap for OCP-4.15 to deal with default memory requirements Mar 13, 2024
@gbraad
Copy link
Collaborator

gbraad commented Mar 13, 2024

I am less concerned about the startup time, as the introduction of swap to prevent the increase of the default memory might have effects on the overall use.


As such, we do not advocate the utilization of swap memory for workloads or environments that are subject to performance constraints. Furthermore, it is recommended to employ LimitedSwap, as this significantly mitigates the risks posed to the node.

'performance constraints' might already be the case to get the cluster in a stable state (startup time). Though I want to see an actual and representative payload to test this.

@praveenkumar
Copy link
Member Author

Is there an impact on cluster startup time?

During my testing I didn't see any impact but let me create the bundle and then see.

@cfergeau
Copy link
Contributor

Enable Tech- preview feature

What are the implications of this? This allows us to use swap, but does this also enable automatically other features we may want or not want?

@praveenkumar
Copy link
Member Author

Enable Tech- preview feature

What are the implications of this? This allows us to use swap, but does this also enable automatically other features we may want or not want?

https://docs.openshift.com/container-platform/4.15/nodes/clusters/nodes-cluster-enabling-features.html#nodes-cluster-enabling-features-about_nodes-cluster-enabling have all details about what ll features are auto enabled (even we want or not)

@cfergeau
Copy link
Contributor

Pod security admission enforcement. Enables the restricted enforcement mode for pod security admission. Instead of only logging a warning, pods are rejected if they violate pod security standards. (OpenShiftPodSecurityAdmission)

This one might be problematic? Though it looks like we can change back the value to be more permissive.

@praveenkumar
Copy link
Member Author

After bit more experiment looks like swap is not stable as I thought and stop => start always fails. Filled 2 different issues around swap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

3 participants