Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fast/warm-reboot] add cpufreq.default_governor=performance to BOOT_OPTIONS #3435

Merged
merged 2 commits into from
Oct 14, 2024

Conversation

stepanblyschak
Copy link
Contributor

@stepanblyschak stepanblyschak commented Jul 19, 2024

What I did

Set cpufreq.default_governor to performance for faster boot time. We observe consistent 1 sec improvement across several devices.

NOTE: This will apply to upgrades starting from 202405 since this is set in shutdown path to avoid any extra scripts running at boot time. Upgrade from older versions/branches will require a runtime patch to fast-reboot and warm-reboot script.

DEPENDS ON: sonic-net/sonic-buildimage#19634

How I did it

Append this option to BOOT_OPTIONS variable.

How to verify it

Run fast-reboot or warm-reboot. Check:

admin@sonic:~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance

After boot is finalized check that it is reset back to default:

admin@sonic:~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
schedutil

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

…PTIONS

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
@vaibhavhd
Copy link
Contributor

We observe consistent 1 sec improvement across several devices.

What platform and T0 topology were this change tested on? Can you share what changed in the boot up sequence when default_governor=performance?

We need to make sure that this inbuilt performance improvement will not lead to some unexpected bootup sequence, such as race conditions, unexpected preemption sequence, etc.

1s improvement is too tricky to be rightfully attributed to this change. How certain are we that 1s improvement is due to this change? If we know that a certain sequence that got changed due to enabling performance mode, can we not just address that as an enhancement, rather than depending on the default_governor=performance


# Set governor to performance to speed up boot process.
# The governor is reset back to kernel default in warmboot-finalizer script.
BOOT_OPTIONS="${BOOT_OPTIONS} cpufreq.default_governor=performance"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that you don't add this as a default behavior. This needs to be set by an option provided to warm/fast-reboot script.

That way we can tightly control when/if this is used.

Copy link
Contributor Author

@stepanblyschak stepanblyschak Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vaibhavhd

I don't think an optimization is an optinal feature.

We already did enhancements which are enabled by default, e.g. LACP retry count, that even had a larger impact as the changes are applied also on the peer side.

Also, from the testing perspective we would need to test all combinations of possible optimization flags user may pass to the script.

From the user perspective, it becomes unclear what flags he needs to use, what is heavily tested and what is not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need an architecture check? Is this supported on armhf and arm64 as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be useful in some scenarios and I think user should be given an option to control this behavior. Some examples I can think of are:

  1. High CPU usage during boot up path.
  2. High thermal load due to use new file location for sys eeprom and sfp eeprom on msn2700 #1.
  3. If this mode has dependency on underlying hardware then it depends on the hardware if this is supported.

Mostly these concerns are not critical on new platforms. But some old/slow platforms do not really want this high CPU, high fan activity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vaibhavhd @saiarcot895 I limit this setting for nvidia where this is tested and working.

@stepanblyschak
Copy link
Contributor Author

@vaibhavhd

We observe consistent 1 sec improvement across several devices.

What platform and T0 topology were this change tested on?

Results were shared over emails.

Can you share what changed in the boot up sequence when default_governor=performance?

Could you please clarify what information you are requesting? This PR does not change the boot sequence.

We need to make sure that this inbuilt performance improvement will not lead to some unexpected bootup sequence, such as race conditions, unexpected preemption sequence, etc.

I consider this a good thing, if this change exposes unexpected bootup sequence or a race condition we will catch it and fix it rather then waiting for a hidden bug to occur in production with .1% repro rate without this change.
Please note, we are not requesting this enhancement for a release branch.

1s improvement is too tricky to be rightfully attributed to this change. How certain are we that 1s improvement is due to this change? If we know that a certain sequence that got changed due to enabling performance mode, can we not just address that as an enhancement, rather than depending on the default_governor=performance

fast-reboot test gives a consistent result with deviations less then 1s on nvidia platform. Another our platform with another CPU gives 1.5 sec improvement. A reduction of the average time may be considered an improvement. If you'd like us to prove it with statistics, please suggest the number of runs that will raise the confidence.

@bingwang-ms
Copy link
Contributor

@vaibhavhd Can you please help review?

@vaibhavhd
Copy link
Contributor

@saiarcot895 please review this PR.

@vaibhavhd
Copy link
Contributor

Why is this being done only for reboot? Why not for service warm-restart? Has it been considered what other scenarios will/may benefit from this?

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
@stepanblyschak
Copy link
Contributor Author

Why is this being done only for reboot? Why not for service warm-restart? Has it been considered what other scenarios will/may benefit from this?

@vaibhavhd I do not find performance issues with service warm-restart

@vaibhavhd vaibhavhd merged commit 319f58d into sonic-net:master Oct 14, 2024
7 checks passed
qiluo-msft pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Nov 4, 2024
#### Why I did it

Set cpufreq.default_governor to *performance* for faster boot time. We observe consistent 1 sec improvement across several devices.

The change in finalize-warmboot.sh restores the default governor after fast or warm boot is finished.

**NOTE**: This will apply to upgrades starting from 202405 since this is set in shutdown path to avoid any extra scripts running at boot time. Upgrade from older versions/branches will require a runtime patch to fast-reboot and warm-reboot script.

#### How I did it

After fast or warm boot is finished restore to default governor.

#### How to verify it

Run fast-reboot or warm-reboot. Check:
```
admin@sonic:~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
```

After boot is finalized check that it is reset back to default:
```
admin@sonic:~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
schedutil
```

Tested with sonic-net/sonic-utilities#3435
rkavitha-hcl pushed a commit to rkavitha-hcl/sonic-buildimage that referenced this pull request Nov 15, 2024
…9634)

#### Why I did it

Set cpufreq.default_governor to *performance* for faster boot time. We observe consistent 1 sec improvement across several devices.

The change in finalize-warmboot.sh restores the default governor after fast or warm boot is finished.

**NOTE**: This will apply to upgrades starting from 202405 since this is set in shutdown path to avoid any extra scripts running at boot time. Upgrade from older versions/branches will require a runtime patch to fast-reboot and warm-reboot script.

#### How I did it

After fast or warm boot is finished restore to default governor.

#### How to verify it

Run fast-reboot or warm-reboot. Check:
```
admin@sonic:~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
```

After boot is finalized check that it is reset back to default:
```
admin@sonic:~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
schedutil
```

Tested with sonic-net/sonic-utilities#3435
aidan-gallagher pushed a commit to aidan-gallagher/sonic-buildimage that referenced this pull request Nov 16, 2024
…9634)

#### Why I did it

Set cpufreq.default_governor to *performance* for faster boot time. We observe consistent 1 sec improvement across several devices.

The change in finalize-warmboot.sh restores the default governor after fast or warm boot is finished.

**NOTE**: This will apply to upgrades starting from 202405 since this is set in shutdown path to avoid any extra scripts running at boot time. Upgrade from older versions/branches will require a runtime patch to fast-reboot and warm-reboot script.

#### How I did it

After fast or warm boot is finished restore to default governor.

#### How to verify it

Run fast-reboot or warm-reboot. Check:
```
admin@sonic:~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
```

After boot is finalized check that it is reset back to default:
```
admin@sonic:~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
schedutil
```

Tested with sonic-net/sonic-utilities#3435
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Dec 4, 2024
…9634)

#### Why I did it

Set cpufreq.default_governor to *performance* for faster boot time. We observe consistent 1 sec improvement across several devices.

The change in finalize-warmboot.sh restores the default governor after fast or warm boot is finished.

**NOTE**: This will apply to upgrades starting from 202405 since this is set in shutdown path to avoid any extra scripts running at boot time. Upgrade from older versions/branches will require a runtime patch to fast-reboot and warm-reboot script.

#### How I did it

After fast or warm boot is finished restore to default governor.

#### How to verify it

Run fast-reboot or warm-reboot. Check:
```
admin@sonic:~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
```

After boot is finalized check that it is reset back to default:
```
admin@sonic:~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
schedutil
```

Tested with sonic-net/sonic-utilities#3435
mssonicbld pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Dec 5, 2024
#### Why I did it

Set cpufreq.default_governor to *performance* for faster boot time. We observe consistent 1 sec improvement across several devices.

The change in finalize-warmboot.sh restores the default governor after fast or warm boot is finished.

**NOTE**: This will apply to upgrades starting from 202405 since this is set in shutdown path to avoid any extra scripts running at boot time. Upgrade from older versions/branches will require a runtime patch to fast-reboot and warm-reboot script.

#### How I did it

After fast or warm boot is finished restore to default governor.

#### How to verify it

Run fast-reboot or warm-reboot. Check:
```
admin@sonic:~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
```

After boot is finalized check that it is reset back to default:
```
admin@sonic:~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
schedutil
```

Tested with sonic-net/sonic-utilities#3435
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants