-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support hot upgrade or smooth upgrade, Upgrade smoothly, Gracefully Upgrade, Source cleaning #1579
Comments
Users can choose:
|
When SRS supports K8S deployment, services need to support upgrades, rollbacks, and canary releases. The basic requirement for these mechanisms is that SRS needs to support Gracefully Quit/Upgrade. Only when SRS can do its part well, can K8S or other release mechanisms meet the requirements for production-level releases. The SRS cluster is divided into Origin and Edge clusters, and this issue can be viewed separately.
Therefore, we focus on the issue of Gracefully Quit in the Edge cluster, which can refer to the mechanism of Nginx.
Since Nginx chooses to start the master using execve, inheriting the listen file descriptor, this process can be more complex. SRS can choose to use REUSEPORT to directly start a new process listening on the same file descriptor, making this solution simpler. Additionally, SRS3 has been released with the following plans:
|
To publish updates, rollbacks, and gray releases, there are two main requirements for SRS:
The key to Gracefully Quit is to no longer accept new connections and wait for the existing connections to exit. We can achieve this by closing the listening file descriptor (fd) in SRS. Another approach is to remove the backend Pod from the SLB (Server Load Balancer), which will naturally prevent new fds from being created.
|
SRS adds a new signal: SIGQUIT, which stands for Gracefully QUIT. It allows for a smooth exit by closing the listening file descriptor (FD) and waiting for existing connections to finish before exiting. Finally, it will wait for a certain period of time, by default 3.2 seconds, to allow for the completion of the final cleanup. For example, if there are no connections, only the listening needs to be closed.
When there are connections, it will keep waiting.
You can see that the listening connection is closed, but the service connection is still not closed. SRS will only exit after this streaming connection is finished. Add a new configuration for the waiting time before exiting, with a default value of 3.2 seconds.
|
We also need a configuration because when K8S calls preStop, it sends a SIGTERM signal to SRS. SIGTERM is a fast quit signal that causes SRS to exit quickly. Even during the Gracefully Quit period, SRS will handle this signal. Therefore, it is necessary to configure SRS to consider SIGTERM as a gracefully quit signal.
By default, it is not enabled, which means that SRS will exit when it receives a SIGTERM signal. This is suitable for general scenarios, such as origin servers or situations where smooth upgrades are not required.
|
SRS3 already supports graceful shutdown. It can also support smooth upgrades in the K8S and SLB architectures. Please refer to: https://github.com/ossrs/srs/wiki/v4_CN_K8s#srs-cluster-update-rollback-gray-release-with-zero-downtime
|
Just need to clean up one Source, as described in other Issues:' Make sure to maintain the markdown structure.
Make sure to maintain the markdown structure. For more progress, please refer to: #413 Make sure to maintain the markdown structure.
|
Usage
SRS supports two signals:
force_grace_quit
to consider SIGTERM as Gracefully QUIT as well.terminationGracePeriodSeconds
, and it will force exit after waiting for this long. If there are no connections, it will wait for grace_final_wait before exiting.Other
In order to simplify the handling process, SRS does not clean up memory objects when stopping the stream, as the stream may be re-pushed. If cleaning is required, it would result in complex and careful handling of Source objects, which is not conducive to problem simplification.
Not cleaning up Source objects will cause continuous memory growth. This may not be a noticeable issue in scenarios where there is less streaming and more playback. However, in scenarios with a lot of streaming, such as monitoring and conference scenarios, cleaning up the streams becomes necessary. Reference:
Currently, partial optimizations have been implemented to alleviate this issue.
At the same time, we are also considering the most stable and easiest solution. There is another idea to make SRS support smooth exit and smooth upgrade, roughly as follows:
This way, the old SRS can easily and safely release the created sources and potential other memory issues. Users can smoothly upgrade and exit SRS during off-peak periods according to their business needs, minimizing the impact on users.
The only issue is that when both the new and old SRS are providing services, the API is provided by the new SRS, which means that the system count is not accurate, and the number of users served by the old SRS may be missed.
TRANS_BY_GPT3
The text was updated successfully, but these errors were encountered: