This repository has been archived by the owner on Sep 14, 2021. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
use http-server-stabilizer + 4 worker subprocesses
This change makes syntect_server resilient to the two classes of problems we've seen in production usage of it: 1. Some specific language grammar/file pairs can cause syntect to panic internally. This is usually because syntect doesn't implement a specific sublime-syntax feature in some way and it [panics instead of returning result types](trishume/syntect#98). 2. Much rarer, some specific language/grammar file pairs can cause syntect to get stuck in an infinite loop internally -- never to return and consuming an entire CPU core until it is restarted manually. Previously we tried to solve #1 through stack unwinding (c5773da), but since the 2nd issue above also appeared it proved to not be sufficient on its own. It is still useful, though, because it can do per-request recovery of the first failure scenario above and as such it will be added back in. Even without stack unwinding, http-server-stabilizer helps both cases above by running and monitoring replicas of syntect_server. See the README in https://github.com/slimsag/http-server-stabilizer for details. It is important to note that all this does is stop these individual file failures from harming other requests to syntect_server. They are still issues on their own, and logging and Prometheus monitoring is now in place for us to identify when this is occurring and in which file it occurred so we can track down the issue and make small reproduction cases to file and fix upstream. Since only one instance of syntect_server was previously running and we now run multiple, more memory is needed. Each instance requires about 1.1 GB at peak (depending on which languages are used). The default is now to run 4 workers, so 4.4 GB is the minimum required and 6 GB is suggested. In the event only one worker is ran (via setting the env var `WORKERS=1`), stability is still greatly improved since the 2nd failure case above can only last a short period of time instead of until the container is restarted manually. Part of sourcegraph/sourcegraph#5406
- Loading branch information