-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing zig solutions is slow on Exercism #63
Comments
@ErikSchierboom is this and this the current and complete source of truth for how Exercism runs a container in production? |
Yes, that is correct. |
Better conform with Dockerfile best practices [1]: If a service can run without privileges, use `USER` to change to a non-root user. See also the Docker docs on `USER` [2], and the Alpine Linux wiki page on creating a new user [3]. The commit [4] that added a zig cache significantly speeds up a `RUN zig test foo` command that's added at the bottom of the Dockerfile. But it doesn't seem to be a speedup in production. I suspect that the zig cache isn't found in production, due to differences in how containers are run (see Exercism's terraform and tooling-invoker repos [5][6]). I suspect that this commit may be insufficient to fix the problem, but it's a step in the right direction. Possible follow-up work includes: - Setting zig cache environment variables, or passing zig cache location options to `zig test` - Changing the paths of solution files. - Updating Zig. I saw a recent commit [7] that touches Cache.zig. Closes: #60 Closes: #62 Refs: #63 [1] https://docs.docker.com/develop/develop-images/dockerfile_best-practices/ [2] https://docs.docker.com/engine/reference/builder/#user [3] https://wiki.alpinelinux.org/wiki/Setting_up_a_new_user [4] 82b56af, 2023-08-23, ".dockerignore, Dockerfile: add a zig cache" [5] https://github.com/exercism/terraform/blob/3ff7b3b5a2f6/terraform/tooling_invoker/ami.sh#L68-L88 [6] https://github.com/exercism/tooling-invoker/blob/88a9d826a746/lib/tooling_invoker/job_processor/exec_docker.rb#L150-L162 [7] ziglang/zig@020105d0dde6 2023-08-20, "Cache: Fix findPrefix when paths are slightly out of the ordinary"
This uses a worse pattern [1], but it might help Zig use the cache. Follow up from recent commits [2][3]. Refs: exercism#63 [1] https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#use-multi-stage-builds [2] 82b56af, 2023-08-23, ".dockerignore, Dockerfile: add a zig cache" [3] b8b9d02, 2023-08-27, "Dockerfile, bin(run-tests), tests: set USER"
Two recent commits [1][2] tried to speed up running user solutions in production, but weren't sufficient. Make the `ziggy` user, rather than the user in the earlier stage, create the cache. This is a worse Dockerfile pattern [3], but it might help Zig to use the cache. A zig cache directory like `~/.cache/zig` contains an `h` subdirectory, with `.txt` files like: 0 18137 11111111 1692020106000000000 0d5def4e330aca188b4480e29786a6c0 1 c.zig 18137 11111111 1692020106000000000 0d5def4e330aca188b4480e29786a6c0 1 c.zig 9163 22222222 1692020106000000000 514b9d62f95db9ea86d2cd5008c5d946 1 std/std.zig 9163 22222222 1692020106000000000 514b9d62f95db9ea86d2cd5008c5d946 1 std/std.zig 22609 33333333 1692020106000000000 0538267a6cdadf6d7dadf7693fce1bed 1 std/start.zig [...] It looks like the format is [4]: manifest header (0), then rows of: {size} {inode} {mtime_ns}, {crypto.auth.siphash.SipHash128 hash} {prefix} {sub_path} where the prefixes are commented as: /// A set of strings such as the zig library directory or project source root, which /// are stripped from the file paths before putting into the cache. They /// are replaced with single-character indicators. This is not to save /// space but to eliminate absolute file paths. This improves portability /// and usefulness of the cache for advanced use cases. but maybe the cache didn't allow what we were doing before this commit: creating it with one user at some location, then moving it to another location to be used by a different user. If this commit still isn't sufficient, we could try updating Zig. Exercism currently uses the Zig 0.11.0 release, which doesn't include a recent commit for the zig cache system [5]. Refs: #63 [1] 82b56af, 2023-08-23, ".dockerignore, Dockerfile: add a zig cache" [2] b8b9d02, 2023-08-27, "Dockerfile, bin(run-tests), tests: set USER" [3] https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#use-multi-stage-builds [4] https://github.com/ziglang/zig/blob/0.11.0/lib/std/Build/Cache.zig#L840-L851 [5] ziglang/zig@020105d0dde6 2023-08-20, "Cache: Fix findPrefix when paths are slightly out of the ordinary"
Three recent commits [1][2][3] tried to speed up running user solutions in production, but weren't sufficient. Continue trying, by running `zig test` indirectly. Refs: exercism#63 [1] 82b56af, 2023-08-23, ".dockerignore, Dockerfile: add a zig cache" [2] b8b9d02, 2023-08-27, "Dockerfile, bin(run-tests), tests: set USER" [3] 3f2c0c2, 2023-08-27, "Dockerfile: create zig cache later"
Three recent commits [1][2][3] tried to speed up running user solutions in production, but weren't sufficient. Continue trying. With Exercism's test runner architecture, the x86_64 CPU that creates the cache is likely to have different features than the x86_64 CPU that runs `zig test` in production. If that caused a cache miss, this commit might help. Refs: exercism#63 [1] 82b56af, 2023-08-23, ".dockerignore, Dockerfile: add a zig cache" [2] b8b9d02, 2023-08-27, "Dockerfile, bin(run-tests), tests: set USER" [3] 3f2c0c2, 2023-08-27, "Dockerfile: create zig cache later"
Three recent commits [1][2][3] tried to speed up running user solutions in production, but weren't sufficient. Continue trying, by running `zig test` indirectly. Extra advantage: we have a single source of truth for `zig test` options, which helps if we want to change them later. Disadvantage: `docker build` no longer fails if this line produces a failing test. That's why I didn't do it this way originally. Refs: #63 [1] 82b56af, 2023-08-23, ".dockerignore, Dockerfile: add a zig cache" [2] b8b9d02, 2023-08-27, "Dockerfile, bin(run-tests), tests: set USER" [3] 3f2c0c2, 2023-08-27, "Dockerfile: create zig cache later"
Three recent commits [1][2][3] tried to speed up running user solutions in production, but weren't sufficient. Continue trying. With Exercism's test runner architecture, the x86_64 CPU that creates the cache is likely to have different features than the x86_64 CPU that runs `zig test` in production. If that caused a cache miss, this commit might help. Refs: exercism#63 [1] 82b56af, 2023-08-23, ".dockerignore, Dockerfile: add a zig cache" [2] b8b9d02, 2023-08-27, "Dockerfile, bin(run-tests), tests: set USER" [3] 3f2c0c2, 2023-08-27, "Dockerfile: create zig cache later"
Four recent commits [1][2][3][4] tried to speed up running user solutions in production, but weren't sufficient. Continue trying. With Exercism's test runner architecture, the x86_64 CPU that creates the cache is likely to have different features than the x86_64 CPU that runs `zig test` in production. If that meant that Zig couldn't use the cache, this commit might help. This commit produces the below diff to a `builtin.zig` file in a `/home/ziggy/.cache/zig/o/` subdirectory in the image. Note the reduced set of CPU features. @@ -12,66 +12,22 @@ pub const single_threaded = false; pub const abi = std.Target.Abi.musl; pub const cpu: std.Target.Cpu = .{ .arch = .x86_64, - .model = &std.Target.x86.cpu.icelake_server, + .model = &std.Target.x86.cpu.x86_64, .features = std.Target.x86.featureSet(&[_]std.Target.x86.Feature{ .@"64bit", - .adx, - .aes, - .allow_light_256_bit, - .avx, - .avx2, - .avx512bw, - .avx512cd, - .avx512dq, - .avx512f, - .avx512vl, - .bmi, - .bmi2, - .clflushopt, .cmov, - .crc32, - .cx16, .cx8, - .ermsb, - .f16c, - .fast_15bytenop, - .fast_gather, - .fast_scalar_fsqrt, - .fast_shld_rotate, - .fast_variable_crosslane_shuffle, - .fast_variable_perlane_shuffle, - .fast_vector_fsqrt, - .fma, - .fsgsbase, - .fsrm, .fxsr, .idivq_to_divl, - .invpcid, - .lzcnt, .macrofusion, .mmx, - .movbe, .nopl, - .pclmul, - .popcnt, - .prefer_256_bit, - .prfchw, - .rdrnd, - .rdseed, - .rtm, - .sahf, + .slow_3ops_lea, + .slow_incdec, .sse, .sse2, - .sse3, - .sse4_1, - .sse4_2, - .ssse3, .vzeroupper, .x87, - .xsave, - .xsavec, - .xsaveopt, - .xsaves, }), }; pub const os = std.Target.Os{ @@ -79,14 +35,14 @@ pub const os = std.Target.Os{ .version_range = .{ .linux = .{ .range = .{ .min = .{ - .major = 5, - .minor = 15, + .major = 3, + .minor = 16, .patch = 0, }, .max = .{ .major = 5, - .minor = 15, - .patch = 0, + .minor = 10, + .patch = 81, }, }, Refs: #63 [1] 82b56af, 2023-08-23, ".dockerignore, Dockerfile: add a zig cache" [2] b8b9d02, 2023-08-27, "Dockerfile, bin(run-tests), tests: set USER" [3] 3f2c0c2, 2023-08-27, "Dockerfile: create zig cache later" [4] 0636420, 2023-08-27, "Dockerfile: create zig cache via run.sh, not zig test"
It looks like b58066f helped. See #71 (comment). Next I'll try to compile more of the zig stdlib and add it to the image's zig cache. (Edit: that doesn't seem to produce a significant speedup). |
I still don't understand why if I locally:
it takes about 3 seconds locally on a slow machine (definitely using the container's zig cache), but it takes Exercism about 7-8 seconds to run the tests for
Any ideas? |
It must be the container startup time, because https://github.com/exercism/zig-test-runner/actions/runs/5992634417/job/16252228546 indicates a successful deploy, which is I can verify by going to https://exercism.org/maintaining/tracks/zig. |
Yeah, I know about this. But it's possible for For example, it seemed like b58066f really was a consistent 2x speedup over the immediately preceding state. I believe that Zig builds for the native CPU by default (even in debug mode, with the rationale that otherwise it'd give up performance when running debug builds, which is sometimes important), like passing Locally, the CPU at execution time was the same as the CPU at build time, so Zig could use the image's cache. But that wasn't true for the image that was deployed in production, because the GitHub Actions CPU features in general won't match those of the AWS machine. So I'm trying to remove other possible differences in production that stop Zig from fully using the image's cache. For example, I noticed that the cache manifest contains inodes, but I haven't checked yet whether that means the cache cannot be used if transferred to another filesystem at the same path, or whether there's an extra cost if you do that. But it's designed to avoid absolute paths, at least. I'll ask some Zig people if I can't find out myself. |
Is there anything I can run in production that will figure out the right architecture? |
Yes, but:
|
82b56af is a significant speedup for a
RUN zig test foo
command that's added at the bottom of theDockerfile
. But it doesn't seem to be a speedup in production.Some possible short-term ways to improve:
USER
(done in b8b9d02)bin/run.sh
, notzig test
directly (done in 0636420)-target
(done in b58066f)zig test
tmpfs
.Longer term:
The text was updated successfully, but these errors were encountered: