Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prevent doing excessive file system checks in require calls #40890

Merged
merged 1 commit into from
Jun 2, 2021

Conversation

KristofferC
Copy link
Member

Ref #40570

Some background on the code loading code:

The parsing of TOML files used to be implemented completely stateless. When asking for example what UUID a package the TOML parsing system would look through the TOML files line by line until it found it. Asking for another package, it would start from scratch again. This caused these files to have to be opened and closed a large number of times #27414 (comment).

In #36018 code loading was made to use a real TOML parser and in #37906 a cache was introduced to avoid having to reparse the same TOML file over and over.

However, a lot of the "statelessness" from the old code loading still exists. For example, every time load_path is called, Julia goes through the file system and looks around for project files. This is called many many times for a single using call. One way to measure how much julia hits the file system is by running with strace and looking for different stat calls:

❯ strace -o stat_master julia -e 'using GR_jll'

❯ cat stat_master | grep -c statx
9322

Looking in the file we can for example see

lstat("/home", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/home/kc", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/home/kc/.julia", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
lstat("/home/kc/.julia/environments", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
lstat("/home/kc/.julia/environments/v1.6", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
lstat("/home/kc/.julia/environments/v1.6/Project.toml", {st_mode=S_IFREG|0664, st_size=3711, ...}) = 0

being called over and over (corresponding to

realpath(project_file)
).

This PR assumes that within a single require calls, things like the load path, the content of Project files, etc will not change. This is a bit similar to the restriction we have on "world age".

With this PR we get:

❯ strace -o stat_pr ./julia -e 'using GR_jll'

❯ cat stat_pr | grep -c stat
2252

which is a quite significant reduction.

I admit that the implementation here is ugly but I consider this just as an ephemeral state of things for 1.7. For 1.8 I want to do a proper update of the code loading and do something similar that was started in #37632 to propagate path info to precompile workers so that only the main worker has to do any package location lookups.

@mkitti perhaps you could do some measurements on Windows and see if this improves #40570?

@KristofferC KristofferC added the packages Package management and loading label May 20, 2021
@KristofferC KristofferC force-pushed the kc/load_path_cache_v1 branch from 28994e9 to 4ea7216 Compare May 20, 2021 14:19
@mkitti
Copy link
Contributor

mkitti commented May 20, 2021

I hope to have some measurements by this weekend. Thanks for your work on this.

@KristofferC
Copy link
Member Author

Might as well @nanosoldier runtests(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here. cc @maleadt

@KristofferC
Copy link
Member Author

Looks like there are some load path things that need to be looked at, but there also seems to be some big problems with PkgEval getting stuck when installing dependencies that I think are unrelated to this PR.

@mkitti
Copy link
Contributor

mkitti commented May 21, 2021

Maybe I should hold off benchmarking then?

@KristofferC
Copy link
Member Author

Nah, the performance will be the same I'm pretty sure. I must just have missed something small.

@KristofferC KristofferC force-pushed the kc/load_path_cache_v1 branch from 65571ab to eb6cb24 Compare May 24, 2021 09:22
@KristofferC
Copy link
Member Author

The bug should be fixed, let's try again

@nanosoldier runtests(ALL, vs = ":master")

@KristofferC KristofferC force-pushed the kc/load_path_cache_v1 branch from eb6cb24 to 1172c6b Compare May 24, 2021 11:41
@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here. cc @maleadt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
packages Package management and loading
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants