-
Notifications
You must be signed in to change notification settings - Fork 83
Add optional server process for caching file hashes #248
Conversation
how to run clcachesrv.py as a server process from command line? will it be possible to dump and load cache from this process or how could this be used in case like appveyor where we start every time from a clean system? https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.383/job/5fh1qd2m5h26jf17 |
The idea is that the server is simply run on the command line. In this rough proof of concept, it opens a named pipe ( For what it's worth, I'm currently experimenting with a reimplementation of |
6316458
to
d190959
Compare
Current coverage is 89.27% (diff: 31.81%)@@ master #248 diff @@
==========================================
Files 1 1
Lines 1015 1035 +20
Methods 0 0
Messages 0 0
Branches 171 173 +2
==========================================
+ Hits 919 924 +5
- Misses 68 82 +14
- Partials 28 29 +1
|
Closer but still some errors https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.398/job/62a67yy2tiaos9fd |
Thanks for the notice - I found an exception in line 3023. It seems that the server tries to get the hash given a truncated path. I think the problem here is that so much data is sent by clcache (so many paths) that either the buffer size was exceeded (currently 64k) or the server side did not read it in one chunk, i.e. a single 'read()' call was not enough. I'll play with it some more later today. Thanks for testing the implementation. :-) |
Yes, if you search for "fail" you will normally quickly find all the real issues... Just note that I don't always fully understand what I am doing, so if I should test it in some different way just say ;) |
a8f374b
to
f14d1a9
Compare
So yes, it looks like something is working :) I actually did not expect it to show any speed up on the second run since I thought it would be required to dump, store and reload the file hashes from clcachesrv to be able to do that... https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.399 msbuild time 36:27.80 |
@sasobadovinac Thanks for providing those numbers! If I read it correctly, in build 1.0.399 you built freecad twice:
Later, build 1.0.400 started:
That looks like a nice speedup, but I wonder: how much of this is because of setting CLCACHE_SERVER? I.e. how long would a build take (a build with an empty cache and then one with a filled cache) if you would not use CLCACHE_SERVER? |
@frerich Yes, I do an 64bit and an 32bit build every time and each of them save / restore their own cache. This are about the "normal" times I am getting for this builds, about 35 min if clcache is not used or empty cache (setting up the clcache and saving / restoring the cache can add few minutes, so we have in this case about 42 min) and about 15 min if clcache is used on the same commit (no code change, all cache hits). So I don't really see any visible speed up because of clcachesrv compared to just running clcache directly. Here is one to compare https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.355 and one with xxhash https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.360 |
Interesting! Can you maybe run both a 'cold' and a 'warm' build (i.e. with empty and filled caches) but this time also set the You can simply set |
I am just now running a build with clcachesrv where I have fully cleaned and rebuild the cache and will next run another build since in this way it shows the best times to compare (I am often just running build after build with different parameters so the cache gets mixed and big). After that I will run with CLACHE_PROFILE, should I do that with clcachesrv or clcache? I can do both just don't know if today :) |
A profile with One more thing came to my mind: I see that in the logs there is also some CMake output, so I guess not all of the time is spent on actually compiling stuff. Maybe you could print some time stamps before/after the actual compilation so that we get a better idea of the relative improvement? |
@frerich msbuild writes the time of just the build time at the end, it is indeed best to compare this ( https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.415/job/fbk1qike6mkxob6l#L8962 ), also if you hover your mouse over each line of the report it will shows the time :) But I am doing something wrong or is clcachesrv not supported by profiler ( https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.415/job/fbk1qike6mkxob6l#L9252 )? |
Thanks for the tip with the tooltips - that's useful! I didn't know that! As for the issue with running the When setting |
clcachesrv clcache |
Thanks! Alas, it appears that
I.e. the script is called in the directory |
Yes sorry, I run it from where I run the clcache and when I saw some output I thought it was ok :) Does this one look good https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.423 ? It is just a test to make sure the showprofilereport.py is showing correct data, will rerun the other builds after it... |
Yes, that's looking much better. 👍 |
clcachesrv clcache |
Thanks! Here's a quite analysis of the profile reports:
|
One thing to add, I believe this is running with 2 CPU cores |
85783f3
to
a2ad069
Compare
This makes clcache honour a new CLCACHE_SERVER environment variable; if it is set, clcache will not compute hash sums for files itself but rather expect that there is a clcachesrv instance running. This should improve the runtime for cache hits. A response starting with '!' indicates that an exception occured, and the rest of the response can be deserialised using pickle to get an exception value which can be thrown. What's a bit ugly is that I had to duplicate the pipe name.
Here are two builds from the commit before... I see you made some updates, should I rerun them again? Is run jobs fix from the master included here? https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.479 msbuild time 38:09.34 |
@sasobadovinac It appears that the version you tried is okay; the changes which happened after that are mostly cosmetic. For what it's worth, it was recently noted that the |
I am afraid I don't know how to run it by omitting https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.489/job/l0thxsba4k579pfr |
Googling for that error message yields a lot of hits, it's apparently a symptom of a Python installation mixup -- which may well be the case, given that the AppVeyor VMs come with multiple Python installations. My superficial understanding is that the message means that you're using some Python version which tries to pick up libraries from a different Python installation. Maybe caused by your build script setting PATH to point to a different Python, but not Try setting |
For me clcachesrv now works. Also I produced kind of an unbelievable result but I was not able to find any problems/errors yet. So here we go: |
@akleber Thanks for giving it another try - those numbers are encouraging, I hope there's nothing blatantly wrong. Those timings are better than I expected: previous profiling report already clearly showed that the runtime for warm caches is dominated by reading files and hashing the contents (for the sake of accessing cache entries) so I kind of expected things to improve. I didn't know how fast (or slow) named pipes would be though, i.e. what the maximum speedup would be. As for the file monitoring: I just looked at the C code of libuv (which is what the pyuv module is based on) and see that it uses
Knowing this, I'm not sure how to proceed. Maybe we should whitelist/blacklist directory for clcachesrv such that it ignores certain directories (e.g. anything in the build directory or the temporary directory)? I think that would still give a lot of benefit, assuming that common system headers and includes of 3rd party libraries amount for a large portion of the runtime. Or maybe clcachesrv can be made to watch directories recursively and thus we only watch certain 'base directories' -- the downside of this being that we may get a lot more events than expecte. |
Some ideas: It would make sense to run the caching server during one build only and still get most of its benefits. I.e. in the build job: On the other hand, if a long running process is desired, it could also make sense to watch only folders in a white list (or complementarily ignore from a black list). For example, in our use case, we would watch only the compiler headers, windows headers and the folder where we put the external dependencies. Even with those 2 solutions in place, the first one seems more appealing to distribute in a build node. |
There's still the risk that the build is trying to delete directories which are being watched, so for the sake of being defensive I tend to favor the whitelisting idea. However, it might be kind of hard to tell what directories to whitelist... |
I think for my use case it would be helpful to specify a regular expression of directory paths I would like not to be watched. E.g. something like Another possibility although a lot more complicated to implement might be the following. Uppon exiting of clcachesrv persist the hash cache e.g. via pickle and on start load the hash cache and make sure it is up to date. This way one could have the clcachesrv process running while compiling and not running while e.g. the cleanup of intermediate directories. |
Yes, I think a blacklist would be much easier to configure and it would work equally well. I think it's much easier to tell which directories are the 'volatile' ones (because your own build system tinkers with them) than to tell the (possibly long) list of directories from where you are pulling in headers without even knowing. |
I got some more numbers. Base is a the Ninja build as before on the same 24 core machine. clcache is invoked via batch/python. |
Thanks a lot! I'd just like to say that I very much appreciate you reporting all these numbers. The project is all about performance, and when talking about performance you need cold hard numbers. Alas, I don't have access to build machines which are nearly as big as yours, so I very much appreciate seeing that you share your findings!
This is quite sobering. With clcache and a warm cache, it's basically the same time as without clcache, i.e. avoiding the compilation does not actually help much because computing hash sums eats up all the gain?
...and the other conclusion is that for warm caches, computing the hash sums is a significant overhead for you, no? I.e. the general idea of cashing the sums seems to help indeed for your use case. I forgot -- in your use case, do you commonly invoke clcache with single source files or multiple source files? @TiloW seems to have made some very promising progress in #255 to improve the runtime (much more so than I would have expected...) when invoking clcache with multiple source files. |
@frerich As I am using Ninja for my build which is created by CMake I only ever have a single source file per compiler call. |
I'm not sure I understand - you're saying that
Maybe I misunderstood something? |
Sorry for not being clear. |
@akleber Aaah, yes - that's true of course. I think the use case you describe (computing millions of file hashes for a couple thousand different files) is very typical, so the cache maintained by Still, I think there's no real alternative to the file watching feature because we must notice file changes. However, I can imagine that you indeed don't need any blacklisting/whitelisting just yet but simply start/stop So, if we assume that any blacklisting is not required right now, I wonder - what are the remaining items to be tackled before this work could be merged? |
From my side, there are no remaining items right now. I am now happily waiting for this PR to be merged and the next release ;-) |
In order to make pylint find the module(s) used by clcachesrv, we have to setup the virtual environment first such that all dependencies are available.
Merging this; it appears to work well enough for the first couple of projects, so let's pull it in and see how it goes. Future plans for the server process are to add a little bit of command line API to configure 'verbose' output (to simplify debugging), and to make use of the 'errors' arguments passed via the pyuv API. All that can be done in subsequent PRs though. |
This somewhat experimental patch introduces a new program called
clcachesrv
to the distribution. It's a server process which opens a named pipe and waits for incoming messages via that pipe - each message is a newline-separated list of file paths. For each message,clcachesrv
will compute and cache the file hashes. The cached hashes are invalidated automatically sinceclcachesrv
uses file system monitoring API to detect changes to the files for whose the hashes were cached.The PR also introduces a new
CLCACHE_SERVER
variable which, when set, makesclcache
useclcachesrv
instead of calculating the hashes itself.My hope is that this resolves (or at least improves the situation described in) #239 -- while discussing the unexpectedly poor performance of cache hits, the code for reading files and hashing them turned out to be dominant in the runtime profile. The
clcachesrv
server hopefully improves this since it will only read and cache files when needed, even across multipleclcache
invocations.