Skip to content
This repository has been archived by the owner on Feb 4, 2020. It is now read-only.

Add optional server process for caching file hashes #248

Merged
merged 5 commits into from
Dec 14, 2016
Merged

Conversation

frerich
Copy link
Owner

@frerich frerich commented Nov 18, 2016

This somewhat experimental patch introduces a new program called clcachesrv to the distribution. It's a server process which opens a named pipe and waits for incoming messages via that pipe - each message is a newline-separated list of file paths. For each message, clcachesrv will compute and cache the file hashes. The cached hashes are invalidated automatically since clcachesrv uses file system monitoring API to detect changes to the files for whose the hashes were cached.

The PR also introduces a new CLCACHE_SERVER variable which, when set, makes clcache use clcachesrv instead of calculating the hashes itself.

My hope is that this resolves (or at least improves the situation described in) #239 -- while discussing the unexpectedly poor performance of cache hits, the code for reading files and hashing them turned out to be dominant in the runtime profile. The clcachesrv server hopefully improves this since it will only read and cache files when needed, even across multiple clcache invocations.

@sasobadovinac
Copy link

how to run clcachesrv.py as a server process from command line? will it be possible to dump and load cache from this process or how could this be used in case like appveyor where we start every time from a clean system? https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.383/job/5fh1qd2m5h26jf17

@frerich
Copy link
Owner Author

frerich commented Nov 21, 2016

The idea is that the server is simply run on the command line. In this rough proof of concept, it opens a named pipe (\\.\pipe\clcache_srv) and waits for incoming messages. There is no means to make it save or restore the data in memory, but maybe we want to do that later.

For what it's worth, I'm currently experimenting with a reimplementation of clcachesrv based on pyuv, Python bindings to libuv. Maybe this simplifies stuff.

@frerich frerich force-pushed the clcachesrv branch 2 times, most recently from 6316458 to d190959 Compare November 22, 2016 18:36
@codecov-io
Copy link

codecov-io commented Nov 22, 2016

Current coverage is 89.27% (diff: 31.81%)

Merging #248 into master will decrease coverage by 1.26%

@@             master       #248   diff @@
==========================================
  Files             1          1          
  Lines          1015       1035    +20   
  Methods           0          0          
  Messages          0          0          
  Branches        171        173     +2   
==========================================
+ Hits            919        924     +5   
- Misses           68         82    +14   
- Partials         28         29     +1   

Powered by Codecov. Last update b611ea6...32db227

@sasobadovinac
Copy link

@frerich
Copy link
Owner Author

frerich commented Nov 23, 2016

Thanks for the notice - I found an exception in line 3023.

It seems that the server tries to get the hash given a truncated path. I think the problem here is that so much data is sent by clcache (so many paths) that either the buffer size was exceeded (currently 64k) or the server side did not read it in one chunk, i.e. a single 'read()' call was not enough. I'll play with it some more later today.

Thanks for testing the implementation. :-)

@sasobadovinac
Copy link

Yes, if you search for "fail" you will normally quickly find all the real issues... Just note that I don't always fully understand what I am doing, so if I should test it in some different way just say ;)

@frerich frerich force-pushed the clcachesrv branch 2 times, most recently from a8f374b to f14d1a9 Compare November 24, 2016 09:42
@sasobadovinac
Copy link

sasobadovinac commented Nov 25, 2016

So yes, it looks like something is working :) I actually did not expect it to show any speed up on the second run since I thought it would be required to dump, store and reload the file hashes from clcachesrv to be able to do that...

https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.399 msbuild time 36:27.80
https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.400 msbuild time 09:27.24

@frerich
Copy link
Owner Author

frerich commented Nov 25, 2016

@sasobadovinac Thanks for providing those numbers!

If I read it correctly, in build 1.0.399 you built freecad twice:

  1. First, freecad was built in 64bit mode. The job started with an empty cache and clcachesrv running. The build finished about 42 minutes later and resulted in a cache 741MB in size and 1221 misses (0 hits).
  2. Next, freecad was built in 32bit mode. The job started with restoring (?) a previously populated cache which was 492MB in size and 1221 entries (all cache misses, no cache hits). 42 minutes later, the build finishe - the cache now contains 2442 entries, i.e. twice the number it had at the beginning. Still, no cache hits - and now, the cache is 985MB in size.

Later, build 1.0.400 started:

  1. First, freecad was built in 64bit mode. The job starts by restoring a cache, apparently the cache from the 64bit in job 1.0.399? The restored cache is 741MB in size with 1221 entries. clcachesrv is started, and the compile starts. 15 minutes later, the build finished. At this point, the cache is still 741MB in size, with 1221 entries -- and 1221 hits. I.e. the entire build caused cache hits.
  2. Similarly, the 32bit build restores the cache from the previous 32bit build. It also finished 15 minutes later, with 100% cache hits.

That looks like a nice speedup, but I wonder: how much of this is because of setting CLCACHE_SERVER? I.e. how long would a build take (a build with an empty cache and then one with a filled cache) if you would not use CLCACHE_SERVER?

@sasobadovinac
Copy link

sasobadovinac commented Nov 25, 2016

@frerich Yes, I do an 64bit and an 32bit build every time and each of them save / restore their own cache. This are about the "normal" times I am getting for this builds, about 35 min if clcache is not used or empty cache (setting up the clcache and saving / restoring the cache can add few minutes, so we have in this case about 42 min) and about 15 min if clcache is used on the same commit (no code change, all cache hits).

So I don't really see any visible speed up because of clcachesrv compared to just running clcache directly. Here is one to compare https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.355 and one with xxhash https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.360

@frerich
Copy link
Owner Author

frerich commented Nov 25, 2016

Interesting! Can you maybe run both a 'cold' and a 'warm' build (i.e. with empty and filled caches) but this time also set the CLCACHE_PROFILE environment variable and then use the 'showprofilereport.py' scripts to get a runtime profile? That would allow us to determine where the time is spent for cache misses (i.e. during the cold build) and cache hits (for the warm build).

You can simply set CLACHE_PROFILE to 1, then run clcache as usual and at the end just run showprofilereport.py -- it will collect all generated profile data and print a report to standard output.

@sasobadovinac
Copy link

sasobadovinac commented Nov 25, 2016

I am just now running a build with clcachesrv where I have fully cleaned and rebuild the cache and will next run another build since in this way it shows the best times to compare (I am often just running build after build with different parameters so the cache gets mixed and big). After that I will run with CLACHE_PROFILE, should I do that with clcachesrv or clcache? I can do both just don't know if today :)

@frerich
Copy link
Owner Author

frerich commented Nov 25, 2016

A profile with CLCACHE_SERVER set (and clcachesrv running) would be perfect, thank you!

One more thing came to my mind: I see that in the logs there is also some CMake output, so I guess not all of the time is spent on actually compiling stuff. Maybe you could print some time stamps before/after the actual compilation so that we get a better idea of the relative improvement?

@sasobadovinac
Copy link

sasobadovinac commented Nov 25, 2016

@frerich msbuild writes the time of just the build time at the end, it is indeed best to compare this ( https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.415/job/fbk1qike6mkxob6l#L8962 ), also if you hover your mouse over each line of the report it will shows the time :)

But I am doing something wrong or is clcachesrv not supported by profiler ( https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.415/job/fbk1qike6mkxob6l#L9252 )?

@frerich
Copy link
Owner Author

frerich commented Nov 25, 2016

Thanks for the tip with the tooltips - that's useful! I didn't know that!

As for the issue with running the showprofilereport.py: it's a Python script, so I think all that's missing is adding python in front of the command line. Or maybe it's executed in the wrong working directory.

When setting CLCACHE_PROFILE, clcache will generate .prof file for each invocation which contain the runtime information. The showprofilereport.py script then recursively aggregates all .prof files in the current working directory to build a report.

@sasobadovinac
Copy link

sasobadovinac commented Nov 26, 2016

@frerich
Copy link
Owner Author

frerich commented Nov 26, 2016

Thanks! Alas, it appears that showprofilereport.py was not called in the right working directory: I see output like this:

python showprofilereport.py
Reading C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\clcache-46129633486cca23d9f911ff568eff96.prof...
Reading C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\clcache-61714fd51d7a8a307763744808a886d8.prof...
Reading C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\clcache-7a083a10cc7a2bb17dcca48e072ba07d.prof...
Sat Nov 26 02:22:16 2016    C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\clcache-46129633486cca23d9f911ff568eff96.prof
Sat Nov 26 02:22:15 2016    C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\clcache-61714fd51d7a8a307763744808a886d8.prof
Sat Nov 26 02:22:15 2016    C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\clcache-7a083a10cc7a2bb17dcca48e072ba07d.prof
         48931 function calls (48915 primitive calls) in 0.407 seconds

I.e. the script is called in the directory C:\Program Files (x86)\Microsoft Visual Studio 12.0 or so, not in the actual build directory of your application. Hence, it only picks up three .prof files (i.e. the runtime data of three compiler invocations) and thus generates a very short report. It seems the directory C:\projects\freecad\build might be better, maybe you can give that a try?

@sasobadovinac
Copy link

sasobadovinac commented Nov 26, 2016

Yes sorry, I run it from where I run the clcache and when I saw some output I thought it was ok :) Does this one look good https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.423 ? It is just a test to make sure the showprofilereport.py is showing correct data, will rerun the other builds after it...

@frerich
Copy link
Owner Author

frerich commented Nov 26, 2016

Yes, that's looking much better. 👍

@sasobadovinac
Copy link

sasobadovinac commented Nov 26, 2016

@frerich
Copy link
Owner Author

frerich commented Nov 26, 2016

Thanks! Here's a quite analysis of the profile reports:

  1. Without clcachesrv, the profile report says that 1243.841 seconds were spent on executing the Python code in clcache; this is about 21 minutes, i.e. longer than the entire job. This probably happens because clcache is called with multiple source files, in which case it reinvokes itself for each source file and builds them concurrently, so the runtime of each concurrent invocation is summed on top of the parent process' runtime. Most of that runtime (924 seconds) is spent in WaitForSingleObject, and 329 of those seconds are caused by the waitForAnyProcess function in clcache waiting for any of the recursively launched processes to finish. So I think it's safe to say that the 'effective' runtime is more like 1243-329 = 914 seconds.

  2. With clcachesrv, 990 seconds were spent in the clcache Python code. 820 seconds in WaitForSingleObject alone! 230 of those seconds are caused by waitForAnyProcess, i.e. if we deduct that we are left with 990-230 = 760 seconds. So that would be a 20% improvement but I wonder whether the runtime of the clcachesrv script should be included (I don't think so though, the runtime of clcachesrv is implicitly contained in the runtime of the getFileHashes function since it waits for the server.

@sasobadovinac
Copy link

One thing to add, I believe this is running with 2 CPU cores

@frerich frerich force-pushed the clcachesrv branch 2 times, most recently from 85783f3 to a2ad069 Compare December 6, 2016 15:28
This makes clcache honour a new CLCACHE_SERVER environment variable; if
it is set, clcache will not compute hash sums for files itself but
rather expect that there is a clcachesrv instance running. This should
improve the runtime for cache hits.

A response starting with '!' indicates that an exception occured, and
the rest of the response can be deserialised using pickle to get an
exception value which can be thrown.

What's a bit ugly is that I had to duplicate the pipe name.
@sasobadovinac
Copy link

sasobadovinac commented Dec 6, 2016

Here are two builds from the commit before... I see you made some updates, should I rerun them again? Is run jobs fix from the master included here?

https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.479 msbuild time 38:09.34
https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.480 msbuild time 09:30.97

@frerich
Copy link
Owner Author

frerich commented Dec 6, 2016

@sasobadovinac It appears that the version you tried is okay; the changes which happened after that are mostly cosmetic. For what it's worth, it was recently noted that the --onefile argument for pyinstaller makes things quite a bit slower. You could try omitting it and see whether it improves things for you.

@sasobadovinac
Copy link

sasobadovinac commented Dec 7, 2016

I am afraid I don't know how to run it by omitting --onefile on appveyor :\

https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.489/job/l0thxsba4k579pfr

@frerich
Copy link
Owner Author

frerich commented Dec 7, 2016

Googling for that error message yields a lot of hits, it's apparently a symptom of a Python installation mixup -- which may well be the case, given that the AppVeyor VMs come with multiple Python installations. My superficial understanding is that the message means that you're using some Python version which tries to pick up libraries from a different Python installation. Maybe caused by your build script setting PATH to point to a different Python, but not PYTHONPATH (or PYTHONHOME).

Try setting PYTHONPATH and/or PYTHONHOME to C:\Python34; does that help?

@akleber
Copy link
Contributor

akleber commented Dec 7, 2016

For me clcachesrv now works. Also I produced kind of an unbelievable result but I was not able to find any problems/errors yet. So here we go:
Our build without clcache takes 45min. With clcache with cold cache it now takes 35min, which I do not yet understand. But the best part is, that the build with clcache and with a warm cache now takes 7 min. clcachesrv was started before the cold cache build and ran until after the warm cache build. Its memory footprint was 23mb at the end. 7 min is unexpectedly fast so please take these numbers with a grain of salt until I verified them.
Also I had to disable the directory watching. It was not needed for the test because no file changes. I needed to do this because it seems as if it keeps some kind of handle open so that I am not able to delete watched directories. In our build-directory headers are being generated and subsequently watched. In preparation for the next build this build directory is deleted which did not work with an "Access denied" error. So I had to remove the watching feature for this cold-warm test. However in production this feature is absolutely necessary to invalidate hash-cache entries. I have not looked into this code but maybe @frerich has an idea?

@frerich
Copy link
Owner Author

frerich commented Dec 7, 2016

@akleber Thanks for giving it another try - those numbers are encouraging, I hope there's nothing blatantly wrong. Those timings are better than I expected: previous profiling report already clearly showed that the runtime for warm caches is dominated by reading files and hashing the contents (for the sake of accessing cache entries) so I kind of expected things to improve. I didn't know how fast (or slow) named pipes would be though, i.e. what the maximum speedup would be.

As for the file monitoring: I just looked at the C code of libuv (which is what the pyuv module is based on) and see that it uses ReadDirectoryChangesW on Windows to implement the file system monitoring. This function roughly works by accepting a directory name and a callback and then the callback is invoked whenever anything in that directory changes. Some blog article I found explains:

Another potential pitfall of this function [ReadDirectoryChangesW] is that the referenced directory itself is now “in use” and so can't be deleted. To monitor files in a directory and still allow the directory to be deleted, you would have to monitor the parent directory and its children.

Knowing this, I'm not sure how to proceed. Maybe we should whitelist/blacklist directory for clcachesrv such that it ignores certain directories (e.g. anything in the build directory or the temporary directory)? I think that would still give a lot of benefit, assuming that common system headers and includes of 3rd party libraries amount for a large portion of the runtime.

Or maybe clcachesrv can be made to watch directories recursively and thus we only watch certain 'base directories' -- the downside of this being that we may get a lot more events than expecte.

@siu
Copy link
Contributor

siu commented Dec 8, 2016

Some ideas:

It would make sense to run the caching server during one build only and still get most of its benefits. I.e. in the build job: clcachesrv start; (make|ninja|msbuild...); clcachesrv shutdown. Actually it could reduce the number of cases that the server has to handle like having an ever-growing number of watched folders/files, folders being used, etc.

On the other hand, if a long running process is desired, it could also make sense to watch only folders in a white list (or complementarily ignore from a black list). For example, in our use case, we would watch only the compiler headers, windows headers and the folder where we put the external dependencies.

Even with those 2 solutions in place, the first one seems more appealing to distribute in a build node.

@frerich
Copy link
Owner Author

frerich commented Dec 8, 2016

It would make sense to run the caching server during one build only and still get most of its benefits. I.e. in the build job: clcachesrv start; (make|ninja|msbuild...); clcachesrv shutdown.

There's still the risk that the build is trying to delete directories which are being watched, so for the sake of being defensive I tend to favor the whitelisting idea.

However, it might be kind of hard to tell what directories to whitelist...

@akleber
Copy link
Contributor

akleber commented Dec 9, 2016

I think for my use case it would be helpful to specify a regular expression of directory paths I would like not to be watched. E.g. something like --exclude ".*/build/.*". I do not have a List of path I would like to be watched or not be watched, but I can describe via a regular expression how paths look like that get deleted. This should be fairly simple to implement.

Another possibility although a lot more complicated to implement might be the following. Uppon exiting of clcachesrv persist the hash cache e.g. via pickle and on start load the hash cache and make sure it is up to date. This way one could have the clcachesrv process running while compiling and not running while e.g. the cleanup of intermediate directories.

@frerich
Copy link
Owner Author

frerich commented Dec 9, 2016

I think for my use case it would be helpful to specify a regular expression of directory paths I would like not to be watched.

Yes, I think a blacklist would be much easier to configure and it would work equally well. I think it's much easier to tell which directories are the 'volatile' ones (because your own build system tinkers with them) than to tell the (possibly long) list of directories from where you are pulling in headers without even knowing.

@akleber
Copy link
Contributor

akleber commented Dec 12, 2016

I got some more numbers. Base is a the Ninja build as before on the same 24 core machine. clcache is invoked via batch/python.
Without clcache the compiletime is 33min. With clcache and a cold cache its 45min and with a warm obj cache its 30min.
With clcachesrv, a cold obj cache and a cold hash cache the time is 37min (8min improvement by clcachesrv). With a warm obj cache and a warm hash cache the time is 8min (22min improvement by clcachesrv).
As I am still not convinced that the overhead of the header hashing by clcache should be so large I disabled the hash cache inside clcachesrv. So I was using clcachesrv but it essential did what clcache normally does but with a named pipe in between. This build took 30min which is the same time as clcache with warm obj cache and without clcachesrv. So the conclusion here is, that using named pipes has no notable performance penalty in my environment.
I think my next step will be to implement my proposed blacklisting functionality and try to use clcache/clcachesrv in our CI build system, to get a better feeling about the effect of clcache with actual header/source changes. My simple rebuild tests are nice for some kind of baseline but in the end it is not what is going on in our CI system...

@frerich
Copy link
Owner Author

frerich commented Dec 12, 2016

I got some more numbers.

Thanks a lot! I'd just like to say that I very much appreciate you reporting all these numbers. The project is all about performance, and when talking about performance you need cold hard numbers. Alas, I don't have access to build machines which are nearly as big as yours, so I very much appreciate seeing that you share your findings!

Without clcache the compiletime is 33min. With clcache and a cold cache its 45min and with a warm obj cache its 30min.

This is quite sobering. With clcache and a warm cache, it's basically the same time as without clcache, i.e. avoiding the compilation does not actually help much because computing hash sums eats up all the gain?

So the conclusion here is, that using named pipes has no notable performance penalty in my environment.

...and the other conclusion is that for warm caches, computing the hash sums is a significant overhead for you, no? I.e. the general idea of cashing the sums seems to help indeed for your use case.

I forgot -- in your use case, do you commonly invoke clcache with single source files or multiple source files? @TiloW seems to have made some very promising progress in #255 to improve the runtime (much more so than I would have expected...) when invoking clcache with multiple source files.

@akleber
Copy link
Contributor

akleber commented Dec 13, 2016

@frerich As I am using Ninja for my build which is created by CMake I only ever have a single source file per compiler call.
Just created PR #259 with the exclude from watching feature. I did not find how I could contribute to this PR.
However I now think I do not even need this feature. The build time with a warm obj cache and a cold vs warm hash cache is the same in my environment. Thinking about it I might not event need the watching feature at all, if I start the clcachesrv process before my build and kill it afterwards.

@frerich
Copy link
Owner Author

frerich commented Dec 13, 2016

However I now think I do not even need this feature. The build time with a warm obj cache and a cold vs warm hash cache is the same in my environment.

I'm not sure I understand - you're saying that clcachesrv does not seem to make a difference in your environment? I wonder why that is, since in an earlier comment you wrote

With clcachesrv, a cold obj cache and a cold hash cache the time is 37min (8min improvement by clcachesrv). With a warm obj cache and a warm hash cache the time is 8min (22min improvement by clcachesrv).

Maybe I misunderstood something?

@akleber
Copy link
Contributor

akleber commented Dec 14, 2016

Sorry for not being clear. clcachesrv improves the performance tremendously for our use case. What I was trying to say was, that for us it is not important if the clcachesrv internal hash cache is warm or cold.
Previously (without hard numbers) I thought that for maximum efficiency we need to have clcachesrv running constantly to have its warm in-memory hash cache. It then uses its directory watching feature to make sure its cache is consistent with the files on disk. Thats why I then implemented the directory exclude feature so that the directory watching does not interfere with our CI setup.
After I implemented this I went back to verify if it is important that clcachesrv has a warm hash cache. I tested with a warm clcache obj cache and it turned out, that for us it does not make a difference if the clcachesrv hash cash is warm at the beginning of the build.
Previously I saw that during a build clcache does compute about 2 million file hashes, from I would guess maybe 10000 different files. So it seems as if the time to compute the hashes on time in clcachesrv is small compared to the improvements when it then can use it from it in-memory cache.
Hope that clear it up a little bit.

@frerich
Copy link
Owner Author

frerich commented Dec 14, 2016

@akleber Aaah, yes - that's true of course. I think the use case you describe (computing millions of file hashes for a couple thousand different files) is very typical, so the cache maintained by clcachesrv is typicall small and 'warms up' very quickly.

Still, I think there's no real alternative to the file watching feature because we must notice file changes. However, I can imagine that you indeed don't need any blacklisting/whitelisting just yet but simply start/stop clcachesrv for the builds.

So, if we assume that any blacklisting is not required right now, I wonder - what are the remaining items to be tackled before this work could be merged?

@akleber
Copy link
Contributor

akleber commented Dec 14, 2016

From my side, there are no remaining items right now. I am now happily waiting for this PR to be merged and the next release ;-)

Frerich Raabe added 2 commits December 14, 2016 09:00
In order to make pylint find the module(s) used by clcachesrv, we have
to setup the virtual environment first such that all dependencies are
available.
@frerich
Copy link
Owner Author

frerich commented Dec 14, 2016

Merging this; it appears to work well enough for the first couple of projects, so let's pull it in and see how it goes. Future plans for the server process are to add a little bit of command line API to configure 'verbose' output (to simplify debugging), and to make use of the 'errors' arguments passed via the pyuv API. All that can be done in subsequent PRs though.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants