Build cache #10

Sipkab · 2020-01-12T09:27:06Z

This issue servers as a place of discussion for build cache related implementation.

As of the current state (2020.01.12.), there is a basic implementation of the build cache that passes tests. It is not finished, the build daemons doesn't support this feature yet, and there is no persistence behind the build caches. There is a memory based implementation that is used for testing only.

For the implementation, we should consider the following.

Task requirements

Evaluate what kind of requirements do we impose on tasks that can be cacheable.

Communication

Communication with the build cache can be done in two ways. Either using the saker.rmi library, or using a more common protocol.

Since the usage of the build cache for other purpose than with the saker.build system is not a design goal, the saker.rmi solution seems more appropriate.

Either way, the build cache will be accessible through an abstract interface and the protocol could be replaced without disruption.

Security

The build caches will usually run on a shared server that is accessible from outside. There needs to be some security measures that ensure that only the authorized clients can access data from the cache, and only the authorized clients can publish to the build cache.

The authorization could be implemented using certificates that will be used for an SSL connection with the build cache server. The server examines this certificate, and provides access to the features that the client is allowed to use.

The certificates doesn't need to be issued by some known provider, it can be managed in-house by the maintainer of the build cache. In general, there should be read and write certificates that the server recognizes.

The read certificates can be used to download content from the build cache, but doesn't allow publishing. This can be used by the developers. The write certificate allows publishing to the cache. It should be used on CI servers that publishes the results to the cache. These results can later retrieved by the clients of the cache.

Performance

We need to determine when are the suitable use-cases for the build cache to be used during build execution. If the build cache is contacted for small incremental changes, then it can degrade performance. However, if we only use the build cache for clean project builds, then it may be used too rarely to provide an advantage.

This part is open for discussion. We probably should do some heuristic based cache tries.

Settings

In order for the build cache to work, one very likely needs hash based file change tracking.

When a build is run with a build cache, we could change the default mechanism to be hash based insteda of file attributes. It can be overridden by the user, but the default may be changed.

Persistence

The published build cache data should be persisted by the server. The frequently queried data could be kept in memory.

A mechanism for efficient lookup and storing should be implemented. The build cache works with byte blobs and not structured data. We could either use some third party library/software, or implement our own solution. Using third party software may impose license and other maintenance related restrictions on the build system.

Sipkab added roadmap This feature is on the development roadmap enhancement New feature or request labels Jan 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build cache #10

Build cache #10

Sipkab commented Jan 12, 2020

Build cache #10

Build cache #10

Comments

Sipkab commented Jan 12, 2020