PyPI caching mirror
- Host a proxy PyPI mirror server with caching
- Cache the index (package list and packages' file list)
- Cache the package files
- Support multiple indices
- Set index cache times-to-live (individually for each index)
- Set files cache max-size on disk
- Manually invalidate index cache
pip install proxpi
Install coloredlogs
as well to get coloured logging
Either run flask
locally
FLASK_APP=proxpi.server flask run
Or use Docker
docker run -p 5000:5000 epicwink/proxpi
See flask run --help
for more information on address and port binding, and certificate
specification to use HTTPS. Alternatively, bring your own WSGI server.
Use PIP's index-URL flag to install packages via the proxy
pip install --index-url http://127.0.0.1:5000/index/ simplejson
curl -X DELETE http://127.0.0.1:5000/cache/simplejson
curl -X DELETE http://127.0.0.1:5000/cache/list
If you need to invalidate a locally cached file, restart the server: files should never change in a package index.
PROXPI_INDEX_URL
: index URL, default: https://pypi.org/simple/PROXPI_INDEX_TTL
: index cache time-to-live in seconds, default: 30 minutes. Disable index-cache by setting this to 0PROXPI_EXTRA_INDEX_URLS
: extra index URLs (comma-separated)PROXPI_EXTRA_INDEX_TTLS
: corresponding extra index cache times-to-live in seconds (comma-separated), default: 3 minutes, cache disabled when 0PROXPI_CACHE_SIZE
: size of downloaded package files cache (bytes), default 5GB. Disable files-cache by setting this to 0
proxpi
was designed with three goals (particularly for continuous integration (CI)):
- to reduce load on PyPI package serving
- to reduce
pip install
times - not require modification to the current workflow
Specifically, proxpi
was designed to run for CI services such as
Travis,
Jenkins,
GitLab CI,
Azure Pipelines
and GitHub Actions.
proxpi
works by caching index requests (ie which versions, wheel-types, etc) are
available for a given package (the index cache) and the package files themselves (to a
local directory, the package cache). This means they will cache identical requests after
the first request, and will be useless for just one pip install
.
As a basic end-user of these services, for at least most of these services you won't be
able to keep a proxpi
server running between multiple invocations of your project(s)
CI pipeline: CI invocations are designed to be independent. This means the best that you
can do is start the cache for just the current job.
A more advanced user of these CI services can bring their own runner (personally, my
needs are for running GitLab CI). This means you can run proxpi
on a fully-controlled
server (eg EC2 instance), and proxy PyPI requests (during
a pip
command) through the local cache. See the instructions
below.
Hopefully, in the future these CI services will all implement their own transparent
caching for PyPI. For example, Azure already has
Azure Artifacts which
provides much more functionality than proxpi
, but won't reduce pip install
times for
CI services not using Azure.
This implementation leverages the index URL configurable of pip
and Docker networks.
This is to be run on a server you have console access to.
-
Start a GitLab CI Docker runner using their documentation
-
Run the
proxpi
Docker containerdocker run -d --name proxpi epicwink/proxpi:latest
You don't need to expose a port (the
-p
flag) as we'll be using the internal Docker (bridge) network. -
Discover the
proxpi
container's IP addressdocker inspect proxpi
The relevant value is at
$[0].NetworkSettings.Networks.bridge.IPAddress
-
Set
pip
's index URL to theproxpi
server by setting it in the runner environment. AddPIP_INDEX_URL=http://<IPAddress>:5000/index/
andPIP_TRUSTED_HOST=<IPAddress>
torunners.environment
in the GitLab CI runner configuration TOML. For example, you may end up with the following configuration:[[runners]] name = "awesome-ci-01" url = "https://gitlab.com/" token = "SECRET" executor = "docker" environment = [ "DOCKER_TLS_CERTDIR=/certs", "PIP_INDEX_URL=http://172.17.0.3:5000/index/", "PIP_TRUSTED_HOST=172.17.0.3", ]
This is designed to not require any changes to the GitLab CI project configuration (ie
gitlab-ci.yml
), unless it already sets the index URL for some reason (if that's the
case, you're probably already using a cache).
Another option is to set up a proxy, but that's more effort than the above method.