Skip to content

Commit

Permalink
Add support for azure-cli v2.53+.
Browse files Browse the repository at this point in the history
In 2.53 the azure-cli added calls to older API versions, these changes now scan the azure-cli code for these versions and add them to the list of versions to keep. This make the final trimmed azure package slightly bigger but with improved compatibility.
  • Loading branch information
sodul committed Feb 7, 2024
1 parent 530c5ca commit 265c4cd
Show file tree
Hide file tree
Showing 5 changed files with 108 additions and 53 deletions.
12 changes: 7 additions & 5 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,21 +31,23 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [ '3.8', '3.9' ]
name: Python ${{ matrix.python-version }} sample
python-version: [ '3.8', '3.9', '3.10', '3.11', '3.12' ]
name: Python ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v2
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Display Python version
run: python -c "import sys; print(sys.version)"
- name: Install
run: make dev-install
- name: pylint
if: matrix.python-version == '3.12'
run: make pylint
- name: pylint
- name: mypy
if: matrix.python-version == '3.12'
run: make mypy
- name: Unit Tests
run: make test
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@

Simple Python script to purge mostly useless Azure SDK API versions.

The Azure SDK for python is over 600MB and growing. The main reason for the
The Azure SDK for python is 1.2GB and growing. The main reason for the
size and growth is that each release gets added internally and all prior
release are kept. This is a troublesome design which does not seem to be
addressed in the near future. This deleted most but not all API versions as
multiple versions are required for importing the models. This keep a high
compatibility level while trimming more than half of the space used.
compatibility level while trimming half of the space used.

This has been tested with Python 3.9, but the unittests pass with 3.8.
For the Azure versions it has been tested with azure-cli 2.25.0 to 2.27.1.
The latest version has been tested with Python 3.12.1, but the unittests pass with 3.8.
For the Azure versions the latest version has been tested with azure-cli 2.53.0 to 2.57.0.

So Long & Thanks For All The Fish.

Expand All @@ -25,10 +25,10 @@ wget https://raw.githubusercontent.com/clumio-code/azure-sdk-trim/main/azure_sdk
```

```shell
pip install git+https://github.com/clumio-code/azure-sdk-trim@v0.1.0#egg=azure-sdk-trim
pip install git+https://github.com/clumio-code/azure-sdk-trim@v0.2.0#egg=azure-sdk-trim
```

The script has been developed and tested with Python 3.9.6, but compatibility
The script has been developed and tested with Python 3.12.1, but compatibility
with older releases of Python 3 should be possible.


Expand All @@ -42,9 +42,9 @@ version is a good enough workaround but this could lead to unsuspected behavior,
so we do not intend to add symlinks automatically. We recommend filing bugs
against the upstream maintainers so that they stop pointing to obsolete APIs.

We use newer Python syntax so 3.8 is required, but the code can be modified for
backward compatibility with 3.7 if needed. We will not accept any PR to add
support for unsupported versions of Python (no python 2.7 or 3.5).
We use newer Python syntax so py3.12 is recommended, but the code can be modified for
backward compatibility with 3.8 if needed. We will not accept any PR to add
support for unsupported versions of Python (anything older than 3.8).


## Style Guide
Expand Down
100 changes: 78 additions & 22 deletions azure_sdk_trim/azure_sdk_trim.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env python3
#
# Copyright 2021 Clumio, Inc.
# Copyright 2021. Clumio, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -20,32 +20,48 @@

"""Simple script to purge mostly useless Azure SDK API versions.
The Azure SDK for python is over 600MB and growing. The main reason for the
size and growth is that each release gets added internally and all prior
release are kept. This is a troublesome design which does not seem to be
addressed in the near future. This deleted most but not all API versions as
multiple versions are required for importing the models. This keep a high
compatibility level while trimming more than half of the space used.
The Azure SDK for python is 1.2GB and growing. The main reason for the size and
growth is that each release gets added internally and all prior releases are
kept. This is a troublesome design that does not seem to be seriously addressed,
even after being reported and acknowledged for several years.
This tool deletes most but not all API versions as multiple versions depend on
each other, and the az cli itself does not always use the latest versions.
This keeps a high-compatibility level while trimming almost half of the space.
Note that the az cli team has since created their own trim_sdk.py script, linked
below.
So Long & Thanks For All The Fish.
https://github.com/Azure/azure-sdk-for-python/issues/11149
https://github.com/Azure/azure-sdk-for-python/issues/17801
https://github.com/Azure/azure-cli/issues/26966
https://github.com/Azure/azure-cli/blob/dev/scripts/trim_sdk.py
"""

from __future__ import annotations

import argparse
import importlib.util
import json
import logging
import pathlib
import re
import shutil
import subprocess
import sys
from typing import Optional, Sequence
from typing import Sequence

from humanize import filesize

try:
# Import the az cli profiles module, if present, to detect which SDK versions are used.
from azure.cli.core import profiles # type: ignore
except ImportError:
profiles = None

from humanize import filesize # type: ignore

logger = logging.getLogger(__name__)

Expand All @@ -69,6 +85,31 @@ def parse_args(argv: Sequence[str]) -> argparse.Namespace:
return parser.parse_args(argv[1:])


def get_az_cli_versions() -> dict[str, str]:
"""Returns list of SDK versions used by the az cli.
The az cli has its own opinionated set of old SDKs to deal with. This is
very unfortunate. We also have to resort to do a runtime import which is a
bad practice, but we need to access the az CLI internals.
"""
if profiles is None:
logger.info('No az cli detected.')
return {}
latest = profiles.API_PROFILES['latest']
versions = {}
for sdk, version in latest.items():
if version is None:
continue
# Use .removeprefix() instead of .replace() when we drop python 3.8 support.
sdk_dir = sdk.import_prefix.replace('azure.', '', 1).replace('.', '/')
version_string = version if isinstance(version, str) else version.default_api_version
version_dir = 'v' + version_string.replace('.', '_').replace('-', '_')
versions[sdk_dir] = version_dir
logger.info('Detected az cli with %d SDKs to keep.', len(versions))
logger.debug(json.dumps(versions, indent=4, sort_keys=True))
return versions


def disk_usage(path: pathlib.Path) -> int:
"""Returns the disk usage size in byte for the given path.
Expand All @@ -91,23 +132,30 @@ class VersionedApiDir:
Such directories will contain one or more version folder such as v7.0,
v2020_12_01 or v2021_02_01_preview and a file named models.py which will
imports the models from specific versions. The most recent, default, version
is assumed to be in that list of imports.
import the models from specific versions. The most recent, default, versions
are assumed to be in that list of imports.
We scrape the imports lines in the models file to detect that this is a
multi versioned API with potential folders to be trimmed. The import list is
use to whitelist the folders we need to keep.
We scrape the import lines in the models.py file to detect that this is a
multi-versioned API with potential folders to be trimmed. The import list is
them used as a keep list.
"""

def __init__(self, path: pathlib.Path):
def __init__(self, path: pathlib.Path, base_dir: pathlib.Path) -> None:
self._path = path.parent if path.name == 'models.py' else path
self._versions: Optional[tuple[str, ...]] = None
self._base_dir = base_dir
self._versions: set[str] | None = None
self._keep: set[str] = set()

@property
def path(self) -> pathlib.Path:
"""Returns the path of the API directory."""
return self._path

@property
def sub_dir(self) -> str:
"""Returns the subdirectory of the API directory."""
return str(self.path.relative_to(self._base_dir))

def _parse_models(self) -> tuple[str, ...]:
"""Parse models.py to find which versions are in imported and in use."""
models_path = self._path / 'models.py'
Expand All @@ -120,11 +168,18 @@ def _parse_models(self) -> tuple[str, ...]:
versions.append(match.group(1))
return tuple(versions)

def keep(self, versions: str | Sequence[str] = ()):
"""Keep the given versions."""
self._keep.update((versions,) if isinstance(versions, str) else versions)
self._versions = None

@property
def versions(self) -> tuple[str, ...]:
def versions(self) -> set[str]:
"""Returns the versions declared in models.py."""
if self._versions is None:
self._versions = self._parse_models()
versions = set(self._parse_models())
versions.update(self._keep)
self._versions = versions
return self._versions

@property
Expand Down Expand Up @@ -153,7 +208,7 @@ def find_api_dirs(base_dir: pathlib.Path) -> set[VersionedApiDir]:
"""Find the API directories with multiple versions."""
api_dirs = set()
for sub_path in base_dir.rglob('models.py'):
api_dir = VersionedApiDir(sub_path)
api_dir = VersionedApiDir(sub_path, base_dir)
if api_dir.is_versioned:
api_dirs.add(api_dir)
return api_dirs
Expand All @@ -175,19 +230,20 @@ def purge_old_releases(base_dir: pathlib.Path):
usage = disk_usage(base_dir)
logger.info('%s is using %s.', base_dir, filesize.naturalsize(usage))

cli_versions = get_az_cli_versions()
api_dirs = find_api_dirs(base_dir)
for api_dir in api_dirs:
if api_dir.sub_dir in cli_versions:
api_dir.keep(cli_versions[api_dir.sub_dir])
purge_api_dir(api_dir)

new_usage = disk_usage(base_dir)
logger.info('%s is now using %s.', base_dir, filesize.naturalsize(new_usage))
logger.info('Saved %s.', filesize.naturalsize(usage - new_usage))


def main(argv: Optional[Sequence[str]] = None):
def main(argv: Sequence[str]):
"""Main."""
if argv is None:
argv = sys.argv
args = parse_args(argv)
if args.verbose:
logging.basicConfig(
Expand Down
16 changes: 8 additions & 8 deletions dev-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@

-e .

black>=21.6b0
gray>=0.10.1
green>=3.3.0
mypy>=0.910
mypy-extensions>=0.4.3
pyfakefs>=4.5.0
pylint>=2.9.6
pyupgrade>=2.23.3
black>=24.1.1
gray>=0.14.0
green>=4.0.0
mypy>=1.8.0
mypy-extensions>=1.0.0
pyfakefs>=5.3.4
pylint>=3.0.3
pyupgrade>=3.15.0
unify>=0.5
15 changes: 6 additions & 9 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,21 +16,15 @@

"""Setup script for azure-sdk-trim."""

import sys

from setuptools import find_packages
from setuptools import setup

if sys.version_info[0] < 3:
with open('README.md') as fh:
long_description = fh.read()
else:
with open('README.md', encoding='utf-8') as fh:
long_description = fh.read()
with open('README.md', encoding='utf-8') as fh:
long_description = fh.read()

setup(
name='azure-sdk-trim',
version='0.1.0',
version='0.2.0',
description='Python SDK for Clumio REST API',
long_description=long_description,
long_description_content_type='text/markdown',
Expand All @@ -41,6 +35,9 @@
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3 :: Only',
'Programming Language :: Python :: 3.9',
'Programming Language :: Python :: 3.10',
'Programming Language :: Python :: 3.11',
'Programming Language :: Python :: 3.12',
],
packages=find_packages(),
entry_points={'console_scripts': ['azure-sdk-trim=azure_sdk_trim.azure_sdk_trim:main']},
Expand Down

0 comments on commit 265c4cd

Please sign in to comment.