Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform a shallow clone of templates on first use #1855

Merged
merged 14 commits into from
Jun 6, 2024

Conversation

freakboy3742
Copy link
Member

@freakboy3742 freakboy3742 commented Jun 4, 2024

Cookiecutter's default behavior is to perform a full clone of any repository passed as a URL. This is proving problematic for the macOS and Windows app templates, because the full history contains a lot of old binaries. Unfortunately, cookiecutter doesn't appear to have any setting to perform a shallow clone. However, it can operate on a local file path, and we already have tools to manage branches and remotes in place.

This PR makes 3 changes:

  • It moves all git repo handling to be handled by Briefcase (via GitPython), rather than via cookiecutter. This means than cookiecutter is only ever invoked with a path to a local filesystem checkout
  • The initial checkout of a template repository is a shallow clone
  • The template cache directory is now managed as a Briefcase cache, rather than using the ~/.cookiecutters directory.

This is a partial fix for #933; as it reduces the size of the template clone that is required. The full fix is #1849, as that will only download the stub for the Python version that is in use.

Refs #933.

PR Checklist:

  • All new features have been tested
  • All new features have been documented
  • I have read the CONTRIBUTING.md file
  • I will abide by the code of conduct

@freakboy3742 freakboy3742 requested a review from mhsmith June 4, 2024 05:59
Copy link
Member

@rmartin16 rmartin16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea but I'm not sure I understand how it's going to work currently.

If we do a shallow clone of the repo, then all Briefcase can know about the repo is the last commit of the default branch. Therefore, it will not be able to checkout the necessary branch for the version of Briefcase....or an arbitrary branch from the CLI.

For instance, trying to use v0.3.18 of the template:

❯ rm -rf build/ ~/.cache/briefcase/templates/briefcase-linux-system-template

❯ briefcase create -vvv --config='template_branch="v0.3.18"'

[helloworld] Finalizing application configuration...
Targeting pop:jammy (Vendor base debian)
Determining glibc version... done
Targeting glibc 2.35
Targeting Python3.10

[helloworld] Generating application template...
Using app template: https://github.com/beeware/briefcase-linux-system-template.git, branch v0.3.18
Cloning template 'https://github.com/beeware/briefcase-linux-system-template.git'...
DEBUG git.cmd: Popen(['git', 'clone', '-v', '--depth=1', '--', 'https://github.com/beeware/briefcase-linux-system-template.git', '/home/russell/.cache/briefcase/templates/briefcase-linux-system-template'], cwd=/home/russell/tmp/beeware/helloworld, stdin=None, shell=False, universal_newlines=True)
DEBUG git.repo.base: Cmd(['git', 'clone', '-v', '--depth=1', '--', 'https://github.com/beeware/briefcase-linux-system-template.git', '/home/russell/.cache/briefcase/templates/briefcase-linux-system-template'])'s unused stdout:
DEBUG git.cmd: Popen(['git', 'remote', 'set-url', '--', 'origin', 'https://github.com/beeware/briefcase-linux-system-template.git', 'https://github.com/beeware/briefcase-linux-system-template.git'], cwd=/home/russell/.cache/briefcase/templates/briefcase-linux-system-template, stdin=None, shell=False, universal_newlines=False)
DEBUG git.cmd: Popen(['git', 'fetch', '-v', '--', 'origin'], cwd=/home/russell/.cache/briefcase/templates/briefcase-linux-system-template, stdin=None, shell=False, universal_newlines=False)

Could not find a template branch for Briefcase v0.3.18.

I don't have a recommended implementation but this GitHub blogpost looks insightful.

src/briefcase/commands/base.py Outdated Show resolved Hide resolved
@rmartin16
Copy link
Member

rmartin16 commented Jun 4, 2024

A separate comment that may warrant independent effort.

While this strategy gets us away from depending on the ~/.cookiecutter directory, the ~/.cookiecutter_replay directory is still utilized. While we don't use replay functionality and I don't see a way to turn it off, we may want to consider using a custom location to avoid this noise in the user's home directory.

The only configuration I can find for this is via a file in the file system; for instance:

❯ cat ~/.cache/briefcase/templates/cookiecutter.yml 
replay_dir: "/home/russell/.cache/briefcase/templates/cookiecutter-replay"

This file path can be passed to cookiecutter() via config_file.

@freakboy3742
Copy link
Member Author

I like this idea but I'm not sure I understand how it's going to work currently.

If we do a shallow clone of the repo, then all Briefcase can know about the repo is the last commit of the default branch. Therefore, it will not be able to checkout the necessary branch for the version of Briefcase....or an arbitrary branch from the CLI.

Dang... I was afraid I might have missed something in my testing. Back to the drawing board, I guess...

@mhsmith
Copy link
Member

mhsmith commented Jun 4, 2024

If we do a shallow clone of the repo, then all Briefcase can know about the repo is the last commit of the default branch. Therefore, it will not be able to checkout the necessary branch for the version of Briefcase....or an arbitrary branch from the CLI.

In a shallow clone, I think that could be done with a git fetch command specifically asking for the version branch, and maybe a --depth argument to the fetch command as well.

But if we're about to remove the binaries from the templates anyway, it might not be worth the effort of getting a shallow clone working. When we remove the binaries, we can force-push the main branch to a commit which either has no history, or a rewritten history with the binaries omitted. After that, as long as we're careful not to add anything large to the templates, a full clone should be no problem.

@mhsmith
Copy link
Member

mhsmith commented Jun 4, 2024

We also noted just now that git clone has --branch and --single-branch options which are independent of the shallow setting.

@freakboy3742
Copy link
Member Author

In a shallow clone, I think that could be done with a git fetch command specifically asking for the version branch, and maybe a --depth argument to the fetch command as well.
...
But if we're about to remove the binaries from the templates anyway, it might not be worth the effort of getting a shallow clone working. When we remove the binaries, we can force-push the main branch to a commit which either has no history, or a rewritten history with the binaries omitted. After that, as long as we're careful not to add anything large to the templates, a full clone should be no problem.

Even if we do this, the full repo history will include the blobs for all the historical branches - so the repo will still be ~500MB, even though only 1MB is actually checked out on any recent branches.

Another option may be a blobless-clone - this would download all the hashes (so there's full branch history), but only download the blobs when they're needed. I'll investigate further.

@freakboy3742
Copy link
Member Author

Looks like replacing depth=1 with filter=[blob:none] does the job without any additional handling. Using the current state of the repos, an initial macOS template clone is 59 MB (236MB without blob filtering). Switching to the v0.3.15 template increases the repo size to 61MB, with the extra blob content being downloaded when the branch is checked out.

I've also explicitly disabled replay usage on the cookiecutter calls; that should disable the use of the replay folder, without the need to write a configuration file.

@rmartin16
Copy link
Member

Looks like replacing depth=1 with filter=[blob:none] does the job without any additional handling. Using the current state of the repos, an initial macOS template clone is 59 MB (236MB without blob filtering). Switching to the v0.3.15 template increases the repo size to 61MB, with the extra blob content being downloaded when the branch is checked out

So, since we're still requiring a non-trivial amount of data to be downloaded and we've taken over the Git clone process from Cookiecutter, do we want to consider implementing a progress bar for GitPython? There's an example here. Before I read your comment fully that there's still 59MB to download....I was surprised when the process seemed to freeze for a minute in my VM.

I'll be able to do more robust testing of this strategy tomorrow.

@freakboy3742
Copy link
Member Author

So, since we're still requiring a non-trivial amount of data to be downloaded and we've taken over the Git clone process from Cookiecutter, do we want to consider implementing a progress bar for GitPython? There's an example here. Before I read your comment fully that there's still 59MB to download....I was surprised when the process seemed to freeze for a minute in my VM.

It's 59MB right now - but once #1849 lands, that should drop dramatically (hopefully to < 4MB, like the Xcode template).

That said, I'm not opposed to adding a progress bar. The only hesitation I have about leaning more heavily on GitPython specifically is that the project is currently marked as "maintenance mode only", so there's an outside chance we may need to replace that package at some point in the not-too-distant future. However, if it's low effort to implement, then we might as well lean on it while we can; if we need to fall back to a simple waitbar in the future, then so be it.

@mhsmith mhsmith removed their request for review June 5, 2024 12:57
@mhsmith
Copy link
Member

mhsmith commented Jun 5, 2024

I'll leave this to @rmartin16 as he's more familiar with this area.

Copy link
Member

@rmartin16 rmartin16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few in-line comments/suggestions.

It's 59MB right now - but once #1849 lands, that should drop dramatically (hopefully to < 4MB, like the Xcode template).

All right...although, if the plan is to rewrite the currently large repos' histories, then this seems like it will mostly be a service for large third-party repos since the benefits will be marginal for Briefcase defaults....which lends credence towards Malcolm's comment about "why do this at all?". However, we have already done it at this point and the change doesn't seem that risky to me.

That said, I'm not opposed to adding a progress bar. The only hesitation I have about leaning more heavily on GitPython specifically is that the project is currently marked as "maintenance mode only", so there's an outside chance we may need to replace that package at some point in the not-too-distant future. However, if it's low effort to implement, then we might as well lean on it while we can; if we need to fall back to a simple waitbar in the future, then so be it.

Briefly looking at GitPython's requirements, I don't think it's trivial....but once the histories are rewritten, the immediate need will be removed anyway...so, may as well push this off as well.

src/briefcase/commands/base.py Outdated Show resolved Hide resolved
src/briefcase/commands/base.py Outdated Show resolved Hide resolved
src/briefcase/commands/base.py Outdated Show resolved Hide resolved
@freakboy3742
Copy link
Member Author

freakboy3742 commented Jun 6, 2024

It's 59MB right now - but once #1849 lands, that should drop dramatically (hopefully to < 4MB, like the Xcode template).

All right...although, if the plan is to rewrite the currently large repos' histories, then this seems like it will mostly be a service for large third-party repos since the benefits will be marginal for Briefcase defaults....which lends credence towards Malcolm's comment about "why do this at all?". However, we have already done it at this point and the change doesn't seem that risky to me.

To be clear - the current plan is to not rewrite the repo histories, as the blobless clone means users won't have any large binary artefacts in their repo as soon as #1849 (and the related template changes) land.

Without this change, #1849 will mean the final checked out template is tiny... but you will still be cloning 200+MB of repo history. Rewriting repo history is one way to address that without this change, but given that we've got a clean solution that results in only downloading what you need, we might as well lean into that.

Copy link
Member

@rmartin16 rmartin16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we haven't spent a lot of time in this code, I spent a bit of time nitpicking....but I'm done digging in after these comments :) I expect to approve once you send it back tonight/in the morning.

changes/933.feature.1.rst Outdated Show resolved Hide resolved
src/briefcase/commands/base.py Outdated Show resolved Hide resolved
src/briefcase/commands/base.py Outdated Show resolved Hide resolved
src/briefcase/commands/base.py Outdated Show resolved Hide resolved
src/briefcase/commands/base.py Show resolved Hide resolved
@freakboy3742
Copy link
Member Author

I've cleaned up those comments and error messages; #1849 has also been merged, and with the new "stables" repo, a clean macOS app template is 1.3 MB. Ready for final(?) review

Copy link
Member

@rmartin16 rmartin16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small typo fix and we're good to go 👍🏼

src/briefcase/commands/base.py Outdated Show resolved Hide resolved
Co-authored-by: Russell Martin <russell@rjm.li>
@freakboy3742 freakboy3742 merged commit 7c5dde1 into beeware:main Jun 6, 2024
44 checks passed
@freakboy3742 freakboy3742 deleted the shallow-clone branch June 6, 2024 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants