Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert the submodule star-sys/STAR to a git subtree #29

Merged
merged 594 commits into from
Aug 14, 2020
Merged

Conversation

sjackman
Copy link
Contributor

@sjackman sjackman commented Aug 13, 2020

Remove the Git submodule star-sys/STAR.
Add the Git subtree star-sys/STAR/source.

Thanks to Lance for this suggestion!

Files under star-sys/STAR that are not under star-sys/STAR/source have been removed, in particular the large binaries in star-sys/STAR/bin. There's thankfully no need to rewrite the history of Orbit. Since star-sys/STAR was a Git submodule, once it has been removed, its Git objects will not be fetched by default. The Git commit history of STAR, and the Orbit-specific changes to STAR, has been preserved.

This change addresses two issues:

  1. the ~500 MB of large binaries in star-sys/STAR/bin
  2. the cargo bug git submodules are not cached git submodules are not cached rust-lang/cargo#7987

The challenges are that git subtree is an uncommon tool and will be unfamiliar to most developers, and merging upstream STAR into Orbit is somewhat more involved. https://gist.github.com/tswaters/542ba147a07904b1f3f5#fetching-upstram

See https://www.atlassian.com/git/tutorials/git-subtree
and https://stackoverflow.com/questions/23937436/add-subdirectory-of-remote-repo-with-git-subtree
and https://jrsmith3.github.io/merging-a-subdirectory-from-another-repo-via-git-subtree.html
and https://gist.github.com/tswaters/542ba147a07904b1f3f5

alexdobin and others added 30 commits December 1, 2017 11:43
…ctory path) for the file names in --readFilesIn . Implemented read group ID output as the last column of the Chimeric.out.junction file.
Don't exit when fastq contains a zero-length read
…alized memory access. The chimeric output may change for a very small number of reads.
…ing bins. Icnreasing this number reduces the amount of RAM required for sorting.
…it, use --peOverlapNbasesMin and --peOverlapMMp oprions.
sjackman and others added 6 commits May 12, 2020 11:51
@sjackman sjackman self-assigned this Aug 13, 2020
@sjackman sjackman added the enhancement New feature or request label Aug 13, 2020
Copy link
Contributor

@pmarks pmarks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have to learn the details at some but, but definitely seems like this is worth a shot.

Copy link
Contributor

@pmarks pmarks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh -- we should move the license file from STAR to the top of the subtree. Don't want to elide the license accidentally.

@nlhepler
Copy link
Contributor

I agree w/ Pat re: License. I may also be easier to just copy into place. Otherwise, dope!

@sjackman
Copy link
Contributor Author

Good catch, Pat. I've added star-sys/STAR/LICENSE.

Copy link
Contributor

@nlhepler nlhepler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌 SHIP IT 👌

@sjackman
Copy link
Contributor Author

sjackman commented Aug 13, 2020

$ time git clone https://github.com/10XGenomics/orbit
Cloning into 'orbit'...
remote: Enumerating objects: 3997, done.
remote: Counting objects: 100% (3997/3997), done.
remote: Compressing objects: 100% (969/969), done.
remote: Total 4378 (delta 2942), reused 3988 (delta 2940), pack-reused 381
Receiving objects: 100% (4378/4378), 7.36 MiB | 3.73 MiB/s, done.
Resolving deltas: 100% (3112/3112), done.
git clone https://github.com/10XGenomics/orbit orbit-subtree  0.39s user 0.16s system 19% cpu 2.787 total 13504 MB
$ du -hs orbit
 10M	orbit-subtree

😁

@sjackman sjackman marked this pull request as ready for review August 13, 2020 16:50
@adam-azarchs
Copy link
Member

LGTM, though it would be good to include a script for updating the subtree.

@sjackman
Copy link
Contributor Author

sjackman commented Aug 13, 2020

# Extract the source/ subdirectory of STAR into a Git subtree.
orbit-subtree:
	git clone https://github.com/10XGenomics/orbit orbit-subtree
	git -C orbit-subtree remote add star https://github.com/10XGenomics/STAR
	git -C orbit-subtree fetch star

	# Remove the STAR submodule.
	git -C orbit-subtree switch -c orbit/subtree remotes/origin/master
	git -C orbit-subtree rm star-sys/STAR .gitmodules
	git -C orbit-subtree commit -m 'Remove the submodule star-sys/STAR'

	# Create the subtree of star/master.
	git -C orbit-subtree switch -c star/master remotes/star/master
	git -C orbit-subtree subtree split -P source -b star/subtree

	# Add the subtree of star/master to orbit.
	git -C orbit-subtree switch orbit/subtree
	git -C orbit-subtree subtree add -P star-sys/STAR/source star/subtree

	# Create the subtree of star/orbit.
	git -C orbit-subtree switch -c star/orbit remotes/star/orbit
	git -C orbit-subtree subtree split -P source -b star/subtree

	# Add the subtree of star/orbit to orbit.
	git -C orbit-subtree switch orbit/subtree
	git -C orbit-subtree subtree merge -P star-sys/STAR/source star/subtree

	# Check for large files.
	git -C orbit-subtree lfs migrate info --include-ref=orbit/subtree --above=1MB
	git -C orbit-subtree rev-list --objects orbit/subtree \
	| git -C orbit-subtree cat-file --batch-check='%(objectname) %(objecttype) %(objectsize) %(rest)' \
	| awk '$$2 == "blob" && $$3 >= 1000000 { print substr($$1, 1, 7), $$3, $$4; fail=1 } END { exit fail }'

Here's the script used to create the subtree.

@sjackman
Copy link
Contributor Author

# Check for large files.
$ git -C orbit-subtree lfs migrate info --include-ref=orbit/subtree --above=1MB
migrate: Sorting commits: ..., done.                                            
migrate: Examining commits: 100% (671/671), done.                               
STAR      	7.4 MB	4/4 files(s)	100%
*.a       	6.2 MB	2/2 files(s)	100%
STARstatic	3.6 MB	1/1 files(s)	100%
$ git -C orbit-subtree rev-list --objects orbit/subtree \
	| git -C orbit-subtree cat-file --batch-check='%(objectname) %(objecttype) %(objectsize) %(rest)' \
	| awk '$2 == "blob" && $3 >= 1000000 { print substr($1, 1, 7), $3, $4; fail=1 } END { exit fail }'
40b79fe 3106550 liborbit.a
9b28e14 3097906 orbit.a
8a26056 1793054 STAR
5f27ff2 1792798 STAR
43e95fd 3646356 STARstatic
e532a98 1792790 STAR
9c8e339 2019903 STAR

Note that there are 17 MB of large binary files stored in STAR/source/. It's not in my opinion worth rewriting history to either remove these large files or store them in Git LFS, as it would significantly complicate the process of merging STAR/master into Orbit.

@sjackman
Copy link
Contributor Author

LGTM, though it would be good to include a script for updating the subtree.

Yes, agreed. I'll do that in a subsequent PR. Those commands scan be extracted from #29 (comment)

Copy link
Contributor

@evolvedmicrobe evolvedmicrobe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because sometimes the bin is a trash bin, I love it!

@sjackman
Copy link
Contributor Author

@sjackman sjackman merged commit b7914fb into master Aug 14, 2020
@sjackman sjackman deleted the sj/subtree branch August 14, 2020 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.