Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow building all docs with a local repo #659

Merged
merged 24 commits into from
Mar 11, 2019
Merged

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Mar 1, 2019

We current offer two ways to build the docs:

  1. Build an arbitrary book with --doc and its companion arguments like
    --resource and --asciidoctor and --chunk.
  2. Build all of the books that Elastic publishes with --all.

Building an arbitrary book is nice and quick because it is have very
little overhead on top of the time it takes to build the actual book.
But it isn't always the best test because the command that you use to
build the book might not exactly match the invocation that --all uses
to build the book. Furthermore, building a single book is not usually
enough! When you are testing documentation changes that you've made
locally you often don't know all of the books that your changes might
effect. The Elasticsearch repository is used in about a dozen books in
combination with a hand full of other repositories. Also! If you want
to build any books that require multiple repositories then you have to
set all of them up appropriately.

Building all of the docs doesn't have any of those problems but it isn't
wasn't compatible with building from the local filesystem. This change
fixes that by adding a --sub_dir option to "substitute" and "directory"
in place of some branch of some repository when building all of the docs.
This overrides the standard up-to-date checks to force all books being
built with these subsitutions to be rebuilt every time. It looks like:

./build_docs --all --target_repo git@github.com:elastic/built-docs.git \
                --keep_hash \
                --sub_dir elasticsearch:master:~/Code/elasticsearch

This is nice because it:

  1. Rebuilds all of the books that your local directory is involved with
  2. Uses standard switches when building each of those books
  3. Does not rebuild books that don't use your local directory

That third thing is actually caused by that new --keep_hash flag. It
causes the build's up-to-date checking to use the same hash that was
used the last time the book was built. --keep_hash has uses beyond
being paired with --sub_dir. In particular, it is useful when testing
what happens when you switch a book to Asciidoctor. You can switch the
book to Asciidoctor in conf.yaml and rebuild with --keep_hash and
the build will only have the asciidoctor change in it.

This also drops --no_fetch because --keep_hash is better in almost
every way.

The goal of --sub_dir isn't so much that people will execute it
locally, but that we can execute it on CI. I'm sure folks will execute
it locally but it pretty heavy because, just like a normal --all build
it has to:

  1. Clone the built-docs repo or fetch updates if it is already cloned.
    many, many files in it.
  2. Check out the built-docs repo. This is truly slow because it has
  3. Clone all of the repos used to build every book or fetch updates if
    they are already cloned.

I suspect in time we can fix some of these. But for now, this is a heavy
thing.

We current offer two ways to build the docs:
1. Build an arbitrary book with `--doc` and its companion arguments like
`--resource` and `--asciidoctor`.
2. Build *all* of the books that Elastic publishes with `--all`.

Building an arbitrary book is nice and quick because it is have very
little overhead on top of the time it takes to build the actual book.
But it isn't always the best test because the command that you use to
build the book might not exactly match the invocation that `--all` uses
to build the book. Furthermore, building a single book is not usually
enough! When you are testing documentation changes that you've made
locally you often don't know all of the books that your changes might
effect. The Elasticsearch repository is used in about a dozen books in
combination with a hand full of other repositories.

Building all of the docs is quite slow and you might run into errors
that you didn't cause. It is exactly what we need to build the docs for
the public web site but it isn't right for testing some changes that you
make locally. It also doesn't line up properly with local development as
it only knows how to pull repos from the Elastic organization at github.

This change adds a "third way" to build test the docs that splits the
difference between the two options. It uses the same mechanisms that we
usually use to build all of the docs but allows you to substitute local
directories in place of some branch of some repo:

```
./build_docs --all --target_repo git@github.com:elastic/built-docs.git \
             --keep_hash \
             --sub_dir elasticsearch:master:~/Code/elasticsearch
```

This is nice because it:
1. Rebuilds all of the books that your local directory is involved with
2. Uses standard switches when building each of those books
3. Does not rebuild books that don't use your local directory

That third thing is actually caused by that new `--keep_hash` flag. It
causes the build's up-to-date checking to use the same hash that was
used the last time the book was built. `--keep_hash` has uses beyond
being paired with `--sub_dir`. In particular, it is useful when testing
what happens when you switch a book to Asciidoctor. You can switch the
book to Asciidoctor in `conf.yaml` and rebuild with `--keep_hash` and
the build will *only* have the asciidoctor change in it.

This also drop `--no_fetch` because `--keep_hash` is better in almost
every way.

The goal of `--sub_dir` isn't so much that people will execute it
locally, but that we can execute it on CI. I'm sure folks will execute
it locally but it pretty heavy because, just like a normal `--all` build
it has to:
1. Clone the `built-docs` repo or fetch updates if it is already cloned.
2. Check out the `built-docs` repo. This is truly slow because it has
many, many files in it.
3. Clone all of the repos used to build every book or fetch updates if
they are already cloned.

I suspect in time we can fix some of these. But for now, this is a heavy
thing.
build_docs Outdated
@@ -103,8 +104,7 @@ def run_build_docs(args):
'-v',
'%s:/doc/%s:ro,cached' % (repo_root, repo_name)
])
build_docs_args.append(
'/doc/' + repo_name + path.replace(repo_root, ''))
return '/doc/' + repo_name + path.replace(repo_root, '')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped automatically adding it to the list because we don't want to do that for --sub_dir.

@@ -489,6 +489,7 @@ sub init_repos {
user => $Opts->{user},
url => $Opts->{target_repo},
reference => $reference_dir,
keep_hash => 0,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't keep_hash on the target repo because the target_repo contains the list of hashes to keep!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's worth a review comment, is it worth a code comment?

@@ -532,7 +534,7 @@ sub init_repos {
else {
$pm->start($name) and next;
eval {
$repo->update_from_remote() unless $Opts->{no_fetch};
$repo->update_from_remote();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I drop this feature because --keep_hash is better in most cases.

build_docs.pl Show resolved Hide resolved
.PHONY: simple_all_expected_files
simple_all_expected_files:
# Test the simplest possible `--all` invocation
$(call BUILD_MINIMAL_ALL,/tmp/simple_all)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to these call invocations because they don't bump into make's broken up-to-date checking for directories.

@@ -44,10 +44,10 @@ contents:
prefix: test
current: master
branches: [ master ]
index: minimal.asciidoc
index: index.asciidoc
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index.asciidoc allows me to reuse this file in other tests.

$new .= '|asciidoctor' if $asciidoctor;

return $old ne $new if $self->keep_hash;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're keeping the hash then we don't need to check to see if any of the files that we're using changed so we can just return here.

}

#===================================
sub tree {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tree subroutine was not used.

my ($msg) = ( $log =~ /^\w+\s+([^\n]+)/ );
my $msg;
if ( $sha eq 'local' ) {
$msg = 'local changes';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is worth looking at the substitution branch to try and find a commit message here. We don't even know that the substitution directory is in a git repo....

@@ -316,6 +341,7 @@ sub checkout_to {
#===================================
my ( $self, $destination ) = @_;

die 'sub_dir not supported with checkout_to' if %{ $self->{sub_dirs}};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is only used by the target_repo so we don't need to support sub_dirs. In fact, it doesn't make sense to have it!

@lcawl
Copy link
Contributor

lcawl commented Mar 1, 2019

This seems like it helps with some of the issues described in #249

I initially thought that it could help with the scenario where we need to know which books to rebuild when we make changes to shared files like https://github.com/elastic/docs/blob/master/shared/attributes.asciidoc. On closer reading, however, I don't think that's right--these parameters wouldn't catch that scenario and we'll still need to forcefully rebuild all.

@nik9000
Copy link
Member Author

nik9000 commented Mar 1, 2019

I initially thought that it could help with the scenario where we need to know which books to rebuild when we make changes to shared files like https://github.com/elastic/docs/blob/master/shared/attributes.asciidoc. On closer reading, however, I don't think that's right--these parameters wouldn't catch that scenario and we'll still need to forcefully rebuild all.

I think that is true.

This seems like it helps with some of the issues described in #249

I think it catches the "Check cross-doc links and perform other post-build verification on specific books using the already-built versions of everything else." bit. I don't think it covers the rest though!

@nik9000
Copy link
Member Author

nik9000 commented Mar 6, 2019

@jarpy, could you have a look at this one when you get a chance please?

@nik9000
Copy link
Member Author

nik9000 commented Mar 6, 2019

Woops, there are a lot of conflicts! I'll see about fixing those as soon as I finish what I've got locally.

# First build a repository to use as the source.
# REPLACE the minimal documentation in the source repo
cp ../README.asciidoc /tmp/source/index.asciidoc
cd /tmp/source && \
Copy link
Contributor

@ninaspitfire ninaspitfire Mar 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this ever be run in parallel with itself? Might need a proper tmpdir.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be difficult to get it to run in parallel with itself but I may as well make a proper tmpdir. You are right about that!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess Docker kinda takes care of that, doesn't it? I may have been thinking too old-school.

mount_docs_repo_and_dockerify_path(resource_dir, resource_dir)
build_docs_args.append(mount_docs_repo_and_dockerify_path(
resource_dir, resource_dir))
elif arg == '--sub_dir':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you add features, all the manual parsing of argv is going to cost you more and more. argparse isn't great, but is there any value in considering it over full DIY?

(Normally, I would suggest Click, but I know you can't do packages.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've considered it on and off for a while. I think the value that DIY gets me is being able to pass through arguments that I don't know about. I only rewrite the arguments that have anything to do with paths. The cost of DIY is that I need this everywhere. At this point I think it is net simpler, but I'd be receptive to any argument to the contrary. Like, I wonder if it'd be helpful to force me to list all of the supported arguments in the python file because we plan to drop the perl file one day.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"I've considered it"

Cool. I trust your judgement.

@@ -489,6 +489,7 @@ sub init_repos {
user => $Opts->{user},
url => $Opts->{target_repo},
reference => $reference_dir,
keep_hash => 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's worth a review comment, is it worth a code comment?

build_docs.pl Show resolved Hide resolved
@ninaspitfire ninaspitfire self-requested a review March 8, 2019 21:11
@nik9000 nik9000 merged commit 5b06c2d into elastic:master Mar 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants