Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make doc builds more modular #249

Closed
2 tasks done
debadair opened this issue Nov 1, 2017 · 18 comments
Closed
2 tasks done

Make doc builds more modular #249

debadair opened this issue Nov 1, 2017 · 18 comments
Labels
enhancement Something we'd like to improve

Comments

@debadair
Copy link
Contributor

debadair commented Nov 1, 2017

We've outgrown the monolithic approach to building, verifying, and publishing the docs. Along the way, the doc repo has grown to the point where it's problematic for people to clone.

The most obvious signs of this are our release day delays. People don't always build the docs before merging and very rarely build everything, which is the only way to check cross doc links. Last minute and unrelated changes frequently require us to restart the doc build, and because we have to build everything, that takes a non-trivial amount of time.

This also affects our day to day workflow, which features frequent diversions to chase down unrelated build failures so we can finish the tasks at hand.

Ideally, we'd be able to:

  • Run pre-build checks on the asciidoc files to find common problems that are slipping through the cracks--undefined attributes, broken external links, maybe even spellcheck?
  • Check cross-doc links and perform other post-build verification on specific books using the already-built versions of everything else.
  • Run the Kibana link checks independently/get a report that shows the topics that are being linked to in a specific book.
  • Publish selected books/branches. Updating the current version for Curator, for example, shouldn't require rebuilding and publishing everything else that's changed. Doc build failures on master shouldn't block a version release. Cloud releases shouldn't be blocked by doc build failures in the individual product docs.

In addition to making the doc build itself more modular, we need to figure out how to make the docs repo easier to deal with--perhaps by splitting the build infrastructure and generated docs apart. It can literally take people hours to clone the repo, so people avoid putting other doc tools there and opt out of contributing to the docs.

We've started to chip at these problems in the following PRs:

@debadair debadair added the enhancement Something we'd like to improve label Nov 1, 2017
@dedemorton
Copy link
Contributor

Would be great to have intelligent spell checking (that's based on a dictionary we define), so we can catch misspellings for product names and such and avoid false positives (code, API names, etc).

@nrichers
Copy link

nrichers commented Nov 1, 2017

Re: Run pre-build checks on the Asciidoc files: There are additional things we should check for that we do not check today:

  • Accessibility issues with tables that do not have proper headers, with spatial and color references, etc.
  • Forbidden words and phrases, such as "It's easy to ...", "simply," etc,
  • (Future: Style checks, maybe? We have a wide range of skills at Elastic and there is often low-hanging fruit when I edit content that an automated check could flag just as well I as a human, at a fraction of the cost/time.)

If not part of the initial scope of this issue, then at least as something we'd want to check for in the future.

@lcawl
Copy link
Contributor

lcawl commented Nov 3, 2017

It would also be helpful if the full builds could be launched as needed on jenkins/ci or some other managed resource, since when I have to do the full build and push on my local machine it is really really slow.

In particular, I'm referring to the process listed as "Pushing new versions or releases of documentation to the web" here: https://wiki.elastic.co/display/DOC. For some reason we seem to need to do this for almost every release.

@lcawl
Copy link
Contributor

lcawl commented Nov 7, 2017

One more that occurred to me after a long night of tracking down broken links: It would be good if a link checker could be run before changes are merged. Now even when I try to proactively prevent them, it's common that more pop up only after the PR is merged, since that's when the full build is done and links are checked.

@drewr
Copy link

drewr commented Nov 8, 2017

@mgreau I believe you have some thoughts around improving our asciidoc performance?

@debadair
Copy link
Contributor Author

debadair commented Nov 9, 2017

Two of our biggest pain points are not discovering broken cross doc links before we merge, and pushing updates that are missing content due to undefined attributes.

@tsantos You were working on a script to check for undefined attributes in asciidoc files, right? Having a way to run a pre-flight check to catch them before we build would be super helpful.

@mgreau
Copy link
Member

mgreau commented Nov 9, 2017

@mgreau I believe you have some thoughts around improving our asciidoc performance?

yes, sure but I have always worked with the Asciidoctor toolchain:

Having said that, this issue is not about using Asciidoctor (I don't know if this is something that has already been studied) so I can give you some links (not tested personally) about discovering broken cross doc links:

pushing updates that are missing content due to undefined attributes.

This section Catch a Missing or Undefined Attribute can help you understand how it is implemented in Asciidoctor but again it's not the same behavior with the Python implementation.

Let me know if I can help you on this subject.

@clintongormley
Copy link
Contributor

@mgreau I've tried asciidoctor in the past. Although it claims to be compatible with asciidoc, there are a number of differences. Also, when I tried to run it on the definitive guide, it just hung. (this may have improved since then). Plus we've made a number of customizations to asciidoc that we'd have to migrate across.

A tool for verifying links in text-based files (go binary, developed by a Gradle Inc employee, Gradle uses it to check their guides)

we already have a tool for this, the question is more about the need to build all the docs before being able to run it.

This section Catch a Missing or Undefined Attribute can help you understand how it is implemented in Asciidoctor but again it's not the same behavior with the Python implementation.

Correct. Being able to have undefined attributes is considered a feature in asciidoc. Not sure I agree, but it is pretty deeply ingrained.

@mgreau
Copy link
Member

mgreau commented Nov 10, 2017

Although it claims to be compatible with asciidoc, there are a number of differences

Correct, there are some differences and they are documented in this Changed Syntax documentation. Also, there is a compat-mode attribute, when it's enabled, Asciidoctor will accept the AsciiDoc Python syntax.

@clintongormley
Copy link
Contributor

I've just tested out the latest asciidoctor and it builds the def guide (which it didn't before) and it is significantly faster than asciidoc. To produce a docbook 4.5 ouput, it took 1 second while asciidoc took 19s. So it is definitely worth exploring. (That said, a significant amount of time is also taken by the XSL transformations: a further 38 seconds). Docbook output is required as the asciidoctor html output doesn't support html chunking, plus docbook gives us strict link checking.

It would still be a big speedup. That said, I don't know about the difference in output and error checking behaviour. Plus the customizations need to be ported over. This is a huge undertaking.

@mgreau
Copy link
Member

mgreau commented Nov 10, 2017

@clintongormley cool that it works now!

Also I just saw this on the asciidoc.org website:

The Asciidoctor project now maintains the official definition of the AsciiDoc syntax.

from this commit on the asciidoc project. I was not aware of that (the commit is from September), I have sent a message to Dan Allen to know more about it.

Plus the customizations need to be ported over. This is a huge undertaking.

yes sure, there is some work to do. I can take a look if we plan to work on it (cc @drewr)

@debadair
Copy link
Contributor Author

From today's festivities, this is a big one:

Automate testing of conf.yaml changes--add a CI check that runs the full build from the PR for the update.

@kevinkluge
Copy link
Member

@debadair between now and having a CI check, wouldn't things be much better if we ran a manual full build a few days before the release? Then we could fight through the issues with less time pressure.

@rjernst
Copy link
Member

rjernst commented Nov 15, 2017

One general suggestion I have is to have 1 branch per minor version. This would match with the branching we use for the rest of the stack, and it means improvements can be made to unreleased versions without destabilizing the build for older versions. If we did this, I think we could integrate the docs build into unified release, so that the docs are fully built and verified along with all the artifacts. At minimum, a docs build could be triggered along with the commit shas of the projects so that we can know they will build (see https://github.com/elastic/release-manager/issues/217).

@debadair
Copy link
Contributor Author

@kevinkluge We absolutely should run a full build ahead of time--that's part of our regular procedure & should have happened this time around. The trick is that given the volume of changes that generally occur in the days leading up to a release, running it a few days ahead doesn't guarantee smooth sailing. So we tend to leave it to the bitter end because it's a time-consuming, resource-intensive task to repeat. (At a time when we are super-busy trying to test and merge updates--and we can only run one doc build at time locally.)

This is really a matter of getting the docs side of things up to speed with the level of testing & automation we've put in place for everything else.

@rjernst Do you mean branch the doc repo for each release, as we do on the product side? There would certainly be advantages to doing that. I'm all for making changes so that we can release specific docs and specific versions. However, not all of the docs use the same versioning as the stack. And some of our infra requires everything to be updated at once when we add new versions. That said, I would love it if we could get the docs integrated with the unified release.

@lcawl
Copy link
Contributor

lcawl commented Jan 18, 2018

Related to #249 (comment), it would be helpful to have some way to run the https://github.com/elastic/docs/blob/master/release_docs.sh on jenkins or some other remote server, since it takes a very long time to run on my local machine. We seem to need to run it on most release days. I also had to disable the automatic doc builds, since they were finishing sooner and preventing the release_docs.sh from uploading successfully. This overlap between the two jobs might need to be considered.

@dedemorton
Copy link
Contributor

@debadair For spell checking, check out this contribution: https://github.com/checkstyle/checkstyle/blob/master/.ci/test-spelling-unknown-words.sh (referenced in elastic/beats#7456).

@debadair
Copy link
Contributor Author

A number of these things have been addressed:

Closing this issue in favor of individual issues for the remaining pieces:

  • Incorporate validation checks #1336
  • Generate book-level link reports
  • Enable publishing select docs irrespective of issues in future branches or docs unrelated to a release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Something we'd like to improve
Projects
None yet
Development

No branches or pull requests

9 participants