Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phase 1 - New Docs Deployment #8659

Merged
merged 1 commit into from
Oct 22, 2023
Merged

Phase 1 - New Docs Deployment #8659

merged 1 commit into from
Oct 22, 2023

Conversation

bitsondatadev
Copy link
Collaborator

@bitsondatadev bitsondatadev commented Sep 26, 2023

This PR is a culmination of my work in the making-iceberg-docs repository.

To see a preview, you can check out the GitHub page. We will eventually want to migrate this to using the iceberg GitHub pages to deploy the docs.

Design Discussion: https://lists.apache.org/thread/nq283g83p9cgx518jwo7dg85fm5gjrvv
Design Doc: https://docs.google.com/document/d/1WJXzcwC6isfoywcLY2lZ9gZN6i0JU1I2SIZmCkumbZc/edit#heading=h.gli9mc2ghfz1

Phase 1 Goals:

  • Avoid changing the words and content themselves.
  • Migrate away from Hugo front matter and syntax and migrate to MkDocs-Material syntax.
  • Fix broken links and verify using linkchecker.
  • Combine the static (evergreen) pages and the versioned documentation into a single repository.
  • Adjust the structure to match the layout of the website itself, where ./site/*.md contains the evergreen content and ./site/docs/<version>/*.md contains the versions of Iceberg.
  • Keep all versioned javadocs in their own tag.
  • Keep each version of the versioned Iceberg docs in their own tag separate from the equivalent release tag to enable future updates to older doc versions.
  • Set up an all encompassing build using Spotify's backstage/mkdocs-monorepo-plugin and git worktrees to build multiple sub-repositories to include the static root pages, the javadoc, and docs tags, and deploy these in a single site that is searched on.
  • Ensure backwards compatibility with the existing site urls to avoid breaking permalinks.

Phase 2/3 and Todos:

  • Add Vale and style/grammar fixes.
  • Style the webpage to match the existing https://iceberg.apache.org.
  • Getting Started guide.
  • Split content based on diataxis framework.
  • Fix linting errors and add Vale to the build process.
  • Add linkChecker to the build process.
  • Move python docs to pyIceberg repo and build a submodule.
  • Set up Rust/Go docs sites.

@bitsondatadev bitsondatadev self-assigned this Sep 26, 2023
@bitsondatadev bitsondatadev changed the title Phase 1: New Docs Deployment WIP: Phase 1 - New Docs Deployment Sep 26, 2023
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @bitsondatadev

Looks really nice and shiny! The layout reminds me of another side that I like. Left a few comments and pointers, my main concern is that we don't clearly state the version (or dev) on the side, so it is not obvious to a user which version they are looking at.

docs-new/.github/workflows/ci.yml Outdated Show resolved Hide resolved
docs-new/.github/workflows/ci.yml Outdated Show resolved Hide resolved
on:
push:
branches:
- master
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an issue:

Adjust the structure to match the layout of the website itself, where ./home/.md contains the evergreen content and ./home/docs//.md contains the versions of Iceberg.

This would mean that it shows features that haven't been released yet. I think it is great to build these pages, but we have to clearly state that it is the dev version. I think Arrow is a nice example:

image

Copy link
Collaborator Author

@bitsondatadev bitsondatadev Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would mean that it shows features that haven't been released yet.

Not on the live site. I assumed that this build would only trigger on a release, so (latest) and the current version (e.g. 1.3.1 today) would just be duplicate docs (Unfortunately redirects from latest > <version> that maintain the /docs/latest/ url would require some fancy DNS records that @rdblue said we should avoid for now).

Later, we should defnitely consider adding a (nightly) or (dev) build and we would label that as such. For now, just assume that we only build this during a release. Unless the concern is around when people build the site locally.

@nastra also brought this point up to me directly but I forgot to add this in the design so thanks for bringing this up so I can explain it here!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, then you should not run the workflow on a push event on the main/master branch.

Then you want to replace it with:

on:
    tags:        
      - apache-iceberg-*

Also, in PyIceberg we have it as a step in the release:

on:
workflow_dispatch:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Yeah, let's hit this deployment on the next PR.

@bitsondatadev
Copy link
Collaborator Author

Great work @bitsondatadev

Looks really nice and shiny! The layout reminds me of another side that I like. Left a few comments and pointers, my main concern is that we don't clearly state the version (or dev) on the side, so it is not obvious to a user which version they are looking at.

Thanks @Fokko!

You touched on a few points I missed around my intended next steps.

Finishing Phase 1:

  • Once 1.4.0 is released, cherry-pick/merge the changes between docs and build out all the release tags from 1.4.0 back to the earliest documentation versions we have.

  • Work with Fokko/ASF INFRA to get a preview site deployed (that doesn't step on any other existing site's urls.

  • Initially merge this under the docs_new folder to complete this first round of work.

Phase 2:

  • Run a few cleanup PRs that address strictness links and build process. This will include adding and applying vale rules, linkchecker to the build, getting mkdocs to compile docs without any error, fixing search index around 'spark' keyword, working on version labels (dev) etc...

  • Simultaneously, fix the look and feel of these sites to move to the look and feel of the existing site.

  • Finally, once the links/syntax are all in check, replace docs/ with new_docs/ and move the site over to build in iceberg.apache.org domain.

Phase 3: Will then be addressing the organization of the site (which is outlined under "Phase 2" in the design docs.

@ajantha-bhat
Copy link
Member

@bitsondatadev: Is it possible to split into two PRs?
One is to move back the existing docs from iceberg-docs repo and another one is for refactoring?
It may be easy to review the incremental change for the second PR.

@bitsondatadev
Copy link
Collaborator Author

bitsondatadev commented Sep 27, 2023

@ajantha-bhat,what level of granularity do you suggest for the refactoring?

If you want me to treat the static content and versioned docs as separate changes, I don't see much value there. The hard part about this phase was making a single build process that could stitch multiple versioned docs and static docs into a single build. I need both of these elements to create such a system and showcase it to the community. There is little value in just copying the iceberg-docs over in its own PR IMO.

One possible thing I can do to minimize strain is to remove the Vale configs for now. This would actually remove about 60 files.

I've also noted quite a bit of deadweight in the assets folder I can remove.

I think that will limit the focus to the merging of the two sides side of the content.

@bitsondatadev bitsondatadev force-pushed the new-docs branch 5 times, most recently from 000344e to d63f2ce Compare September 29, 2023 04:54
@bitsondatadev bitsondatadev requested a review from Fokko September 29, 2023 04:54
@bitsondatadev
Copy link
Collaborator Author

Hey @ajantha-bhat @Fokko, I've now updated the PR to remove the unecessary static files which took away more than half of the files so this should be much easier to look at.

For the rest of it, I explain the layout and rationale a bit more in the README now around the build process and what we are trying to achieve.

If you have any qeustions do let me know please! I think this is now in a state where I feel good sharing on the mailing list.

@bitsondatadev bitsondatadev changed the title WIP: Phase 1 - New Docs Deployment Phase 1 - New Docs Deployment Sep 29, 2023
@RussellSpitzer
Copy link
Member

How should we be reviewing this? What is new, what is old? Is it just style and build?

@ajantha-bhat
Copy link
Member

How should we be reviewing this? What is new, what is old? Is it just style and build?

That was my intention of asking #8659 (comment) too.

docs-new/LICENSE Outdated Show resolved Hide resolved
docs-new/.vale.ini Outdated Show resolved Hide resolved
- limitations under the License.
-->

Apache Software Foundation links
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this missing content? Or was it just not accessible in the old site?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to play with this again, mkdocs wasn't liking that I had an entire section without any markdown file and only external links so I added this as an index, I imagine there's some trick to this to not do what I'm doing now.

└── variables.yml
```

All of the documentation versions are saved in special `docs-<version>` tags that only contain the root of the docs version they contian. There is also a `javadoc` tag that contains all prior versions of the javadocs in a single tag. These are generated and loaded only at build time using the [git-worktree](https://git-scm.com/docs/git-worktree) docs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: contian.

Why use a tag for older javadoc instead of a branch? It seems odd to re-create a tag every time we need to change it instead of adding a new version of Javadoc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Branches seem to be the more clear choice for both old versions and javadoc. Is that not possible for some reason?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. It's complicated.

First, docs url path and javadocs url path are in different locations, so they each have their own mount path when considering we use git worktree to pull the different versions together at build time.

So the git worktree will mount whatever is in the tag at a given subdirectory. My initial plan was actually to do git worktree and git sparse checkout to define specifically what sections of the tag or remote tag I needed and place them in the proper points in the docs. This would allow me the flexibility to just define a tag per version that includes Java docs and Iceberg versioned docs. However, I was unable to get sparse checkout to work properly with worktree and I also noticed this in the docs:

THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE.

So instead I need the tags to contain the exact directory at the root that needs to be mounted for the docs. So that's what my deploy script handles.

Now there's some debate we can have on how we want to do this.

  • If someone is a git wizard and can help me understand the behavior of worktree in tandem with sparse checkout, I'm happy to dig back into that if we think the future changes won't be disruptive but I doubt that's the path we want to take.
  • We can also consider sticking all docs and Javadocs into a single tag and just update that tag.
  • We could pair docs and javadocs into a tag together and every time I add a worktree, I'll have another part of the script that moves javadocs to the correct location.
  • Just do what we're doing now.

I'm fine with any of these approaches really. I just chose the current one thinking it would make things easier to build from older versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bitsondatadev, I don't understand what sparse checkout has to do with branches vs tags.

According to the git worktree docs:

To instead work on an existing branch in a new worktree, use git worktree add <path> <branch>

So you can use git worktree to check out an existing branch in a subdirectory. I think we should use that rather than tags because tags are hard to work with. Branches can be updated and we can open PRs against them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was originally trying to do this as an extension of the main release tags. When we realized we wanted the flexibility to update the older versions branches made more sense but for some reason I was thinking tags were a limitation of worktrees.

Anyways, yeah we can easily just do the same thing we're doing with tags and just update the branches that are rooted in the release tag.

docs-new/README.md Outdated Show resolved Hide resolved
docs-new/README.md Outdated Show resolved Hide resolved
docs-new/README.md Outdated Show resolved Hide resolved
docs-new/README.md Outdated Show resolved Hide resolved
@bitsondatadev
Copy link
Collaborator Author

@rdblue I have this now in a good state relative to the 1.4.0 docs in nightly, could you PTAL? Thanks!


```
git worktree add site/docs/docs/1.3.1 docs-1.3.1
git worktree add site/docs/javadoc javadoc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These don't exist yet, so I had to comment out the 1.3.1 entry in mkdocs.yml to get mkdocs serve working. Not a blocker.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a warning to the README

1. Cut a new release from the current branch revision. This creates a new branch `docs-<version>`,.

```
.github/bin/deploy_docs.sh -v 1.4.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move this to maybe a dev folder with scripts rather than .github which seems like an odd place for it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note here as well to ignore while we remove the script.


## Build

Run the build command in the root directory, and optionally add `--clean` to force MkDocs to clear previously generated pages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "root directory" I assume you mean site/ because that's where I'm running it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving forward should this be built from ./site/ then?

-not \( -path ./docs-new -prune \) \
! -name LICENSE \
! -name NOTICE \
-exec rm -rf {} +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this require -f? That's not a good thing to have in scripts.

-exec rm -rf {} +

# move the nightly docs to the root and change from 'nightly'
mv docs-new ./tmp && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be updated to be site.

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I'm +1 for this change.

The only thing that needs to be fixed is the deploy_docs.sh script. The blocker is that the script still attempts to use docs-new rather than site. There's also a .orig file from a bad merge that needs to be removed.

I'd also prefer to be able to run the deploy_docs.sh script locally. I added some comments to it about how to change it to make that happen. That doesn't need to be a blocker, but we do need to make sure people know that the script is destructive. I nearly deleted files unintentionally in my local copy! It's a good thing that the script was broken.

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good start.

@rdblue
Copy link
Contributor

rdblue commented Oct 22, 2023

I'm going to merge this to unblock the next phase of work. Building the site works and I think once this is in we can move on to:

  1. Update the style to match the current site
  2. Add release branches for the older docs
  3. Add automation scripts to build the site

I'd also like to move the docs in main from nightly to a docs area (its own mkdocs site) and remove duplicate markdown files. I think that's probably the next step.

@rdblue rdblue merged commit e2b56da into apache:main Oct 22, 2023
45 checks passed
@manuzhang
Copy link
Contributor

@bitsondatadev have you created a project to track progress and to-do list? Maybe others can help and pick up some tasks as well.

@bitsondatadev
Copy link
Collaborator Author

@bitsondatadev have you created a project to track progress and to-do list? Maybe others can help and pick up some tasks as well.

I want to but I don't have the ability to set it up. Seeing that @Fokko added an emoji response maybe he can help me with this!

@Fokko
Copy link
Contributor

Fokko commented Oct 23, 2023

@bitsondatadev Yes, I'm so excited to see this happening. Great work here! 🙌

I've created a project here. Maybe jus a label might also suffice? I'll leave it up to ou

@bitsondatadev
Copy link
Collaborator Author

Yeah, I think the projects make sense to have a better visual of progress broken down by phase. The first phase was a bunch of research and discussion on the design, so that would have been difficult to coordinate. The second phase is rather boring and though it is something that's easier to task out, my goal is to move quickly on that one so we don't have two versions of docs for long. The third phase is where I will build a bunch of tasks and get us to business as usual.

The second phase is updating older docs to the mkdocs format and tracking down a web dev to get mkdocs stylized to the current site. If anyone knows any open source web devs please let them know to reach out on the mailing list!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants