Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate documentation page for third-party dependencies #2611

Merged
merged 6 commits into from
Mar 2, 2020

Conversation

charith-elastic
Copy link
Contributor

@charith-elastic charith-elastic commented Feb 24, 2020

Generates a documentation page containing third-party dependency information.

This change refactors the licence-detector to add the following functionality:

  • Detect licence type of each dependency.
  • Allow manual overrides to provide licence and URL information for dependencies that cannot be scanned automatically.
  • Add template for dependency list page.
  • Force refresh of Go module cache to obtain the latest information about dependencies
  • Add script to generate dependency information about the ECK container image
  • Add validation option to ensure that all dependency URLs are valid

Dependency page preview: http://cloud-on-k8s_2611.docs-preview.app.elstc.co/guide/en/cloud-on-k8s/master/k8s-dependencies.html

Fixes elastic/k8s-dev#103

@charith-elastic charith-elastic added >docs Documentation :ci Things related to Continuous Integration, automation and releases labels Feb 24, 2020
@anyasabo anyasabo mentioned this pull request Feb 24, 2020
@charith-elastic charith-elastic marked this pull request as ready for review February 24, 2020 16:54
@charith-elastic
Copy link
Contributor Author

Jenkins test this please

@charith-elastic
Copy link
Contributor Author

Jenkins test this please

This directory contains the scripts to generate licence notice and dependency information documentation.

- `generate-notice.sh`: Invoked by `make generate-notice-file` to automatically generate `NOTICE.txt` and `docs/dependencies.asciidoc`.
- `generate-image-deps.sh`: Manually invoked script to update the container image dependency information in `docs/container-image-dependencies.csv`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what the CSV is intended for? It doesn't look like it's used as an input to anything else. I was also a little surprised when I was testing it out and it needed to sudo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CSV is included by docs/dependencies.asciidoc and contains the licence information for the Docker image. The sudo request by Tern is indeed a bit worrying but apparently it's needed to mount the procfs file system:

Tern requires root privileges to run because it needs to mount procfs in order to run commands within a chroot environment and call the Docker CLI. It is enough if you have configured sudo; Tern will ask for your password before running any priviledged commands.

I wish they supported OCI images rather than relying so heavily on Docker daemon and CLI but it's the only reliable tool I have managed to find. The alternative is to manually maintain a list of dependencies in the Docker image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for the clarification. I wonder if there's a clearer phrase than "container image dependency" since our direct dependencies are also included in the image (just inside the binary). Maybe changing to "container base image dependency..."?

"os"
"path/filepath"

securejoin "github.com/cyphar/filepath-securejoin"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I learned about a new lib, nice

}

decoder := json.NewDecoder(f)
for {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be me overthinking it, but we mostly parse it this way because we want to end up with a map for mkDepInfo(), correct? If we wanted a slice we could just store overrides.json as a json array and unmarshal it right into a slice (and also get to prettify the JSON file rather than making each item be on one line).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose the JSON lines format because it is well-known and Go supports it out of the box.
The amount of data stored on each line is pretty minimal and I don't expect it to be edited too often. If it starts to become too complex and requires pretty formatting, I think we can consider switching to a more human-friendly file format at that point.

func MkClassifier(dataPath string) (*licenseclassifier.License, error) {
absPath, err := filepath.Abs(dataPath)
if err != nil {
return nil, fmt.Errorf("failed to determine absolute path of licence data file: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it might be worth considering using pkg/errors so we get those sweet, sweet stack traces

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pkg/errors is now in maintenance mode in favour of the changes to errors introduced by Go 1.13. I am using the error wrapping method introduced by 1.13 here. I prefer error messages with added context like this over stacktraces as they are easier to read and search for in the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's been in maintenance mode for a bit now, but you are correct 1.13 adds the error wrapping functionality that pkg/errors had. There's still no easy support afaik for including stack traces as well though. Like I said it was just a personal style suggestion, definitely not a blocker

@anyasabo
Copy link
Contributor

It's a little odd the base image deps are formatted differently than the module deps, but it matches the ECE doc and I expect this will probably be the least visited page in our entire docs, so it is definitely not worth changing.


This script generates licence information for the contents of the ECK container image. As the container base image is rarely ever changed and the tool used ([Tern](https://github.com/vmware/tern)) is slow to run, this script is not invoked automatically by the build process.

To generate the dependency list (`docs/container-image-dependencies.csv`) for a particular image tag, invoke the script as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To generate the dependency list (`docs/container-image-dependencies.csv`) for a particular image tag, invoke the script as follows:
To generate the dependency list (`docs/container-image-dependencies.csv`) for a particular image tag, invoke the script as follows. Note that this will prompt for root and [is necessary](https://github.com/vmware/tern/blob/master/docs/releases/v0_1_0.md#notes):

"github.com/karrick/godirwalk"
)

var errLicenceNotFound = errors.New("failed to detect licence")
// detectionThreshold is the minimum confidence score required from the licence classifier.
const detectionThreshold = 0.85
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mainly for self-documentation here so that someone reading the code would be aware of the existence of a confidence score. Also, all of our current dependencies had scores higher than 0.85 so I thought it would be good to have a higher threshold to reduce the potential for a false positive slipping through.

@charith-elastic
Copy link
Contributor Author

It's a little odd the base image deps are formatted differently than the module deps

Since Go dependencies have long names, having the URL as a separate column makes the table too wide for the content area -- causing the text to get squished and unreadable.

Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! a few minor questions around naming.

hack/licence-detector/detector/detector.go Outdated Show resolved Hide resolved
hack/licence-detector/detector/detector.go Outdated Show resolved Hide resolved
hack/licence-detector/validate/validate.go Outdated Show resolved Hide resolved
hack/licence-detector/validate/validate.go Show resolved Hide resolved
hack/licence-detector/validate/validate.go Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ci Things related to Continuous Integration, automation and releases >docs Documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants