Skip to content
Nisha Kumar edited this page Dec 1, 2017 · 3 revisions

Tern

What is Tern

Tools like Docker make it easy to build and distribute linux containers that run microservices. Typically, a container is built by running instructions in a Dockerfile. The assumption, however, is that the Dockerfile is the software bill of materials existing within the distributed containers, when in fact it is simply a list of instructions to install and run software components. This is troublesome from a compliance standpoint because there is no record of what actually got installed. Furthermore, depending on how the Dockerfile is written, it can be quite difficult to reproduce Docker builds as sources get versioned and repositories go stale.

The Tern project was created to address the need to identify the software components that are installed within a container and to collect metadata for the purpose of verifying things like versions, licenses and source urls.

Tern is the tool used to check against a knowledge base of information, either actual metadata or shell scripts to retrieve metadata, and use it to create a catalog of software packages that are installed in a Docker container.

Tern helps you begin to better manage the compliance of your containers and work towards more identifiable and reproducible builds.

User personas

  • Compliance Validation: Tern can be used by any developer looking to check existing containers to ensure that Open Source Software within it meet source and licensing requirements for their organization or to give feedback to the Open Source community
  • Build and Release: Tern can be used as part of a build and release pipeline where a bill of materials can be created by running it against a Dockerfile, checked against a set of rules (either manually or through automation) or expected BOM, and released as one of the artifacts.

Milestones

Phase 1

  • Retrieve sources from the packages installed in a simple Dockerfile
  • A Knowledge base that can contain either this package information or methods to retrieve it for a given package

Phase 2

  • A Cache to store layers with the packages that are installed in those layers
  • A report containing a line-by-line 'walkthrough' of the Dockerfile to say what packages were installed in each line
  • A summary report
  • The complete list of dependencies are retrieved

Phase 3

  • Allow for source tarball retrieval
  • Recurse through images built from other images
  • SPDX document output

Phase 4

  • Allow for identification of security updates that may be present in a container
  • Allow for application of security updates

Current Functionality

  • Parse a Dockerfile to get FROM and RUN instructions
  • Retrieve package information from a base image using the base OS's package management system. For example, if the base OS is Photon OS, tern can use tdnf to retrieve package source url, version and license information.
  • From RUN commands retrieve packages that may be installed if the image is not buildable

Architecture

Arch

The input to Tern is a Dockerfile. Tern will attempt to build the Docker image using the Dockerfile. If it cannot then it will try to decipher what packages may have been installed using just the Dockerfile.

The flow looks something like this:

  1. Parse Dockerfile to get instructions associated with the base image
  2. Attempt to pull base image from Dockerhub. If it doesn't exist then exit.
  3. Check the cache to see if the packages associated with the filesystem layers exist. If they do retrieve it.
  4. If there is no listing of the filesystem layer in the cache then check the command library for snippets to retrieve package metadata.
  5. Start a container with the base image and execute the commands in the running container. Store the results in the cache.
  6. Parse each RUN directive in the Dockerfile to retrieve a list of commands and packages installed with them.
  7. Attempt to build the Dockerfile.
  8. If the build is successful then retrieve the filesystem layers and check against the cache for installed packages. If there is a match then retrieve the list of packages
  9. If there is no listing then check the command library for snippets to retrieve package metadata
  10. Start a container with the built image and execute the commands in the running container. Store the results in the cache.
  11. If the build is not successful then just report the list of packages that may have been installed from the parsed information.

Tern collates all of the package information in a detailed report intended to help users determine which line in the Dockerfile installed what packages including what lines it couldn't parse. It can also produce a sparse report only containing the package information. We recommend you start with the detailed report first.

Functional blocks

tern
|__ report.py
|       |__ common.py
|              |__ classes/*
|              |__ utils/*
|__ sources.py
        |__ common.py
               |__ classes/*
               |__ utils/*

tern is the main executable which imports functions like report and sources. These modules in turn use the common functions, classes and utils (not necessarily in the hierarchical order shown above).

  • report.py - the main function used to create a report
  • sources.py (not present right now) - the main function used to retrieve sources
  • common.py - common functions
  • classes - currently containing the Layer class and the Package class
    • Layer - represents the filesystem layer of a container image
    • Package - represents a package that is installed in a filesystem layer
  • utils - utility modules which may be used anywhere
    • dockerfile - utility modules relating to parsing the dockerfile
    • commands - utility modules related to running docker commands
    • metadata - utility modules for extracting a container image's metadata
    • cache - utility commands to perform CRUD operations on the cache
    • constants - constants used in the other utilities

Contributing to Tern

See the CONTRIBUTING.md to get started.

Clone this wiki locally