Skip to content

Alpheus: A tool for organizing and managing computational experiments

License

Notifications You must be signed in to change notification settings

novikov-alexander/alpheus

 
 

Repository files navigation

Alpheus

Platform Build
Windows VS2019 Build Status
Linux Linux Build Status

codecov Tests NuGet NuGet

A tool for organizing and managing computational experiments. Reproducible research is a way of conducting research allowing to get a provenance of any result and be able to compute it again. Alpheus forces a researcher to follow this way.

Advantages: todo: complete the list

  • Builds a dependency graph of operations.
  • Incrementally computes only that data which were affected by a change.
  • Allows reproducing the data.
  • Allows using your usual tools and is based on folders/files.

Installation

You need to have .Net Core SDK 2.1.300 or newer installed.

If you have it install the latest version of Alpheus with

dotnet tool install --global Alpheus-cli

Usage

You can call alpheus in the command line

C:\project1>alpheus help
USAGE: alpheus [help] [<subcommand> [<options>]]

SUBCOMMANDS:

    init <options>        Make the current directory an Alpheus experiment directory
    config <options>      Modify configuration of research directory
    build <options>       Creates an experiment graph node
    compute <options>     Tries to compute the graph to make the outdated node up to date
    status <options>      Prints the graph status for particular .alph file
    save <options>        Save a copy of file/directory to the storage(s)
    restore <options>     Restore a copy of the file/directory from storage

    Use 'alpheus <subcommand> help' for additional information.

OPTIONS:

    help                  display this list of options.

Build

Before building the code, you need to make sure the machine has the following tools installed:

  1. .Net Core SDK 2.1.300 or newer
  2. Node.js 8.11.3 or higher.
  3. Yarn package manager.

Clone the repository and run the following command in the root of the repository:

dotnet build

You can also open Alpheus.sln using Visual Studio 2019 (or newer) and build the solution.

Tests

Run the following command in the root of the repository:

dotnet test

Documentation

Initialization

In the root folder of the experiment run the following command:

alpheus init

This creates new folder .alpheus with default settings described in .alpheus/config.json. This folder should be committed to the git repository.

Now the root folder can be called the experiment folder.

Adding new method

Experiment is a composition of methods producing and consuming artefacts. Each method is a command line operation registered using the command alpheus build. An artefact is a file or a folder located within the experiment folder.

For example, the following command registers a method which produces an output artefact author.txt by running command whoami > author.txt:

alpheus build -o "author.txt" "cmd /c whoami > $out1"

Note that this command doesn't actually run anything, but just creates author.txt.alph file which describes how author.txt can be produced. When there are many methods, these description files allow to build a dependency graph for methods of the experiment.

Let the scripts/count.py script contains two arguments: input file and output file, and puts number of characters in the input file to the output file. The following command registers a method which runs the script for the author.txt and builds count.txt:

alpheus build -o "count.txt" -d "scripts/count.py" -d "author.txt" "python $in1 $in2 $out1"

Note that we manifest that the new method depends on output of the first method, author.txt. This information is stored in the created file count.txt.alph.

All *.alph files must be committed to the git repository, so the experiment workflow is shared.

Computing an artefact

To compute an artefact, use alpheus compute. For instance, the following command computes count.txt:

alpheus compute count.txt

Alpheus builds the dependency graph of methods needed in order to produce the required file and then runs only those methods which have no up-to-date outputs. Alpheus automatically determines changes in files/directories, so you don't need to worry if the output is consistent. As a result, we get both author.txt and count.txt.

It is up to you whether you want to commit these files to the git repository or push them to an external storage, or keep them just on the local machine. In the latter case, on other machines these files must be recomputed, if needed.

Removing an artefact/method

Just delete corresponding *.alph files. Note that you can break the dependencies by deleting artefacts required by other methods. In this case, the computation of those methods will fail.

Getting status of an experiment

Implicit dependencies

Let there are following methods:

  • Method M produces folder data/.
  • Method N depends on data/birds.csv.

N shouldn't be run unless M completes, since potentially M can change data/birds.csv. So the rule is:

If a method dependency path is under or equal an output path of another method, then the first method runs after the second method ends.

Vector operations

If you need to perform an identical operation with multiple artefacts, you should provide both input and output paths with an asterisk (*) when declaring a method:

alpheus build -o "counts/*.txt" -d "scripts/count.py" -d "files/*.txt" "python $in1 $in2 $out1"

In the given example, the script count.py will be executed for each text file in the files folder, and there will be text files with same name containing counts in the count folder.

Maximum number of asterisks in the inputs and number of asterisks in the outputs must be same. Number of asterisks define dimensionality of the vector operations.

Using external storage for artefacts

Support for standard tools and languages

todo

Common worfklow

todo: how the user builds the expemeriment, saves, shares. We recommend to start with adding and debugging scripts manually then register it in the dependency graph.

Migration from a bunch of scripts and data files

todo: builds alpheus experiment when you already have a bunch of scripts and files.

Sharing and collaborating

todo

Running in Cloud

todo

About

Alpheus: A tool for organizing and managing computational experiments

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • F# 98.3%
  • JavaScript 1.3%
  • Other 0.4%