Platform | Build |
---|---|
Windows | |
Linux |
A tool for organizing and managing computational experiments. Reproducible research is a way of conducting research allowing to get a provenance of any result and be able to compute it again. Alpheus forces a researcher to follow this way.
Advantages: todo: complete the list
- Builds a dependency graph of operations.
- Incrementally computes only that data which were affected by a change.
- Allows reproducing the data.
- Allows using your usual tools and is based on folders/files.
You need to have .Net Core SDK 2.1.300 or newer installed.
If you have it install the latest version of Alpheus with
dotnet tool install --global Alpheus-cli
You can call alpheus
in the command line
C:\project1>alpheus help
USAGE: alpheus [help] [<subcommand> [<options>]]
SUBCOMMANDS:
init <options> Make the current directory an Alpheus experiment directory
config <options> Modify configuration of research directory
build <options> Creates an experiment graph node
compute <options> Tries to compute the graph to make the outdated node up to date
status <options> Prints the graph status for particular .alph file
save <options> Save a copy of file/directory to the storage(s)
restore <options> Restore a copy of the file/directory from storage
Use 'alpheus <subcommand> help' for additional information.
OPTIONS:
help display this list of options.
Before building the code, you need to make sure the machine has the following tools installed:
- .Net Core SDK 2.1.300 or newer
- Node.js 8.11.3 or higher.
- Yarn package manager.
Clone the repository and run the following command in the root of the repository:
dotnet build
You can also open Alpheus.sln using Visual Studio 2019 (or newer) and build the solution.
Run the following command in the root of the repository:
dotnet test
In the root folder of the experiment run the following command:
alpheus init
This creates new folder .alpheus
with default settings described in .alpheus/config.json
. This folder should be committed to the git repository.
Now the root folder can be called the experiment folder.
Experiment is a composition of methods producing and consuming artefacts. Each method is a command line operation registered using the command alpheus build
. An artefact is a file or a folder located within the experiment folder.
For example, the following command registers a method which produces an output artefact author.txt
by running command whoami > author.txt
:
alpheus build -o "author.txt" "cmd /c whoami > $out1"
Note that this command doesn't actually run anything, but just creates author.txt.alph
file which describes how author.txt
can be produced. When there are many methods, these description files allow to build a dependency graph for methods of the experiment.
Let the scripts/count.py
script contains two arguments: input file and output file, and puts number of characters in the input file to the output file. The following command registers a method which runs the script for the author.txt
and builds count.txt
:
alpheus build -o "count.txt" -d "scripts/count.py" -d "author.txt" "python $in1 $in2 $out1"
Note that we manifest that the new method depends on output of the first method, author.txt
. This information is stored in the created file count.txt.alph
.
All *.alph
files must be committed to the git repository, so the experiment workflow is shared.
To compute an artefact, use alpheus compute
. For instance, the following command computes count.txt
:
alpheus compute count.txt
Alpheus builds the dependency graph of methods needed in order to produce the required file and then runs only those methods which have no up-to-date outputs. Alpheus automatically determines changes in files/directories, so you don't need to worry if the output is consistent. As a result, we get both author.txt
and count.txt
.
It is up to you whether you want to commit these files to the git repository or push them to an external storage, or keep them just on the local machine. In the latter case, on other machines these files must be recomputed, if needed.
Just delete corresponding *.alph
files. Note that you can break the dependencies by deleting artefacts required by other methods. In this case, the computation of those methods will fail.
Let there are following methods:
- Method M produces folder
data/
. - Method N depends on
data/birds.csv
.
N shouldn't be run unless M completes, since potentially M can change data/birds.csv
. So the rule is:
If a method dependency path is under or equal an output path of another method, then the first method runs after the second method ends.
If you need to perform an identical operation with multiple artefacts, you should provide both input and output paths with an asterisk (*) when declaring a method:
alpheus build -o "counts/*.txt" -d "scripts/count.py" -d "files/*.txt" "python $in1 $in2 $out1"
In the given example, the script count.py
will be executed for each text file in the files
folder, and
there will be text files with same name containing counts in the count
folder.
Maximum number of asterisks in the inputs and number of asterisks in the outputs must be same. Number of asterisks define dimensionality of the vector operations.
todo
todo: how the user builds the expemeriment, saves, shares. We recommend to start with adding and debugging scripts manually then register it in the dependency graph.
todo: builds alpheus experiment when you already have a bunch of scripts and files.
todo
todo