Pymake (pmk) is an environment for making reproducible research. It provides tools adapted to ease the creation, maintenance, tracking and sharing of experiments. It has two main paradigms:
- Manage and navigate in your experiments, as a command-line interface.
- Models and workflows for Machine Learning experiments, as a framework.
It follows a Don't-Repeat-Yourself (DRY) philosophy and propose a workflow called Model-Spec-Action (MSA) which is in the spirit of former Model-View-Controller (MVC) design pattern but adapted for computer simulations, generally speaking.
It can be represented as follows:
- Specification of design of experimentation with a simple grammar,
- Indexation of specs, models, scripts and corpus, powered by Whoosh,
- Customisable Command-line for quick design and experiment testing, powered by argparse
- Command-line auto-completion for specs and scripts,
- Simple grid search specification and navigation,
- Support experiments rules filtering (experimental)
- Support disks I/O management for training/input data and outputs results.
- Automatic filesystem I/O for data persistence.
- Automatic compression.
- Pickle and Json format are currently supported.
- Support plotting and table printing facilities powered by matplotlib and pandas
- Support experiments parallelization powered by gnu-parallel,
- Browse, design and test several models and corpus found in the literature.
- Integration of models from scikit-learn,
Perspectives:
- Web server UI and notebook automatic builder.
- Better documentation (see also wiki/).
pip install pmk
apt-get install python3-setuptools python3-pip python3-tk libopenblas-dev gfortran parallel
brew install parallel
git clone https://github.com/dtrckd/pymake
cd pymake && make
Public projects that uses Pymake :
- ml : Machine Learning models and experiments.
- docsearch : self hosted search engine for your pdf documents.
- run or expe: It is the term that design one single experiment. it is related to an atomic, sequential code execution.
- model: A class that have a method named
fit
and located inmodel/
. - spec: A spec is a design of experience. it is defines by a subset of Expspaces, ExpTensors and ExpGroups.
- script: A script is a file containing a list of actions, (see ExpeFormat).
- actions: An action is basically one method in a script that can be triggered by users. The term script is often used instead of action by misuse language.
- ExpSpace: A dict-like object used to stored the settings of one expe.
- ExpTensor: A dict-like object to represent a set of expe with common parameters. Each entries that are instance of
list
orset
are used to build the Cartesian product of those entries. It is used to defined grid search over parameters. - ExpGroup: A list-like object to defined a set of heterogeneous expes.
- ExpeFormat: A base class used to create scripts. It acts like a sandbox for the runs. The classes that inherit ExpeFormat should be located in
script/
. - ExpDesign: A base class used to create design of experience. The experience of type ExpSpace, ExpTensor and ExpGroup should be defined within class that inherit ExpDesign and located in
spec/
. - pymake.cfg: the pymake configuration file, where, for example, the name of the location (model/, spec, model/) can be changed among other settings.
- gramarg: It refers to a file, by default in gramarg.py, where you can tune the command line options of pmk by adding your onw. The command line option grammar is powered by the python module argparse.
pmk diff spec1 spec2
The pymake.cfg have a settings, by default gramarg = project_name.grammarg
, which point to the python file gramarg.py. Inside this file you can add command-line options, fully compatible with the argparse
python module. By default the file contains an empty list. If you want, let's say to set a parameter in your expe with the command line like this pmk --my_key my_value
you can add a element in the list as follows:
_gram = [
'--my_key',dict(type=str, help='simple option from command-line'),
]
Now suppose that you want to run several expe with different value for an argument, for example --my-key 10 20
will result in a expTensor with two expe, one with "my-key" at 10 the other at 20. To activate this you can proceed as follows:
from pymake.core.gram import exp_append
_gram = [
'--my_key',dict(nargs='*', action=exp_append),
]
Thus the argument you will get is a str for "my-key". If you want a int let's say, you can proceed as follows:
from functools import partial
from pymake.core.gram import exp_append
_gram = [
'--my_key',dict(nargs='*', action=partial(exp_append, _t=int)),
]
Finally if you argument "my-key" should be a list of values (int here) and should not create several expe, you can proceed like this:
from functools import partial
from pymake.core.gram import exp_append_uniq
_gram = [
'--my_key',dict(nargs='*', action=partial(exp_append_uniq, _t=int)),
]
Pymake provide a magic command line argument to specify any field in an expe. Let's say you want to give the value my_value
in the field my_key
in your expe, then you can do pmk [...] --pmk my_value=my_key
. You can chain as many key=value pairs like this.
If a spec has several run/expe and if the run/expe are launched sequentially (without --cores
option), then one can use a global container defined in the ExpeFormat sandbox classes in the variable self.D
. Typically one would init variables at the first experience, process it, and at the final run, do some processing with that variable, as illustrated in the following example:
class MyScripts(ExpeFormat):
def my_action(self):
if self.is_first_expe():
self.D.my_shared_var = 0
my_shared_var = self.D.my_shared_var
my_shared_var += 1
if self.is_last_expe():
print('Expe total: %d' % self.D.my_shared_var)
If the runs are parallelized (with --cores
options), there is no current implemented way to do it although it is likely to be developed in the future.
If one parameter is accessible from the command line. You can deactivate it from the command line by giving the argument _null
, from example pmk a_complex_spec --my_key _null
. Thus the associated value will takes no value (or its default value.)
The command, pymake update
build the auto-completion file for bash. To enable it, put the following lines at the end of your ~/.bashrc
:
if [ -d $HOME/.bash_completion.d ]; then
if [ ! -z $(ls $HOME/.bash_completion.d) ]; then
for bcfile in $HOME/.bash_completion.d/*; do
. $bcfile
done
fi
fi
If you want to enable the auto-completion, open a new terminal or just run source ~/.bashrc
.
- Workflow / directory structure
- pymake commands
- Designing Experiments
- Track your data and results
- pymake.cfg
- Search and indexation
(to be completed)
In a pymake project there is 4 main components, associated to 4 directories (you can change those names in the pymake.cfg):
data/
: Where are storer input/output of any experiments,- contains datasets (and saved results) ,
model/
: It represents our understanding of the data,- contains models -- every class with a
fit
method ,
- contains models -- every class with a
script/
: Code that operate with the data and models,- contains scripts for actions, -- every class that inherit
ExpeFormat
- contains scripts for actions, -- every class that inherit
spec/
: It is the specifications of the context of an experiment. In order words, the parameters of an experiment.- contains specification of (design) experiments (
ExpSpace
,ExpTensor
andExpGroup
), -- can be given as an argument of pymake.
- contains specification of (design) experiments (
Along with those directory there is two system files:
- pymake.cfg: at the root of a project (basically define a project) specify the paths for the
data | model | script | spec
and other global options, - gramarg.py: defines the command-line options for a project.
Initialize a new project in the current directory:
pymake init
If new models or scripts are added in the project, you'll need to update the pymake index:
pymake update
List/Search information:
pmk -l spec # show available designs of experimentation
pmk -l model # show available models
pmk -l script # show available scripts
pmk show expe_name # or just pymake expe_name
Run experiments:
pmk run [expe_name] --script script_name [script options...]
# Or shortly (alias):
pmk [expe_name] -x script_name
# Run in parallel:
pmk [expe_name] -x script_name --cores N_CORES
Show Paths for disks I/O:
pmk path [expe_name] [script options...]
Show individuals commands for asynchronously purpose (@deprecated):
pmk cmd [expe_name] [script options...]
A design of experiment is defined as one of the following type:
- ExpSpace: A subclass of
dict
=> 1 experiment - ExpTensor: A subclass of
dict
=> many experiments (Cartesian Product of alllist
entrie of the dict) - ExpGroup: A subclass of
list
=> group of ExpSpace or ExpTensor.
Design of experiment (ExpSpace, ExpTensor or ExpGroup) must live inside a class that inherit ExpDesign
. Those classes live in files inside the spec/
directory. You'll need the following import:
from pymake import ExpDesign, ExpSpace, ExpTensor, ExpGroup
The following examples need to be instantiated in class that inherits ExpDesign
: class MyDesign(ExpDesign)
.
To specify an unique experiment, one can use the ExpSpace
class:
exp1 = ExpSpace(name = 'myexpe',
size = 42,
key1 = 100,
key2 = 'johndoe'
_format = '{name}-{size}-{key1}_{key2}'
)
To specify a grid search, one can use the ExpTensor
class:
exp2 = ExpTensor(name = 'myexpe',
size = [42, 100],
key1 = list(range(20, 1000))
key2 = 'johndoe'
_format = '{name}-{size}-{key1}_{key2}'
)
Which will results in four experiments where "size" and "key1" settings take different values.
The third class is the ExpGroup
which allows to group several design of experiments (for example if they have different settings name):
exp3 = ExpGroup([exp1, exp2])
You can then run pmk -l
to see our design of experiments.
Basically, A model is a class inside model/
that have a method fit
.
(Doc in progress for more fancy use cases of design.)
A script is a piece of code that you execute which is parameterized by a specification. More specifically, Scripts are methods of class that inherits a ExpeFormat
and that lives inside the script/
folder.
Once you defined some scripts, you'll be able to list them with pmk -l script
, and to run them, by their name, with pmk [specification_id] -x script_name
.
Then each experiments defined in your design (or _default_expe if no specification_id is given), will go through the script method. Then, a bunch of facilities are living inside the method at run-time:
self.expe
: The settings of the current experiment,self._it
: The ith script running inside the script,- and more (doc in progress)
If a your expe contain models, you can automatically load and save it in a expe if your spec have a field named "model", and that its value point to a valid model in your pmk path. Then you can load your model in a script by calling self.load_model()
. If you give the argument -w
in the command-line, or (equivalent) your expe have have a pair _write=True
, the model is automatically saved at the end of the expe, after the model have been updated. Then you can reload from its file br calling self.load_model(load=True)
.
In order to save and analyze your results, each unique experiment need to be identified in a file. To do so, Pymake comes with its own mechanism to map the settings/specification to an unique s. Pymake use the following conventions:
- .inf: csv file where each lines contains the state of the iterative process of an experiment, (see _scv_format)
- .pk.gz: to save compressed binary object usually at the end of an experiments, and load it after for analysis/visualization,
- .json: to save information in a JSON format.
There is a bunch of special spec parameters to customize the behaviours of pymake describe in the following sections.
The choice of the filename will depends on the settings of the experiments. In order to specify the format of the filename, there is the special settings --format str_fmt
. str_fmt
is a string template for the filename, with braces delimiter to specify what settings will be replaced, example:
Suppose we have the following settings:
settings = ExpSpace(name = 'myexpe',
size = 42,
key1 = 100,
key2 = 'johndoe'
_format = '{name}-{size}-{key1}_{key2}'
)
The filename for this unique experiment will be 'myexpe-42-100_johndoe'
To give an unique identifier of an expe belonging to a group of experiments (ExpTensor
or ExpGroup
) one can use the special term {_id}
(unique counter identifier) and ${name}
(name of experiment as given in the ExpDesign class) in the _format
attribute.
The path of the filename identifying an expe is automatically inferred by pymake. Thus, if you want to better partition your results, there is two parameters to control the output_path. By default it is something like .pmk/results/training/<refdir>/<repeat>/output_path
. Thus you can control in your spec parameter the two level of sub-directory customizable with the keys _refdir
and _repeat
(in spec). If not given, the default parameters are "default" and '' (void) for respectively _refdir
and _repeat
. Note that you can format it with the same syntax explained for _format
.
to complete...
- explain the
_scv_typo
parameters.. - the model need to have a method injected a the end of its iterative process..
pymake (pmk) command-line reference.
init = command$;
command = 'pmk' [command_name] [expedesign_id]* [expe_id]* [pmk_options];
command_name = 'run' | 'runpara' | 'path' | 'cmd' | 'update' | 'show' | 'hist' | '' ;
expe_id = int; # int identifier of an expe from 0 to; size(exp) -1.
expedesign_id = [exp id/name]; # string identifier to an exp
pmk_options = [pymake special options + project options];
If 'expe_name' is empty and -x
is given, pymake assumes run
command. If no design spec is given, then the parameters are empty unless the script defines a _default_expe
expe settings. All settings undefined in a design but defined in the _default_expe
will take this value. Further, _default_expe
can point to an existing spec in spec/
; to do so use the following setting inside _spec='my_expe_name'
.
Remark: -l and -s (--simulate) options don't execute, they just show things up.
Pick among all (design of) experiments in {spec}. To list them pmk -l spec
.
Here are all the special options that own pymake, such as --refdir, --format, --script, -w, -l, -h etc. Additionally, all the options for the current project should be added in the grammarg.py
file.