Skip to content

nirdizati-research/predict-python

Repository files navigation

Predict python

License MIT HitCount

Master

Build Status codecov Total alerts Language grade: Python Maintainability

Development

Build Status codecov

Reference

If you use the code from this repository, please cite the original paper:

@inproceedings{DBLP:conf/bpm/RizziSFGKM19,
  author    = {Williams Rizzi and
               Luca Simonetto and
               Chiara Di Francescomarino and
               Chiara Ghidini and
               T{\~{o}}nis Kasekamp and
               Fabrizio Maria Maggi},
  title     = {Nirdizati 2.0: New Features and Redesigned Backend},
  booktitle = {Proceedings of the Dissertation Award, Doctoral Consortium, and Demonstration
               Track at {BPM} 2019 co-located with 17th International Conference
               on Business Process Management, {BPM} 2019, Vienna, Austria, September
               1-6, 2019},
  pages     = {154--158},
  year      = {2019},
  url       = {http://ceur-ws.org/Vol-2420/paperDT8.pdf},
  timestamp = {Fri, 30 Aug 2019 13:15:06 +0200},
  biburl    = {https://dblp.org/rec/bib/conf/bpm/RizziSFGKM19},
  bibsource = {dblp computer science bibliography, https://dblp.org}
} 

Django backend server for machine learning on event logs.

Running in a new environment

The docker build is available @ https://hub.docker.com/r/nirdizatiresearch/predict-python/ in any case if you prefer to setup your environment on your own you can refer the Dockerfile.

Docker Compose

On first run to setup the database, you can run:

docker-compose run server python manage.py migrate

To run the project:

docker-compose up redis server scheduler worker

To access a generic remote Django server you can use the ssh tunneling functionality as shown in the following sample:

ssh -L 8000:127.0.0.1:8000 <user>@<host>

Run an instance of the project

If you are familiar with docker-compose the docker-compose file is available, otherwise if you use PyCharm as IDE run the provided configurations.

Finally, from the command line you can use the following sample commands to interact with our software.

Start server with

python manage.py runserver

Run tests with one of the following

python manage.py test
./manage.py test

NB: always run a redis-server in background if you want your server to accept any incoming post requests!

Start by running migrations and adding sample data

python manage.py migrate
python manage.py loaddata <your_file.json>

Start jobs from command line

curl --request POST \
  --header 'Content-Type: application/json' \
  --data-binary '{
    "type": "classification",
    "split_id": 1,
    "config": {
      "encodings": ["simpleIndex"],
      "clusterings": ["noCluster"],
      "methods": ["randomForest"],
      "label": {"type": "remaining_time"},
      "encoding": {"prefix_length": 3, "generation_type": "only", "padding": "zero_padding"}
    }
  }' \
http://localhost:8000/jobs/multiple

Creating a single split options.

  • $SPLIT_TYPE has to be one of split_sequential, split_random, split_temporal, split_strict_temporal. By default split_sequential.
  • test_size has to be from 0 to 1. By default 0.2
curl --request POST \
  --header 'Content-Type: application/json' \
  --data-binary '{
    "type": "single",
    "original_log": 1, 
    "config": {
      "test_size": 0.2,
      "split_type": $SPLIT_TYPE
    }
  }' \
http://localhost:8000/splits/

Advanced configuration

Prediction methods accept configuration for sklearn classification/regression methods. The Job config must contain a dict with only the supported options for that method. The dict name must take the format "type.method". For classification randomForest this would be classification.randomForest. Advanced configuration is optional. Look at jobs/job_creator.py for default values.

For example, the configuration for classification KNN would have to be like:

curl --request POST \
  --header 'Content-Type: application/json' \
  --data-binary '{
    "type": "classification",
    "split_id": 1,
    "config": {
      "encodings": ["simpleIndex"],
      "clusterings": ["noCluster"],
      "methods": ["knn"],
      "classification.knn": {
        "n_neighbors": 5,
        "weights": "uniform"
      },
      "label": {"type": "remaining_time"},
      "encoding": {"prefix_length": 3, "generation_type": "up_to", "padding": "no_padding"}
    }
  }' \
http://localhost:8000/jobs/multiple

Labelling job

Log encoding and labelling can be tested before prediction. It supports all the same values as classification and regression jobs but the method and clustering.

curl --request POST \
  --header 'Content-Type: application/json' \
  --data-binary '{
    "type": "labelling",
    "split_id": 5,
    "config": {
      "label": {"type": "remaining_time"},
      "encoding": {"prefix_length": 3, "generation_type": "up_to", "padding": "no_padding"}
    }
  }' \
http://localhost:8000/jobs/multiple

Documentation

This project allows documentation to be built automatically using sphinx. All the documentation-related files are in the docs/ folder, structured as:

└── docs/
    ├── build/
    │   ├── doctrees/
    │   └── html/
    ├── source
    │    ├── _static/
    │    ├── _templates/
    │    ├── api/
    │    ├── readme/
    │    ├── conf.py
    │    └── index.rst
    ├── generate_modules.sh
    └── Makefile

in the html/ the built html files are placed, whereas in the source/ there are all the source files. The _static/ contains the images used in the final html files, as the logo: place eventual screenshots etc. here. The api/ contains all the files used for automatically fetching docstrings in the project, you shouldn't edit them as they are all replaced when re-building the documentation. The readme/ folder contains the .rst copies of the readmes used, when updating the project's readme, please also update those accordingly. The conf.py contains all the sphinx settings, along with the theme used (sphinx-rtd-theme).

The index.rst file is the documentation entry point, change this file to change the main documentation structure as needed. After updating the docstrings in the project, please re-run the generate_modules.sh script, that simply uses the sphinx-apidoc command to re-create the api .rst files.

Finally, the Makefile is used when building the entire documentation, please run a make clean make html when you want updated docs.

To summarize, after changing docstrings or the readme.rst files, simply run:

sh generate_modules.sh
make clean
make html

Documentation is also hosted on readthedocs.com and built automatically after each commit on the master or development branch, make sure to have the api files updated in advance.

Note on CUDA enabled systems

As this project detects when a compatible GPU is present in the system and tries to use it, please add a CUDA_VISIBLE_DEVICES=0 flag as an environment variable if you encounter problems.

Contributors