Demo project for the prediction service API in jBPM.
First we will go through the necessary steps to setup the demo and lastly we will look at some implementation details on how the prediction API works. This will allow you to learn how to create your own machine learning (ML) based prediction services and how to integrate them with jBPM.
Download and install jBPM from here.
This repository contains two example prediction service implementations as Maven modules and a REST client to populate the project with task to allow the predictive model training. Start by downloading, or alternatively cloning, the repository:
$ git clone git@github.com:ruivieira/jbpm-recommendation-demo.git
For this demo, two random forest-based services, one using the SMILE library and another as Predictive Model Markup Language (PMML) model, will be used.
The services, located respectively in services/jbpm-recommendation-smile-random-forest
and services/jbpm-recommendation-pmml-random-forest
, can be built with (using SMILE as an example):
$ cd services/jbpm-recommendation-smile-random-forest
$ mvn clean install -T1C -DskipTests -Dgwt.compiler.skip=true \
-Dfindbugs.skip=true -Drevapi.skip=true -Denforcer.skip=true \
-Dcheckstyle.skip=true
The resulting JARs files can then be included in the Workbench's kie-server.war
located in standalone/deployments
directory of your jBPM server installation. To do this, simply create a WEB-INF/lib
, copy the compiled jars into it and run
$ zip -r kie-server.war WEB-INF
The PMML-based service expects to find the PMML model in META-INF
, so after copying the PMML file in jbpm-recommendation-pmml-random-forest/src/main/resources/models/random_forest.pmml
into META-INF
, it should also be included in the WAR by using
$ zip -r kie-server.war META-INF
jBPM will search for a prediction service with an identifier specified by a Java property named org.jbpm.task.prediction.service
. Since in our demo, the random forest service has the indentifier SMILERandomForest
, we can set this value before starting the workbench, for instance as an environment variable:
$ export JAVA_OPTS="-Dorg.jbpm.task.prediction.service=SMILERandomForest"
For the purpose of this documentation we will illustrate the steps using the SMILE-based service. The PMML-based service can be used by setting the above environment variable as
$ export JAVA_OPTS="-Dorg.jbpm.task.prediction.service=PMMLRandomForest"
Start the WB by running
./bin/standalone.sh
One the WB has completed the startup, you can go to http://localhost:8080/business-central/ and login using the default admin credential wbadmin/wbadmin
. After chosing the default workspace (or creating your own), then select "Import project" and use the project git URL:
https://github.com/ruivieira/jbpm-recommendation-demo-project.git
The project consists of a single Human Task, which can be inspected using the WB. The task is generic and simple enough in order to demonstrate the working of the jBPM's prediction API.
For the purposes of the demonstration, this task will be used to model a simple purchasing task where the purchase of a laptop of a certain brand is requested and must be, eventually, manually approved. The tasks inputs are:
item
- aString
with the brand's nameprice
- aFloat
representing the laptop's priceActorId
- aString
representing the user requesting the purchase
The task provides as outputs:
approved
- aBoolean
specifying whether the purchased was approved or not
This repository contains a REST client (under client
) which allows to add Human Tasks in batch in order to have sufficient data points to train the model, so that we can have meaningful predictions.
NOTE: Before running the REST client, make sure that the Workbench is running and the demo project is deployed and also running.
The class org.jbpm.recommendation.demo.RESTClient
performs this task and can be executed from the client
directory with:
$ mvn exec:java -Dexec.mainClass="org.jbpm.recommendation.demo.RESTClient"
The client will then simulate the creation and completion of human tasks, during which the model will be trained.
The tasks' completion will adhere to the following logic:
- The purchase of a laptop of brand
Lenovo
requested by userJohn
orMary
will be approved if the price is around $1500 - The purchase of a laptop of brand
Apple
requested by userJohn
orMary
will be approved if the price is around $2500 - The purchase of a laptop of brand
Lenovo
requested by userJohn
orMary
will be rejected if the price is around $2500
The prices for Lenovo and Apple laptop are drawn from Normal distributions with respective means of 1500 and 2500 (pictured below). Although the prediction service is not aware of the deterministic rules we've used to set the task outcome, it will train the model based on the data it receives.
In the following sections we will explain the internal working of a prediction service, how to test this project in the Workbench and how to create your own prediction service.
jBPM offers an API which allows for predictive models to be trained with Human Tasks (HT) data and for HT to incorporate the model's predictions as outputs ore even complete a HT.
This is achieved by connecting the HT handling to a prediction service. A prediction service is simply any third-party class wich implements the org.kie.internal.task.api.prediction.PredictionService
interface.
This interface consists of three methods:
getIdentifier()
- this methods simply returns a unique (String
) identifier for your prediction servicepredict(Task task, Map<String, Object> inputData)
- this method takes task information and the task's inputs from which we will derive the model's inputs, as a map. The method returns aPredictionOutcome
instance, which we will look in closer detail later ontrain(Task task, Map<String, Object> inputData, Map<String, Object> outputData)
- this method, similarly topredict
, takes task info and the task's inputs, but now we also need to provide the task's outputs, as a map, for training
By default, if no other prediction service is specified, jBPM will use a no-op service as defined in org.jbpm.services.task.prediction.NoOpPredictionService
. This service returns an empty prediction and performs no training. jBPM processes will behave as if no prediction service is present.
It is important to note that the prediction service makes no assumptions about which features will be used for model training and prediction. The API exposes the task information, inputs and outputs, but it is up to the developer/data scientist to select which inputs and outputs will be used for training, or if pre-processing is necessary, for instance.
The PredictionOutcome
is a class which encapsulates the model's prediction for a certain Map<String, Object> inputData
.
This class will contain:
- A
Map<String, Object> outcome
containing the prediction outputs, each entry represents a output attribute name and value. This map can be empty, which corresponds to the model not providing any prediction. - A
confidence
value. The meaning of this field is left to the developer. As an example, it could represent a probability between0.0
and1.0
. It's relevance is related to theconfidenceThreshold
below. - A
confidenceThreshold
- this value represents theconfidence
cutoff after which an action can be taken by the HT item handler.
As example, let's assume our confidence
represents a prediction probability between 0.0
and 1.0
. If the confidenceThreshold
is 0.7
, that would mean that for confidence > 0.7
the HT outputs would be set to the outcome
and the task automatically closed. If the confidence < 0.7
, then the HT would set the prediction outcome
as suggested values, but the task would not be closed and still need human interaction. If the outcome
is empty, then the HT lifecycle would proceed as if no prediction was made.
The initial step is then, as defined above, the predict
step.
In the scenario where the the prediction's confidence is above the threshold, the task is automatically completed. If that the confidence is not above the threshold, however, when the task is eventually completed both the inputs and the outputs will then be used to further train the model by calling the prediction service's train
method.
As we've seen previously, when creating and completing a batch of tasks (as previously) we are simultaneously training the predictive model. The service implementation is based on a random forest model a popular ensemble learning method.
When running the RESTClient
, 1200 task will be created and completed to allow for a reasonably sized training dataset. The prediction service initially has a confidence threshold of 1.0
and after a sufficiently large number (arbitrarily chosen as 1200) of observations are use for training, the confidence threshold drops to 0.75
. This is simply to demonstrate the two possible actions, i.e. prediction without completing and completing the task. This also allows us to avoid any cold start problems.
After the model is trained with the task from RESTClient
, we will now create a new Human Task.
If we create a HT requesting the purchase of an Apple
laptop from John
with the price $2500, we should expect it to be approved.
If fact, when claiming the task, we can see that the prediction service recommends the purchase to be approved with a "confidence" of 91%.
If he now create a task for the request of a Lenovo
laptop from Mary
with the price $1437, he would expect it to be approved. We can see that this is the case, where the form is filled in by the prediction service with an approved status with a "confidence" of 86.5%.
We can also see, as expected, what happens when John
tries to order a Lenovo
for $2700. The prediction service fills the form as "not approved" with a "confidence" of 71%.
In this service, the confidence threshold is set as 0.95
and as such the task was not closed automatically.
The second example implementation is the PMML-based prediction service. PMML is a predictive model interchange standard, which allows for a wide variety of models to be reused in different platforms and programming languages.
The service included in this demo consists of pre-trained model (with a dataset similar to the one generate by the RESTClient
) which is executed by a PMML engine. For this demo, the engine used was jpmml-evaluator, the de facto reference implementation of the PMML specification.
There are two main differences when comparing this service to the SMILE-based one:
- The model doesn't need the training phase. The model has been already trained and serialised into the PMML format. This means that we can start using predictions straight away from jBPM.
- The
train
API method is a no-op in this case. This means that whenever the service'strain
method is called, it will not be used for training in this example (only thepredict
method is needed for a "read-only" model), as we can see from the figure below.
A demonstration video is available here (description in subtitles).