Pamplemousse is an optimizing transpiler for converting PMML models to Lua scripts. It aims to generate Lua that faithfully represents the exact behaviour of a model, as specified by the PMML standard, while being efficient and readable.
Pamplemousse is for pragmatic people who want to solve real problems using Machine Learning, particularly in situations where latency is critical.
Pamplemousse allows you to use advanced ML models, generated by a wide variety of machine learning software within your project. To start using pamplemousse in production today, you only need to integrate Lua with your solution. Lua is a mature, well known and very efficient scripting language that is extremely easy to integrate. Pamplemousse takes the complexity of machine learning models out of the real-time core of your system, leaving only a simple script that calculates what you need and no more.
Pamplemousse can be used wherever Lua can. It is an external tool/library that can be invoked as a part of whatever pipeline that you choose to use to deploy models, it does not have to be integrated into your real-time system. The tool itself is regularly tested on Linux (Centos) and MacOS, but the generated scripts can run on whatever platform that supports Lua. Furthermore, you can test whatever script Pamplemousse has generated thoroughly before deploying it.
Pamplemousse uses CMake and depends on tinyxml2 and Lua.
Make sure these dependancies are installed, checkout Pamplemousse from here and you are ready to go!
$ git clone https://github.com/ThreatMetrix/Pamplemousse.git
$ mkdir Pamplemousse_build
$ cd Pamplemousse_build
$ cmake ../Pamplemousse
$ make
Start off by trying the included conversion application.
You can see a help message by running pamplemousse without parameters.
Alternatively, if your requirements become more complex, you may use pamplemousse as a library and implement your own input/output logic to the model.
- Trees
- Neural Networks
- Naïve Bayes
- Regression
- Scorecard
- Support Vector Machines
- Model Composition, Ensembles, and Segmentation
Sure, take the following PMML for example:
<?xml version="1.0" encoding="UTF-8"?>
<PMML version="3.2" xmlns="http://www.dmg.org/PMML-3_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-3_2 http://www.dmg.org/v3-2/pmml-3-2.xsd">
<Header copyright="Copyright (c) 2012 DMG" description="RPart Decision Tree Model">
<Extension name="user" value="DMG" extender="Rattle/PMML"/>
<Application name="Rattle/PMML" version="1.2.29"/>
<Timestamp>2012-09-27 12:46:08</Timestamp>
</Header>
<DataDictionary numberOfFields="5">
<DataField name="class" optype="categorical" dataType="string">
<Value value="Iris-setosa"/>
<Value value="Iris-versicolor"/>
<Value value="Iris-virginica"/>
</DataField>
<DataField name="sepal_length" optype="continuous" dataType="double">
<Interval closure="closedClosed" leftMargin="4.3" rightMargin="7.9"/>
</DataField>
<DataField name="sepal_width" optype="continuous" dataType="double">
<Interval closure="closedClosed" leftMargin="2" rightMargin="4.4"/>
</DataField>
<DataField name="petal_length" optype="continuous" dataType="double">
<Interval closure="closedClosed" leftMargin="1" rightMargin="6.9"/>
</DataField>
<DataField name="petal_width" optype="continuous" dataType="double">
<Interval closure="closedClosed" leftMargin="0.1" rightMargin="2.5"/>
</DataField>
</DataDictionary>
<TreeModel modelName="RPart_Model" functionName="classification" algorithmName="rpart" splitCharacteristic="binarySplit" missingValueStrategy="defaultChild">
<MiningSchema>
<MiningField name="class" usageType="predicted"/>
<MiningField name="sepal_length" usageType="supplementary"/>
<MiningField name="sepal_width" usageType="supplementary"/>
<MiningField name="petal_length" usageType="active"/>
<MiningField name="petal_width" usageType="supplementary"/>
</MiningSchema>
<Output>
<OutputField name="class" optype="categorical" dataType="string" feature="predictedValue"/>
<OutputField name="Probability_Iris-setosa" optype="continuous" dataType="double" feature="probability" value="Iris-setosa"/>
<OutputField name="Probability_Iris-versicolor" optype="continuous" dataType="double" feature="probability" value="Iris-versicolor"/>
<OutputField name="Probability_Iris-virginica" optype="continuous" dataType="double" feature="probability" value="Iris-virginica"/>
</Output>
<Node id="1" score="Iris-virginica" recordCount="105" defaultChild="3">
<True/>
<ScoreDistribution value="Iris-setosa" recordCount="33" confidence="0.314285714285714"/>
<ScoreDistribution value="Iris-versicolor" recordCount="35" confidence="0.333333333333333"/>
<ScoreDistribution value="Iris-virginica" recordCount="37" confidence="0.352380952380952"/>
<Node id="2" score="Iris-setosa" recordCount="33">
<SimplePredicate field="petal_length" operator="lessThan" value="2.6"/>
<ScoreDistribution value="Iris-setosa" recordCount="33" confidence="1"/>
<ScoreDistribution value="Iris-versicolor" recordCount="0" confidence="0"/>
<ScoreDistribution value="Iris-virginica" recordCount="0" confidence="0"/>
</Node>
<Node id="3" score="Iris-virginica" recordCount="72" defaultChild="7">
<SimplePredicate field="petal_length" operator="greaterOrEqual" value="2.6"/>
<ScoreDistribution value="Iris-setosa" recordCount="0" confidence="0"/>
<ScoreDistribution value="Iris-versicolor" recordCount="35" confidence="0.486111111111111"/>
<ScoreDistribution value="Iris-virginica" recordCount="37" confidence="0.513888888888889"/>
<Node id="6" score="Iris-versicolor" recordCount="37">
<SimplePredicate field="petal_length" operator="lessThan" value="4.85"/>
<ScoreDistribution value="Iris-setosa" recordCount="0" confidence="0"/>
<ScoreDistribution value="Iris-versicolor" recordCount="34" confidence="0.918918918918919"/>
<ScoreDistribution value="Iris-virginica" recordCount="3" confidence="0.0810810810810811"/>
</Node>
<Node id="7" score="Iris-virginica" recordCount="35">
<SimplePredicate field="petal_length" operator="greaterOrEqual" value="4.85"/>
<ScoreDistribution value="Iris-setosa" recordCount="0" confidence="0"/>
<ScoreDistribution value="Iris-versicolor" recordCount="1" confidence="0.0285714285714286"/>
<ScoreDistribution value="Iris-virginica" recordCount="34" confidence="0.971428571428571"/>
</Node>
</Node>
</Node>
</TreeModel>
</PMML>
Running Pamplemousse without specifying any explicit outputs:
$ ./pamplemousse --convert --input_table --output_table example.xml
Will give something like the following:
function func ( input )
local petal_length = input and input ["petal_length"]
local class_1 = nil
local probabilities_Iris_setosa = nil
local probabilities_Iris_versicolor = nil
local probabilities_Iris_virginica = nil
if petal_length and petal_length < 2.6 then
class_1 = "Iris-setosa"
probabilities_Iris_setosa = 1
probabilities_Iris_versicolor = 0
probabilities_Iris_virginica = 0
elseif (petal_length == nil or petal_length >= 2.6) then
if petal_length and petal_length < 4.85 then
class_1 = "Iris-versicolor"
probabilities_Iris_setosa = 0
probabilities_Iris_versicolor = 0.91891891891891897
probabilities_Iris_virginica = 0.081081081081081086
elseif (petal_length == nil or petal_length >= 4.85) then
class_1 = "Iris-virginica"
probabilities_Iris_setosa = 0
probabilities_Iris_versicolor = 0.028571428571428571
probabilities_Iris_virginica = 0.97142857142857142
end
end
petal_length = {}
petal_length ["Probability_Iris-virginica"] = probabilities_Iris_virginica or 0
petal_length ["Probability_Iris-versicolor"] = probabilities_Iris_versicolor or 0
petal_length ["Probability_Iris-setosa"] = probabilities_Iris_setosa or 0
petal_length ["class"] = class_1
return petal_length
end
Alternatively, by specifying what you actually want, it will produce something much more concise and efficient
$ ./pamplemousse --convert --input_table --output_table --prediction class example.xml
function func ( input )
local petal_length = input and input ["petal_length"]
local class_1 = nil
if petal_length and petal_length < 2.6 then
class_1 = "Iris-setosa"
elseif (petal_length == nil or petal_length >= 2.6) then
if petal_length and petal_length < 4.85 then
class_1 = "Iris-versicolor"
elseif (petal_length == nil or petal_length >= 4.85) then
class_1 = "Iris-virginica"
end
end
petal_length = {}
petal_length ["class"] = class_1
return petal_length
end