Skip to content

Latest commit

 

History

History
184 lines (170 loc) · 9.19 KB

README.md

File metadata and controls

184 lines (170 loc) · 9.19 KB

Pamplemousse

What is Pamplemousse?

Pamplemousse is an optimizing transpiler for converting PMML models to Lua scripts. It aims to generate Lua that faithfully represents the exact behaviour of a model, as specified by the PMML standard, while being efficient and readable.

Who is it for?

Pamplemousse is for pragmatic people who want to solve real problems using Machine Learning, particularly in situations where latency is critical.

Why do I want it?

Pamplemousse allows you to use advanced ML models, generated by a wide variety of machine learning software within your project. To start using pamplemousse in production today, you only need to integrate Lua with your solution. Lua is a mature, well known and very efficient scripting language that is extremely easy to integrate. Pamplemousse takes the complexity of machine learning models out of the real-time core of your system, leaving only a simple script that calculates what you need and no more.

Where can I use it?

Pamplemousse can be used wherever Lua can. It is an external tool/library that can be invoked as a part of whatever pipeline that you choose to use to deploy models, it does not have to be integrated into your real-time system. The tool itself is regularly tested on Linux (Centos) and MacOS, but the generated scripts can run on whatever platform that supports Lua. Furthermore, you can test whatever script Pamplemousse has generated thoroughly before deploying it.

How do I build it?

Pamplemousse uses CMake and depends on tinyxml2 and Lua.

Make sure these dependancies are installed, checkout Pamplemousse from here and you are ready to go!

$ git clone https://github.com/ThreatMetrix/Pamplemousse.git
$ mkdir Pamplemousse_build
$ cd Pamplemousse_build
$ cmake ../Pamplemousse
$ make

How do I use it?

Start off by trying the included conversion application.

You can see a help message by running pamplemousse without parameters.

Alternatively, if your requirements become more complex, you may use pamplemousse as a library and implement your own input/output logic to the model.

How much of PMML does it support?

Can I see it in action:

Sure, take the following PMML for example:

<?xml version="1.0" encoding="UTF-8"?>
<PMML version="3.2" xmlns="http://www.dmg.org/PMML-3_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-3_2 http://www.dmg.org/v3-2/pmml-3-2.xsd">
 <Header copyright="Copyright (c) 2012 DMG" description="RPart Decision Tree Model">
  <Extension name="user" value="DMG" extender="Rattle/PMML"/>
  <Application name="Rattle/PMML" version="1.2.29"/>
  <Timestamp>2012-09-27 12:46:08</Timestamp>
 </Header>
 <DataDictionary numberOfFields="5">
  <DataField name="class" optype="categorical" dataType="string">
   <Value value="Iris-setosa"/>
   <Value value="Iris-versicolor"/>
   <Value value="Iris-virginica"/>
  </DataField>
  <DataField name="sepal_length" optype="continuous" dataType="double">
   <Interval closure="closedClosed" leftMargin="4.3" rightMargin="7.9"/>
  </DataField>
  <DataField name="sepal_width" optype="continuous" dataType="double">
   <Interval closure="closedClosed" leftMargin="2" rightMargin="4.4"/>
  </DataField>
  <DataField name="petal_length" optype="continuous" dataType="double">
   <Interval closure="closedClosed" leftMargin="1" rightMargin="6.9"/>
  </DataField>
  <DataField name="petal_width" optype="continuous" dataType="double">
   <Interval closure="closedClosed" leftMargin="0.1" rightMargin="2.5"/>
  </DataField>
 </DataDictionary>
 <TreeModel modelName="RPart_Model" functionName="classification" algorithmName="rpart" splitCharacteristic="binarySplit" missingValueStrategy="defaultChild">
  <MiningSchema>
   <MiningField name="class" usageType="predicted"/>
   <MiningField name="sepal_length" usageType="supplementary"/>
   <MiningField name="sepal_width" usageType="supplementary"/>
   <MiningField name="petal_length" usageType="active"/>
   <MiningField name="petal_width" usageType="supplementary"/>
  </MiningSchema>
  <Output>
   <OutputField name="class" optype="categorical" dataType="string" feature="predictedValue"/>
   <OutputField name="Probability_Iris-setosa" optype="continuous" dataType="double" feature="probability" value="Iris-setosa"/>
   <OutputField name="Probability_Iris-versicolor" optype="continuous" dataType="double" feature="probability" value="Iris-versicolor"/>
   <OutputField name="Probability_Iris-virginica" optype="continuous" dataType="double" feature="probability" value="Iris-virginica"/>
  </Output>
  <Node id="1" score="Iris-virginica" recordCount="105" defaultChild="3">
   <True/>
   <ScoreDistribution value="Iris-setosa" recordCount="33" confidence="0.314285714285714"/>
   <ScoreDistribution value="Iris-versicolor" recordCount="35" confidence="0.333333333333333"/>
   <ScoreDistribution value="Iris-virginica" recordCount="37" confidence="0.352380952380952"/>
   <Node id="2" score="Iris-setosa" recordCount="33">
    <SimplePredicate field="petal_length" operator="lessThan" value="2.6"/>
    <ScoreDistribution value="Iris-setosa" recordCount="33" confidence="1"/>
    <ScoreDistribution value="Iris-versicolor" recordCount="0" confidence="0"/>
    <ScoreDistribution value="Iris-virginica" recordCount="0" confidence="0"/>
   </Node>
   <Node id="3" score="Iris-virginica" recordCount="72" defaultChild="7">
    <SimplePredicate field="petal_length" operator="greaterOrEqual" value="2.6"/>
    <ScoreDistribution value="Iris-setosa" recordCount="0" confidence="0"/>
    <ScoreDistribution value="Iris-versicolor" recordCount="35" confidence="0.486111111111111"/>
    <ScoreDistribution value="Iris-virginica" recordCount="37" confidence="0.513888888888889"/>
    <Node id="6" score="Iris-versicolor" recordCount="37">
     <SimplePredicate field="petal_length" operator="lessThan" value="4.85"/>
     <ScoreDistribution value="Iris-setosa" recordCount="0" confidence="0"/>
     <ScoreDistribution value="Iris-versicolor" recordCount="34" confidence="0.918918918918919"/>
     <ScoreDistribution value="Iris-virginica" recordCount="3" confidence="0.0810810810810811"/>
    </Node>
    <Node id="7" score="Iris-virginica" recordCount="35">
     <SimplePredicate field="petal_length" operator="greaterOrEqual" value="4.85"/>
     <ScoreDistribution value="Iris-setosa" recordCount="0" confidence="0"/>
     <ScoreDistribution value="Iris-versicolor" recordCount="1" confidence="0.0285714285714286"/>
     <ScoreDistribution value="Iris-virginica" recordCount="34" confidence="0.971428571428571"/>
    </Node>
   </Node>
  </Node>
 </TreeModel>
</PMML>

Running Pamplemousse without specifying any explicit outputs:

$ ./pamplemousse --convert --input_table --output_table example.xml

Will give something like the following:

function func ( input )
  local petal_length = input and input ["petal_length"]
  local class_1 = nil
  local probabilities_Iris_setosa = nil
  local probabilities_Iris_versicolor = nil
  local probabilities_Iris_virginica = nil
  if petal_length and petal_length < 2.6 then
    class_1 = "Iris-setosa"
    probabilities_Iris_setosa = 1
    probabilities_Iris_versicolor = 0
    probabilities_Iris_virginica = 0
  elseif (petal_length == nil or petal_length >= 2.6) then
    if petal_length and petal_length < 4.85 then
      class_1 = "Iris-versicolor"
      probabilities_Iris_setosa = 0
      probabilities_Iris_versicolor = 0.91891891891891897
      probabilities_Iris_virginica = 0.081081081081081086
    elseif (petal_length == nil or petal_length >= 4.85) then
      class_1 = "Iris-virginica"
      probabilities_Iris_setosa = 0
      probabilities_Iris_versicolor = 0.028571428571428571
      probabilities_Iris_virginica = 0.97142857142857142
    end
  end
  petal_length = {}
  petal_length ["Probability_Iris-virginica"] = probabilities_Iris_virginica or 0
  petal_length ["Probability_Iris-versicolor"] = probabilities_Iris_versicolor or 0
  petal_length ["Probability_Iris-setosa"] = probabilities_Iris_setosa or 0
  petal_length ["class"] = class_1
  return petal_length
end

Alternatively, by specifying what you actually want, it will produce something much more concise and efficient

$ ./pamplemousse --convert --input_table --output_table --prediction class example.xml
function func ( input )
  local petal_length = input and input ["petal_length"]
  local class_1 = nil
  if petal_length and petal_length < 2.6 then
    class_1 = "Iris-setosa"
  elseif (petal_length == nil or petal_length >= 2.6) then
    if petal_length and petal_length < 4.85 then
      class_1 = "Iris-versicolor"
    elseif (petal_length == nil or petal_length >= 4.85) then
      class_1 = "Iris-virginica"
    end
  end
  petal_length = {}
  petal_length ["class"] = class_1
  return petal_length
end