Skip to content
Markus Cozowicz edited this page Nov 20, 2017 · 17 revisions

The native C++ codebase can ingest JSON by passing --json

The following JSON format can be ingested into VW:

  • Top-level properties are considered features for the default namespace.
  • Top-level properties of type object or array are considered namespaces.
  • Features are JSON strings, integer, float, boolean, arrays of integers and/or floats.
  • Top-level properties starting with _ are ignored, except if they match a special property (e.g. "_label", "_multi", "_text").
  • Labels can be passed using top-level "_label" property. This is also supported for multiline examples, but the label needs to be part of one of the multiline examples.
  • If the JSON value is either a string, integer or float is converted to a string and passed directly to VW label parser.
  • If the JSON value is an object, the first property needs to match one of the JSON properties of SimpleLabel or ContextualBanditLabel.
  • Special text handling through "_text": properties named "_text" are processed using string splitting and not string escaping (see sample below).
  • Multiline examples as used by contextual bandits are specified by using the "_multi" property. Each entry itself is an example as described above and can optionally contain a label. The top-level properties are used for the optional shared example.

The C# layer can ingest

  • JSON strings
  • JSON.NET's JsonReader
  • C# objects serializable to the above JSON format using JSON.NET serializing rules. Thus JsonProperty annotations are inspected and so on. This is particularly useful if one needs to score a given object, then serialize it JSON and train from the JSON serialization as it circumvents the de-serialization for the scoring part.

Examples

JSON VW String
 
{
 "f1":25,"f2":true,
 "_aux":"some ignored info"
} 
 | f1:25 f2
 
{
 "ns1":{"location":"New York"},
 "f2":[1,0.2,3]
} 
 |ns1 New_York | :1 :.2 :.3
{
 "ns1":{"location":"New York"},
 "ns2":{"f2":3.4},"_label":1
} 
1 |ns1 New_York |ns2 f2:3.4
 
{
 "ns1":{"location":"New York", "f2":3.4},
 "_label":{"Label":2,"Weight":0.3}
} 
2 0.3 |ns1 New_York f2:3.4
 
{
 "x":2,
 "_text":"elections US iowa"
} 
| x:2 elections US iowa
 
{
 "UserAge":15,
 "_multi":[
   {"_text":"elections maine", "Source":"TV"},
   {"Source":"www", "topic":4, "_label":"2:3:.3"}
 ]
} 
shared | UserAge:15
| elections maine SourceTV
2:3:.3 | Sourcewww topic:4
Clone this wiki locally