-
Notifications
You must be signed in to change notification settings - Fork 6
Rulewritable
The RuleWritable
is probably the most important datatype in Thrax. It is a representation of an SCFG rule. It has these fields:
-
lhs
, aText
(a Hadoop datatype for fast-comparison Strings) representing the left hand side nonterminal of the rule. -
source
, aText
representation of the source side of the rule -
target
, aText
of the target side of the rule -
e2f
andf2e
, two AlignmentArrays giving the target-to-source alignments and source-to-target alignments, respectively -
features
, aMapWritable
.
Here are some notes:
The AlignmentArray is a two-dimensional array of Text. It has a length equal to the number of terminal symbols on a given side, and the first item of each array is that terminal symbol. The remaining items are the terminals it has been aligned to, or "/UNALIGNED/" if the word is unaligned. For example, let's say we have a rule
[X] ||| foo [X] bar baz ||| a b [X] c |||
where foo is aligned to a and b, baz is aligned to c and bar is unaligned. Then the AlignmentArrays would look like this:
e2f: [ a | foo ] [ b | foo ] [ c | baz]
f2e: [ foo | a | b ] [ bar | /UNALIGNED/ ] [ baz | c ]
features
is a MapWritable. This is what you will want to modify to add new feature values to a rule. Once you calculate a feature value, you can simply put
it into the map. Easy.