MaxEntClassificationEDA

under developing

MaxEntClassificationEDA is an Entailment Decision Algorithm (EDA) based on a prototype system called TIE (Textual Inference Engine), which is developed and maintained by Rui Wang and his colleagues in the Language Technology (LT) lab of DFKI GmbH.

Notice that technically running MaxEntClassificationEDA does not require additional installation or building steps apart from setting up the EOP. Also, among the knowledge resources that should be manually installed by the users (explained here in the manual), we highly recommend the users to install TreeTagger in order to use most of the components described below. Other required knowledge resources for each of the configurations are described below.

Here are the configurations you could setup for MaxEntClassificationEDA.

Table of Contents Configuration File Common settings Specific settings for English Specific settings for German

Configuration File

There is a list of pre-defined configuration files which can be found at /config and also in the eop-resources archive at eop-resources/configuration-file/. Most values in the configuration file can stay exactly as provided. We bring here the details of some of the values you may wish (or need) to change.

Common settings

Section	Property	Value	Requirement
`PlatformConfiguration`	`activatedEDA`	It's the common setting for selecting the EDA. The default value here is `eu.excitementproject.eop. core.MaxEntClassificationEDA`.	N/A
`PlatformConfiguration`	`language`	For the moment, `MaxEntClassificationEDA` supports English (`EN`), German (`DE`), and Italian (`IT`). In principle, the EDA is language-independent. The default value is `EN`.	N/A
`PlatformConfiguration`	`activatedLAP`	The linguistic analysis pipeline needed for the EDA. The default value is `eu.excitementproject.eop.lap. dkpro.MaltParserEN`. Notice that the `EN` indicates the language flag.	N/A
`eu.excitementproject.eop. core.MaxEntClassificationEDA`	`modelFile`	The location where the trained model is stored. The default location is under `./src/test/resources/model/`. The conventional name for a model consists of the EDA name, the settings, and the language flag. For instance, MaxEntClassificationEDAModel_Base +TS_DE means a German model using the bag-of-words similarity, the bag-of-lemmas similarity and the tree skeleton similarity. The default value is usually the same as the configuration file name.	For training, the model file should NOT exist; for testing, the path to the model file should be updated correctly.
`eu.excitementproject.eop. core.MaxEntClassificationEDA`	`trainDir`	The directory contains the training data. The data should be (linguistically) preprocessed and serialized into `xmi` files. The default value is `./target/EN/dev/`. Notice that the `EN` indicates the language flag.	The directory should exist.
`eu.excitementproject.eop. core.MaxEntClassificationEDA`	`testDir`	The directory contains the testing data. The data should be (linguistically) preprocessed and serialized into `xmi` files. The default value is `./target/EN/test/`. Notice that the `EN` indicates the language flag.	The directory should exist.
`eu.excitementproject.eop. core.MaxEntClassificationEDA`	`classifier`	The setting for the maximum entropy classifier. For the moment, there are two parameters supported, maximum iteration number and the cutoff threshold, which are separated by comma. The default value is `10000,1`.	N/A
`eu.excitementproject.eop. core.MaxEntClassificationEDA`	`Components`	The list of components used in the EDA, which are separated by comma. Notice that each of the components needs to have a separate section in the configuration file. Otherwise, there will be a `ConfigurationException`.	N/A
`BagOfWordsScoring`	N/A	The bag-of-words scoring component. There is no further settings supported.	The LAP should include a tokenizer, e.g., `OpenNLPTaggerEN`.
`BagOfLemmasScoring`	N/A	The bag-of-lemmas scoring component. There is no further settings supported.	The LAP should include a tokenizer and a lemmatizer, e.g., `TreeTaggerEN`.
`BagOfDepsScoring`	N/A	The bag-of-dependencies (without POS tags) scoring component. There is no further settings supported.	The LAP should include syntactic analysis, e.g., `MaltParserEN`.
`BagOfDepsPosScoring`	N/A	The bag-of-dependencies (with POS tags) scoring component. There is no further settings supported.	The LAP should include syntactic analysis, e.g., `MaltParserEN`.
`TreeSkeletonScoring`	N/A	The tree skeleton scoring component. There is no further settings supported.	The LAP should include syntactic analysis, e.g., `MaltParserEN`.

Specific settings for English

Notice that the English lexical resources, WordNet and VerbOcean, need to be properly installed in order to run the following configurations respectively.

Section	Property	Value	Requirement
`BagOfLexesScoring`	`WordnetLexicalResource`	It indicates the usage of the WordNet. The value indicates the relations used separated by comma. The default value is the relations related to entailment, i.e., `HYPERNYM, SYNONYM, PART_HOLONYM`. There is a separate section for further settings.	N/A
`WordnetLexicalResource`	`wordNetFilesPath`	The path to the location of WordNet. The default value is `/ontologies/ EnglishWordNet-dict/`.	The path needs to be updated.
`WordnetLexicalResource`	`isCollapsed`	Whether to query the WordNet with all the selected relations together or separately. The default value is `true`.	N/A
`WordnetLexicalResource`	`useFirstSenseOnlyLeft`	Whether to query the WordNet with only the first sense on the left hand side of the relation. The default value is `false`.	N/A
`WordnetLexicalResource`	`useFirstSenseOnlyRight`	Whether to query the WordNet with only the first sense on the right hand side of the relation. The default value is `false`.	N/A
`BagOfLexesScoring`	`VerbOceanLexicalResource`	It indicates the usage of the VerbOcean. The value indicates the relations used separated by comma. The default value is the relations related to entailment, i.e., `StrongerThan, CanResultIn, Similar`. There is a separate section for further settings.	N/A
`VerbOceanLexicalResource`	`verbOceanFilePath`	The path to the location of WordNet. The default value is `/VerbOcean/ verbocean.unrefined.2004-05-20.txt`.	The path needs to be updated.
`VerbOceanLexicalResource`	`isCollapsed`	Whether to query the VerbOcean with all the selected relations together or separately. The default value is `true`.	N/A

Specific settings for German

Notice that the German lexical resources, GermaNet, DistSim, and DerivBase, need to be properly installed in order to run the following configurations respectively. In particular, GermaNet is not delivered with the EOP resources package. Further settings of the lexical resources can be found here.

Section	Property	Value	Requirement
`BagOfLexesScoring`	`withPOS`	Whether the bag-of-lexes scoring component will include POS in the queries to the lexical resources. The default value is `false`.	N/A
`BagOfLexesScoring`	`GermanDistSim`	It indicates the usage of the German distributional similarity resource. Further settings can be found here.	N/A
`BagOfLexesScoring`	`GermaNetWrapper`	It indicates the usage of the GermaNet. The value indicates the relations used, separated by comma. The default value is the relations related to entailment, i.e., `Causes, Entails, Has_Hypernym, Has_Synonym`. Further settings can be found here.	GermaNet should be properly installed and the path should be correctly specified.
`BagOfLexesScoring`	`DerivBaseResource`	It indicates the usage of the German derivational resource. Further settings can be found here.	It is only triggered when `withPOS` is turned on.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MaxEntClassificationEDA

Table of Contents

Configuration File

Common settings

Specific settings for English

Specific settings for German

EOP

Get Involved

Documentation

Development

Clone this wiki locally