-
Notifications
You must be signed in to change notification settings - Fork 0
MaxEntClassificationEDA
under developing
MaxEntClassificationEDA is an Entailment Decision Algorithm (EDA) based on a prototype system called TIE (Textual Inference Engine), which is developed and maintained by Rui Wang and his colleagues in the Language Technology (LT) lab of DFKI GmbH.
Notice that technically running MaxEntClassificationEDA does not require additional installation or building steps apart from setting up the EOP. Also, among the knowledge resources that should be manually installed by the users (explained here in the manual), we highly recommend the users to install TreeTagger in order to use most of the components described below. Other required knowledge resources for each of the configurations are described below.
Here are the configurations you could setup for MaxEntClassificationEDA.
There is a list of pre-defined configuration files which can be found at /config and also in the eop-resources archive at eop-resources/configuration-file/. Most values in the configuration file can stay exactly as provided. We bring here the details of some of the values you may wish (or need) to change.
Section | Property | Value | Requirement |
---|---|---|---|
PlatformConfiguration | activatedEDA | It's the common setting for selecting the EDA. The default value here is eu.excitementproject.eop. core.MaxEntClassificationEDA. | N/A |
PlatformConfiguration | language | For the moment, MaxEntClassificationEDA supports English (EN), German (DE), and Italian (IT). In principle, the EDA is language-independent. The default value is EN. | N/A |
PlatformConfiguration | activatedLAP | The linguistic analysis pipeline needed for the EDA. The default value is eu.excitementproject.eop.lap. dkpro.MaltParserEN. Notice that the EN indicates the language flag. | N/A |
eu.excitementproject.eop. core.MaxEntClassificationEDA | modelFile | The location where the trained model is stored. The default location is under ./src/test/resources/model/. The conventional name for a model consists of the EDA name, the settings, and the language flag. For instance, MaxEntClassificationEDAModel_Base +TS_DE means a German model using the bag-of-words similarity, the bag-of-lemmas similarity and the tree skeleton similarity. The default value is usually the same as the configuration file name. | For training, the model file should NOT exist; for testing, the path to the model file should be updated correctly. |
eu.excitementproject.eop. core.MaxEntClassificationEDA | trainDir | The directory contains the training data. The data should be (linguistically) preprocessed and serialized into xmi files. The default value is ./target/EN/dev/. Notice that the EN indicates the language flag. | The directory should exist. |
eu.excitementproject.eop. core.MaxEntClassificationEDA | testDir | The directory contains the testing data. The data should be (linguistically) preprocessed and serialized into xmi files. The default value is ./target/EN/test/. Notice that the EN indicates the language flag. | The directory should exist. |
eu.excitementproject.eop. core.MaxEntClassificationEDA | classifier | The setting for the maximum entropy classifier. For the moment, there are two parameters supported, maximum iteration number and the cutoff threshold, which are separated by comma. The default value is 10000,1. | N/A |
eu.excitementproject.eop. core.MaxEntClassificationEDA | Components | The list of components used in the EDA, which are separated by comma. Notice that each of the components needs to have a separate section in the configuration file. Otherwise, there will be a ConfigurationException. | N/A |
BagOfWordsScoring | N/A | The bag-of-words scoring component. There is no further settings supported. | The LAP should include a tokenizer, e.g., OpenNLPTaggerEN. |
BagOfLemmasScoring | N/A | The bag-of-lemmas scoring component. There is no further settings supported. | The LAP should include a tokenizer and a lemmatizer, e.g., TreeTaggerEN. |
BagOfDepsScoring | N/A | The bag-of-dependencies (without POS tags) scoring component. There is no further settings supported. | The LAP should include syntactic analysis, e.g., MaltParserEN. |
BagOfDepsPosScoring | N/A | The bag-of-dependencies (with POS tags) scoring component. There is no further settings supported. | The LAP should include syntactic analysis, e.g., MaltParserEN. |
TreeSkeletonScoring | N/A | The tree skeleton scoring component. There is no further settings supported. | The LAP should include syntactic analysis, e.g., MaltParserEN. |
Notice that the English lexical resources, WordNet and VerbOcean, need to be properly installed in order to run the following configurations respectively.
Section | Property | Value | Requirement |
---|---|---|---|
BagOfLexesScoring | WordnetLexicalResource | It indicates the usage of the WordNet. The value indicates the relations used separated by comma. The default value is the relations related to entailment, i.e., HYPERNYM, SYNONYM, PART_HOLONYM. There is a separate section for further settings. | N/A |
WordnetLexicalResource | wordNetFilesPath | The path to the location of WordNet. The default value is /ontologies/ EnglishWordNet-dict/. | The path needs to be updated. |
WordnetLexicalResource | isCollapsed | Whether to query the WordNet with all the selected relations together or separately. The default value is true. | N/A |
WordnetLexicalResource | useFirstSenseOnlyLeft | Whether to query the WordNet with only the first sense on the left hand side of the relation. The default value is false. | N/A |
WordnetLexicalResource | useFirstSenseOnlyRight | Whether to query the WordNet with only the first sense on the right hand side of the relation. The default value is false. | N/A |
BagOfLexesScoring | VerbOceanLexicalResource | It indicates the usage of the VerbOcean. The value indicates the relations used separated by comma. The default value is the relations related to entailment, i.e., StrongerThan, CanResultIn, Similar. There is a separate section for further settings. | N/A |
VerbOceanLexicalResource | verbOceanFilePath | The path to the location of WordNet. The default value is /VerbOcean/ verbocean.unrefined.2004-05-20.txt. | The path needs to be updated. |
VerbOceanLexicalResource | isCollapsed | Whether to query the VerbOcean with all the selected relations together or separately. The default value is true. | N/A |
Notice that the German lexical resources, GermaNet, DistSim, and DerivBase, need to be properly installed in order to run the following configurations respectively. In particular, GermaNet is not delivered with the EOP resources package. Further settings of the lexical resources can be found here.
Section | Property | Value | Requirement |
---|---|---|---|
BagOfLexesScoring | withPOS | Whether the bag-of-lexes scoring component will include POS in the queries to the lexical resources. The default value is false. | N/A |
BagOfLexesScoring | GermanDistSim | It indicates the usage of the German distributional similarity resource. Further settings can be found here. | N/A |
BagOfLexesScoring | GermaNetWrapper | It indicates the usage of the GermaNet. The value indicates the relations used, separated by comma. The default value is the relations related to entailment, i.e., Causes, Entails, Has_Hypernym, Has_Synonym. Further settings can be found here. | GermaNet should be properly installed and the path should be correctly specified. |
BagOfLexesScoring | DerivBaseResource | It indicates the usage of the German derivational resource. Further settings can be found here. | It is only triggered when withPOS is turned on. |