Skip to content
rzanoli edited this page Feb 13, 2015 · 12 revisions
Introduction

AdArte (A Transformation-Driven Approach for Recognizing Textual Entailment) is based on modelling entailment relations as a classification problem where the single T-H pairs are first represented by a sequence of edit operations (i.e., deleting, replacing and inserting pieces of text) called transformations needed to transform T into H, and then used as features to feed up a supervised learning classifier to classify the pairs as positive or negative examples.

Transformations

The transformations are calculated by applying tree edit distance (Tai, 1979) on the dependency trees of the T-H pairs while some Background Knowledge like WordNet, VerbOcean and Catvar is used for recognizing cases where T and H use different textual expressions (e.g., girl vs young_woman, spray vs spraying) while preserving entailment.

Classification

The transformations are used as features for the T-H pairs classification. In this context we adopt Weka (Hall et al, 2009), that is a collection of machine learning algorithms that allows for trying different algorithms like Random Forest and Support Vector Machines (SVM).

Evaluation

AdArte has been evaluated on two different data sets. The SICK data set (Marelli et al, 2014b) that was used at SemEval-2014 Task#1. EXCITEMENT Entailment Graph is instead a new data set developed within EXCITEMENT and containing email feedbacks sent by customers of a railway company. A first comparison of this implemented approach with other existing methods shows state-of-the-art performance.

Future Work

The current implementation of AdArte has some limitations whose solution is subject to future work. In fact with some data sets like RTE-3 where the number of labelled examples is limited (a few hundreds of pairs) and the number of the produced transformations could exceed the examples, the predictive power of the learned model could be considerably reduced.

Related Work

The AdArte approach is different from other approaches based on edit distance (e.g., EDITS) that calculate threshold values best separating positive from negative examples or approaches applying the transformations derived from knowledge and linguistic resources like WordNet and Wikipedia (e.g., BIUTEE).

References

Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: An update. SIGKDD Explor Newsl 11(1):10–18, DOI 10.1145/1656274.1656278

Marelli M, Menini S, Baroni M, Bentivogli L, Bernardi R, Zamparelli R (2014b) A SICK cure for the evaluation of compositional distributional semantic models. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland, May 26-31, 2014., pp 216–223

Tai K (1979) The tree to tree correction problem. J ACM 26(3):442–433

Clone this wiki locally