Skip to content

Global graph optimizer user guide

adlerm edited this page Jan 6, 2014 · 17 revisions

Under construction

##Introduction

Textual entailment is inherently a transitive relation, that is, the rules 'x -> y' and 'y -> z' imply the rule 'x -> z'. The transitivity constraint can improve the quality of a given set of weighted entailment rules, by filtering rules which contradict the constraint.

Berant et al. (2012) formulated the problem of learning entailment rules as a graph optimization problem, where nodes are text fragments, and the weighted edges represent entailment rules that respect transitivity. The finding of the optimal set of edges respecting transitivity is NP-hard. In the next sections, we overview two efficiently approximate methods for learning the edges of entailment graphs, and describe the new EOP globalgraphoptimizer tool, which defines a general infrastructure for global optimization algorithms, provided with a set of implemented optimization algorithms.

Algorithms

The global optimization algorithms optimize a given weighted entailment graph, by taking into account the transitivity constraint of the entailment relation. In the following sections we describe two approximation algorithms for global optimization.

Tree node fix

The tree node fix algorithm is based on the assumption about the tree-like structure of the directed acyclic entailment graphs.

Conversion to Forest-reducible graph (FRG)

For a directed edge i  j in a directed acyclic graphs (DAG), we term the node i a child of nodej, and j a parent of i. A directed forest is a DAG where all nodes have no more than one parent. According to Berant et al., most of the entailments graphs can be converted to directed forests by applying the following procedure:

  • Convert the given entailment graph into a Strongly-Connected-Component (SCC) graph in the following way: every strongly connected component (a set of semantically-equivalent predicates, in our graphs) is contracted into a single node, and an edge is added from SCC S1 to SCC S2 if there is an edge in G from some node in S1 to some node in S2. The SCC graph is always a DAG, and since the given entailment graph is transitive the SCC graph is also transitive.
  • Apply transitive reduction on the SCC graph by removing all edges whose absence does not affect its transitive closure (i.e., edges which can be inferred by the transitivity: there should be an edge from node i to node j if there is a path from i to j). In DAGs, the result of transitive reduction is unique. The resulted graph is a forest-reducible graph (FRG) if all nodes in its reduced form have no more than one parent. As mentioned above, Berant et al. shows, empirically, that most of the entailment graphs can be converted by this procedure to FRG.

The Tree-Node-Fix procedure

Given a FRG, converted from an entailment graph, the tree-node-fix procedure reattaches, at each iteration, a single node v to the FRG in a way that improves the objective function, until the value of the objective function cannot be improved anymore by any re-attachment.

Re-attaching a node v is performed by removing v from the graph, and connecting it back with a better set of edges, while maintaining the constraint that it is an FRG. This is done by considering all possible edges from/to the other graph nodes and choosing the optimal subset, while the rest of the graph remains fixed. Berant et al. show that the re-attachment of each node is linear (thus, if we bound the number of iterations over all nodes, the overall complexity is quadratic), by analyzing the three possible re-attachment cases: (1) insertion of node v into a component; (2) insertion of node v as a child of some component; (3) insertion of node v as a new root in the graph. For more details see Section 4.1 in the paper.

The resulted graph of this procedure induces an approximation of the optimized global graph. Berant et al. demonstrate empirically that the method is, by orders of magnitude, faster than the state-of-the-art exact algorithm, but still produces an output that is almost as good as the optimal solution.

High to Low

The high-to-low procedure is quite simple, but effective (according to unreported test, on the dataset that was used for evaluation by Bernant et al., [2012]). Given a weighted entailment graph:

  • Construct a list of all edges, sorted by their weights from high to low.
  • Remove all edges from the graph
  • At each iteration
  • Select the edge with the highest score which are not yet presented in the graph
  • Construct the transitive closure for the selected edge (a set of edges, composed of edges from nodes that reach directly the source node of the selected edge, to nodes that are reached directly from the target node of the selected edge).
  • In case the overall weight of the selected edge and the edges of its transitive closure (defined by the sum of their weights) is positive, add each of the edges to the graph (if is not yet presented there).

The global graph optimizer tool

The globalgraphoptimizer project, of the Excitement Open Platform, is an infrastructure for defining global optimization algorithms, with a set of implemented methods.

The input of the tool is composed of:

  • A weighted directed acyclic graph.
  • A scoring model, which defines the weight of each possible edge
  • An edge learner, which optimizes the given graph, according to the given scoring model and the transitivity constraint, based on some global optimization algorithm. The output of the tool is an optimized graph.

The tool is provided with a set of implementations for graph representations, scoring models, and edge learners. In particular, the edge learner package contains a set of implementations for the methods described in the previous section, with some variations:

  • High to Low: a set of implementations of the high-to-low algorithm.
  • Tree node Fix: a set of implementations of the tree-node-fix algorithms.
  • Combined: a combination of the high-to-low and the tree-node-fix algorithms, where the graph is first constructed according the high-to-low method, and then optimized by applying the tree-node-fix algorithm.
Clone this wiki locally