Skip to content

Global graph optimizer user guide

adlerm edited this page Jan 7, 2014 · 17 revisions

Under construction

##Introduction

Textual entailment is inherently a transitive relation, that is, the rules 'x -> y' and 'y -> z' imply the rule 'x -> z'. The transitivity constraint can improve the quality of a given set of weighted entailment rules, by filtering rules which contradict the constraint.

Berant et al. (2012) formulated the problem of learning entailment rules as a graph optimization problem, where nodes are text fragments, and the weighted edges represent entailment rules that respect transitivity. The finding of the optimal set of edges respecting transitivity is NP-hard. In the next sections, we overview two efficiently approximate methods for learning the edges of entailment graphs, and describe the new EOP globalgraphoptimizer tool, which defines a general infrastructure for global optimization algorithms, provided with a set of implemented optimization algorithms.

Algorithms

The global optimization algorithms optimize a given weighted entailment graph, by taking into account the transitivity constraint of the entailment relation. In the following sections we describe two approximation algorithms for global optimization.

Tree Node Fix

The tree node fix algorithm is based on the assumption about the tree-like structure of the directed acyclic entailment graphs.

Conversion to Forest-reducible graph (FRG)

For a directed edge i  j in a directed acyclic graphs (DAG), we term the node i a child of nodej, and j a parent of i. A directed forest is a DAG where all nodes have no more than one parent. According to Berant et al., most of the entailments graphs can be converted to directed forests by applying the following procedure:

  • Convert the given entailment graph into a Strongly-Connected-Component (SCC) graph in the following way: every strongly connected component (a set of semantically-equivalent predicates, in our graphs) is contracted into a single node, and an edge is added from SCC S1 to SCC S2 if there is an edge in G from some node in S1 to some node in S2. The SCC graph is always a DAG, and since the given entailment graph is transitive the SCC graph is also transitive.
  • Apply transitive reduction on the SCC graph by removing all edges whose absence does not affect its transitive closure (i.e., edges which can be inferred by the transitivity: there should be an edge from node i to node j if there is a path from i to j). In DAGs, the result of transitive reduction is unique. The resulted graph is a forest-reducible graph (FRG) if all nodes in its reduced form have no more than one parent. As mentioned above, Berant et al. shows, empirically, that most of the entailment graphs can be converted by this procedure to FRG.

The Tree-Node-Fix procedure

Given a FRG, converted from an entailment graph, the tree-node-fix procedure reattaches, at each iteration, a single node v to the FRG in a way that improves the objective function, until the value of the objective function cannot be improved anymore by any re-attachment.

Re-attaching a node v is performed by removing v from the graph, and connecting it back with a better set of edges, while maintaining the constraint that it is an FRG. This is done by considering all possible edges from/to the other graph nodes and choosing the optimal subset, while the rest of the graph remains fixed. Berant et al. show that the re-attachment of each node is linear (thus, if we bound the number of iterations over all nodes, the overall complexity is quadratic), by analyzing the three possible re-attachment cases: (1) insertion of node v into a component; (2) insertion of node v as a child of some component; (3) insertion of node v as a new root in the graph. For more details see Section 4.1 in the paper.

The resulted graph of this procedure induces an approximation of the optimized global graph. Berant et al. demonstrate empirically that the method is, by orders of magnitude, faster than the state-of-the-art exact algorithm, but still produces an output that is almost as good as the optimal solution.

High to Low

The high-to-low procedure is quite simple, but effective (according to unreported test, on the dataset that was used for evaluation by Bernant et al., [2012]). Given a weighted entailment graph:

  • Construct a list of all edges, sorted by their weights from high to low.
  • Remove all edges from the graph
  • At each iteration
  • Select the edge with the highest score which are not yet presented in the graph
  • Construct the transitive closure for the selected edge (a set of edges, composed of edges from nodes that reach directly the source node of the selected edge, to nodes that are reached directly from the target node of the selected edge).
  • In case the overall weight of the selected edge and the edges of its transitive closure (defined by the sum of their weights) is positive, add each of the edges to the graph (if is not yet presented there).

The global graph optimizer tool

The globalgraphoptimizer project, of the Excitement Open Platform, is an infrastructure for defining global optimization algorithms, with a set of implemented methods.

The input of the tool is composed of:

  • A weighted directed acyclic graph.
  • A scoring model, which defines the weight of each possible edge
  • An edge learner, which optimizes the given graph, according to the given scoring model and the transitivity constraint, based on some global optimization algorithm. The output of the tool is an optimized graph.

The tool is provided with a set of implementations for graph representations, scoring models, and edge learners. In particular, the edge learner package contains a set of implementations for the methods described in the previous section, with some variations:

  • High to Low: a set of implementations of the high-to-low algorithm.
  • Tree node Fix: a set of implementations of the tree-node-fix algorithms.
  • Combined: a combination of the high-to-low and the tree-node-fix algorithms, where the graph is first constructed according the high-to-low method, and then optimized by applying the tree-node-fix algorithm.

Packages

The project is composed of the following packages

eu.excitementproject.eop.globalgraphoptimizer.defs

General definitions of constants and enums.

eu.excitementproject.eop.globalgraphoptimizer.graph

Implementation of various graph types.

eu.excitementproject.eop.globalgraphoptimizer.score

Definitions of various scoring models, assign initial scores for graph edges.

######Main Interface

public interface ScoreModel {
	/**
	 * @param fromNode a source node
	 * @param toNode a target node
	 * @return an entailemnt score for the given entailment nodes
	 * @throws Exception
	 */
	double getEntailmentScore(RelationNode fromNode, RelationNode toNode) throws Exception;
	
	/**
	 * @param desc1 a description of a source node
	 * @param desc2 a description of a target node
	 * @return an entailemnt score for the given entailment node descriptions
	 * @throws Exception
	 */
	double getEntailmentScore(String desc1, String desc2) throws Exception;
	
	/**
	 * Gets a set of entailning nodes
	 * 
	 * @return a set of entailment pairs
	 */
	public Set<Pair<String,String>> getEntailing();
	
	/**
	 * Gets a set of non-entailning nodes
	 * 
	 * @return a set of non-entailing pairs
	 */
	public Set<Pair<String,String>> getNonEntailing();
	
}

######Main implementations

  • MapLocalScorer basic implementation, based on a given scoring table, and entailing/non-entailing lists
  • ContractedGraphLocalScoreModel a scorer for a graph that has been contracted - built from the original graph and its scorer
eu.excitementproject.eop.globalgraphoptimizer.edgelearners

Implementation of general optimization methods, which maximize a given graph and a score model, based on the transitivity constraint.

######Main Interface

public interface EdgeLearner {
	/**
	 * Initializes a given graph and scoring model according to some initialization method
	 * 
	 * @param ioGraph A given graph to be optimized
	 * @param iLocalModel A scoring model, which assign score to each possible edge
	 * @throws Exception
	 */
	void init(NodeGraph ioGraph,MapLocalScorer iLocalModel) throws Exception;
	
	/**
	 * Optimize the initial graph according to some method
	 * 
	 * @throws Exception
	 */
	void learn() throws Exception;
	
	
	/**
	 * Returns the current value of the objective function of the learning process
	 * @return the current value of the objective function of the learning process
	 */
	double getObjectiveFunctionValue();	
}

######Main Implementations

  • LocalLearner learns a graph composed of edges with scores, according to the given score model, greater than a given minimal score.
  • HighToLowTreeLearner implements the high-to-low algorithm, described above
  • EfficientlyCorrectEmptyTreeLearner implements the tree-node-fix algorithm, described above
  • EfficientlyCorrectHtlLearner implements the combined high-to-low and tree-node-fix algorithm, described above.
eu.excitementproject.eop.globalgraphoptimizer.alg

Implementations of various graph algorithms:

  • StrongConnectivityComponents extraction of strong connectivity components
  • TransitiveClosureReduction Transitive closure reduction
  • TopologicalSorter Topological sort
eu.excitementproject.eop.globalgraphoptimizer.edgelearners

The API is basically the UntypedPredicateGraphLearner which defines the whole learning process, for a given edge learner and graph.

Usage

In order to use the tool:

  1. Define your graph as a DirectedOntologyGraph

  2. Choose an edge learner

  3. Provide known entailing and non-entailing edges(optional).

  4. Define score for unknown edges (default, 0).

  5. Create an UntypedPredicateGraphLearner instance

  6. Apply the 'learn' method, in order to get a set of optimized components

The following program apply the EfficientlyCorrectHtlLearner on a given EntailmentGraphRaw of the transduction layer:

	public EntailmentGraphCollapsed optimizeGraph(EntailmentGraphRaw workGraph, Double confidenceThreshold) throws GraphOptimizerException{
		DirectedOntologyGraph graph = new DirectedOneMappingOntologyGraph("work graph");

		/* Commented out: 
		 * Here we only add nodes connected by edges in the graph, and we consider all existing edges to denote entailment. 
		 * We should 1) add "orphan" nodes, not connected with any other node.
		 *           2) check whether edges hold ENTAILMENT relation or not  
		 *           3) consider adding edges NOT present in the graph as non-entailment edges with some confidence.
		 * In addition, we should consider all the edges with confidence < confidenceThreshold as non-entailment    
		 */

		HashMap<String, Integer> nodeIndex = new HashMap<String, Integer>(); 
		int i = 1;
		for (EntailmentUnit node : workGraph.vertexSet()) {
			nodeIndex.put(node.getText(), i);
			RelationNode rNode = new RelationNode(i++,node.getText());
			graph.addNode(rNode);
		}
		for (EntailmentRelation edge : workGraph.edgeSet()) {
			Double confidence = detectConfidence(workGraph, edge.getSource(), edge.getTarget(), confidenceThreshold);
			if (confidence == TEDecision.CONFIDENCE_NOT_AVAILABLE)
				throw new GraphOptimizerException("Unavaliable score was detected.");
			RelationNode sourceNode = new RelationNode(nodeIndex.get(edge.getSource().getText()),edge.getSource().getText());
			RelationNode targetNode = new RelationNode(nodeIndex.get(edge.getTarget().getText()),edge.getTarget().getText());
			try {
				graph.addEdge(new RuleEdge(sourceNode, targetNode,confidence));
			} catch (Exception e) {
				throw new GraphOptimizerException("Problem when adding edge "+edge.getSource().getText()+" -> "+edge.getTarget().getText()+" with confidence = "+confidence+".\n"+e);
			}
		}
		
		Set<AbstractOntologyGraph> componnetGraphs;
		try {
			componnetGraphs = graphLearner.learn(graph);
		} catch (Exception e) {
			throw new GraphOptimizerException("Problem with global optimization.\n"+ExceptionUtils.getFullStackTrace(e));
		}
		EntailmentGraphCollapsed ret = new EntailmentGraphCollapsed();

		/* Commented out: 
		 * Here we only add nodes connected by edges in the optimized graph, while we need all the nodes from original graph to be covered in the output. 
		 * We should 1) add "orphan" nodes, not connected with any other node.
		 *           2) collapse paraphrasing nodes into equivalence classes
		 */
		EntailmentGraphRaw tmpRawGraph = new EntailmentGraphRaw();
		for (EntailmentUnit node : workGraph.vertexSet()){
			tmpRawGraph.addVertex(node);
		}
		for (AbstractOntologyGraph componnetGraph : componnetGraphs) {
			for (AbstractRuleEdge componentEdge : componnetGraph.getEdges()) {
				if (componentEdge.score() >= confidenceThreshold) { //TODO: do we need the threshold here? Should it be confidenceThreshold or just 0 (to retain only entailment edges)? 				
					EntailmentUnit source = workGraph.getVertex(componentEdge.from().description());
					EntailmentUnit target=  workGraph.getVertex(componentEdge.to().description());
					EntailmentRelation edge = new EntailmentRelation(source, target, new TEDecisionWithConfidence(componentEdge.score(), DecisionLabel.Entailment));
					tmpRawGraph.addEdge(source, target, edge);
				}
			}
		}
		
		ret = new SimpleGraphOptimizer().optimizeGraph(tmpRawGraph, 0.0); //collapse paraphrasing nodes
		return ret;
	}
Clone this wiki locally