Skip to content

Commit

Permalink
Feature/enable lucene query parsing (#6799)
Browse files Browse the repository at this point in the history
  • Loading branch information
DominikVoigt authored Aug 31, 2020
1 parent c1f2d3f commit dfefd4d
Show file tree
Hide file tree
Showing 14 changed files with 319 additions and 41 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ Note that this project **does not** adhere to [Semantic Versioning](http://semve

### Added

- We added a query parser and mapping layer to enable conversion of queries formulated in simplified lucene syntax by the user into api queries. [#6799](https://github.com/JabRef/jabref/pull/6799)

### Changed

### Fixed
Expand Down
4 changes: 4 additions & 0 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,10 @@ dependencies {
antlr4 'org.antlr:antlr4:4.8-1'
implementation 'org.antlr:antlr4-runtime:4.8-1'

implementation (group: 'org.apache.lucene', name: 'lucene-queryparser', version: '8.6.1') {
exclude group: 'org.apache.lucene', module: 'lucene-sandbox'
}

implementation group: 'org.mariadb.jdbc', name: 'mariadb-java-client', version: '2.6.2'

implementation 'org.postgresql:postgresql:42.2.16'
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Query syntax design

## Context and Problem Statement

All libraries use their own query syntax for advanced search options. To increase usability, users should be able to formulate their (abstract) search queries in a query syntax that can be mapped to the library specific search queries. To achieve this, the query has to be parsed into an AST.

Which query syntax should be used for the abstract queries?
Which features should the syntax support?

## Considered Options

* Use a simplified syntax that is derived of the [lucene](https://lucene.apache.org/core/8_6_1/queryparser/org/apache/lucene/queryparser/classic/package-summary.html) query syntax
* Formulate a own query syntax

## Decision Outcome

Chosen option: "Use a syntax that is derived of the lucene query syntax", because only option that is already known, and easy to implemenent.
Furthermore parsers for lucene already exist and are tested.
For simplicitly, and lack of universal capabilities across fetchers, only basic query features and therefor syntax is supported:

* All terms in the query are whitespace separated and will be ANDed
* Default and certain fielded terms are supported
* Fielded Terms:
* `author`
* `title`
* `journal`
* `year` (for single year)
* `year-range` (for range e.g. `year-range:2012-2015`)
* The `journal`, `year`, and `year-range` fields should only be populated once in each query
* Example:
* `author:"Igor Steinmacher" author:"Christoph Treude" year:2017` will be converted to
* `author:"Igor Steinmacher" AND author:"Christoph Treude" AND year:2017`

### Positive Consequences

* Already tested
* Well known
* Easy to implement
* Can use an existing parser

## Pros and Cons of the Options

### Use a syntax that is derived of the lucene query syntax

* Good, because already exists
* Good, because already well known
* Good, because there already exists a [parser for lucene syntax](https://lucene.apache.org/core/8_0_0/queryparser/org/apache/lucene/queryparser/flexible/standard/StandardQueryParser.html)
* Good, because capabilities of query conversion can easily be extended using the [flexible lucene framework](https://lucene.apache.org/core/8_0_0/queryparser/org/apache/lucene/queryparser/flexible/core/package-summary.html)

### Formulate a own query syntax

* Good, because allows for flexibility
* Bad, because needs a new parser (has to be decided whether to use [ANTLR](https://www.antlr.org/), [JavaCC](https://javacc.github.io/javacc/), or [LogicNG](https://github.com/logic-ng/LogicNG))
* Bad, because has to be tested
* Bad, because syntax is not well known
* Bad, because the design should be easily extensible, requires an appropriate design (high effort)
2 changes: 2 additions & 0 deletions src/main/java/module-info.java
Original file line number Diff line number Diff line change
Expand Up @@ -89,4 +89,6 @@
requires flexmark.util.ast;
requires flexmark.util.data;
requires com.h2database.mvstore;
requires lucene.queryparser;
requires lucene.core;
}
53 changes: 53 additions & 0 deletions src/main/java/org/jabref/logic/importer/QueryParser.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
package org.jabref.logic.importer;

import java.util.ArrayList;
import java.util.Comparator;
import java.util.HashSet;
import java.util.List;
import java.util.Optional;
import java.util.Set;

import org.jabref.logic.importer.fetcher.ComplexSearchQuery;

import org.apache.lucene.index.Term;
import org.apache.lucene.queryparser.flexible.core.QueryNodeException;
import org.apache.lucene.queryparser.flexible.standard.StandardQueryParser;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.QueryVisitor;

/**
* This class converts a query string written in lucene syntax into a complex search query.
*
* For simplicity this is limited to fielded data and the boolean AND operator.
*/
public class QueryParser {

/**
* Parses the given query string into a complex query using lucene.
* Note: For unique fields, the alphabetically first instance in the query string is used in the complex query.
*
* @param queryString The given query string
* @return A complex query containing all fields of the query string
* @throws QueryNodeException Error during parsing
*/
public Optional<ComplexSearchQuery> parseQueryStringIntoComplexQuery(String queryString) {
try {
ComplexSearchQuery.ComplexSearchQueryBuilder builder = ComplexSearchQuery.builder();

StandardQueryParser parser = new StandardQueryParser();
Query luceneQuery = parser.parse(queryString, "default");
Set<Term> terms = new HashSet<>();
// This implementation collects all terms from the leaves of the query tree independent of the internal boolean structure
// If further capabilities are required in the future the visitor and ComplexSearchQuery has to be adapted accordingly.
QueryVisitor visitor = QueryVisitor.termCollector(terms);
luceneQuery.visit(visitor);

List<Term> sortedTerms = new ArrayList<>(terms);
sortedTerms.sort(Comparator.comparing(Term::text));
builder.terms(sortedTerms);
return Optional.of(builder.build());
} catch (QueryNodeException | IllegalStateException ex) {
return Optional.empty();
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ public interface SearchBasedFetcher extends WebFetcher {
* @return a list of {@link BibEntry}, which are matched by the query (may be empty)
*/
default List<BibEntry> performComplexSearch(ComplexSearchQuery complexSearchQuery) throws FetcherException {
// Default Implementation behaves like perform search using the default field as query
return performSearch(complexSearchQuery.getDefaultField().orElse(""));
// Default implementation behaves as performSearch using the default field phrases as query
List<String> defaultPhrases = complexSearchQuery.getDefaultFieldPhrases();
return performSearch(String.join(" ", defaultPhrases));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,9 @@ private List<BibEntry> getBibEntries(URL urlForQuery) throws FetcherException {
}

default URL getComplexQueryURL(ComplexSearchQuery complexSearchQuery) throws URISyntaxException, MalformedURLException, FetcherException {
// Default Implementation behaves like getURLForQuery using the default field as query
return this.getURLForQuery(complexSearchQuery.getDefaultField().orElse(""));
// Default implementation behaves as getURLForQuery using the default field phrases as query
List<String> defaultPhrases = complexSearchQuery.getDefaultFieldPhrases();
return this.getURLForQuery(String.join(" ", defaultPhrases));
}

/**
Expand Down
6 changes: 3 additions & 3 deletions src/main/java/org/jabref/logic/importer/fetcher/ArXiv.java
Original file line number Diff line number Diff line change
Expand Up @@ -264,12 +264,12 @@ public List<BibEntry> performSearch(String query) throws FetcherException {
@Override
public List<BibEntry> performComplexSearch(ComplexSearchQuery complexSearchQuery) throws FetcherException {
List<String> searchTerms = new ArrayList<>();
complexSearchQuery.getAuthors().ifPresent(authors -> authors.forEach(author -> searchTerms.add("au:" + author)));
complexSearchQuery.getTitlePhrases().ifPresent(title -> searchTerms.add("ti:" + title));
complexSearchQuery.getAuthors().forEach(author -> searchTerms.add("au:" + author));
complexSearchQuery.getTitlePhrases().forEach(title -> searchTerms.add("ti:" + title));
complexSearchQuery.getJournal().ifPresent(journal -> searchTerms.add("jr:" + journal));
// Since ArXiv API does not support year search, we ignore the year related terms
complexSearchQuery.getToYear().ifPresent(year -> searchTerms.add(year.toString()));
complexSearchQuery.getDefaultField().ifPresent(defaultField -> searchTerms.add(defaultField));
searchTerms.addAll(complexSearchQuery.getDefaultFieldPhrases());
String complexQueryString = String.join(" AND ", searchTerms);
return performSearch(complexQueryString);
}
Expand Down
139 changes: 116 additions & 23 deletions src/main/java/org/jabref/logic/importer/fetcher/ComplexSearchQuery.java
Original file line number Diff line number Diff line change
@@ -1,23 +1,26 @@
package org.jabref.logic.importer.fetcher;

import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.Objects;
import java.util.Optional;

import org.jabref.model.strings.StringUtil;

import org.apache.lucene.index.Term;

public class ComplexSearchQuery {
// Field for non-fielded search
private final String defaultField;
private final List<String> defaultField;
private final List<String> authors;
private final List<String> titlePhrases;
private final Integer fromYear;
private final Integer toYear;
private final Integer singleYear;
private final String journal;

private ComplexSearchQuery(String defaultField, List<String> authors, List<String> titlePhrases, Integer fromYear, Integer toYear, Integer singleYear, String journal) {
private ComplexSearchQuery(List<String> defaultField, List<String> authors, List<String> titlePhrases, Integer fromYear, Integer toYear, Integer singleYear, String journal) {
this.defaultField = defaultField;
this.authors = authors;
this.titlePhrases = titlePhrases;
Expand All @@ -28,16 +31,32 @@ private ComplexSearchQuery(String defaultField, List<String> authors, List<Strin
this.singleYear = singleYear;
}

public Optional<String> getDefaultField() {
return Optional.ofNullable(defaultField);
public static ComplexSearchQuery fromTerms(Collection<Term> terms) {
ComplexSearchQueryBuilder builder = ComplexSearchQuery.builder();
terms.forEach(term -> {
String termText = term.text();
switch (term.field().toLowerCase()) {
case "author" -> builder.author(termText);
case "title" -> builder.titlePhrase(termText);
case "journal" -> builder.journal(termText);
case "year" -> builder.singleYear(Integer.valueOf(termText));
case "year-range" -> builder.parseYearRange(termText);
case "default" -> builder.defaultFieldPhrase(termText);
}
});
return builder.build();
}

public List<String> getDefaultFieldPhrases() {
return defaultField;
}

public Optional<List<String>> getAuthors() {
return Optional.ofNullable(authors);
public List<String> getAuthors() {
return authors;
}

public Optional<List<String>> getTitlePhrases() {
return Optional.ofNullable(titlePhrases);
public List<String> getTitlePhrases() {
return titlePhrases;
}

public Optional<Integer> getFromYear() {
Expand All @@ -60,23 +79,69 @@ public static ComplexSearchQueryBuilder builder() {
return new ComplexSearchQueryBuilder();
}

@Override
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}

ComplexSearchQuery that = (ComplexSearchQuery) o;

// Just check for set equality, order does not matter
if (!(getDefaultFieldPhrases().containsAll(that.getDefaultFieldPhrases()) && that.getDefaultFieldPhrases().containsAll(getDefaultFieldPhrases()))) {
return false;
}
if (!(getAuthors().containsAll(that.getAuthors()) && that.getAuthors().containsAll(getAuthors()))) {
return false;
}
if (!(getTitlePhrases().containsAll(that.getTitlePhrases()) && that.getTitlePhrases().containsAll(getTitlePhrases()))) {
return false;
}
if (getFromYear().isPresent() ? !getFromYear().equals(that.getFromYear()) : that.getFromYear().isPresent()) {
return false;
}
if (getToYear().isPresent() ? !getToYear().equals(that.getToYear()) : that.getToYear().isPresent()) {
return false;
}
if (getSingleYear().isPresent() ? !getSingleYear().equals(that.getSingleYear()) : that.getSingleYear().isPresent()) {
return false;
}
return getJournal().isPresent() ? getJournal().equals(that.getJournal()) : !that.getJournal().isPresent();
}

@Override
public int hashCode() {
int result = defaultField != null ? defaultField.hashCode() : 0;
result = 31 * result + (getAuthors() != null ? getAuthors().hashCode() : 0);
result = 31 * result + (getTitlePhrases() != null ? getTitlePhrases().hashCode() : 0);
result = 31 * result + (getFromYear().isPresent() ? getFromYear().hashCode() : 0);
result = 31 * result + (getToYear().isPresent() ? getToYear().hashCode() : 0);
result = 31 * result + (getSingleYear().isPresent() ? getSingleYear().hashCode() : 0);
result = 31 * result + (getJournal().isPresent() ? getJournal().hashCode() : 0);
return result;
}

public static class ComplexSearchQueryBuilder {
private String defaultField;
private List<String> authors;
private List<String> titlePhrases;
private List<String> defaultFieldPhrases = new ArrayList<>();
private List<String> authors = new ArrayList<>();
private List<String> titlePhrases = new ArrayList<>();
private String journal;
private Integer fromYear;
private Integer toYear;
private Integer singleYear;

public ComplexSearchQueryBuilder() {
private ComplexSearchQueryBuilder() {
}

public ComplexSearchQueryBuilder defaultField(String defaultField) {
if (Objects.requireNonNull(defaultField).isBlank()) {
public ComplexSearchQueryBuilder defaultFieldPhrase(String defaultFieldPhrase) {
if (Objects.requireNonNull(defaultFieldPhrase).isBlank()) {
throw new IllegalArgumentException("Parameter must not be blank");
}
this.defaultField = defaultField;
// Strip all quotes before wrapping
this.defaultFieldPhrases.add(String.format("\"%s\"", defaultFieldPhrase.replace("\"", "")));
return this;
}

Expand All @@ -87,9 +152,6 @@ public ComplexSearchQueryBuilder author(String author) {
if (Objects.requireNonNull(author).isBlank()) {
throw new IllegalArgumentException("Parameter must not be blank");
}
if (Objects.isNull(authors)) {
this.authors = new ArrayList<>();
}
// Strip all quotes before wrapping
this.authors.add(String.format("\"%s\"", author.replace("\"", "")));
return this;
Expand All @@ -102,9 +164,6 @@ public ComplexSearchQueryBuilder titlePhrase(String titlePhrase) {
if (Objects.requireNonNull(titlePhrase).isBlank()) {
throw new IllegalArgumentException("Parameter must not be blank");
}
if (Objects.isNull(titlePhrases)) {
this.titlePhrases = new ArrayList<>();
}
// Strip all quotes before wrapping
this.titlePhrases.add(String.format("\"%s\"", titlePhrase.replace("\"", "")));
return this;
Expand Down Expand Up @@ -135,6 +194,21 @@ public ComplexSearchQueryBuilder journal(String journal) {
return this;
}

public ComplexSearchQueryBuilder terms(Collection<Term> terms) {
terms.forEach(term -> {
String termText = term.text();
switch (term.field().toLowerCase()) {
case "author" -> this.author(termText);
case "title" -> this.titlePhrase(termText);
case "journal" -> this.journal(termText);
case "year" -> this.singleYear(Integer.valueOf(termText));
case "year-range" -> this.parseYearRange(termText);
case "default" -> this.defaultFieldPhrase(termText);
}
});
return this;
}

/**
* Instantiates the AdvancesSearchConfig from the provided Builder parameters
* If all text fields are empty an empty optional is returned
Expand All @@ -147,11 +221,30 @@ public ComplexSearchQuery build() throws IllegalStateException {
if (textSearchFieldsAndYearFieldsAreEmpty()) {
throw new IllegalStateException("At least one text field has to be set");
}
return new ComplexSearchQuery(defaultField, authors, titlePhrases, fromYear, toYear, singleYear, journal);
return new ComplexSearchQuery(defaultFieldPhrases, authors, titlePhrases, fromYear, toYear, singleYear, journal);
}

void parseYearRange(String termText) {
String[] split = termText.split("-");
int fromYear = 0;
int toYear = 9999;
try {
fromYear = Integer.parseInt(split[0]);
} catch (NumberFormatException e) {
// default value already set
}
if (split.length > 1) {
try {
toYear = Integer.parseInt(split[1]);
} catch (NumberFormatException e) {
// default value already set
}
}
this.fromYearAndToYear(fromYear, toYear);
}

private boolean textSearchFieldsAndYearFieldsAreEmpty() {
return StringUtil.isBlank(defaultField) && this.stringListIsBlank(titlePhrases) &&
return this.stringListIsBlank(defaultFieldPhrases) && this.stringListIsBlank(titlePhrases) &&
this.stringListIsBlank(authors) && StringUtil.isBlank(journal) && yearFieldsAreEmpty();
}

Expand Down
Loading

0 comments on commit dfefd4d

Please sign in to comment.