Welcome to Quaerite

NOTE

This is the active fork from the original project started at MITRE.

Background and Goals

This project includes tools to help evaluate relevance ranking. This code has been tested with Solr 4.x, 7.x and 8.x, and ES 6.x and 7.x.

This project is not intended to compete with existing relevance evaluation tools, such as Splainer, Quepid, Rated Ranking Evaluator, or Luigi's Box. Rather, this project was developed for use cases not currently covered by open source software packages. The author encourages collaboration among these projects.

NOTE: This project is under construction and is quite dynamic.
There will be breaking changes before the first major release.

While the name of this project may change in the future, we selected quaerite -- Latin imperative "seek", root of English "query" -- to allude not only to the challenges of creating queries, but also to the challenges of tuning search engines. One may spend a not insignificant amount of time tuning countless parameters. In the end, we hope that invenietis with slightly less effort than without this project. For the pronunciation, see this link.

Similarities and Differences between the Genetic Algorithm (GA) in Quaerite and Learning to Rank

In the research literature, the application of a GA or Genetic Programming (GP) is one method for learning to rank (see, e.g. Andrew Trotman on GP).

However, for integrators and developers who work in the Lucene ecosystem, "Learning to Rank" (LTR) connotes a specific methodology/module initially added to Apache Solr by Bloomberg and then offered as a plugin for ElasticSearch by Doug Turnbull and colleagues at OpenSource Connections, Wikimedia Foundation and Snagajob Engineering. In the following, I use LTR to refer to this Lucene-ecosystem-specific module and methodology.

In no way do I see this implementation of GA as a competitor to LTR; rather, it is another tool that might help complement LTR and/or other tuning methodologies.

Similarities

All of the basic requirements for quality search must first be met -- analyzer chains must be well designed for the data, the underlying data in the index should be accurate, well organized and well curated
There must be sufficient, high quality, accurate and representative ground truth judgments for training and testing
Machine learning can only do so much -- further tuning and/or adding new methods of enrichment may be required

Differences

In practice, LTR is designed to perform more costly calculations as a re-ranking step...that is, after the search engine has returned the best n documents, LTR is typically applied to carry out more costly calculations on this smaller subset of documents to re-rank the results based on the models built offline. The goal of this implementation of GA (and the other tools in Quaerite) is to help tune the parameters used in the initial search system's ranking, NOT as part of a secondary reranking.
Bloomberg, OpenSource Connections, Wikimedia and Snagajob have spent quite a bit of time and effort developing and integrating these modules to make them easy to use. This toolkit has been developed with far fewer resources for use initially by one relevance engineer...there are areas for improvement.

Current Status

As of this writing, Quaerite allows for experimentation with the following parameters: bf, bq, qf, pf, pf2, pf3, ps, ps2, ps3, q.op (and mm), solr url (so that you can run experiments against different cores and/or different versions of Solr), customHandler (so that you can compare different customized handlers), tie. For ES, specifically, parameters include: boost, fuzziness and multi_match_type (e.g. best_fields, most_fields, cross_fields and phrase).

Getting Started

See the quaerite-examples module and its README.

Releases

1.0.0-ALPHA March 22, 2019
1.0.0-ALPHA2 April 30, 2020

Road Map

High priorities

Add other features (e.g. bq, bf) as needed
See the issues on the issue tracker

Planned Releases

1.0.0-beta1 July, 2020

Related Open Source Projects

License (see also LICENSE.txt)

Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 229 Commits
.github/workflows		.github/workflows
quaerite-analysis		quaerite-analysis
quaerite-cli		quaerite-cli
quaerite-connectors		quaerite-connectors
quaerite-core		quaerite-core
quaerite-duplicates		quaerite-duplicates
quaerite-examples		quaerite-examples
quaerite-logs		quaerite-logs
quaerite-parent		quaerite-parent
quaerite-solr-tools		quaerite-solr-tools
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to Quaerite

NOTE

Background and Goals

Similarities and Differences between the Genetic Algorithm (GA) in Quaerite and Learning to Rank

Similarities

Differences

Current Status

Getting Started

Releases

Road Map

Related Open Source Projects

License (see also LICENSE.txt)

About

Releases

Packages

Languages

License

tballison/quaerite

Folders and files

Latest commit

History

Repository files navigation

Welcome to Quaerite

NOTE

Background and Goals

Similarities and Differences between the Genetic Algorithm (GA) in Quaerite and Learning to Rank

Similarities

Differences

Current Status

Getting Started

Releases

Road Map

Related Open Source Projects

License (see also LICENSE.txt)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages