REVEAL-plus

Overview

High-Recall Information Retrieval (HRIR) - the identification of nearly all relevant documents within a set of them, given a search query - is a pivotal task in a wide range of applications [1] such as electronic discovery and systematic review. The relevant documents are judged this way by the user conducting the search, which may cause excessive effort (for the user) in classifying documents.

For the given context, several researches have focused on reducing this effort, still providing a high recall. As a result a considerable number of techniques have emerged, among which it is worth mentioning REVEAL - RelEVant rulE-based Active Learning (check out [1] for more details) - which uses active learning and association rules to improve the HRIR performance.

This project intend to improve REVEAL by providing:

An enhanced starting point for the method; and
A clear and definite sopping criteria;

Installation

Clone this repository and enter it: git clone https://github.com/jeanchilger/scal.git && cd SCAL
Inside repository, build kissdb indexer and other necessary binaries: make

You are ready to go

Usage

The bash file named main is the entry point for the system. Type ./main <option> for using it. Check the available options below.

-s <samples>, --samples=<samples>
      Set <samples> as the quantity of executions that will occurs.
      After, the mean and standard deviation over the samples are taken.
      The standard value is 1.

-c <corpus>, --corpus=<corpus>
      Specifies the name of the corpus to be used.

-t <topic-list>, --topics=<topic-list>
      Specifies which topics will be computed by the method.
      <topic-list> must be a space separated string, containing
      one or more topics.

-v, --verbose
      If specified, verbose messages will be shown during execution.

-o, --off-colors
      Turns off colors of terminal outputs.

-h, --help
      Show a message like this.

Datasets

Some dataset we've adapted and used for assessment are listed below.

CLEF 2017 Dataset

CLEF 2017 development set, which was created based on the Diagnostic Test Accuracy (DTA) systematic reviews conducted by the Cochrane Library.

See [3].

CORD-19

COVID-19 Open Research Dataset.

See [4].

Team

Coordinating professor

Guilherme Dal Bianco
- GitHub
- Lattes (pt-br Curriculum)

Academics participating

Emili Willinghoefer (early work)
- GitHub
Jean Carlo Hilger
- GitHub
Matheus Vinícius Todescato
- GitHub

Credits

The code used as basis is from HTAustin's CAL repository, we adapted his code from CAL to SCAL and later to the REVEAL method.

References

Core articles that guided this project.

[1] Guilherme Dal Bianco. Reveal-hire - a new active framework for the high recall task. In Proceedings of ACM Conference (Conference’17), 2018.
[2] Gordon V. Cormack and Maura R. Grossman. Scalability of continuous active learning for reliable high-recall text classification. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management, 2016.
[3] Evangelos Kanoulas, Dan Li, Leif Azzopardi, and Rene Spijker. CLEF 2017 technologically assisted reviews in empirical medicine overview. CEUR Workshop Proceedings, 2017.
[4] Ellen M. Voorhees, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, William R. Hersh, Kyle Lo, Kirk Roberts, Ian Soboroff, and Lucy Lu Wang. Trec-covid: Constructing a pandemic information retrieval test collection. ArXiv, 2020.

Name		Name	Last commit message	Last commit date
Latest commit History 150 Commits
Corpus		Corpus
SSARP		SSARP
data-extraction-scripts		data-extraction-scripts
judgement		judgement
kissdb		kissdb
sofia-ml-read-only		sofia-ml-read-only
svd		svd
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
do_judgement_main.py		do_judgement_main.py
dofeaturesseed		dofeaturesseed
dosplit		dosplit
dosteminline		dosteminline
dotfidfinline		dotfidfinline
fixnum		fixnum
main		main
porter		porter
preprocess		preprocess
select_round.py		select_round.py
training		training

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

REVEAL-plus

Overview

Installation

Usage

Datasets

CLEF 2017 Dataset

CORD-19

Team

Coordinating professor

Academics participating

Credits

References

About

Releases

Packages

Contributors 3

Languages

License

jeanchilger/reveal-plus

Folders and files

Latest commit

History

Repository files navigation

REVEAL-plus

Overview

Installation

Usage

Datasets

CLEF 2017 Dataset

CORD-19

Team

Coordinating professor

Academics participating

Credits

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages