This project provides (a) a minimal Lucene index builder and (b) a minimal SQLite database builder for Medline (abstract collection for PubMed articles).
The Lucene index has three fields: pmid
, abstractText
, and articleTitle
with the latter two searchable, and the SQLite database has two fields: pmid
and abstract
with the an index built on pmid
.
This project is used in the preparation of the OAQA BioASQ System.
Two use cases have been configured as exec:exec
goals in the pom.xml
file.
Build Lucene index
mvn -Ddocs.dir=DOCS_DIR -Dindex.dir=INDEX_DIR exec:exec@index
where DOCS_DIR
is the input directory that contains the downloaded .xml.gz
or .xml
files and INDEX_DIR
is the output directory for the Lucene index.
Build SQLite database
mvn -Ddocs.dir=DOCS_DIR -Ddb.path=DB_PATH exec:exec@store
where DOCS_DIR
is the input directory that contains the downloaded .xml.gz
or .xml
files and DB_PATH
is the output SQLite database file path.