Skip to content

CoLRev-Environment/search-query

Repository files navigation

Welcome to search-query

GitHub Actions Workflow Status GitHub Release PyPI - Version GitHub License Binder

Search-query is a Python package for parsing, validating, simplifying, and serializing literature search queries. It currently supports PubMed and Web of Science, and can be extended to support other databases. As a default it relies on the JSON schema proposed by an expert panel (Haddaway et al., 2022). The package can be used programmatically or through the command line, has zero dependencies, and can therefore be integrated in a variety of environments. The heuristics, parsers, and linters are battle-tested on over 500 peer-reviewed queries registered at searchRxiv.

Installation

To install search-query, run:

pip install search-query

Programmatic use

To create a query programmatically, run:

from search_query import OrQuery, AndQuery

# Typical building-blocks approach
digital_synonyms = OrQuery(["digital", "virtual", "online"], search_field="Abstract")
work_synonyms = OrQuery(["work", "labor", "service"], search_field="Abstract")
query = AndQuery([digital_synonyms, work_synonyms], search_field="Author Keywords")

Parameters:

  • list of strings or queries: strings which you want to include in the search query,
  • search field: search field to which the query should be applied (available options: TODO: GIVE EXAMPLES AND LINK TO DOCS)

TODO : implement a user-friendly version of OrQuery / AndQuery, which accepts lists of strings/queries and search_fields as strings

To load a JSON query file, run the parser:

from search_query.search_file import SearchFile
from search_query.parser import parse

search = SearchFile("search-file.json")
query = parse(search.search_string, syntax=search.platform)

Available platform identifiers are listed here.

To validate a JSON query file, run the linter:

from search_query.linter import run_linter

messages = run_linter(search.search_string, syntax=search.platform)
print(messages)

Linter messages are documented and explained here.

To simplify and format a query, run:

query.format(*tbd: how to select/exclude rules?*)

To translate a query to a particular database syntax and print it, run:

query.to_string(syntax="ebsco")
query.to_string(syntax="pubmed")
query.to_string(syntax="wos")

To write a query to a JSON file, run the serializer:

from search_query import save_file

save_file(
    filename="search-file.json",
    query_str=query.to_string(syntax="wos"),
    syntax="wos",
    authors=[{"name": "Tom Brady"}],
    record_info={},
    date={}
)

CLI use

Linters can be run on the CLI:

search-query lint search-file.json

Pre-commit hooks

Linters can be included as pre-commit hooks by adding the following to the `.pre-commit-config.yaml:

repos:
  - repo: local
    hooks:
      - id: search-file-lint
        name: Search-file linter
        entry: search-file-lint
        language: python
        files: \.json$

To activate and run:

pre-commit install
pre-commit run --all

Documentation

docs

Demo

Binder

How to cite

TODO: main citation

The package was developed as part of Bachelor's theses:

  • Ernst, K. (2024). Towards more efficient literature search: Design of an open source query translator. Otto-Friedrich-University of Bamberg.

Not what you are looking for?

This python package was developed with purpose of integrating it into other literature management tools. If that isn't your use case, it migth be useful for you to look at these related tools:

License

This project is distributed under the MIT License.