PubMed/Medline Query Transpiler
The goal of transmute is to provide a way of transforming PubMed/Medline search strategies from systematic reviews into other queries suitable for other search engines. The result of the transformation is an immediate representation which can be analysed with greater ease or transformed again run on other search engines. This is why transmute is described as a transpiler. An immediate representation allows trivial transformation to boolean queries acceptable by search engines, such as Elasticsearch.
An example of a Medline and Pubmed query are:
1. MMSE*.ti,ab.
2. sMMSE.ti,ab.
3. Folstein*.ti,ab.
4. MiniMental.ti,ab.
5. \"mini mental stat*\".ti,ab.
6. or/1-5
(\"Contraceptive Agents, Female\"[Mesh] OR \"Contraceptive Devices, Female\"[Mesh] OR contracept*[tiab]) AND (\"Body Weight\"[Mesh] OR weight[tiab] OR \"Body Mass Index\"[Mesh]) NOT (cancer*[ti] OR polycystic [ti] OR exercise [ti] OR physical activity[ti] OR postmenopaus*[ti])
Both are valid Pubmed and Medline search strategies reported in real systematic reviews; transmute can currently transform both Medline and PubMed queries. An example API usage by constructing a pipeline and executing it is shown in the next section.
Here we construct a pipeline in Go:
query := `1. MMSE*.ti,ab.
2. sMMSE.ti,ab.
3. Folstein*.ti,ab.
4. MiniMental.ti,ab.
5. \"mini mental stat*\".ti,ab.
6. or/1-5`
p := transmute.pipeline.NewPipeline(transmute.parser.NewMedlineParser(),
transmute.backend.NewElasticsearchCompiler(),
transmute.pipeline.TransmutePipelineOptions{RequiresLexing: true})
dsl, err := p.Execute(query)
if err != nil {
panic(err)
}
println(dsl.StringPretty())
Which results in:
{
"query": {
"bool": {
"disable_coord": true,
"should": [
{
"bool": {
"should": [
{
"wildcard": {
"title": "MMSE*"
}
},
{
"wildcard": {
"abstract": "MMSE*"
}
}
]
}
},
{
"multi_match": {
"fields": [
"title",
"abstract"
],
"query": "sMMSE"
}
},
{
"bool": {
"should": [
{
"wildcard": {
"title": "Folstein*"
}
},
{
"wildcard": {
"abstract": "Folstein*"
}
}
]
}
},
{
"multi_match": {
"fields": [
"title",
"abstract"
],
"query": "MiniMental"
}
},
{
"bool": {
"should": [
{
"wildcard": {
"title": "\"mini mental stat*\""
}
},
{
"wildcard": {
"abstract": "\"mini mental stat*\""
}
}
]
}
}
]
}
}
}
As well as being a well-documented library, transmute can also be used on the command line. Since it is still in development, it can be built from source with go tools:
go get -u github.com/hscells/transmute/cmd/transmute
transmute --help
transmute --input mmse.query --parser medline --backend elasticsearch
The output of the command line pretty-prints the same output from above.
The goal of transmute is to parse and transform PubMed/Medline queries into queries suitable for other search engines. However, the project makes some assumptions about the query:
- The parser does not attempt to simplify boolean expressions, so badly written queries will remain inefficient.
- A query cannot compile to Elasticsearch when it contains an adjacency operator with more than one field. This is due to a limitation with Elasticsearch.
If you would like to extend transmute and create a new backend for it, have a read of the documentation. As this should lead you in the right direction. Writing a new backend requires the transformation of the immediate representation into the target query language.
If you use this work for scientific publication, please reference
@inproceedings{scells2018framework,
author = {Scells, Harrisen and Locke, Daniel and Zuccon, Guido},
title = {An Information Retrieval Experiment Framework for Domain Specific Applications},
booktitle = {The 41st International ACM SIGIR Conference on Research \& Development in Information Retrieval},
series = {SIGIR '18},
year = {2018},
}
The Go gopher was created by Renee French, licensed under Creative Commons 3.0 Attributions license.