MQuery-SRU

MQuery-SRU is an easy to set up endpoint for Clarin FCS 2.0 (Federated Content Search) based on the Manatee-open corpus search engine and developed and maintained by the Institute of the Czech National Corpus.

Features

Full support for the FCQ-QL query language
- definable mapping between FCQ-QL layers and Manatee-open positional attributes
Level 1 support for basic search via CQL (Context Query Language)
simultaneous search in multiple defined corpora
(optional) backlinks to respective concordances in KonText

Requirements

a working Linux server with installed Manatee-open library
Redis database
Go language compiler and tools
(optional) an HTTP proxy server (Nginx, Apache, ...)

How to install

Install Go language environment, either via a package manager or manually from Go download page
1. make sure /usr/local/go/bin and ~/go/bin are in your $PATH so you can run any installed Go tools without specifying a full path
Install Manatee-open from the download page. No specific language bindings are required.
1. configure --with-pcre --disable-python && make && sudo make install && sudo ldconfig
Get MQuery-SRU sources (git clone --depth 1 https://github.com/czcorpus/mquery-sru.git)
Run ./configure
Run make
Run make install
- the application will be installed in /opt/mquery-sru
- for data and registry, /var/opt/corpora/data and /var/opt/corpora/registry directories will be created
- systemd services mquery-sru-server.service and mquery-sru-worker-all.target will be created
Copy at least one corpus and its configuration (registry) into respective directories (/var/opt/corpora/data, /var/opt/corpora/registry)
Update corpora entries in /opt/mquery-sru/conf.json file to match your installed corpora
start the service:
- systemctl start mquery-sru-server
- systemctl start mquery-sru-worker-all.target

HTTP access

In most cases, it is not recommended to expose the server directly to the Internet. It is therefore advisable to put the service behind an HTTP proxy. E.g. in Nginx, the configuration may look like this:

location /mquery-fcs/ {
    proxy_pass http://127.0.0.1:8080/;
    proxy_set_header Host $http_host;
    proxy_redirect off;
    proxy_read_timeout 30;
    proxy_set_header X-Forwarded-For $remote_addr;
    proxy_set_header X-Forwarded-Proto $scheme;    
}

Worker considerations

It's important to understand that endpoints experiencing low traffic can still benefit from having multiple workers. Specifically, if an endpoint is configured to search across multiple corpora, MQuery-SRU can leverage these workers to execute searches in parallel. This approach can significantly reduce the response time by querying all configured corpora simultaneously, thereby improving efficiency even under conditions of minimal load.

Configuration

To run the endpoint, you need at least

to configure listening address and port
defined path to your Manatee corpora registry (= configuration) files
defined corpora along with:
- positional attributes to be exposed and also layer names they belong to
- mapping of FCS-QL's within structures (s, sentence, p etc.) to your specific corpora structures
address of your Redis service plus a number of database to be used for passing queries and results around

See configuration reference and/or conf.sample.json for detailed info.

OS integration (systemd)

This applies in case make install is not used.

(Here we assume the service will run with user www-data)

Create a directory for logging (e.g. /var/log/mquery-sru) and set proper permissions for www-data to be able to write there.

You can use predefined systemd files from /scripts/systemd/*. Copy (or link) them to /etc/systemd/system and then run:

systemctl enable mquery-sru-server.service
systemctl enable mquery-sru-worker-all.target

Now you can try to run the service:

systemctl start mquery-sru-server
systemctl start mquery-sru-worker-all.target

See MQuery-SRU in action

A CNC instance of MQuery-SRU is running as one of the endpoints for Clarin Content Search page.

Name		Name	Last commit message	Last commit date
Latest commit History 318 Commits
assets/xslt		assets/xslt
backlink		backlink
cmd		cmd
cnf		cnf
corpus		corpus
general		general
handler		handler
mango		mango
monitoring		monitoring
query		query
rdb		rdb
result		result
scripts/systemd		scripts/systemd
tools		tools
worker		worker
.gitignore		.gitignore
.manabuild.json		.manabuild.json
Dockerfile		Dockerfile
Dockerfile.itests		Dockerfile.itests
LICENSE		LICENSE
README.md		README.md
conf-docker.json		conf-docker.json
conf.itest.json		conf.itest.json
conf.sample.json		conf.sample.json
config-reference.md		config-reference.md
configure		configure
docker-compose-itest.yml		docker-compose-itest.yml
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MQuery-SRU

Features

Requirements

How to install

HTTP access

Worker considerations

Configuration

OS integration (systemd)

See MQuery-SRU in action

About

Releases 2

Packages

Contributors 2

Languages

License

czcorpus/mquery-sru

Folders and files

Latest commit

History

Repository files navigation

MQuery-SRU

Features

Requirements

How to install

HTTP access

Worker considerations

Configuration

OS integration (systemd)

See MQuery-SRU in action

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Packages