Skip to content
Tomas Machalek edited this page Dec 6, 2022 · 26 revisions

KonText HTTP API

Table of contents

Introduction

This text contains both general API documentation for any KonText installation and also a specific installation run by the Institute of the Czech National Corpus (CNC). The CNC installation has few differences:

  • all the API actions/endpoints require the client to be registered and logged in
  • there is no need to add URL query parameter format=json to get a JSON response in case of actions providing also an HTML output
  • CNC KonText archives all the queries so it is always possible to return to an old result set (for older queries, KonText must recalculate data first, so the result access is not instantaneous in such a case)

Creating a concordance query

While it is possible to use both simple and advanced query types, it is strongly advised to use the advanced query variant when dealing with the API as the query is much easier to encode. The simple variant is meant to be simple for web interface users which is paid off by querie's complex internal structure and evaluation.

  • URL: /query_submit?format=json
  • HTTP Method: POST
  • content type: application/json

Request body:

{
  "type": "concQueryArgs",
  "maincorp": "syn2020",
  "usesubcorp": null,
  "viewmode": "kwic",
  "pagesize": 40,
  "attrs": ["word","tag"],
  "attr_vmode": "visible-kwic",
  "base_viewattr": "word",
  "ctxattrs": [],
  "structs": ["text","p","g"],
  "refs": [],
  "fromp": 0,
  "shuffle": 0,
  "queries": [
    {
      "qtype": "advanced",
      "corpname": "syn2020",
      "query": "[word=\"celou\"] [lemma=\"pravda\"]",
      "pcq_pos_neg": "pos",
      "include_empty": false,
      "default_attr":"word"
    }
  ],
  "text_types": {},
  "context":
  {
    "fc_lemword_wsize": [-5, 5],
    "fc_lemword": "",
    "fc_lemword_type": "all",
    "fc_pos_wsize": [-5, 5],
    "fc_pos": [],
    "fc_pos_type": "all"
  },
  "async": false
}

Parameters

name description
type this is always a constant concQueryArgs
usesubcorp null or name of user's subcorpus
viewmode kwic|sen|align (align works only for parallel corpora)
pagesize a positive number specifying size of the resulting page
attrs a list of KWIC's positional attributes we want to retrieve
ctxattrs a list of non-KWIC positional attributes we want to retrieve
attr_vmode visible-all|visible-kwic|visible-multiline|mouseover - this is useful mostly for GUI clients
base_viewattr the main attribute the flow of text will be based on
structs a list (possibly empty) of structural attributes to be shown
refs a list (possibly empty) of additional metadata attached to each row
fromp a number specifying a starting page
shuffle 0|1, if 1 the the lines will be shuffled (this negatively affects performance)
queries a list of objects, each for active corpus (normally 1 item, for aligned corpora > 1)
queries[].qtype advanced (strongly advised, see introduction of the section)
queries[].corpname a corpus identifier
queries[].query A JSON-encoded CQL query (e.g. [word=\"their\"] [lemma=\"truth\"])
queries[].pcq_pos_neg applies for aligned corpora queries
queries[].include_empty true|false
queries[].default_attr a positional attribute applied for simplied CQL expressions (e.g. with default attribute word one can write "foo" instead of [word="foo"])

Response:

  • HTTP status: 201 Created (if without errors)
  • content type: application/json
{
  "size": 110,
  "finished": true,
  "conc_args": {
    "maincorp": "syn2020",
    "viewmode": "kwic",
    "pagesize": 40,
    "attrs": "word,tag",
    "attr_vmode": "visible-kwic",
    "base_viewattr": "word",
    "structs": "text,p,g"
  },
  "query_overview": {
  },
  "Q": [ "~gUgICee6K2ka" ],
  "conc_persistence_op_id": "gUgICee6K2ka"
}

Parameters

name description
size size of the resulting concordance (in tokens)
finished if async is set to true then this is always true
conc_persistence_op_id a public ID of the resulting concordance
conc_args additional parameters affecting how the concordance is displayed

Displaying a concordance

To view a concordance, one must have a concordance ID (see conc_persistence_op_id argument in the previous section).

  • URL: /view
  • HTTP Method: GET

Parameters (in URL)

name description
q concordance persistence ID; the value must have a ~ prefix to distinct fully stored queries from legacy/NoSkE ones
format for API use, json is required (without it, an HTML page is returned

Response

(only a subset of the most important entries is shown below)

{
  "kwiclen": 2,
  "Lines": [
    {
      "Left": [],  // see the following section for the description
      "Kwic": [],  // ditto
      "Right": []  // ditto
    },
    {
      "Left": ["..."],
      "Kwic": ["..."],
      "Right": ["..."]
    }
  ],
  "conc_persistence_op_id": "RSiw4GIgW08s",
  "concsize": 115,
  "result_arf": 51.31,
  "result_relative_freq": 0.94
}

The format of Left, Kwic, Right entries is as follows:

[
  {
    "str": "setměním", 
    "class": "", 
    "tail_posattrs": ["setmění", "NNNS7-----A----"]
  }
  // other items/positions
]
attribute description
str value of the token (or structure - e.g. <p>)
class type of the value - empty string (normal token), col0 coll (for KWIC), strc (structure)
tail_posattrs additional positional attributes for the position (e.g. tag, lemma,...) - based on attrs, structattrs and attr_vmode

Frequency distribution for text types

  • URL: /freqtt
  • HTTP Method: GET

Parameters (in URL)

name description
q concordance persistence ID; the value must have a ~ prefix to distinct fully stored queries from legacy/NoSkE ones
fttattr dot-separated structure and structural attribute (e.g. doc.first_published)
ftt_include_empty 0,1 - if 1 then also values with no occurrences will be returned
flimit 0,1,...,N - a minimum absolute frequency of a searched phenomenon
format json (otherwise, an HTML page is returned

Response:

  • HTTP status: 200 OK (if without errors)
  • content type: application/json

Important entries:

name (path) description
Blocks Frequency results (array)
Blocks[i] Frequency results entry
Blocks[i].Head Array of respective columns (typically - 1) value of a respective structural attr., 2) absolute frequency, 3) ipm
Blocks[i].Items Array of individual lines
Blocks[i].Items[i].Word[0].n Value of a respective structural attribute
Blocks[i].Items[i].freq Absolute frequency
Blocks[i].Items[i].rel Instances per million (ipm)

Please note that the Blocks array contains only a single result set (=table of items). The array type is kept for backward compatibility.

Simple frequency distribution for positional attributes

  • URL: /freqs
  • HTTP Method: GET

URL Query arguments

name description
q concordance persistence ID (starts with ~)
fcrit freq. criterion (e.g. lemma/e 0~0>0, see Sketchengine documentation](https://www.sketchengine.eu/documentation/methods-documentation/)
freq_type tokens
format json

Response:

  • HTTP status: 200 OK (if without errors)
  • content type: application/json

Important entries:

name (path) description
Blocks Frequency results (array)
Blocks[i] Frequency results entry
Blocks[i].Head Array of respective columns (typically - 1) value of a respective structural attr., 2) absolute frequency, 3) ipm
Blocks[i].Items Array of individual lines
Blocks[i].Items[i].Word[0].n Value of a respective structural attribute
Blocks[i].Items[i].freq Absolute frequency
Blocks[i].Items[i].rel Instances per million (ipm)

Please note that the Blocks array contains only a single result set (=table of items). The array type is kept for backward compatibility.

Two-dimensional frequency distribution

  • URL: /freqct
  • HTTP Method: GET

URL Query arguments

name description
q concordance persistence ID; the value must have a ~ prefix to distinct fully stored queries from legacy/NoSkE ones
ctattr1 attribute applied for the 1st dimension (both positional and structural attributes are supported)
ctattr2 the same as ctattr1 but for the 2nd dimension
ctfcrit1 the 1st dimension criterion
ctfcrit2 the 2nd dimension criterion
ctminfreq a minimum frequency of included entries; the units are defined by the ctminfreq_type parameter
ctminfreq_type abs - absolute freq., pabs - percentile of abs. freq., ipm - instances per million, pipm - percentile of ipm

Response:

  • HTTP status: 200 OK (if without errors)
  • content type: application/json
name (path) description
freq_type "2-attribute" (this is mostly used by the client application)
attr1 matches the ctattr1 given in the request
attr2 matches the ctattr2 given in the request
data.data[i][0] matching 1st dimension value
data.data[i][1] matching 2nd dimension value
data.data[i][2] absolute frequency
data.data[i][3] base set size for i.p.m. calculation *️⃣

*️⃣ More information about base set size:

  1. in case of a relationship between two structural attributes, the value is always 1000000 which should be interpreted as "not applicable"
  2. in case of two positional attributes, the base set size equals the size of a respective concordance
  3. in case of one positional and one structural attribute, the base set size is a number of tokens in a subcorpus specified by a respective structural attribute value (i.e. not affected by a respective concordance)

Example (you must be logged-in to KonText):

https://www.korpus.cz/kontext/freqct?q=~vMSCwEgqqSOu&ctfcrit1=0<0&ctfcrit2=0&ctattr1=lemma_lc&ctattr2=doc.txtype_group&ctminfreq=80&ctminfreq_type=pabs&format=json

Clone this wiki locally