-
Notifications
You must be signed in to change notification settings - Fork 22
HTTP API
- [Introduction]
- Creating a concordance query
- Displaying a concordance
- Frequency distribution for text types
- Frequency distribution for positional attributes
- Two-dimensional frequency distribution
This text contains both general API documentation for any KonText installation and also a specific installation run by the Institute of the Czech National Corpus (CNC). The CNC installation has few differences:
- all the API actions/endpoints require the client to be registered and logged in
- there is no need to add URL query parameter
format=json
to get a JSON response in case of actions providing also an HTML output - CNC KonText archives all the queries so it is always possible to return to an old result set (for older queries, KonText must recalculate data first, so the result access is not instantaneous in such a case)
While it is possible to use both simple and advanced query types, it is strongly advised to use the advanced query variant when dealing with the API as the query is much easier to encode. The simple variant is meant to be simple for web interface users which is paid off by querie's complex internal structure and evaluation.
- URL:
/query_submit?format=json
- HTTP Method:
POST
- content type:
application/json
Request body:
{
"type": "concQueryArgs",
"maincorp": "syn2020",
"usesubcorp": null,
"viewmode": "kwic",
"pagesize": 40,
"attrs": ["word","tag"],
"attr_vmode": "visible-kwic",
"base_viewattr": "word",
"ctxattrs": [],
"structs": ["text","p","g"],
"refs": [],
"fromp": 0,
"shuffle": 0,
"queries": [
{
"qtype": "advanced",
"corpname": "syn2020",
"query": "[word=\"celou\"] [lemma=\"pravda\"]",
"pcq_pos_neg": "pos",
"include_empty": false,
"default_attr":"word"
}
],
"text_types": {},
"context":
{
"fc_lemword_wsize": [-5, 5],
"fc_lemword": "",
"fc_lemword_type": "all",
"fc_pos_wsize": [-5, 5],
"fc_pos": [],
"fc_pos_type": "all"
},
"async": false
}
name | description |
---|---|
type | this is always a constant concQueryArgs
|
usesubcorp |
null or name of user's subcorpus |
viewmode |
kwic|sen|align (align works only for parallel corpora) |
pagesize | a positive number specifying size of the resulting page |
attrs | a list of KWIC's positional attributes we want to retrieve |
ctxattrs | a list of non-KWIC positional attributes we want to retrieve |
attr_vmode |
visible-all|visible-kwic|visible-multiline|mouseover - this is useful mostly for GUI clients |
base_viewattr | the main attribute the flow of text will be based on |
structs | a list (possibly empty) of structural attributes to be shown |
refs | a list (possibly empty) of additional metadata attached to each row |
fromp | a number specifying a starting page |
shuffle |
0|1 , if 1 the the lines will be shuffled (this negatively affects performance) |
queries | a list of objects, each for active corpus (normally 1 item, for aligned corpora > 1) |
queries[].qtype |
advanced (strongly advised, see introduction of the section) |
queries[].corpname | a corpus identifier |
queries[].query | A JSON-encoded CQL query (e.g. [word=\"their\"] [lemma=\"truth\"] ) |
queries[].pcq_pos_neg | applies for aligned corpora queries |
queries[].include_empty | true|false |
queries[].default_attr | a positional attribute applied for simplied CQL expressions (e.g. with default attribute word one can write "foo" instead of [word="foo"] ) |
Response:
- HTTP status:
201 Created
(if without errors) - content type:
application/json
{
"size": 110,
"finished": true,
"conc_args": {
"maincorp": "syn2020",
"viewmode": "kwic",
"pagesize": 40,
"attrs": "word,tag",
"attr_vmode": "visible-kwic",
"base_viewattr": "word",
"structs": "text,p,g"
},
"query_overview": {
},
"Q": [ "~gUgICee6K2ka" ],
"conc_persistence_op_id": "gUgICee6K2ka"
}
name | description |
---|---|
size | size of the resulting concordance (in tokens) |
finished | if async is set to true then this is always true
|
conc_persistence_op_id | a public ID of the resulting concordance |
conc_args | additional parameters affecting how the concordance is displayed |
To view a concordance, one must have a concordance ID (see conc_persistence_op_id
argument in the previous section).
- URL:
/view
- HTTP Method:
GET
name | description |
---|---|
q | concordance persistence ID; the value must have a ~ prefix to distinct fully stored queries from legacy/NoSkE ones |
format | for API use, json is required (without it, an HTML page is returned |
(only a subset of the most important entries is shown below)
{
"kwiclen": 2,
"Lines": [
{
"Left": [], // see the following section for the description
"Kwic": [], // ditto
"Right": [] // ditto
},
{
"Left": ["..."],
"Kwic": ["..."],
"Right": ["..."]
}
],
"conc_persistence_op_id": "RSiw4GIgW08s",
"concsize": 115,
"result_arf": 51.31,
"result_relative_freq": 0.94
}
The format of Left
, Kwic
, Right
entries is as follows:
[
{
"str": "setměním",
"class": "",
"tail_posattrs": ["setmění", "NNNS7-----A----"]
}
// other items/positions
]
attribute | description |
---|---|
str | value of the token (or structure - e.g. <p> ) |
class | type of the value - empty string (normal token), col0 coll (for KWIC), strc (structure) |
tail_posattrs | additional positional attributes for the position (e.g. tag, lemma,...) - based on attrs , structattrs and attr_vmode
|
- URL:
/freqtt
- HTTP Method:
GET
name | description |
---|---|
q |
concordance persistence ID; the value must have a ~ prefix to distinct fully stored queries from legacy/NoSkE ones |
fttattr |
dot-separated structure and structural attribute (e.g. doc.first_published ) |
ftt_include_empty |
0,1 - if 1 then also values with no occurrences will be returned |
flimit |
0,1,...,N - a minimum absolute frequency of a searched phenomenon |
format |
json (otherwise, an HTML page is returned |
- HTTP status:
200 OK
(if without errors) - content type:
application/json
Important entries:
name (path) | description |
---|---|
Blocks |
Frequency results (array) |
Blocks[i] |
Frequency results entry |
Blocks[i].Head |
Array of respective columns (typically - 1) value of a respective structural attr., 2) absolute frequency, 3) ipm |
Blocks[i].Items |
Array of individual lines |
Blocks[i].Items[i].Word[0].n |
Value of a respective structural attribute |
Blocks[i].Items[i].freq |
Absolute frequency |
Blocks[i].Items[i].rel |
Instances per million (ipm) |
Please note that the Blocks
array contains only a single result set (=table of items). The array type is kept for backward compatibility.
- URL:
/freqs
- HTTP Method:
GET
name | description |
---|---|
q |
concordance persistence ID (starts with ~ ) |
fcrit | freq. criterion (e.g. lemma/e 0~0>0 , see Sketchengine documentation](https://www.sketchengine.eu/documentation/methods-documentation/) |
freq_type | tokens |
format | json |
- HTTP status:
200 OK
(if without errors) - content type:
application/json
Important entries:
name (path) | description |
---|---|
Blocks |
Frequency results (array) |
Blocks[i] |
Frequency results entry |
Blocks[i].Head |
Array of respective columns (typically - 1) value of a respective structural attr., 2) absolute frequency, 3) ipm |
Blocks[i].Items |
Array of individual lines |
Blocks[i].Items[i].Word[0].n |
Value of a respective structural attribute |
Blocks[i].Items[i].freq |
Absolute frequency |
Blocks[i].Items[i].rel |
Instances per million (ipm) |
Please note that the Blocks
array contains only a single result set (=table of items). The array type is kept for backward compatibility.
- URL:
/freqct
- HTTP Method:
GET
name | description |
---|---|
q |
concordance persistence ID; the value must have a ~ prefix to distinct fully stored queries from legacy/NoSkE ones |
ctattr1 |
attribute applied for the 1st dimension (both positional and structural attributes are supported) |
ctattr2 |
the same as ctattr1 but for the 2nd dimension |
ctfcrit1 |
the 1st dimension criterion |
ctfcrit2 |
the 2nd dimension criterion |
ctminfreq |
a minimum frequency of included entries; the units are defined by the ctminfreq_type parameter |
ctminfreq_type |
abs - absolute freq., pabs - percentile of abs. freq., ipm - instances per million, pipm - percentile of ipm |
- HTTP status:
200 OK
(if without errors) - content type:
application/json
name (path) | description |
---|---|
freq_type |
"2-attribute" (this is mostly used by the client application) |
attr1 |
matches the ctattr1 given in the request |
attr2 |
matches the ctattr2 given in the request |
data.data[i][0] |
matching 1st dimension value |
data.data[i][1] |
matching 2nd dimension value |
data.data[i][2] |
absolute frequency |
data.data[i][3] |
base set size for i.p.m. calculation *️⃣ |
*️⃣ More information about base set size:
- in case of a relationship between two structural attributes, the value is always
1000000
which should be interpreted as "not applicable" - in case of two positional attributes, the base set size equals the size of a respective concordance
- in case of one positional and one structural attribute, the base set size is a number of tokens in a subcorpus specified by a respective structural attribute value (i.e. not affected by a respective concordance)
Example (you must be logged-in to KonText):