-
Notifications
You must be signed in to change notification settings - Fork 22
HTTP API
Tomas Machalek edited this page Sep 23, 2022
·
26 revisions
- Creating a concordance query
- Displaying a concordance
- Frequency distribution for text types
- Frequency distribution for positional attributes
- Two-dimensional frequency distribution
While it is possible to use both simple and advanced query types it is strongly advised to use the advanced variant when dealing with the API as the query encoding is much easier in such case.
- URL:
/query_submit?format=json
- HTTP Method:
POST
- content type:
application/json
Request body:
{
"type": "concQueryArgs",
"maincorp": "syn2020",
"usesubcorp": null,
"viewmode": "kwic",
"pagesize": 40,
"attrs": ["word","tag"],
"attr_vmode": "visible-kwic",
"base_viewattr": "word",
"ctxattrs": [],
"structs": ["text","p","g"],
"refs": [],
"fromp": 0,
"shuffle": 0,
"queries": [
{
"qtype": "advanced",
"corpname": "syn2020",
"query": "[word=\"celou\"] [lemma=\"pravda\"]",
"pcq_pos_neg": "pos",
"include_empty": false,
"default_attr":"word"
}
],
"text_types": {},
"context":
{
"fc_lemword_wsize": [-5, 5],
"fc_lemword": "",
"fc_lemword_type": "all",
"fc_pos_wsize": [-5, 5],
"fc_pos": [],
"fc_pos_type": "all"
},
"async": false
}
name | description |
---|---|
type | this is always a constant concQueryArgs
|
usesubcorp |
null or name of user's subcorpus |
viewmode |
kwic|sen|align (align works only for parallel corpora) |
pagesize | a positive number specifying size of the resulting page |
attrs | a list of positional attributes we want to retrieve |
attr_vmode | visible-all|visible-kwic|visible-multiline|mouseover |
base_viewattr | the main attribute the flow of text will be based on |
ctxattrs | TODO |
structs | a list (possibly empty) of structural attributes to be shown |
refs | a list (possibly empty) of additional metadata attached to each row |
fromp | a number specifying a starting page |
shuffle | `0 |
queries | a list of objects, each for active corpus (normally 1 item, for aligned corpora > 1) |
queries[].qtype | advanced |
queries[].corpname | "syn2020", |
queries[].query | A CQL query (e.g. [word=\"their\"] [lemma=\"truth\"] ) |
queries[].pcq_pos_neg | applies for aligned corpora queries |
queries[].include_empty | true|false |
queries[].default_attr | a positional attribute applied for simplied CQL expressions |
Response:
- HTTP status:
201 Created
(if without errors) - content type:
application/json
{
"size": 110,
"finished": true,
"conc_args": {
"maincorp": "syn2020",
"viewmode": "kwic",
"pagesize": 40,
"attrs": "word,tag",
"attr_vmode": "visible-kwic",
"base_viewattr": "word",
"structs": "text,p,g"
},
"query_overview": {
},
"Q": [ "~gUgICee6K2ka" ],
"conc_persistence_op_id": "gUgICee6K2ka"
}
name | description |
---|---|
size | size of the resulting concordance (in tokens) |
finished | if async is set to true then this is always true
|
conc_persistence_op_id | a public ID of the resulting concordance |
conc_args | additional parameters affecting how the concordance is displayed |
- URL:
/view
- HTTP Method:
GET
name | description |
---|---|
q | concordance persistence ID; the value must have a ~ prefix to distinct fully stored queries from legacy/NoSkE ones |
format | for API use, json is required (without it, an HTML page is returned |
(only a subset of the most important entries is shown below)
{
"kwiclen": 2,
"Lines": [
{
"Left": [], // see the following section for the description
"Kwic": [], // ditto
"Right": [] // ditto
},
{
"Left": ["..."],
"Kwic": ["..."],
"Right": ["..."]
}
],
"conc_persistence_op_id": "RSiw4GIgW08s",
"concsize": 115,
"result_arf": 51.31,
"result_relative_freq": 0.94
}
The format of Left
, Kwic
, Right
entries is as follows:
[
{
"str": "setměním",
"class": "",
"tail_posattrs": ["setmění", "NNNS7-----A----"]
}
// other items/positions
]
attribute | description |
---|---|
str | value of the token (or structure - e.g. <p> ) |
class | type of the value - empty string (normal token), col0 coll (for KWIC), strc (structure) |
tail_posattrs | additional positional attributes for the position (e.g. tag, lemma,...) - based on attrs and attr_vmode
|
- URL:
/freqtt
- HTTP Method:
GET
name | description |
---|---|
q |
concordance persistence ID; the value must have a ~ prefix to distinct fully stored queries from legacy/NoSkE ones |
fttattr |
dot-separated structure and structural attribute (e.g. doc.first_published ) |
ftt_include_empty |
0,1 - if 1 then also values with no occurrences will be returned |
flimit |
0,1,...,N - a minimum absolute frequency of a searched phenomenon |
format |
json (otherwise, an HTML page is returned |
- HTTP status:
200 OK
(if without errors) - content type:
application/json
Important entries:
name (path) | description |
---|---|
Blocks |
Frequency results (array) |
Blocks[i] |
Frequency results entry |
Blocks[i].Head |
Array of respective columns (typically - 1) value of a respective structural attr., 2) absolute frequency, 3) ipm |
Blocks[i].Items |
Array of individual lines |
Blocks[i].Items[i].Word[0].n |
Value of a respective structural attribute |
Blocks[i].Items[i].freq |
Absolute frequency |
Blocks[i].Items[i].rel |
Instances per million (ipm) |
🚧
- URL:
/freqct
- HTTP Method:
GET
name | description |
---|---|
q |
concordance persistence ID; the value must have a ~ prefix to distinct fully stored queries from legacy/NoSkE ones |
ctattr1 |
attribute applied for the 1st dimension (both positional and structural attributes are supported) |
ctattr2 |
the same as ctattr1 but for the 2nd dimension |
ctfcrit1 |
the 1st dimension criterion |
ctfcrit2 |
the 2nd dimension criterion |
ctminfreq |
a minimum frequency of included entries; the units are defined by the ctminfreq_type parameter |
ctminfreq_type |
abs - absolute freq., pabs - percentile of abs. freq., ipm - instances per million, pipm - percentile of ipm |
- HTTP status:
200 OK
(if without errors) - content type:
application/json
name (path) | description |
---|---|
freq_type |
"2-attribute" (this is mostly used by the client application) |
attr1 |
matches the ctattr1 given in the request |
attr2 |
matches the ctattr2 given in the request |
`data.data[i][0] | matching 1st dimension value |
`data.data[i][1] | matching 2nd dimension value |
`data.data[i][2] | absolute frequency |
`data.data[i][3] | base set size for ipm calculation *️⃣ |
*️⃣ More information about base set size
- in case of a relationship between two structural attributes, the value is always
1000000
which should be interpreted as "not applicable" - in case of two positional attributes, the base set size equals the size of a respective concordance
- in case of one positional and one structural attribute, the base set size is a number of tokens in a subcorpus specified by a respective structural attribute value (i.e. not affected by a respective concordance)
Example (you must be logged-in to KonText):