-
Notifications
You must be signed in to change notification settings - Fork 1.6k
jq Language Description
- Purpose of this Page
- Notation
- The jq Language
- jq Program Structure and Basic Syntax
- Data Types
- Array and Object Accessors and Iterators
- Lexical Symbol Bindings: Function Definitions and Data Symbol Bindings
- Data Flow
- Generators and Backtracking
- Reductions
- Path Expressions
- Assignments
- Built-in Functions
- Special Forms
- List of Built-in Functions
- Side-Effects
- Keywords
The jq documentation is written in a style that hides a lot of important detail because the hope is that the language feels intuitive. Some users need documentation that includes such details and more — this page is for them. Such users should also read the jq Advanced Topics wiki page.
This page, too, can hopefully form the basis for a formal specification of the jq language.
Besides making use of the jq language, whenever referring to functions, sometimes the number of arguments to that function will be denoted as symbol/N
, so, foo/0
(function named “foo” with no arguments), bar/2
(function named “bar” with two arguments), and so on. E.g., foo(a; b)
is equivalent to foo/2
, but only the former is syntactically a jq expression, while the latter is used only in documentation.
When we refer to “jq programs” we mean, programs written in the jq language.
When we refer to the jq(1)
command-line executable, we refer to it as the “jq command-line processor” or “jq(1)” — the “(1)” in “jq(1)” refers to the operating system manual section for commands. The jq command-line processor compiles and executes jq programs, but the way the jq command-line processor and the jq program interact with the world depends on what command-line options are used — those are not covered here. See the jq documentation for details.
jq is a dynamically-typed functional programming language with second-class higher-order functions of dynamic extent, pervasive backtracking, generalized assignments, and pervasive immutability.
All values are immutable (that is, they are copy-on-write), but the language makes it seem as though they are mutable.
Every expression is a closure, or “thunk”, if you wish, that gets applied to a singular input value.
Every expression can be a generator that can produce zero, one, or more outputs. Generator expressions are allowed in every context, and every context consumes all generator outputs with few exceptions (e.g., first(generator)
will only produce the first output of the generator
expression).
Generators can run out of outputs, in which case program flow will backtrack to the nearest preceding generator that is still active, resuming it to produce the next result and resume forward program flow. This is what we mean by "pervasive backtracking" -- it's everywhere. Literal values are expressions that produce exactly one value (except for strings which interpolate expressions that generate more than one output).
Functions can be defined by users. Functions also get applied to a singular input value, but they may also get additional argument expressions. A function's argument expressions are closures — or thunks if you wish. Functions, incidentally, like expressions, are all closures in jq, as they close over their lexically-visible environments.
There are no dynamic "variables" or bindings in jq, not for values nor for functions.
The output(s) of an expression can be passed to another using the |
operator. expressionA | expressionB
applies expressionA
to some input, then expressionB
to all the outputs of expressionA
. Note that this yields the cartesian product of the two expressions, that is: the outputs of expressionA | expressionB
will be all the outputs generated by expressionB
as applied to every output generated by expressionA
.
As far as the jq language is concerned, a complete jq program is applied only to one singular input value. The jq command-line processor by default applies the jq program to each input to the command-line processor -however many there may be- by restarting the jq program after each input. No state is kept by the jq processor between successive applications of a jq program to different inputs. This behavior of the processor can be controlled by command-line options like -n
and -s
— these are not covered here, but in the manual.
The input to an expression can be referred to explicitly as .
-- this is just an identity operator. This is useful because jq lacks automatic currying, thus an expression that adds 1
to its input reads like so: . + 1
(or 1 + .
). The +
operator is syntactic sugar, and . + 1
desugars to _plus(.; 1)
where _plus/2
is a special built-in function; similarly for other infix operators and the prefix unary operator -
.
The only data types available to jq programs are JSON’s types: scalars (null
, boolean true
and false
, numbers, and strings) and non-scalars (objects and arrays). jq functions have no type information beyond their arity, and are not first-class values.
Expressions (including functions) are not a value type, even though expressions can be passed to functions as arguments. A function’s expression arguments themselves can never be saved as values in arrays, or objects, or as scalars, and thus they cannot be output. The outputs of expressions, on the other hand, can only be values. Thus in def muladd(m; a): (. * m) + a;
the m
and a
symbols are as-if functions that are applied, so muladd(5; 1)
does the obvious thing, but muladd(.+1; ./2)
might be less obvious: it is akin to writing (. * (. + 1)) + (./2)
. The fact that function arguments are thunks, plus jq’s generator/backtracking semantics, recursion, tail-call optimization, path expressions, and reduction operators, allows jq functions to implement powerful abstractions and flow control constructs almost as if jq functions were macros.
The jq language is a “Lisp-2”, in that it has separate symbol namespaces for function and data. Its closures/thunks are of dynamic extent, thus allocated on a stack and deallocated automatically when their defining scopes are exited — this is one reason that jq cannot have closure/thunk/function values, as it would then be difficult or impossible to prevent their use after being deallocated.
The Icon programming language, for example, also has semantics that can be and are implemented using closures of dynamic extent. Dynamic-extent closures are sufficient for implementing depth-first backtracking, at the cost of needing co-routines for breadth-first searches. jq does not yet have co-routines, unlike Icon, which has had them for decades.
Another reason that jq cannot have first-class function values is that jq deals in JSON texts as inputs and outputs, or else raw text, and there is no JSON representation of jq functions, and there really is no standard for representing code in raw text. One can imagine a variant of jq that has first-class function values, and a first-class function type or types, with closures having indefinite extent, but still, something would have to be done about output.
Still, because jq allows local functions in most expressions, and because of its lexical scoping rules, the fact that its functions are not first-class values of first-class types is not so restrictive.
Every jq program consists of exactly one expression. This expression can include any number of module imports/includes and function definitions. Comments are introduced by a #
character and run through the end of the line.
# Module imports, includes, and function definitions:
import "a" as foo;
include "b";
def some_function: body_here;
# ...
#
# Finally, the main program, really, a singular expression:
some_expression | some_other_expression # and so on
# But note that you can have `def ...` in any expression.
Expressions can be pipelined, where the output(s) of each pipeline stage are the inputs to the next:
some_expression | some_other_expression
There are a number of special forms, such as constant literals, array/object accessors and iterators, “variable” bindings and destructuring, conditionals, and so on.
Every expression has a singular input value and zero, one, or more output values.
A function foo
is called by just writing its name: now | foo
applies foo
to the result of now
(which is a function that returns the current time). An expression like (1, 2) | foo
means calling foo
twice, first applying it to 1
, then again to 2
: when foo
as applied to 1
completes, jq backtracks to produce a new value (here, 2
) to apply foo
to.
Functions can have arguments. Again, function arguments are not value arguments but thunk arguments. Calling a function with arguments looks like this: bar(some_expression; another_expression)
. Pay close attention to the use of ;
for separating argument expressions, and do not confuse it with ,
— ,
is an operator that joins the outputs of the expressions on its left and right, while ;
is only a syntactic separator that separates function arguments and terminates function bodies.
Semi-colons are required for terminating import
, include
, and def
function bodies, as well as for separating expression arguments to functions of more than one expression argument.
Whitespace is not significant.
jq supports only JSON’s data types: null
, boolean (true
and false
), strings, numbers, arrays, and objects. Arrays are zero-based.
There is no way to declare any new data types, but objects and arrays can be used to represent complex data types.
jq is a dynamically-typed language.
The expression expr[]
outputs all the values in the array or object output by expr
. E.g., [0, 3][]
outputs 0
, then 3
. .[]
outputs the values in .
, so [range(3)] | .[]
outputs 0
, 1
, and 2
.
The expression expr[N]
outputs the Nth element of the array output by expr
. Thus [range(10)][2]
outputs 2
, and [range(10)][-1]
outputs 9
. Arrays are zero based, with negative indices referring to values from the right end.
The expression expr.ident
outputs the value of the key named "ident"
in the object output by expr
. So {a:0}.a
outputs 0
.
The expression expr["some key string"]
outputs the value of the key named "some key string"
in the object output by expr
.
The expression expr."some key string"
outputs the value of the key named "some key string"
in the object output by expr
.
These things can chain. Thus .a["b"].c[].d
outputs the value of the key named "d"
in the objects output by .a.b.c[]
, which are all the values in the array at .a.b.c
, which in turn is the value of the key named "c"
in .a.b
, and so on.
In jq there are two types of symbols: function symbols, and value symbols.
Function symbols are any ident-like symbols, while value symbols are any ident-like symbols prefixed with a $
.
Ident-like means: starts with a letter or underscore and consists only of letters, digits, and underscores. foo+1
parses as foo
+
1
.
Thus $foo
is a symbol that evaluates to a value, while foo
is a symbol that evaluates to a function (or closure/thunk). Though even a data symbol is an expression, and thus a thunk — one that ignores its input and always outputs the value bound to that data symbol. There is no relation between data and function symbols of the same name.
A symbol (in any context other than where it gets defined) always effectively applies the function named. $foo
is a function that ignores its input and produces the value that $foo
is bound to. foo
is some function that gets applied to its input. foo(expr)
is a function that gets applied to its input — what it does with expr
is up to the foo/1
function’s body.
Functions are defined with def IDENT: BODY;
or def IDENT(arg0; arg1; ..; argN): BODY;
. Any reference to the function’s name in the body is bound to the same, which then allows recursion. The arguments are themselves also functions bound to the expressions passed in at where the defined function is applied. Function definitions can be included just about everywhere (e.g., ... | def foo: ...; ...
).
E.g., in the body of a function defined as def cond(c; t; f): if c then t else f end;
the function symbols c
, t
, and f
, are bound to the first, second, and third argument expressions, respectively, and the name of the function, cond
with ariness 3, is made visible to all jq code that follows its definition.
Note well that foo
, foo(expr)
, foo(expr0; expr1)
, and so on, are all different functions. The number of arguments passed determines which foo
is applied. We can and do refer to the first as foo/0
, the next as foo/1
, and so on.
Data symbol bindings are introduced with expr as $NAME | ...
. The |
is required. The binding is visible to all expressions to the right of the |
.
Lexical bindings shadow earlier bindings of the same names. For example:
def foo:
def foo:
def foo: .+1; # Just .+1
foo*3; # Same as (.+1)*3
foo+5; # Same as ((.+1)*3)+5)
In this example, the foo
in the outermost function body normally would have been bound to the function itself (being named foo
), thus causing infinite recursion in this case, but because foo
is immediately shadowed by a local function foo
, the foo
in the body is bound to that local function.
Function symbol bindings are introduced only by def
s, def
s in modules import
ed or include
d, or by jq itself in the case of built-in function symbols. The names of the argument thunks are lexical bindings available to the body of the function — and to the functions defined inside that function.
Recursion is possible because a function’s name is visible to its body:
def fact:
if . == 0 then 1
elif . > 0 then .*(.-1|fact)
else "fact not defined for negative numbers"|error
end;
A tail-recursive version of fact
:
def fact:
# Helper that keeps state as an array of [$n, $result]:
def fact:
if .[0] == 0 then .
else .[0] as $n |
(.[1] *= $n) |
(.[0] -= 1) | fact
end;
select(. >= 0) | [., 1] | fact | .[1];
or
def fact:
if . == 0 then 1
elif . < 0 then empty
else reduce (range(.) + 1) as $n (1; . * $n)
end;
Note the two kinds of scopes:
- function scopes are introduced by
def
, and function symbols are visible (assuming no shadowing) to all expressions in thedef
s that introduce them- function symbols are also visible to all subsequent
def
s at the same level
- function symbols are also visible to all subsequent
- value scopes are introduced by
... as $name | ...
and are visible (assuming no shadowing) to all expressions to the right of the|
Value scopes are also introduced by destructuring forms, which are a generalization of ... as $name | ...
.
Recall that every expression gets a singular input value, and that expressions can be chained with |
. The outputs of each expression are then passed as input to the expression to the right (if any). The jq command-line processor prints the outputs of the right-most expression.
This means that values flow from left to right. Each expression in a pipeline can “transform”/replace its input value with zero, one, or more values. When an expression produces no more values (possibly none at all), the expression on the left is resumed to see if it can produce another value, in which case the expression on the right is applied de novo to the new value.
Every expression can output zero, one, or more values.
The primitive expression that outputs zero is empty
, and it causes backtracking / pruning.
When an expression produces a value, the evaluation state of that expression is “suspended” while the output is processed by applying the expression to the right to that value.
When an expression in a pipeline produces no further values, then control returns to the expression to the left of it in the pipeline.
E.g., in range(5) | if .%2==1 then ., .*2 else empty end
, the conditional expression is applied to each output of range(5)
, but for some such values (even numbers) it will “output” empty
, which is to say, nothing, and it backtracks, while for other input values (odd numbers) it will output two numbers then backtrack. Each time the if statement in that example backtracks, the range(5)
to its left will resume and output the next value, until it runs out, in which case it will backtack, and being the first expression in the jq program, its backtracking will cause the program to terminate. Note that if this program is invoked via the jq command-line processor (as opposed to the C API for invoking jq programs), then the command-line processor may read another value from stdin
and apply the jq program to it all over again.
The array/object value iterator expression, .[]
, outputs all the values in the array/object.
The comma operator, ,
outputs the value(s) of the expression on the left, then the values of the expression on the right, but both expressions will be applied to the same input. For example, range(3;6) | (., . * 2)
outputs 3
, 6
, 4
, 8
, 5
,10
.
The inputs
builtin outputs all the inputs read from the jq command-line processor’s stdin
.
The range()
builtin outputs a sequence of numbers. E.g., range(5)
outputs the numbers 0 through 4, inclusive.
jq does not have lazy evaluation as such. But because all function arguments are thunks that may or may not get evaluated (depending on what the called function chooses to do), and because function argument thunks can output multiple values, jq effectively has lazy evaluation after all.
Consider the limit/2
builtin function: it outputs the first $n
values of its second argument thunk:
$ time jq -cn '[limit(5; range(1000000))]'
[0, 1, 2, 3, 4]
real 0m0.02s
user 0m0.02s
sys 0m0.00s
$
In fact, the limit/2
builtin function really does limit how many values its second argument produces. No matter how many values its second argument wants to produce, once the $n
th value is reached, evaluation stops.
Here’s a definition of limit/2
:
def limit($n; exp):
if $n < 0 then exp
else label $out | foreach exp as $item ($n; .-1; $item, if . <= 0 then break $out else empty end)
end;
Incidentally, function arguments named $name
are just a small amount of syntactic sugar. The following definition of limit/2
is equivalent to the above:
def limit(n; exp):
n as $n |
if $n < 0 then exp
else label $out | foreach exp as $item ($n; .-1; $item, if . <= 0 then break $out else empty end)
end;
Arrays are not lazy in jq, therefore they always have a definite size, and they take up O(N) space.
jq expressions and functions can output zero, one, or more values. A jq expression that outputs one billion values takes up O(1) memory, not O(N). Therefore “streaming”, i.e., generating many values, is cheaper than collecting those values into an array.
Consider the map
function, and a variant that streams:
def map(f): [.[] | f];
def map_values(f): .[] | f;
The first, map/1
, is the standard “map” function one finds in most functional programming languages. The second, map_values/1
is a streaming version of map/1
.
Whenever possible, jq programmers should prefer to stream values.
jq has a couple of reduction primitives:
reduce stream_expression as $name (initial_value; update_expression)
and
foreach stream_expression as $name (initial_value; update_expression; extract_expression)
These allow the programmer to apply an update expression successively to its own outputs, but with a lexical binding for each of the stream_expression
’s outputs.
E.g., reduce range(5) as $n (0; .+$n)
adds the numbers from 0 to four, inclusive. In this example $n
in the update expression is bound to each successive input from the stream expression (which here is range(5)
), and the expression .+$n
is applied to the reduction’s state value, and the output of .+$n
becomes the next reduction state value. When the stream expression runs out of inputs, the final reduction state value is output.
The foreach
reduction operator can output intermediate state values, and will do so whenever the third expression, the extraction expression (optional and by default equal to .
), outputs a value (if it outputs no values, then foreach
will update the reduction state with the next input). (Note: it probably would have been best to not introduce a new syntactic construct for this, just add an expression to the existing reduce
construct.)
Note that though a reduction like reduce range(5) as $n (0; .+$n)
is equivalent to 0 + 0 | . + 1 | . + 2 | . + 3 | . + 4
, jq uses much less state to implement the reduction.
Note that while the state update expression is running, jq does not retain any additional references to that expression’s input value. This means that from the second update forward, the reduction state value never has more than one reference. This is critical because when values have just one reference, then “mutation” operations that normally copy-then-write, just mutate in-place. See more about this below.
A path expression is any expression which when given to path(EXPR)
, does not yield an error. This is a terrible description. Let us try again.
A path expression is any expression which is composed entirely of:
.
- array/object traversal operators (index computation operations, however, need not be path expressions),
- the object/array iterator
.[]
-
.[KEY_EXPR]
in all its variants, but note thatKEY_EXPR
itself need not be a path expression:.ident
."string key"
.["string key"]
.[INTEGER]
-
.[start_index:end_index]
array slice operator
- the object/array iterator
-
if ...
where thethen
/else
branches are path expressions (the condition expression, however, need not be a path expression), - path expressions chained with
|
, -
$binding
assignments (e.g.,(.+.) as $x | ...
), - multiple path expressions joined with
,
, -
empty
,error
, andbreak $label
, and last but not least: - function calls where their bodies all consist of path expressions (even recursive functions).
Every kind of expression not listed above is not a path expression and is an error to include in the left-hand side of an assignment, including:
- expressions using any of the binary operators
+
,-
,*
,/
,%
, - expressions using the unary operator
-
, - any reference to
$binding
s, - any literals,
- any other expressions not listed here that are also not listed above as being path expressions.
Not every expression is a path expression. For example, .a.b
is a path expression, but .a + .b
is not! foo
is a path expression if and only if the body of function named foo
is a path expression.
The purpose of defining "path expression" is to enable assignment forms. The left-hand side expression of every assignment is always internally passed to path/1
.
The path/1
builtin (path(path_expression)
) outputs arrays of strings and numbers representing the paths through the input value matched by the given path_expression
. path/1
is, essentially, a pattern-matching primitive. Thus null | path(.a[0].b)
outputs ["a",0,"b"]
. The assignment operators are syntactic sugar that use path(left_hand_side)
to compute paths to then call setpath/2
with to set new values at those paths in the input to the assignment.
As we’ll see in Assignments, the path/1
built-in is essential to the construction of assignment operators.
Because in jq arguments to functions are thunks, it is not possible from local syntactic analysis to tell whether an expression must be a path expression — a function’s body might or might not pass a thunk to (It should be possible for the jq compiler to determine if some expression is a path expression, and also to determine if a function argument must be a path expression, thus being able to report path expression errors at compile-time. However, the jq compiler is not that smart at this point.)path/1
. One must either inspect the function’s documentation or its body.
Passing a non-path expression to path/1
will yield a run-time error, so it is important to know which expressions must be path expressions. As we’ll see in Assignments, the left-hand side expressions of assignment forms must be path expressions.
Given a datum like {"a":{"b":[{"c":0},{"d":1}]}}
we can have path expressions like:
-
..
=> matches all paths in the input -
.a.b[0].c
=> matches the path to the value0
-
.a.b[1].d
=> matches the path to the value1
-
.a[][][]
=> matches all the leaf paths in this input -
.a.b|..
=> matches all paths below.a.b
and so on.
Examples:
$ printf '%s\n' '{"a":{"b":[{"c":0},{"d":1}]}}' | jq -c 'path(..)'
[]
["a"]
["a","b"]
["a","b",0]
["a","b",0,"c"]
["a","b",1]
["a","b",1,"d"]
Here we’ll expose some of jq’s internals for the purpose of listing all of the sorts of sub-expressions of path expressions that are exempted from having to contribute to path-building. The reader can gloss over the internals details if they wish and focus only on the list of exemptions below. (XXX Perhaps we should remove all internals details?)
The jq VM interpreter has four special opcodes for dealing with path expressions:
-
PATH_BEGIN
andPATH_END
, which bracket calls to the path expression argument topath/1
, and -
SUBEXP_BEGIN
andSUBEXP_END
, which bracket calls to sub-expressions which are not intended to contribute to path building.
For example, conditional expressions in if
forms are bracketed with SUBEXP_BEGIN
and SUBEXP_END
opcodes.
Thus we can look at all the forms where bytecode is generated via gen_subexp()
to see what sorts of expressions are exempted from having to contribute to path-building:
- evaluation of index expressions such as
index_expr
in.[index_expr]
(seegen_index()
) - evaluation of array slice start/end expressions such as
start_exp
andend_exp
in.[start_exp:end_exp]
(seegen_slice_index()
) - evaluation of empty object construction,
{}
(see'{' MkDict '}'
case ofTerm
insrc/parser.y
) - evaluation of object key and value expressions in object construction syntax (see
gen_dictpair()
) - evaluation of conditional expressions (see
gen_cond()
) - evaluation of value expressions in data symbol binding forms (see
gen_var_binding()
) (i.e., inpath(5 as $five | ...)
, the5
does not contribute to path building, whereaspath(5 | ...)
would yield a run-time error) - evaluation of path expressions in destructuring, which is a generalized form of data symbol binding (see
gen_array_matcher()
andgen_object_matcher()
) - evaluation of argument expressions in calls to C-coded built-in jq functions (see
expand_call_arglist()
)
We have had bugs in the past relating to incorrect or missing uses of gen_subexp()
, and bugs related to insufficient or excessive run-time sanity checking of path-building. See path_intact()
and path_append()
in src/execute.c
.
Note too that path-building context can nest. That is, one can have path expressions with path expressions inside them. This is done by making path building context part of expression evaluation stack frames (jq has a stack, naturally). For example, foo = 1
where foo/0
has a body that itself uses path/1
.
jq has assignment operators. But jq values are immutable. So how can jq possibly have assignments?!
Well, assignments in jq desugar into reductions over the paths matched by the path expressions on the left-hand side (LHS) modifying the values at those paths (in the input value) according to the right-hand side (RHS) expression. Modifications are copy-on-write modifications (and, when there is just one reference to a value, the modifications are in-place as an optimization).
The use of path expressions can make jq assignments resemble Lisp generalized variables (setf
macros), or Icon place references. For example, here we see a function foo
functioning a lot like a Lisp generalized variable (Lisp setf
macros):
$ jq -cn 'def foo: .a.b; {a:{b:{c:0}}}|(foo.c += 1)'
{"a":{"b":{"c":1}}}
Note that foo
here is a function whose body is a path expression, and that one would normally use such a function to extract sub-values of .
, but here jq is able to let assignments work with this function foo
as if foo
itself were a path expression! This is what is termed "generalized variables": the ability to assign values through arbitrarily complex code (including function calls) that is otherwise only meant for reading.
The +=
assignment operator desugars to lhs |= . + rhs
, and |=
desugars into _modify(lhs; rhs)
, and _modify
is defined as (simplified):
def _modify(paths; update):
reduce path(paths) as $p (.; setpath($p; getpath($p) | update));
Note that the lhs
in assignments ultimately gets passed to path/1
, thus making the LHS of assignments… path expressions!
What does _modify/2
do? It:
- produces all the paths in the input value as arrays of path component numbers and/or strings (
path(paths)
) - reduces these with the original input value as the initial reduction state
- for each path it gets the value in the input at that path (
getpath($p)
) - evaluates
update
on that value (getpath($p) | update
) - and finally “mutates” the reduction state value (
.
) by setting the new value at the same path (setpath($p; ...)
)
It’s important to note that values are immutable, which means that all mutation operations return a new copy of their input modified according to the desired mutation. Thus setpath(...; ...)
doesn’t modify its input, but it produces a new value as its output that is a copy of the input modified according to setpath()
’s arguments.
It’s also important to note that whenever there is a single reference to a value, internally jq will in fact mutate it rather than copy it, and this is obviously correct and performant.
All the assignment operators except =
work this way. Those that combine operators like +
, -
, and so on, with assignment, desugar into _modify(lhs; . OPERATOR rhs)
, while |=
desugars into _modify(lhs; rhs)
.
The =
operator passes the same value as input to the RHS as the input to the lhs, and desugars into _assign(paths; value)
. _assign()
is defined as:
def _assign(paths; value):
value as $v | reduce path(paths) as $p (.; setpath($p; $v));
Note that _assign()
applies value
(the RHS) to its input once at the beginning, creates a lexical binding for that value ($v
), and then sets all the paths
to that value $v
. Thus .[] = range(5)
will produce five outputs, each with all the value slots in the .
array or object set to 0
, then all set to 1
, and so on. This can be surprising.
In modify-assignments (|=
, +=
, etc.), it makes no sense to have more than one output in the value update expression. The actual _modify
looks like this:
def _modify(paths; update):
reduce path(paths) as $p (
.;
label $out | (setpath($p; getpath($p) | update) | ., break $out),
delpaths([$p]));
which means that when the value update expression outputs more than one value, only the first is used, and when it outputs no values, then the path is deleted. I.e., .a |= select(.%2 == 1) + 1
deletes .a
from .
if the value at .a
is an even number, else it adds one to it:
$ jq -cn '{a:0,b:true}|.a |= select(.%2==1) + 1'
{"b":true}
$ jq -cn '{a:1,b:true}|.a |= select(.%2==1) + 1'
{"a":2,"b":true}
while .a |= range(5)
sets .a
to 0
:
$ jq -cn '{a:1,b:true}|.a |= range(5)'
{"a":0,"b":true}
There are three types of built-in functions:
-
jq-coded functions
These are functions defined in
src/builtin.jq
, and they are compiled as any user-defined functions. -
bytecoded functions
These are functions defined in
src/builtin.c
, and they consist of hand-craftedblock
representations of jq programs. (Ablock
is an AST-ish output of the jq program parser, which straightforwardly gets compiled to bytecode.)For example, the
empty
built-in function has a one-opcode body, and that opcode isBACKTRACK
.The full list of bytecoded built-in functions is very short, at this time being just:
empty/0
not/0
path/1
range/1
range/2
andrange/3
are jq-coded, not bytecoded, and are made possible by tail recursion optimization. -
C-coded jq functions
These functions are defined in
src/builtin.c
. These functions do not actually accept thunks as arguments, only values, therefore the jq compiler wraps invocations of C-coded functions with a bytecoded wrapper that applies any argument thunks to.
, roughly like so:def _jq_call_c_coded_foo(a; b): a as $a | b as $b | _call_c_coded_foo($a; $b);
.C-coded functions have C prototypes of this form
jv name(jv input)
for zero-expression-argument functions,jv name(jv input, jv a)
for one-expression-argument functions,jv name(jv input, jv a, jv b)
for two-expression-argument functions, and so on up to six arguments. jq-coded functions have no such limit on the number of expression arguments they accept, but they are limited to however many arguments they can address given that compiler jq-coded function bodies are limited to 2^16 opcodes per function body.
With the exception of if-then-else constructs, .[]
, and a few other such constructs, everything in jq involves applying functions.
[ expr ]
is a special form that collects the outputs of expr
into an array. It desugars into something like reduce expr as $value ([]; setpath(length; $value)
. The object constructor, { ... }
is similar.
If-then-else constructs are a special form.
There are a number of others, and these are all defined in src/parser.y
, and are described in the manual.
A partial list of special forms follows:
-
import "name" as prefix;
– imports the module name"name"
and makes its symbols available asprefix::name
-
include "name";
– imports the module named"name"
and makes its symbols available as if the module had been included verbatim -
.
– the current input value - literal values, i.e., numbers,
"strings"
,true
,false
, andnull
-
"this \(expr) interpolates the outputs of expr into this string"
– string interpolation - binary infix operators
- comparison operators:
==
,!=
,<
,>
,<=
,>=
- arithmetic operators:
+
,-
,*
,/
,%
- comparison operators:
- unary prefix negation operator
-
-
[ expr ]
– collectexpr
’s outputs into an array - object construction syntax (see manual)
-
term[index_expr]
– output the value atindex_expr
inexpr
-
term . ident
– same asterm["ident"]
-
term . "name"
– same asterm["name"]
-
..
– produce all the values in.
in pre-order order recursively -
term[start_expr : end_expr]
– array slice operator -
expr ?
– suppress errors fromexpr
-
label $name | ... | break $name
– fancyempty
that unwinds all of...
- assignment operator:
=
- modify-assignment operators:
|=
,+=
,-=
,*=
,/=
,%=
- logical operators:
not
,and
,or
-
__loc__
– evaluates to the{file: FILENAME, line: LINENO}
where__loc__
occurs -
$ident
– value binding’s value -
ident
– applies functionident
to.
-
ident(expr)
– appliesident
called withexpr
to.
-
ident(expr0; expr1)
– applies… - comma operator
,
– outputs the values of the expression to the left, then those of the expression to the right, both expressions applied to the same input value if cond_expr0 then true_expr0 elif cond_expr1 then true_expr1 ... else false_expr end
-
try expr catch handler_expr
– invokeshandler_expr
on the error raised byexpr
, if any - reduction syntax (see above)
- function definition (see elsewhere here)
- data symbol binding (
expr as $name | ...
) and destructuring syntax (see manual)expr as $name | ...
expr as [$name, $other_name] | ...
expr as {$name, $other_name} | ...
expr as {$name:[$thing1, $thing2], $other_name} | ...
-
@sh
,@json
,@csv
,@tsv
,@html
,@uri
,@base64
,@base64d
– format / escape string forms
Note that path(expr)
, though very special, is not a special form. path(expr)
is a bytecoded-function whose body invokes its argument expression thunk bracketed with opcodes that cause the paths in .
traversed by that expression to be recorded and output one by one.
Order: from highest precedence to lowest.
Operator | Associativity | Description |
---|---|---|
?// |
nonassoc | destructuring alternative operator |
? |
none | postfix operator, coerces errors to empty
|
- |
none | prefix negation |
* / %
|
left | polymorphic multiply and divide; modulo |
+ -
|
left | polymorphic plus and minus |
== != < > <= >=
|
nonassoc | equivalence and precedence tests |
and |
left | boolean “and” |
or |
left | boolean “or” |
= |= += -= *= /= %= //=
|
nonassoc | assign; update |
// |
right | coerces null , false and empty to an alternative value |
, |
left | concatenate/alternate two filters |
| |
right | compose/sequence two filters |
(...) |
scope delimiter and grouping operator |
Use jq -nr 'builtins[]'
to list all the built-in functions.
At this time that list includes:
IN/1
IN/2
INDEX/1
INDEX/2
IN_INDEX/2
JOIN/2
JOIN/3
JOIN/4
LOOKUP/2
UNIQUE_INDEX/2
acos/0
acosh/0
add/0
all/0
all/1
all/2
any/0
any/1
any/2
arrays/0
ascii_downcase/0
ascii_upcase/0
asin/0
asinh/0
atan/0
atan2/2
atanh/0
booleans/0
bsearch/1
builtins/0
capture/1
capture/2
cbrt/0
ceil/0
combinations/0
combinations/1
contains/1
copysign/2
cos/0
cosh/0
debug/0
del/1
delpaths/1
drem/2
empty/0
endswith/1
env/0
erf/0
erfc/0
error/0
error/1
exp/0
exp10/0
exp2/0
explode/0
expm1/0
fabs/0
fdim/2
finites/0
first/0
first/1
flatten/0
flatten/1
floor/0
fma/3
fmax/2
fmin/2
fmod/2
format/1
frexp/0
from_entries/0
fromdate/0
fromdateiso8601/0
fromjson/0
fromstream/1
gamma/0
get_jq_origin/0
get_prog_origin/0
get_search_list/0
getpath/1
gmtime/0
group_by/1
gsub/2
gsub/3
halt/0
halt_error/0
halt_error/1
has/1
hypot/2
implode/0
in/1
index/1
indices/1
infinite/0
input/0
input_filename/0
input_line_number/0
inputs/0
inside/1
isempty/1
isfinite/0
isinfinite/0
isnan/0
isnormal/0
iterables/0
j0/0
j1/0
jn/2
join/1
keys/0
keys_unsorted/0
last/0
last/1
ldexp/2
leaf_paths/0
length/0
lgamma/0
lgamma_r/0
limit/2
localtime/0
log/0
log10/0
log1p/0
log2/0
logb/0
ltrimstr/1
map/1
map_values/1
match/1
match/2
max/0
max_by/1
min/0
min_by/1
mktime/0
modf/0
modulemeta/0
nan/0
nearbyint/0
nextafter/2
nexttoward/2
normals/0
not/0
now/0
nth/1
nth/2
nulls/0
numbers/0
objects/0
path/1
paths/0
paths/1
pow/2
pow10/0
range/1
range/2
range/3
recurse/0
recurse/1
recurse/2
recurse_down/0
remainder/2
repeat/1
reverse/0
rindex/1
rint/0
round/0
rtrimstr/1
scalars/0
scalars_or_empty/0
scalb/2
scalbln/2
scan/1
select/1
setpath/2
significand/0
sin/0
sinh/0
sort/0
sort_by/1
split/1
split/2
splits/1
splits/2
sqrt/0
startswith/1
stderr/0
strflocaltime/1
strftime/1
strings/0
strptime/1
sub/2
sub/3
tan/0
tanh/0
test/1
test/2
tgamma/0
to_entries/0
todate/0
todateiso8601/0
tojson/0
tonumber/0
tostream/0
tostring/0
transpose/0
trunc/0
truncate_stream/1
type/0
unique/0
unique_by/1
until/2
utf8bytelength/0
values/0
walk/1
while/2
with_entries/1
y0/0
y1/0
yn/2
Most jq built-in functions are pure, but over time we have added a few impure functions:
-
input
– read one input from the standard input (or whatever the jq command-line processor wants to read from) -
inputs
– read as many inputs from the standard input as possible (or whatever the jq command-line processor wants to read from) -
input_filename
- name of the file whose input is currently being filtered -
debug
– output its input to the standard error output -
halt/0
,halt_error/0
, andhalt_error/1
- stops execution (with an error) -
now
– current time
We’d like to add:
- random numbers
- file I/O
- external command I/O
- SQLite3 access
- ...
The jq language has relatively few keywords. These cannot be used for function or data symbols (Note: we could allow keywords in data symbols, but not in function symbols), but they can be used in object construction syntax as keys.
If we update jq to allow keywords as data symbols, we will also allow keywords in destructuring syntax.
Keywords:
$__loc__
and
as
break
catch
def
elif
else
end
foreach
if
import
include
label
module
or
reduce
then
try
- Home
- FAQ
- jq Language Description
- Cookbook
- Modules
- Parsing Expression Grammars
- Docs for Oniguruma Regular Expressions (RE.txt)
- Advanced Topics
- Guide for Contributors
- How To
- C API
- jq Internals
- Tips
- Development