jq Language Description

Purpose of this Page
Notation
The jq Language
jq Program Structure and Basic Syntax
Data Types
Array and Object Accessors and Iterators
Lexical Symbol Bindings: Function Definitions and Data Symbol Bindings
Data Flow
Generators and Backtracking
1. Lazy Evaluation
2. Streaming vs Arrays
Reductions
Path Expressions
1. Sub-expressions that are not Path Expressions
Assignments
Built-in Functions
Special Forms
1. Operators Priority
List of Built-in Functions
Side-Effects
1. Side-Effects Wish-list
Keywords

Purpose of this Page

The jq documentation is written in a style that hides a lot of important detail because the hope is that the language feels intuitive. Some users need documentation that includes such details and more — this page is for them. Such users should also read the jq Advanced Topics wiki page.

This page, too, can hopefully form the basis for a formal specification of the jq language.

Notation

Besides making use of the jq language, whenever referring to functions, sometimes the number of arguments to that function will be denoted as symbol/N, so, foo/0 (function named “foo” with no arguments), bar/2 (function named “bar” with two arguments), and so on. E.g., foo(a; b) is equivalent to foo/2, but only the former is syntactically a jq expression, while the latter is used only in documentation.

When we refer to “jq programs” we mean, programs written in the jq language.

When we refer to the jq(1) command-line executable, we refer to it as the “jq command-line processor” or “jq(1)” — the “(1)” in “jq(1)” refers to the operating system manual section for commands. The jq command-line processor compiles and executes jq programs, but the way the jq command-line processor and the jq program interact with the world depends on what command-line options are used — those are not covered here. See the jq documentation for details.

The jq Language

jq is a dynamically-typed functional programming language with second-class higher-order functions of dynamic extent, pervasive backtracking, generalized assignments, and pervasive immutability.

All values are immutable (that is, they are copy-on-write), but the language makes it seem as though they are mutable.

Every expression is a closure, or “thunk”, if you wish, that gets applied to a singular input value.

Every expression can be a generator that can produce zero, one, or more outputs. Generator expressions are allowed in every context, and every context consumes all generator outputs with few exceptions (e.g., first(generator) will only produce the first output of the generator expression).

Generators can run out of outputs, in which case program flow will backtrack to the nearest preceding generator that is still active, resuming it to produce the next result and resume forward program flow. This is what we mean by "pervasive backtracking" -- it's everywhere. Literal values are expressions that produce exactly one value (except for strings which interpolate expressions that generate more than one output).

Functions can be defined by users. Functions also get applied to a singular input value, but they may also get additional argument expressions. A function's argument expressions are closures — or thunks if you wish. Functions, incidentally, like expressions, are all closures in jq, as they close over their lexically-visible environments.

There are no dynamic "variables" or bindings in jq, not for values nor for functions.

The output(s) of an expression can be passed to another using the | operator. expressionA | expressionB applies expressionA to some input, then expressionB to all the outputs of expressionA. Note that this yields the cartesian product of the two expressions, that is: the outputs of expressionA | expressionB will be all the outputs generated by expressionB as applied to every output generated by expressionA.

As far as the jq language is concerned, a complete jq program is applied only to one singular input value. The jq command-line processor by default applies the jq program to each input to the command-line processor -however many there may be- by restarting the jq program after each input. No state is kept by the jq processor between successive applications of a jq program to different inputs. This behavior of the processor can be controlled by command-line options like -n and -s — these are not covered here, but in the manual.

The input to an expression can be referred to explicitly as . -- this is just an identity operator. This is useful because jq lacks automatic currying, thus an expression that adds 1 to its input reads like so: . + 1 (or 1 + .). The + operator is syntactic sugar, and . + 1 desugars to _plus(.; 1) where _plus/2 is a special built-in function; similarly for other infix operators and the prefix unary operator -.

The only data types available to jq programs are JSON’s types: scalars (null, boolean true and false, numbers, and strings) and non-scalars (objects and arrays). jq functions have no type information beyond their arity, and are not first-class values.

Expressions (including functions) are not a value type, even though expressions can be passed to functions as arguments. A function’s expression arguments themselves can never be saved as values in arrays, or objects, or as scalars, and thus they cannot be output. The outputs of expressions, on the other hand, can only be values. Thus in def muladd(m; a): (. * m) + a; the m and a symbols are as-if functions that are applied, so muladd(5; 1) does the obvious thing, but muladd(.+1; ./2) might be less obvious: it is akin to writing (. * (. + 1)) + (./2). The fact that function arguments are thunks, plus jq’s generator/backtracking semantics, recursion, tail-call optimization, path expressions, and reduction operators, allows jq functions to implement powerful abstractions and flow control constructs almost as if jq functions were macros.

The jq language is a “Lisp-2”, in that it has separate symbol namespaces for function and data. Its closures/thunks are of dynamic extent, thus allocated on a stack and deallocated automatically when their defining scopes are exited — this is one reason that jq cannot have closure/thunk/function values, as it would then be difficult or impossible to prevent their use after being deallocated.

The Icon programming language, for example, also has semantics that can be and are implemented using closures of dynamic extent. Dynamic-extent closures are sufficient for implementing depth-first backtracking, at the cost of needing co-routines for breadth-first searches. jq does not yet have co-routines, unlike Icon, which has had them for decades.

Another reason that jq cannot have first-class function values is that jq deals in JSON texts as inputs and outputs, or else raw text, and there is no JSON representation of jq functions, and there really is no standard for representing code in raw text. One can imagine a variant of jq that has first-class function values, and a first-class function type or types, with closures having indefinite extent, but still, something would have to be done about output.

Still, because jq allows local functions in most expressions, and because of its lexical scoping rules, the fact that its functions are not first-class values of first-class types is not so restrictive.

jq Program Structure and Basic Syntax

Every jq program consists of exactly one expression. This expression can include any number of module imports/includes and function definitions. Comments are introduced by a # character and run through the end of the line.

# Module imports, includes, and function definitions:
import "a" as foo;
include "b";
def some_function: body_here;
# ...
#
# Finally, the main program, really, a singular expression:
some_expression | some_other_expression # and so on

# But note that you can have `def ...` in any expression.

Expressions can be pipelined, where the output(s) of each pipeline stage are the inputs to the next:

some_expression | some_other_expression

There are a number of special forms, such as constant literals, array/object accessors and iterators, “variable” bindings and destructuring, conditionals, and so on.

Every expression has a singular input value and zero, one, or more output values.

A function foo is called by just writing its name: now | foo applies foo to the result of now (which is a function that returns the current time). An expression like (1, 2) | foo means calling foo twice, first applying it to 1, then again to 2: when foo as applied to 1 completes, jq backtracks to produce a new value (here, 2) to apply foo to.

Functions can have arguments. Again, function arguments are not value arguments but thunk arguments. Calling a function with arguments looks like this: bar(some_expression; another_expression). Pay close attention to the use of ; for separating argument expressions, and do not confuse it with , — , is an operator that joins the outputs of the expressions on its left and right, while ; is only a syntactic separator that separates function arguments and terminates function bodies.

Semi-colons are required for terminating import, include, and def function bodies, as well as for separating expression arguments to functions of more than one expression argument.

Whitespace is not significant.

Data Types

jq supports only JSON’s data types: null, boolean (true and false), strings, numbers, arrays, and objects. Arrays are zero-based.

There is no way to declare any new data types, but objects and arrays can be used to represent complex data types.

jq is a dynamically-typed language.

Array and Object Accessors and Iterators

The expression expr[] outputs all the values in the array or object output by expr. E.g., [0, 3][] outputs 0, then 3. .[] outputs the values in ., so [range(3)] | .[] outputs 0, 1, and 2.

The expression expr[N] outputs the Nth element of the array output by expr. Thus [range(10)][2] outputs 2, and [range(10)][-1] outputs 9. Arrays are zero based, with negative indices referring to values from the right end.

The expression expr.ident outputs the value of the key named "ident" in the object output by expr. So {a:0}.a outputs 0.

The expression expr["some key string"] outputs the value of the key named "some key string" in the object output by expr.

The expression expr."some key string" outputs the value of the key named "some key string" in the object output by expr.

These things can chain. Thus .a["b"].c[].d outputs the value of the key named "d" in the objects output by .a.b.c[], which are all the values in the array at .a.b.c, which in turn is the value of the key named "c" in .a.b, and so on.

Lexical Symbol Bindings: Function Definitions and Data Symbol Bindings

In jq there are two types of symbols: function symbols, and value symbols.

Function symbols are any ident-like symbols, while value symbols are any ident-like symbols prefixed with a $.

Ident-like means: starts with a letter or underscore and consists only of letters, digits, and underscores. foo+1 parses as foo + 1.

Thus $foo is a symbol that evaluates to a value, while foo is a symbol that evaluates to a function (or closure/thunk). Though even a data symbol is an expression, and thus a thunk — one that ignores its input and always outputs the value bound to that data symbol. There is no relation between data and function symbols of the same name.

A symbol (in any context other than where it gets defined) always effectively applies the function named. $foo is a function that ignores its input and produces the value that $foo is bound to. foo is some function that gets applied to its input. foo(expr) is a function that gets applied to its input — what it does with expr is up to the foo/1 function’s body.

Functions are defined with def IDENT: BODY; or def IDENT(arg0; arg1; ..; argN): BODY;. Any reference to the function’s name in the body is bound to the same, which then allows recursion. The arguments are themselves also functions bound to the expressions passed in at where the defined function is applied. Function definitions can be included just about everywhere (e.g., ... | def foo: ...; ...).

E.g., in the body of a function defined as def cond(c; t; f): if c then t else f end; the function symbols c, t, and f, are bound to the first, second, and third argument expressions, respectively, and the name of the function, cond with ariness 3, is made visible to all jq code that follows its definition.

Note well that foo, foo(expr), foo(expr0; expr1), and so on, are all different functions. The number of arguments passed determines which foo is applied. We can and do refer to the first as foo/0, the next as foo/1, and so on.

Data symbol bindings are introduced with expr as $NAME | .... The | is required. The binding is visible to all expressions to the right of the |.

Lexical bindings shadow earlier bindings of the same names. For example:

def foo:
  def foo:
    def foo: .+1; # Just     .+1
    foo*3;        # Same as (.+1)*3
  foo+5;          # Same as ((.+1)*3)+5)

In this example, the foo in the outermost function body normally would have been bound to the function itself (being named foo), thus causing infinite recursion in this case, but because foo is immediately shadowed by a local function foo, the foo in the body is bound to that local function.

Function symbol bindings are introduced only by defs, defs in modules imported or included, or by jq itself in the case of built-in function symbols. The names of the argument thunks are lexical bindings available to the body of the function — and to the functions defined inside that function.

Recursion is possible because a function’s name is visible to its body:

def fact:
    if . == 0 then 1
    elif . > 0 then .*(.-1|fact)
    else "fact not defined for negative numbers"|error
    end;

A tail-recursive version of fact:

def fact:
  # Helper that keeps state as an array of [$n, $result]:
  def fact:
    if .[0] == 0 then .
    else .[0] as $n |
         (.[1] *= $n) |
         (.[0] -= 1) | fact
    end;
  select(. >= 0) | [., 1] | fact | .[1];

or

def fact:
  if . == 0 then 1
  elif . < 0 then empty
  else reduce (range(.) + 1) as $n (1; . * $n)
  end;

Note the two kinds of scopes:

function scopes are introduced by def, and function symbols are visible (assuming no shadowing) to all expressions in the defs that introduce them
- function symbols are also visible to all subsequent defs at the same level
value scopes are introduced by ... as $name | ... and are visible (assuming no shadowing) to all expressions to the right of the |

Value scopes are also introduced by destructuring forms, which are a generalization of ... as $name | ....

Data Flow

Recall that every expression gets a singular input value, and that expressions can be chained with |. The outputs of each expression are then passed as input to the expression to the right (if any). The jq command-line processor prints the outputs of the right-most expression.

This means that values flow from left to right. Each expression in a pipeline can “transform”/replace its input value with zero, one, or more values. When an expression produces no more values (possibly none at all), the expression on the left is resumed to see if it can produce another value, in which case the expression on the right is applied de novo to the new value.

Generators and Backtracking

Every expression can output zero, one, or more values.

The primitive expression that outputs zero is empty, and it causes backtracking / pruning.

When an expression produces a value, the evaluation state of that expression is “suspended” while the output is processed by applying the expression to the right to that value.

When an expression in a pipeline produces no further values, then control returns to the expression to the left of it in the pipeline.

E.g., in range(5) | if .%2==1 then ., .*2 else empty end, the conditional expression is applied to each output of range(5), but for some such values (even numbers) it will “output” empty, which is to say, nothing, and it backtracks, while for other input values (odd numbers) it will output two numbers then backtrack. Each time the if statement in that example backtracks, the range(5) to its left will resume and output the next value, until it runs out, in which case it will backtack, and being the first expression in the jq program, its backtracking will cause the program to terminate. Note that if this program is invoked via the jq command-line processor (as opposed to the C API for invoking jq programs), then the command-line processor may read another value from stdin and apply the jq program to it all over again.

The array/object value iterator expression, .[], outputs all the values in the array/object.

The comma operator, , outputs the value(s) of the expression on the left, then the values of the expression on the right, but both expressions will be applied to the same input. For example, range(3;6) | (., . * 2) outputs 3, 6, 4 , 8, 5 ,10.

The inputs builtin outputs all the inputs read from the jq command-line processor’s stdin.

The range() builtin outputs a sequence of numbers. E.g., range(5) outputs the numbers 0 through 4, inclusive.

Lazy Evaluation

jq does not have lazy evaluation as such. But because all function arguments are thunks that may or may not get evaluated (depending on what the called function chooses to do), and because function argument thunks can output multiple values, jq effectively has lazy evaluation after all.

Consider the limit/2 builtin function: it outputs the first $n values of its second argument thunk:

$ time jq -cn '[limit(5; range(1000000))]'
[0, 1, 2, 3, 4]

real    0m0.02s
user    0m0.02s
sys     0m0.00s
$

In fact, the limit/2 builtin function really does limit how many values its second argument produces. No matter how many values its second argument wants to produce, once the $nth value is reached, evaluation stops.

Here’s a definition of limit/2:

def limit($n; exp):
  if $n < 0 then exp
  else label $out | foreach exp as $item ($n; .-1; $item, if . <= 0 then break $out else empty end)
  end;

Incidentally, function arguments named $name are just a small amount of syntactic sugar. The following definition of limit/2 is equivalent to the above:

def limit(n; exp):
  n as $n |
  if $n < 0 then exp
  else label $out | foreach exp as $item ($n; .-1; $item, if . <= 0 then break $out else empty end)
  end;

Streaming vs Arrays

Arrays are not lazy in jq, therefore they always have a definite size, and they take up O(N) space.

jq expressions and functions can output zero, one, or more values. A jq expression that outputs one billion values takes up O(1) memory, not O(N). Therefore “streaming”, i.e., generating many values, is cheaper than collecting those values into an array.

Consider the map function, and a variant that streams:

def map(f): [.[] | f];
def map_values(f): .[] | f;

The first, map/1, is the standard “map” function one finds in most functional programming languages. The second, map_values/1 is a streaming version of map/1.

Whenever possible, jq programmers should prefer to stream values.

Reductions

jq has a couple of reduction primitives:

reduce stream_expression as $name (initial_value; update_expression)

and

foreach stream_expression as $name (initial_value; update_expression; extract_expression)

These allow the programmer to apply an update expression successively to its own outputs, but with a lexical binding for each of the stream_expression’s outputs.

E.g., reduce range(5) as $n (0; .+$n) adds the numbers from 0 to four, inclusive. In this example $n in the update expression is bound to each successive input from the stream expression (which here is range(5)), and the expression .+$n is applied to the reduction’s state value, and the output of .+$n becomes the next reduction state value. When the stream expression runs out of inputs, the final reduction state value is output.

The foreach reduction operator can output intermediate state values, and will do so whenever the third expression, the extraction expression (optional and by default equal to .), outputs a value (if it outputs no values, then foreach will update the reduction state with the next input). (Note: it probably would have been best to not introduce a new syntactic construct for this, just add an expression to the existing reduce construct.)

Note that though a reduction like reduce range(5) as $n (0; .+$n) is equivalent to 0 + 0 | . + 1 | . + 2 | . + 3 | . + 4, jq uses much less state to implement the reduction.

Note that while the state update expression is running, jq does not retain any additional references to that expression’s input value. This means that from the second update forward, the reduction state value never has more than one reference. This is critical because when values have just one reference, then “mutation” operations that normally copy-then-write, just mutate in-place. See more about this below.

Path Expressions

A path expression is any expression which when given to path(EXPR), does not yield an error. This is a terrible description. Let us try again.

A path expression is any expression which is composed entirely of:

.
array/object traversal operators (index computation operations, however, need not be path expressions),
- the object/array iterator .[]
- .[KEY_EXPR] in all its variants, but note that KEY_EXPR itself need not be a path expression:
  - .ident
  - ."string key"
  - .["string key"]
  - .[INTEGER]
- .[start_index:end_index] array slice operator
if ... where the then/else branches are path expressions (the condition expression, however, need not be a path expression),
path expressions chained with |,
$binding assignments (e.g., (.+.) as $x | ...),
multiple path expressions joined with ,,
empty, error, and break $label, and last but not least:
function calls where their bodies all consist of path expressions (even recursive functions).

Every kind of expression not listed above is not a path expression and is an error to include in the left-hand side of an assignment, including:

expressions using any of the binary operators +, -, *, /, %,
expressions using the unary operator -,
any reference to $bindings,
any literals,
any other expressions not listed here that are also not listed above as being path expressions.

Not every expression is a path expression. For example, .a.b is a path expression, but .a + .b is not! foo is a path expression if and only if the body of function named foo is a path expression.

The purpose of defining "path expression" is to enable assignment forms. The left-hand side expression of every assignment is always internally passed to path/1.

The path/1 builtin (path(path_expression)) outputs arrays of strings and numbers representing the paths through the input value matched by the given path_expression. path/1 is, essentially, a pattern-matching primitive. Thus null | path(.a[0].b) outputs ["a",0,"b"]. The assignment operators are syntactic sugar that use path(left_hand_side) to compute paths to then call setpath/2 with to set new values at those paths in the input to the assignment.

As we’ll see in Assignments, the path/1 built-in is essential to the construction of assignment operators.

Because in jq arguments to functions are thunks, it is not possible from local syntactic analysis to tell whether an expression must be a path expression — a function’s body might or might not pass a thunk to path/1. One must either inspect the function’s documentation or its body. (It should be possible for the jq compiler to determine if some expression is a path expression, and also to determine if a function argument must be a path expression, thus being able to report path expression errors at compile-time. However, the jq compiler is not that smart at this point.)

Passing a non-path expression to path/1 will yield a run-time error, so it is important to know which expressions must be path expressions. As we’ll see in Assignments, the left-hand side expressions of assignment forms must be path expressions.

Given a datum like {"a":{"b":[{"c":0},{"d":1}]}} we can have path expressions like:

.. => matches all paths in the input
.a.b[0].c => matches the path to the value 0
.a.b[1].d => matches the path to the value 1
.a[][][] => matches all the leaf paths in this input
.a.b|.. => matches all paths below .a.b

and so on.

Examples:

$ printf '%s\n' '{"a":{"b":[{"c":0},{"d":1}]}}' | jq -c 'path(..)'
[]
["a"]
["a","b"]
["a","b",0]
["a","b",0,"c"]
["a","b",1]
["a","b",1,"d"]

Sub-expressions that are not Path Expressions

Here we’ll expose some of jq’s internals for the purpose of listing all of the sorts of sub-expressions of path expressions that are exempted from having to contribute to path-building. The reader can gloss over the internals details if they wish and focus only on the list of exemptions below. (XXX Perhaps we should remove all internals details?)

The jq VM interpreter has four special opcodes for dealing with path expressions:

PATH_BEGIN and PATH_END, which bracket calls to the path expression argument to path/1, and
SUBEXP_BEGIN and SUBEXP_END, which bracket calls to sub-expressions which are not intended to contribute to path building.

For example, conditional expressions in if forms are bracketed with SUBEXP_BEGIN and SUBEXP_END opcodes.

Thus we can look at all the forms where bytecode is generated via gen_subexp() to see what sorts of expressions are exempted from having to contribute to path-building:

evaluation of index expressions such as index_expr in .[index_expr] (see gen_index())
evaluation of array slice start/end expressions such as start_exp and end_exp in .[start_exp:end_exp] (see gen_slice_index())
evaluation of empty object construction, {} (see '{' MkDict '}' case of Term in src/parser.y)
evaluation of object key and value expressions in object construction syntax (see gen_dictpair())
evaluation of conditional expressions (see gen_cond())
evaluation of value expressions in data symbol binding forms (see gen_var_binding()) (i.e., in path(5 as $five | ...), the 5 does not contribute to path building, whereas path(5 | ...) would yield a run-time error)
evaluation of path expressions in destructuring, which is a generalized form of data symbol binding (see gen_array_matcher() and gen_object_matcher())
evaluation of argument expressions in calls to C-coded built-in jq functions (see expand_call_arglist())

We have had bugs in the past relating to incorrect or missing uses of gen_subexp(), and bugs related to insufficient or excessive run-time sanity checking of path-building. See path_intact() and path_append() in src/execute.c.

Note too that path-building context can nest. That is, one can have path expressions with path expressions inside them. This is done by making path building context part of expression evaluation stack frames (jq has a stack, naturally). For example, foo = 1 where foo/0 has a body that itself uses path/1.

Assignments

jq has assignment operators. But jq values are immutable. So how can jq possibly have assignments?!

Well, assignments in jq desugar into reductions over the paths matched by the path expressions on the left-hand side (LHS) modifying the values at those paths (in the input value) according to the right-hand side (RHS) expression. Modifications are copy-on-write modifications (and, when there is just one reference to a value, the modifications are in-place as an optimization).

The use of path expressions can make jq assignments resemble Lisp generalized variables (setf macros), or Icon place references. For example, here we see a function foo functioning a lot like a Lisp generalized variable (Lisp setf macros):

$ jq -cn 'def foo: .a.b; {a:{b:{c:0}}}|(foo.c += 1)'
{"a":{"b":{"c":1}}}

Note that foo here is a function whose body is a path expression, and that one would normally use such a function to extract sub-values of ., but here jq is able to let assignments work with this function foo as if foo itself were a path expression! This is what is termed "generalized variables": the ability to assign values through arbitrarily complex code (including function calls) that is otherwise only meant for reading.

The += assignment operator desugars to lhs |= . + rhs, and |= desugars into _modify(lhs; rhs), and _modify is defined as (simplified):

def _modify(paths; update):
    reduce path(paths) as $p (.; setpath($p; getpath($p) | update));

Note that the lhs in assignments ultimately gets passed to path/1, thus making the LHS of assignments… path expressions!

What does _modify/2 do? It:

produces all the paths in the input value as arrays of path component numbers and/or strings (path(paths))
reduces these with the original input value as the initial reduction state
for each path it gets the value in the input at that path (getpath($p))
evaluates update on that value (getpath($p) | update)
and finally “mutates” the reduction state value (.) by setting the new value at the same path (setpath($p; ...))

It’s important to note that values are immutable, which means that all mutation operations return a new copy of their input modified according to the desired mutation. Thus setpath(...; ...) doesn’t modify its input, but it produces a new value as its output that is a copy of the input modified according to setpath()’s arguments.

It’s also important to note that whenever there is a single reference to a value, internally jq will in fact mutate it rather than copy it, and this is obviously correct and performant.

All the assignment operators except = work this way. Those that combine operators like +, -, and so on, with assignment, desugar into _modify(lhs; . OPERATOR rhs), while |= desugars into _modify(lhs; rhs).

The = operator passes the same value as input to the RHS as the input to the lhs, and desugars into _assign(paths; value). _assign() is defined as:

def _assign(paths; value):
    value as $v | reduce path(paths) as $p (.; setpath($p; $v));

Note that _assign() applies value (the RHS) to its input once at the beginning, creates a lexical binding for that value ($v), and then sets all the paths to that value $v. Thus .[] = range(5) will produce five outputs, each with all the value slots in the . array or object set to 0, then all set to 1, and so on. This can be surprising.

In modify-assignments (|=, +=, etc.), it makes no sense to have more than one output in the value update expression. The actual _modify looks like this:

def _modify(paths; update):
    reduce path(paths) as $p (
        .;
        label $out | (setpath($p; getpath($p) | update) | ., break $out),
                      delpaths([$p]));

which means that when the value update expression outputs more than one value, only the first is used, and when it outputs no values, then the path is deleted. I.e., .a |= select(.%2 == 1) + 1 deletes .a from . if the value at .a is an even number, else it adds one to it:

$ jq -cn '{a:0,b:true}|.a |= select(.%2==1) + 1'
{"b":true}
$ jq -cn '{a:1,b:true}|.a |= select(.%2==1) + 1'
{"a":2,"b":true}

while .a |= range(5) sets .a to 0:

$ jq -cn '{a:1,b:true}|.a |= range(5)'
{"a":0,"b":true}

Built-in Functions

There are three types of built-in functions:

jq-coded functions

These are functions defined in src/builtin.jq, and they are compiled as any user-defined functions.
bytecoded functions

These are functions defined in src/builtin.c, and they consist of hand-crafted block representations of jq programs. (A block is an AST-ish output of the jq program parser, which straightforwardly gets compiled to bytecode.)

For example, the empty built-in function has a one-opcode body, and that opcode is BACKTRACK.

The full list of bytecoded built-in functions is very short, at this time being just:
- empty/0
- not/0
- path/1
- range/1
range/2 and range/3 are jq-coded, not bytecoded, and are made possible by tail recursion optimization.
C-coded jq functions

These functions are defined in src/builtin.c. These functions do not actually accept thunks as arguments, only values, therefore the jq compiler wraps invocations of C-coded functions with a bytecoded wrapper that applies any argument thunks to ., roughly like so: def _jq_call_c_coded_foo(a; b): a as $a | b as $b | _call_c_coded_foo($a; $b);.

C-coded functions have C prototypes of this form jv name(jv input) for zero-expression-argument functions, jv name(jv input, jv a) for one-expression-argument functions, jv name(jv input, jv a, jv b) for two-expression-argument functions, and so on up to six arguments. jq-coded functions have no such limit on the number of expression arguments they accept, but they are limited to however many arguments they can address given that compiler jq-coded function bodies are limited to 2^16 opcodes per function body.

With the exception of if-then-else constructs, .[], and a few other such constructs, everything in jq involves applying functions.

Special Forms

[ expr ] is a special form that collects the outputs of expr into an array. It desugars into something like reduce expr as $value ([]; setpath(length; $value). The object constructor, { ... } is similar.

If-then-else constructs are a special form.

There are a number of others, and these are all defined in src/parser.y, and are described in the manual.

A partial list of special forms follows:

import "name" as prefix; – imports the module name "name" and makes its symbols available as prefix::name
include "name"; – imports the module named "name" and makes its symbols available as if the module had been included verbatim
. – the current input value
literal values, i.e., numbers, "strings", true, false, and null
"this \(expr) interpolates the outputs of expr into this string" – string interpolation
binary infix operators
- comparison operators: ==, !=, <, >, <=, >=
- arithmetic operators: +, -, *, /, %
unary prefix negation operator -
[ expr ] – collect expr’s outputs into an array
object construction syntax (see manual)
term[index_expr] – output the value at index_expr in expr
term . ident – same as term["ident"]
term . "name" – same as term["name"]
.. – produce all the values in . in pre-order order recursively
term[start_expr : end_expr] – array slice operator
expr ? – suppress errors from expr
label $name | ... | break $name – fancy empty that unwinds all of ...
assignment operator: =
modify-assignment operators: |=, +=, -=, *=, /=, %=
logical operators: not, and, or
__loc__ – evaluates to the {file: FILENAME, line: LINENO} where __loc__ occurs
$ident – value binding’s value
ident – applies function ident to .
ident(expr) – applies ident called with expr to .
ident(expr0; expr1) – applies…
comma operator , – outputs the values of the expression to the left, then those of the expression to the right, both expressions applied to the same input value
if cond_expr0 then true_expr0 elif cond_expr1 then true_expr1 ... else false_expr end
try expr catch handler_expr – invokes handler_expr on the error raised by expr, if any
reduction syntax (see above)
function definition (see elsewhere here)
data symbol binding (expr as $name | ...) and destructuring syntax (see manual)
- expr as $name | ...
- expr as [$name, $other_name] | ...
- expr as {$name, $other_name} | ...
- expr as {$name:[$thing1, $thing2], $other_name} | ...
@sh, @json, @csv, @tsv, @html, @uri, @base64, @base64d – format / escape string forms

Note that path(expr), though very special, is not a special form. path(expr) is a bytecoded-function whose body invokes its argument expression thunk bracketed with opcodes that cause the paths in . traversed by that expression to be recorded and output one by one.

Operators Priority

Order: from highest precedence to lowest.

Operator	Associativity	Description
`?//`	nonassoc	destructuring alternative operator
`?`	none	postfix operator, coerces errors to `empty`
`-`	none	prefix negation
`*` `/` `%`	left	polymorphic multiply and divide; modulo
`+` `-`	left	polymorphic plus and minus
`==` `!=` `<` `>` `<=` `>=`	nonassoc	equivalence and precedence tests
`and`	left	boolean “and”
`or`	left	boolean “or”
`=` `\|=` `+=` `-=` `*=` `/=` `%=` `//=`	nonassoc	assign; update
`//`	right	coerces `null`, `false` and `empty` to an alternative value
`,`	left	concatenate/alternate two filters
`\|`	right	compose/sequence two filters
`(...)`		scope delimiter and grouping operator

List of Built-in Functions

Use jq -nr 'builtins[]' to list all the built-in functions.

At this time that list includes:

IN/1
IN/2
INDEX/1
INDEX/2
IN_INDEX/2
JOIN/2
JOIN/3
JOIN/4
LOOKUP/2
UNIQUE_INDEX/2
acos/0
acosh/0
add/0
all/0
all/1
all/2
any/0
any/1
any/2
arrays/0
ascii_downcase/0
ascii_upcase/0
asin/0
asinh/0
atan/0
atan2/2
atanh/0
booleans/0
bsearch/1
builtins/0
capture/1
capture/2
cbrt/0
ceil/0
combinations/0
combinations/1
contains/1
copysign/2
cos/0
cosh/0
debug/0
del/1
delpaths/1
drem/2
empty/0
endswith/1
env/0
erf/0
erfc/0
error/0
error/1
exp/0
exp10/0
exp2/0
explode/0
expm1/0
fabs/0
fdim/2
finites/0
first/0
first/1
flatten/0
flatten/1
floor/0
fma/3
fmax/2
fmin/2
fmod/2
format/1
frexp/0
from_entries/0
fromdate/0
fromdateiso8601/0
fromjson/0
fromstream/1
gamma/0
get_jq_origin/0
get_prog_origin/0
get_search_list/0
getpath/1
gmtime/0
group_by/1
gsub/2
gsub/3
halt/0
halt_error/0
halt_error/1
has/1
hypot/2
implode/0
in/1
index/1
indices/1
infinite/0
input/0
input_filename/0
input_line_number/0
inputs/0
inside/1
isempty/1
isfinite/0
isinfinite/0
isnan/0
isnormal/0
iterables/0
j0/0
j1/0
jn/2
join/1
keys/0
keys_unsorted/0
last/0
last/1
ldexp/2
leaf_paths/0
length/0
lgamma/0
lgamma_r/0
limit/2
localtime/0
log/0
log10/0
log1p/0
log2/0
logb/0
ltrimstr/1
map/1
map_values/1
match/1
match/2
max/0
max_by/1
min/0
min_by/1
mktime/0
modf/0
modulemeta/0
nan/0
nearbyint/0
nextafter/2
nexttoward/2
normals/0
not/0
now/0
nth/1
nth/2
nulls/0
numbers/0
objects/0
path/1
paths/0
paths/1
pow/2
pow10/0
range/1
range/2
range/3
recurse/0
recurse/1
recurse/2
recurse_down/0
remainder/2
repeat/1
reverse/0
rindex/1
rint/0
round/0
rtrimstr/1
scalars/0
scalars_or_empty/0
scalb/2
scalbln/2
scan/1
select/1
setpath/2
significand/0
sin/0
sinh/0
sort/0
sort_by/1
split/1
split/2
splits/1
splits/2
sqrt/0
startswith/1
stderr/0
strflocaltime/1
strftime/1
strings/0
strptime/1
sub/2
sub/3
tan/0
tanh/0
test/1
test/2
tgamma/0
to_entries/0
todate/0
todateiso8601/0
tojson/0
tonumber/0
tostream/0
tostring/0
transpose/0
trunc/0
truncate_stream/1
type/0
unique/0
unique_by/1
until/2
utf8bytelength/0
values/0
walk/1
while/2
with_entries/1
y0/0
y1/0
yn/2

Side-Effects

Most jq built-in functions are pure, but over time we have added a few impure functions:

input – read one input from the standard input (or whatever the jq command-line processor wants to read from)
inputs – read as many inputs from the standard input as possible (or whatever the jq command-line processor wants to read from)
input_filename - name of the file whose input is currently being filtered
debug – output its input to the standard error output
halt/0, halt_error/0, and halt_error/1 - stops execution (with an error)
now – current time

Side-Effects Wish-list

We’d like to add:

random numbers
file I/O
external command I/O
SQLite3 access
...

Keywords

The jq language has relatively few keywords. These cannot be used for function or data symbols (Note: we could allow keywords in data symbols, but not in function symbols), but they can be used in object construction syntax as keys.

If we update jq to allow keywords as data symbols, we will also allow keywords in destructuring syntax.

Keywords:

$__loc__
and
as
break
catch
def
elif
else
end
foreach
if
import
include
label
module
or
reduce
then
try

Home
FAQ
jq Language Description
Cookbook
Modules
Parsing Expression Grammars
Docs for Oniguruma Regular Expressions (RE.txt)
Advanced Topics
Guide for Contributors
How To
C API
jq Internals
Tips
Development

Provide feedback

Saved searches

Use saved searches to filter your results more quickly