-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Advanced Topics
- Scope of Symbols
- Closures
- Local functions
- Generators
- Streams
- Recursion and Tail Recursion Optimization
jq is scoped strictly lexically. Symbols are only visible to entities in their implementation or following them. In this example:
def foo(x): x * 2;
def bar(y): foo(y);
bar(.) # main
foo
is visible to bar
and the main program, bar
is visible to the main program, x
is visible only within the implementation of foo
, and y
is visible only in the implementation of bar
.
This is true of all symbols, including $variable
s. It is even true of function's names within their own bodies (otherwise jq wouldn't support recursion). E.g., def rec: .*2 | rec;
is recursive, and it can recurse only because rec
binds to itself with within rec
's body.
The only exception is that functions defined with def
are self-bound, to enable recursion. That is, def foo: ...;
has foo
bound to itself in its body so that it may recurse by calling itself.
Variables defined with the --arg
and --argfile
command-line arguments are global, visible to all functions and the main program as long as they are not shadowed.
There are no "dynamic" variables. There is no assignment to variables, but variable and function declarations shadow preceding ones that have the same name.
In LISP terms, jq is a LISP2 because it has separate namespaces for functions and values, but there is no need for FUNCALL
/APPLY
because there are no function values.
All functions in jq are closures. Top-level functions capture their lexical namespace, which consists of prevously defined functions only. Local functions capture their lexical namespace, which consists of all "visible" bindings (e.g, $foo
) and all "visible" function names.
All function arguments in jq are actually functions whose bodies are the actual argument expressions. I.e., function arguments are closures; referring to a function argument calls the corresponding closure.
There are no closure values: closures can only be passed to functions.
Function arguments are not values.
Closures capture the full power of jq expressions, and may generate multiple results. For example:
def foo(x): .a as $a | x.b * 2;
{"a":{"a":3, "b":4}} | .a as $a | foo($a)
This should output 8.
Here x
is a closure, and in the one invocation of foo(x)
the body of x
is $a
, which is the $a
that was visible at the call site of foo
. The $a
binding made within foo
is not visible in the body of the argument closure x
.
jq supports local functions -- functions defined within functions and scoped to them.
Local functions are often appropriate when a function requires a private helper function.
Currently only functions with no arguments are amenable to tail-call optimization (TCO), which makes local functions useful for taking advantage of TCO. See recursion examples below.
Here is an example from jq's builtins:
def range(init; upto; by):
def _range:
if (by > 0 and . < upto) or (by < 0 and . > upto) then ., ((.+by)|_range)
else .
end;
if by == 0 then init
else init|_range
end
| select((by > 0 and . < upto) or (by < 0 and . > upto)) ;
The local helper _range
is needed to take advantage of TCO.
- The local function definition(s) must be placed ahead of the body of the function (otherwise they wouldn't be visible to the function's body, given jq's strict lexical scoping).
- Local functions may themselves contain local functions.
- The function arguments of a function are visible to its local functions.
- Shadowing of symbols is permitted.
All jq expressions consume an input value and may produce zero, one, or more output values. empty
produces no values, for example.
Expressions such as range(0;10) generate multiple outputs for each input. Each output is passed along to the next pipeline stage as its input, or if the generating expression is the last expression, then it is printed on the standard output stream.
When an expression in a pipeline completes, it backtracks to the previous expression in the pipeline.
Primitive generator expressions:
- The comma operator:
1,2
produces two values,1
then2
. - The
.[]
operator produces all the values in.
(which can be an array or object). - The
range
builtin. - The
path
builtin.
Some noteworthy points about jq generators:
- Generators generate all their values to completion. Generally speaking, generators can only be interrupted by throwing an error with
error
or bybreak
ing. But see below. - If f is a jq function and if g is a generator expression, then the expression f(g) is evaluated without first generating all the values of g.
- It's very easy to get cartesian product behavior if one isn't careful.
Consider:
range(0; 2)
0
1
{"a": 1} | has("a")
true
{"a": 1} | has("a", "a")
true
true
In the second invocation of has
the argument is a closure with "a", "a"
as its body, which produces "a" twice, so has()
outputs twice as to whether the given string is a key in the input value.
Expressions are evaluated lazily in jq, but in jq 1.4 and earlier, all outputs of an expression must be generated (even if they are subsequently filtered out).
In jq1.5rc1+ there are builtins and new syntax (foreach
, break
) that allow one to interrupt generators. For example, the limit/2
builtin outputs only the first N outputs of a given expression:
limit(2; range(0;10))
0
1
limit/2 achieves this power courtesy of the special construct foreach
. The range(0;10)
expression outputs only twice before its evaluation is terminated. That is, jq expressions are evaluated lazily.
In jq 1.5rc1+, there are other constructs useful for harnessing generators, notably foreach
and while
.
jq and jq filters are designed to process streams of JSON entities. A stream, however, is itself not a JSON entity, nor a jq value type.
If the input to a jq filter is a stream of entities, then each is presented to the filter separately. What a filter does with each item will depend on the filter. For example, empty
never produces any output, whereas range(m,n) produces a stream of n-m integers for each input item.
The "," operator simply concatenates streams. For example, the expression
[ range(0;10), range (10; 20) ]
produces the same array as to [ range(0;20) ].
Suppose f/1 is a jq filter (i.e. a filter with arity 1), and that S is a stream. If (null | f(S))
emits the same stream as S | f(.)
, then we say that "f is regular at its first argument". Since regular functions have this regularity property, they have a certain predictability, and most jq functions are regular, but in special cases, the irregular behavior is precisely what is intended.
In jq1.4 and earlier one can define such functions like this:
def f(g): g as $g | ...; # only referring to $g in ...
In jq1.5rc1 a "regular" function can be defined with convenient new syntax:
def f($g): ...; # only referring to $g in ...
Examples:
-
if S then T else U end
is regular at every position. - Short-circuit semantics:
and
is regular in the first input but irregular in the second.- If S is a stream with n items, then
S | (false and .)
is a stream of n values, but (null | (false and S)) simply yields false.
- If S is a stream with n items, then
The previous example shows how a stream can be converted to an array. Arrays can be converted into streams using the postfix []
operator, or | .[]
, e.g. [1,2][]
yields 1
, then 2
.
Some filters have a kind of "Cartesian product" behavior: for example, suppose f has this definition:
def f(x;y): [., x, y];
If S, Sx and Sy are streams of respectively Sn items, Sxn and Syn items, then the expression: S | f(Sx;Sy)
will produce a stream of Sn * Sxn * Syn arrays.
This Cartesian product behavior is, in a sense, the norm, so there can be surprises. For example, the expression "\( (true, false) ) and \( (true, false) )"
produces 4 strings, whereas the expression (true, false) and (true, false)
produces only three boolean values, because 'false and (true, false)' evaluates simply to false
as the generator is never given the chance to start.
In general, the use of recursion is limited by stack depth, and deep recursion is slow. But in jq 1.5rc1+ an important class of tail-recursive functions are automatically optimized to avoid this. In particular, arity-0 tail-recursive functions are automatically optimized.
It's often possible to rewrite a recursive function of any arity into a tail-recursive 0-arity function using a local 0-arity function.
Consider, for example, the standard recursive definition of the factorial function:
def fact(n):
if n <= 1 then n
else n * fact(n-1)
end;
This is nearly tail-recursive, but not quite. To make it tail-recursive, one can introduce an accumulator. To do this and to avoid the recursion-depth penalty, we will introduce a 0-arity tail-recursive local function:
def fact(n):
def _fact:
# Input: [accumulator, counter]
if .[1] <= 1 then .
else [.[0] * .[1], .[1] - 1]| _fact
end;
# Extract the accumulated value from the output of _fact:
[1, n] | _fact | .[0] ;
The trick is to use an local function, and to set things up so that all the information it needs at each step is available either from its inputs or from lexical environment captured by the local 0-arity function.
As a footnote, it should be emphasized that because jq has generators, an efficient way to compute factorials is to use reduce
:
def fact: reduce range(1; .+1) as $i (1; . * $i);
- Home
- FAQ
- jq Language Description
- Cookbook
- Modules
- Parsing Expression Grammars
- Docs for Oniguruma Regular Expressions (RE.txt)
- Advanced Topics
- Guide for Contributors
- How To
- C API
- jq Internals
- Tips
- Development