-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Advanced Topics
TOC
- Scope of Symbols
- Closures
- Inner functions
- Generators
- Streams
- Recursion and Tail Recursion Optimization
jq is scoped strictly lexically. Symbols are only visible to entities in their implementation or following them. In this example:
def foo(x): x * 2;
def bar(y): foo(y);
bar(.) # main
foo
is visible to bar
and the main program, bar
is visible to the main program, x
is visible only within the implementation of foo
, and y
is visible only in the implementation of bar
.
This is true of all symbols, including $variable
s. It is even true of function's names within their own bodies (otherwise jq wouldn't support recursion). E.g., def rec: .*2 | rec;
is recursive, and it can recurse only because rec
binds to itself with within rec
's body.
Variables defined with the --arg
and --argfile
command-line arguments are global, visible to all functions and the main program as long as they are not shadowed.
There are no "dynamic" variables. There is no assignment to variables, but variable declarations shadow preceding ones on conflict.
All function arguments in jq are closures. There are no closure values: closures can only be passed to functions.
Closures capture the full power of jq expressions, and may generate multiple results. For example:
def times(x; y): x as $x | y as $y | $x * $y;
times(.[]; .[])
here x
and y
are closures. As called these closures are both .[]
, which can generate many results. More on this below.
jq supports local functions -- functions defined within functions and scoped to them.
Local functions are often appropriate when a function requires a specialized helper function that makes specific assumptions about its input or arguments.
Generally only functions with no arguments are amenable to tail-call optimization (TCO), which makes local functions useful for taking advantage of TCO. See recursion examples below.
####Example#### Here is an example from jq's builtins:
def range(init; upto; by):
def _range:
if (by > 0 and . < upto) or (by < 0 and . > upto) then ., ((.+by)|_range)
else .
end;
if by == 0 then init
else init|_range
end
| select((by > 0 and . < upto) or (by < 0 and . > upto)) ;
- The inner function definition(s) must be placed ahead of the body of the function (otherwise they wouldn't be visible to the function's body, given jq's strict lexical scoping).
- Inner functions may themselves contain inner functions.
- The outer function parameters are visible to the inner functions.
- Shadowing of variables and parameters is permitted.
Expressions such as range(0;10) generate values. These values can, for example, be passed along individually in a pipeline to the next function, or collected into an array using "[ ... ]". When a sequence of generated values is finally emitted, each value is followed by a newline. empty
backtracks, which allows extant generators in an expression to produce potentially more results.
Primitive generator expressions:
- The comma operator:
1,2
produces two values,1
then2
. - The
.[]
operator. - The
range
builtin. - The
path
builtin.
There are at least three noteworthy points to be made about jq generators:
- Unless specifically harnessed (as discussed in the next section), generators generate all their values to completion.
- Functions that have at least one argument and that have been written as though at least one of the arguments will be a single value will act as generators if a generator is specified instead.
- If f is a jq function and if g is a generator expression, then the expression f(g) is evaluated without first generating all the values of g.
The following examples using the builtin has/1
illustrate the first two points, and the third point is the topic of the following section. For brevity, only the expression and its value (or values) are shown:
range(0; 2)
0
1
{"a": 1} | has("a")
true
{"a": 1} | has("a", "a")
true
true
In jq 1.4+ (that is, in sufficiently recent versions AFTER 1.4), generators can be "reined in" or "harnessed". For example, limit(n; g)
will terminate the generator, g, after n values have been generated:
limit(2; range(0,10))
0
1
limit/2 achieves this power courtesy of the special construct named foreach
, but the important point here is that any jq function can be written to harness generators. That is, as mentioned above, jq does not compute f(g) by first generating the full sequence of values generated by g. It is as though g is passed, as an expression, into the body of the function and only probed for a value "on demand".
In jq 1.4+, there are other constructs useful for harnessing generators, notably foreach
and while
.
jq and jq filters are designed to process streams of JSON entities. A stream, however, is itself not a JSON entity.
If the input to a jq filter is a stream of entities, then each is presented to the filter separately. What a filter does with each item will depend on the filter. For example, empty
never produces any output, whereas range(m,n) produces a stream of n-m integers for each input item.
The "," operator simply concatenates streams. For example, the expression
[ range(0;10), range (10; 20) ]
produces the same array as to [ range(0;20) ].
Suppose f/1 is a jq filter (i.e. a filter with arity 1), and that S is a stream. If (null | f(S))
emits the same stream as S | f(.)
, then we say that "f is regular at its first argument". Since regular functions have this regularity property, they have a certain predictability, and most jq functions are regular, but in special cases, the irregular behavior is precisely what is intended.
Examples:
-
if S then T else U end
is regular at every position. - Short-circuit semantics:
and
is regular in the first input but irregular in the second.- If S is a stream with n items, then
S | (false and .)
is a stream of n values, but (null | (false and S)) simply yields false.
- If S is a stream with n items, then
#####Conversion between streams and arrays
The previous example shows how a stream can be converted to an array. Arrays can be converted into streams using the postfix []
operator, e.g. [1,2][]
yields range(1;3).
Some filters have a kind of "Cartesian product" behavior: for example, suppose f has this definition:
def f(x;y): [., x, y];
If S, Sx and Sy are streams of respectively Sn items, Sxn and Syn items, then the expression: S | f(Sx;Sy)
will produce a stream of Sn * Sxn * Syn arrays.
This Cartesian product behavior is, in a sense, the norm, so there can be surprises. For example, the expression "\( (true, false) ) and \( (true, false) )"
produces 4 strings, whereas the expression (true, false) and (true, false)
produces only three boolean values, because 'false and (true, false)' evaluates simply to false
as the generator is never given the chance to start.
In general, the use of recursion comes with a penalty proportional to the depth of recursion, but in jq 1.4+ (that is, in sufficiently recent versions of jq), an important class of tail-recursive functions are automatically optimized to avoid such a penalty. In particular, since around July 1, 2014, arity-0 tail-recursive functions are automatically optimized.
This optimization is often applicable when it is possible to transform a recursive function into a tail recursive function, since the latter can then be rewritten so that the only recursion occurs in a 0-arity tail-recursive inner function.
Consider, for example, the standard recursive definition of the factorial function:
def fact(n):
if n <= 1 then n
else n * fact(n-1)
end;
This is nearly tail-recursive, but not quite. To make it tail-recursive, one can introduce an accumulator. To do this and to avoid the recursion-depth penalty, we will introduce a 0-arity tail-recursive inner function:
def fact(n):
def _fact:
# Input: [accumulator, counter]
if .[1] <= 1 then .
else [.[0] * .[1], .[1] - 1]| _fact
end;
# Extract the accumulated value from the output of _fact:
[1, n] | _fact | .[0] ;
In summary, the trick is to use an inner function, and to set things up so that all the information it needs at each step is available either from its inputs or from the parameters of the outer function.
As a footnote, it should be emphasized that because jq has generators, an efficient way to compute factorials is to use reduce
:
def fact: reduce range(1; .+1) as $i (1; . * $i);
- Home
- FAQ
- jq Language Description
- Cookbook
- Modules
- Parsing Expression Grammars
- Docs for Oniguruma Regular Expressions (RE.txt)
- Advanced Topics
- Guide for Contributors
- How To
- C API
- jq Internals
- Tips
- Development