Skip to content

Search Keywords

William W. Kimball, Jr., MBA, MSIS edited this page Nov 14, 2022 · 18 revisions
  1. Introduction
  2. Available Keywords
  3. Descendant Searches

Introduction

In addition to Search Expressions, YAML Path also provides some search keywords since version 3.5.0. Like programming language keywords, these are reserved words written in a specific notation so as to avoid interference with other YAML Path expressions and segments.

The general form of a keyword-based search is [KEYWORD(PARAMETERS)] with optional white-space, where:

  • The outermost [] pair is mandatory.
  • KEYWORD -- lower-case, case-sensitive -- is one of the reserved Available Keywords.
  • The following () pair is mandatory, even when no parameters are passed.
  • PARAMETERS is a comma-delimited set of parameters passed to the search keyword's function. In documentation, optional parameters are denoted by using a [] pair around parameters which can be omitted when not needed. Individual parameters can be demarcated using quotation (") or apostrophe (') marks to preserve white-space and all special symbols can be escaped using a back-slash (\).

Some keywords support inversion, where nodes not matching the search are matched instead of those that do. Inversion is denoted by adding a ! prefix before the keyword itself, like [!KEYWORD(PARAMETERS)].

Available Keywords

The available Search Keywords include:

  • distinct([NAME]): [New in version 3.7.0] Match exactly one of every value within collections, discarding duplicates; i.e.: [1, 2, 2, 3] has distinct values, [1, 2, 3]. This cannot be inverted.
  • has_child(NAME): Match nodes having a named, immediate child. This can be inverted.
  • max([NAME]): Match nodes having the highest value in the named child field. More than one node will be returned when they each have the same value. This can be inverted.
  • min([NAME]): Match nodes having the lowest value in the named child field. More than one node will be returned when they each have the same value. This can be inverted.
  • name(): Match only the name of the present key, discarding any child nodes. This cannot be inverted.
  • parent([STEPS]): Step up to the present node's immediate parent, or take multiple STEPS. This cannot be inverted.
  • unique([NAME]): [New in version 3.7.0] Match only values which have no duplicates within collections; i.e.: [1, 2, 2, 3] has unique values, [1, 3]. This can be inverted.

If you'd like to see more search keyword capabilities added to YAML Path, please open a Feature Request Issue with the full description of the capability you seek. Note however that where YAML Path already supports the described capability through other means, such a request may be rejected after discussing this preceding capability unless it is agreed that the other means is onerous.

Descendant Searches

Search Keywords evaluate the data of the present node -- real or virtual Collector results -- or its immediate children; they do not support descendant searches like Search Expressions do. Using Search Keywords, you gain a different way of evaluating descendant node data with the added flexibility of moving up or down the data hierarchy as needed.

Take for example the following data and subsequent query:

products_array:
  - product: doodad
    availability:
      start:
        date: 2020-10-10
        time: 08:00
      stop:
        date: 2020-10-29
        time: 17:00
    dimensions:
      width: 5
      height: 5
      depth: 5
      weight: 10
  - product: doohickey
    availability:
      start:
        date: 2020-08-01
        time: 10:00
      stop:
        date: 2020-09-25
        time: 10:00
    dimensions:
      width: 1
      height: 2
      depth: 3
      weight: 4
  - product: widget
    availability:
      start:
        date: 2020-01-01
        time: 12:00
      stop:
        date: 2020-01-01
        time: 16:00
    dimensions:
      width: 9
      height: 10
      depth: 1
      weight: 4

What if you wanted the name of whichever product has the maximum weight dimension? The answer comes from a single query, (products_array.*.dimensions.weight)[max()][parent(2)].product. Let's take a look at how this query is constructed, one step at a time.

Start by querying the weight of all products:

$ yaml-get --query='products_array.*.dimensions.weight' test_get_nodes.yaml
10
4
4

This reduces the data to only the values we need to compare. Expand the query to collect those weights and use the [max()] Search Keyword to select the maximum among them:

$ yaml-get --query='(products_array.*.dimensions.weight)[max()]' test_get_nodes.yaml
10

That is the maximum value but not the name of the product that has that maximum value. We need data on the parent of the result, which we can get to. Just add the [parent()] Search Keyword to walk up two levels of the result's node ancestry to select its product record:

$ yaml-get --query='(products_array.*.dimensions.weight)[max()][parent(2)]' test_get_nodes.yaml
{"product": "doodad", "availability": {"start": {"date": "2020-10-10", "time": "08:00"}, "stop": {"date": "2020-10-29", "time": "17:00"}}, "dimensions": {"width": 5, "height": 5, "depth": 5, "weight": 10}}

And finally, expand the query one last time to get the product name -- what our original query asks for -- by selecting the product key:

$ yaml-get --query='(products_array.*.dimensions.weight)[max()][parent(2)].product' test_get_nodes.yaml
doodad

The answer to our original question is: doodad. We found this without needing a descendent search because we are able to perform the comparison right at the level it was needed and then walk back up the node hierarchy to get whichever data element(s) we really wanted from the result.

Clone this wiki locally