Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add PPL JSON extended functions support #3028

Open
YANG-DB opened this issue Sep 16, 2024 · 3 comments
Open

[RFC] Add PPL JSON extended functions support #3028

YANG-DB opened this issue Sep 16, 2024 · 3 comments
Labels
enhancement New feature or request PPL Piped processing language

Comments

@YANG-DB
Copy link
Member

YANG-DB commented Sep 16, 2024

Is your feature request related to a problem?
We need to enhance OpenSearch PPL (Piped Processing Language) by adding support for JSON-based operations. This extension will provide users with powerful tools to manipulate and analyze JSON data within PPL queries.

Proposed New Functions

  1. json_object(): Creates a JSON object from existing key/value pairs.
   ... | eval person = json_object("name", "John", "age", 30)
  1. json(): Evaluates whether a value can be parsed as JSON. Returns the value if valid, null otherwise.
   ... | eval result = json('{"name":"John","age":30}')
  1. json_append(): Appends elements to the contents of a valid JSON object.
   ... | eval person = json_object("name", "John"), extended_person = json_append(person, "city", "New York")
  1. json_array(): Creates a JSON array using a list of values.
   ... | eval json_list = json_array("apple", "banana", "cherry")
  1. json_array_to_mv(): Maps the elements of a JSON array to a multivalued field.
   ... | eval json_list = json_array("apple", "banana", "cherry"), mv_field = json_array_to_mv(json_list)
  1. json_delete(): Removes one or more keys and their corresponding values from the specified JSON object.
   ... | eval person = json_object("name", "John", "age", 30, "city", "New York"), person_no_age = json_delete(person, "age")
  1. json_extend(): Extends the contents of a valid JSON object with the values of an array.
   ... | eval person = json_object("name", "John", "age", 30), extended_person = json_extend(person, json_array("city", "New York"))
  1. json_extract(): Returns either a JSON array or a native type value from a field and zero or more paths.
   ... | eval data = json('{"person": {"name": "John", "age": 30}}'), name = json_extract(data, "person.name")
  1. json_keys(): Returns the keys from the key-value pairs in a JSON object as a JSON array.
   ... | eval person = json_object("name", "John", "age", 30), keys = json_keys(person)
  1. json_set(): Inserts or overwrites values for a JSON node with provided values and returns an updated JSON object.
   ... | eval person = json_object("name", "John"), updated_person = json_set(person, "age", 30)
  1. json_valid(): Evaluates whether a JSON object uses valid JSON syntax and returns TRUE or FALSE.
   ... | eval is_valid = json_valid('{"name":"John","age":30}')
  1. all(): Iterates over JSON array values and returns true if every value matches the provided predicate.
   ... | eval json_list = json_array(1, 2, 3), all_match = all(json_list, val -> val != 0)
  1. any(): Iterates over JSON array values and returns true if any value matches the provided predicate.
   ... | eval json_list = json_array(1, 2, 3), any_match = any(json_list, val -> val == 0)
  1. filter(): Iterates over JSON array values and performs an operation on each value.
   ... | eval json_list = json_array(1, 2, 3, 4), filtered_list = filter(json_list, val -> val % 2 == 0)
  1. map(): Iterates over JSON array values and performs an operation on each value.
   ... | eval json_list = json_array(1, 2, 3), mapped_list = map(json_list, val -> val * 2)
  1. reduce(): Iterates over a JSON array in a field or a literal array and performs an accumulation operation.
   ... | eval json_list = json_array(1, 2, 3), sum = reduce(json_list, 0, (acc, val) -> acc + val)

Technical Considerations

  • Consider leveraging existing JSON libraries for efficient parsing and manipulation.
  • Ensure that these functions can handle large JSON objects without significant performance degradation.
@YANG-DB YANG-DB added enhancement New feature or request untriaged PPL Piped processing language labels Sep 16, 2024
@YANG-DB YANG-DB moved this to Todo in PPL Commands Sep 16, 2024
@YANG-DB YANG-DB removed the untriaged label Sep 16, 2024
@YANG-DB YANG-DB moved this from Todo to Design in PPL Commands Dec 4, 2024
@acarbonetto
Copy link
Collaborator

acarbonetto commented Dec 17, 2024

Proposal Part I: Implement JSON validation and JSON objects:

Proposal Part II: Implement JSON read-only functions:

Proposal Part III: Add JSON_ARRAY support:

Proposal Part IV: Add JSON_ARRAY function support:

  • json_all_match
  • json_any_match
  • json_filter
  • json_array_map
  • json_array_reduce

Proposed Part I doc:

### `JSON_VALID`

**Description**

`json_valid(jsonStr)` Evaluates whether a JSON string uses valid JSON syntax and returns TRUE or FALSE.

**Argument type:** STRING

**Return type:** BOOLEAN

Example:

    os> source=people | eval `valid_json` = json_valid('[1,2,3,4]'), `invalid_json` = json_valid('{"invalid": "json"') | fields `valid_json`, `invalid_json`
    fetched rows / total rows = 1/1
    +--------------+----------------+
    | valid_json   | invalid_json   |
    +--------------+----------------+
    | True         | False          |
    +--------------+----------------+

    os> source=accounts | where json_valid('[1,2,3,4]') and isnull(email) | fields account_number, email
    fetched rows / total rows = 1/1
    +------------------+---------+
    | account_number   | email   |
    |------------------+---------|
    | 13               | null    |
    +------------------+---------+

### `JSON`

**Description**

`json(value)` Evaluates whether a string can be parsed and casted to a JSON object. Returns the JSON object if valid, null otherwise.

**Argument type:** STRING

**Return type:** JSON OBJECT/NULL

Example:

    os> source=people | eval `valid_json()` = json('[1,2,3,{"f1":1,"f2":[5,6]},4]') | fields valid_json
    fetched rows / total rows = 1/1
    +---------------------------------+
    | valid_json                      |
    +---------------------------------+
    | [1,2,3,{"f1":1,"f2":[5,6]},4]   |
    +---------------------------------+

    os> source=people | eval `invalid_json()` = json('{"invalid": "json"') | fields invalid_json
    fetched rows / total rows = 1/1
    +----------------+
    | invalid_json   |
    +----------------+
    | null           |
    +----------------+

### `JSON_OBJECT`

**Description**

`json_object(<key>, <value>[, <key>, <value>]...)` returns a JSON object from members of key-value pairs.

**Argument type:**
- A \<key\> must be STRING.
- A \<value\> can be a scalar, json object, or array type.

**Return type:** STRING (json-encoded document)

Note:  JSON_DOCUMENT may be more appropriate for this return type, but then all results would have to be enclosed by to_json_string()

Example:

    os> source=people | eval result = json_object('key', 123.45) | fields result
    fetched rows / total rows = 1/1
    +------------------+
    | result           |
    +------------------+
    | {"key":123.45}   |
    +------------------+

    os> source=people | eval result = json_object('outer', json_object('inner', 123.45)) | fields result
    fetched rows / total rows = 1/1
    +------------------------------+
    | result                       |
    +------------------------------+
    | {"outer":{"inner":123.45}}   |
    +------------------------------+

    os> source=people | eval result = json_object('array_doc', json_array(123.45, "string", true, null)) | fields result
    fetched rows / total rows = 1/1
    +------------------------------+
    | result                       |
    +------------------------------+
    | {"array_doc":[123.45, "string", true, null]}   |
    +------------------------------+

### `TO_JSON_STRING`

**Description**

`to_json_string(doc)` Returns a JSON string representing the given json_document object.

**Argument type:** JSON_OBJECT

**Return type:** STRING

Example:

    os> source=people | eval `json_string` = to_json_string(json_array(1, 2, 0, -1, 1.1, -0.11)) | fields json_string
    fetched rows / total rows = 1/1
    +--------------------------------+
    | json_string                    |
    +--------------------------------+
    | [1.0,2.0,0.0,-1.0,1.1,-0.11]   |
    +--------------------------------+

    os> source=people | eval `json_string` = to_json_string(json_object('key', 123.45)) | fields json_string
    fetched rows / total rows = 1/1
    +-----------------+
    | json_string     |
    +-----------------+
    | {'key', 123.45} |
    +-----------------+

@acarbonetto
Copy link
Collaborator

acarbonetto commented Dec 17, 2024

Proposed Part II Documentation

### `JSON_EXTRACT`

**Description**

`json_extract(jsonStr, path)` Extracts a json object from a json document based on the path specified. Return null if path is not valid.

`json_extract(jsonStr, path [, path]...)` Extracts json objects from a json document based on the paths specified. Returns an array of objects or null if the path is not valid. 

**Argument type:** STRING, STRING

**Return type:** STRING | ARRAY

A STRING expression of a valid JSON object format.

`NULL` is returned in case of an invalid JSON.

Example:

    os> source=people | eval `json_extract('{"a":"b"}', '$.a')` = json_extract('{"a":"b"}', '$a')
    fetched rows / total rows = 1/1
    +----------------------------------+
    | json_extract('{"a":"b"}', 'a')   |
    +----------------------------------+
    | b                                |
    +----------------------------------+

    os> source=people | eval `json_extract('{"a":[{"b":1},{"b":2}]}', '$.a[1].b')` = json_extract('{"a":[{"b":1},{"b":2}]}', '$.a[1].b')
    fetched rows / total rows = 1/1
    +-----------------------------------------------------------+
    | json_extract('{"a":[{"b":1.0},{"b":2.0}]}', '$.a[1].b')   |
    +-----------------------------------------------------------+
    | 2.0                                                       |
    +-----------------------------------------------------------+

    os> source=people | eval `json_extract('{"a":[{"b":1},{"b":2}]}', '$.a[*].b')` = json_extract('{"a":[{"b":1},{"b":2}]}', '$.a[*].b')
    fetched rows / total rows = 1/1
    +-----------------------------------------------------------+
    | json_extract('{"a":[{"b":1.0},{"b":2.0}]}', '$.a[*].b')   |
    +-----------------------------------------------------------+
    | [1.0,2.0]                                                 |
    +-----------------------------------------------------------+

    os> source=people | eval `invalid_json` = json_extract('{"invalid": "json"')
    fetched rows / total rows = 1/1
    +----------------+
    | invalid_json   |
    +----------------+
    | null           |
    +----------------+


### `JSON_KEYS`

**Description**

`json_keys(jsonStr)` Returns an array of all keys of the outermost JSON document.

`json_keys(jsonStr, path)` Returns an array of all keys of the object at path in the JSON document.

**Argument type:** STRING

A STRING expression of a valid JSON document.

**Return type:** ARRAY[STRING]

`NULL` is returned in case of any other valid JSON string, or an empty string, or an invalid JSON.

Example:

    os> source=people | eval `keys` = json_keys('{"f1":"abc","f2":{"f3":"a","f4":"b"}}')
    fetched rows / total rows = 1/1
    +------------+
    | keys       |
    +------------+
    | [f1, f2]   |
    +------------+

    os> source=people | eval `keys` = json_keys('[1,2,3,{"f1":1,"f2":[5,6]},4]')
    fetched rows / total rows = 1/1
    +--------+
    | keys   |
    +--------+
    | null   |
    +--------+

### `JSON_DELETE`

**Description**

`json_delete(jsonStr, path [, path]...)` Deletes json elements from a json document at the specified paths. Return the updated document.

**Arguments type:** JSON_STRING, ARRAY<STRING>

**Return type:** JSON_STRING

A JSON object format.

Example:

    os> source=people | eval deleted = json_delete('{"account_number":1,"balance":39225,"age":32,"gender":"M"}', array('age','gender')) | head 1 | fields deleted 
    fetched rows / total rows = 1/1
    +------------------------------------------+
    | deleted                                  |
    +-----------------------------------------+
    |{"account_number":1,"balance":39225}     |
    +-----------------------------------------+

    os> source=people | eval deleted = json_delete('{"f1":"abc","f2":{"f3":"a","f4":"b"}}', array('f2.f3')) | head 1 | fields deleted
    fetched rows / total rows = 1/1
    +-----------------------------------------------------------+
    | deleted                                                   |
    +-----------------------------------------------------------+
    | {"f1":"abc","f2":{"f4":"b"}}                              |
    +-----------------------------------------------------------+

    os> source=people | eval deleted =  json_delete('{"teacher":"Alice","student":[{"name":"Bob","rank":1},{"name":"Charlie","rank":2}]}',array('teacher', 'student.rank')) | head 1 | fields deleted
    fetched rows / total rows = 1/1
    +--------------------------------------------------+
    | deleted                                          |
    +--------------------------------------------------+
    |{"student":[{"name":"Bob"},{"name":"Charlie"}]}   |
    +--------------------------------------------------+

### `JSON_SET`

**Description**

`json_set(jsonStr, path, value[, path, value]...)` Updates or Adds json elements to a json document at the specified keys. Returns the updated document.

**Arguments type:** 
- The jsonStr is a STRING
- A \<key\> is a 
- A \<value\> can be a scalar, json object, or array type.

**Return type:** JSON_STRING

A JSON object format.

Example:

    os> source=people | eval updated = json_set('{"a":[{"b":1},{"b":2}]}', '$.a[*].b', 3) | head 1 | fields updated 
    fetched rows / total rows = 1/1
    +-------------------------+
    | updated                 |
    +-------------------------+
    | {"a":[{"b":3},{"b":3}]} |
    +-------------------------+

@YANG-DB
Copy link
Member Author

YANG-DB commented Dec 18, 2024

Thanks @acarbonetto this looks great!
Can you please:

  • rename this feature into an RFC
  • add an issue for each json function
  • add the list of issues in the main RFC.

we can start discussing each function syntax and details within its own issue
Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request PPL Piped processing language
Projects
Status: Design
Development

No branches or pull requests

2 participants