Skip to content

Commit

Permalink
docs: refine
Browse files Browse the repository at this point in the history
  • Loading branch information
CookiePieWw committed Sep 18, 2024
1 parent 104d5cf commit 93181a2
Showing 1 changed file with 56 additions and 43 deletions.
99 changes: 56 additions & 43 deletions docs/rfcs/2024-08-06-json-datatype.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,48 +60,59 @@ Are both valid.
The dataflow of the insertion process is as follows:
```
Insert JSON strings directly through client:
Parse Insert
String(Serialized JSON)┌──────────┐Arrow Binary(JSONB)┌──────┐Arrow Binary(JSONB)
Client ---------------------->│ Server │------------------>│ Mito │------------------> Storage
└──────────┘ └──────┘
(Server identifies JSON type and performs auto-conversion)
Encode Insert
JSON Strings ┌──────────┐JSONB ┌──────────────┐JSONB
Client ------------>│ Server │----->│ Query Engine │-----> Storage
└──────────┘ └──────────────┘
Insert JSON strings through parse_json function:
(Conversion is performed by function inside Query Engine)
Encode & Insert
JSON Strings ┌──────────┐JSON Strings ┌──────────────┐JSONB
Client ------------>│ Server │------------>│ Query Engine │-----> Storage
└──────────┘ └──────────────┘
Parse Insert
String(Serialized JSON)┌──────────┐String(Serialized JSON)┌─────┐Arrow Binary(JSONB)┌──────┐Arrow Binary(JSONB)
Client ---------------------->│ Server │---------------------->│ UDF │------------------>│ Mito │------------------> Storage
└──────────┘ └─────┘ └──────┘
(Conversion is performed by UDF inside Query Engine)
```

However, insertions through prepared statements in MySQL clients will not trigger the auto-conversion since the prepared plan of datafusion cannot identify JSON type from binary type. The server will directly insert the input string into the JSON column as string bytes instead of converting it to JSONB. This may cause problems when the string is not a valid JSON string.

```
For insertions through prepared statements in MySQL clients:
Prepare stmt ┌──────────┐
Client ------------>│ Server │ -----> Cached Plan ───────┐
└──────────┘ │
(Cached plan erased type info of JSON │
and treat it as binary) │
┌─────────────────────────────────┘
Execute stmt ┌──────────┐
Client ------------>│ Server │ (Cannot perform auto-conversion here)
└──────────┘
```
Servers identify JSON column through column schema and perform auto-conversions. But when using prepared statements and binding parameters, the corresponding cached plans in datafusion generated by prepared statements cannot identify JSON columns. Under this circumstance, the servers identify JSON columns through the given parameters and perform auto-conversions.

Thus, following codes may not work as expected:
The following is an example of inserting JSON data through prepared statements:
```Rust
// sqlx first prepare a statement and then execute it.
sqlx::query(create table test (ts TIMESTAMP TIME INDEX, b JSON))
sqlx::query(
"create table test(ts timestamp time index, j json)",
)
.execute(&pool)
.await
.unwrap();

let json = serde_json::json!({
"code": 200,
"success": true,
"payload": {
"features": [
"serde",
"json"
],
"homepage": null
}
});

// Valid, can identify serde_json::Value as JSON type
sqlx::query("insert into test values($1, $2)")
.bind(i)
.bind(json)
.execute(&pool)
.await?;
sqlx::query("insert into demo values(?, ?)")
.bind(0)
.bind(r#"{"name": "jHl2oDDnPc1i2OzlP5Y", "timestamp": "2024-07-25T04:33:11.369386Z", "attributes": { "event_attributes": 48.28667 }}"#)
.await
.unwrap();

// Invalid, cannot identify String as JSON type
sqlx::query("insert into test values($1, $2)")
.bind(i)
.bind(json.to_string())
.execute(&pool)
.await?;
.await
.unwrap();
```
The JSON will be inserted as string bytes instead of JSONB. Also happens when using `PREPARE` and `EXECUTE` in MySQL client. Among these scenarios, we need to use `parse_json` function explicitly to convert the string to JSONB.

## Query

Expand All @@ -120,17 +131,19 @@ Specifically, to perform auto-conversions, we attach a message to JSON data in t
The dataflow of the query process is as follows:
```
Query directly through client:
(Server identifies JSON type and performs auto-conversion based on column metadata)
Decode Scan
JSON Strings ┌──────────┐JSONB ┌──────────────┐JSONB
Client <------------│ Server │<-----│ Query Engine │<----- Storage
└──────────┘ └──────────────┘
Decode Scan
String(Serialized JSON)┌──────────┐Arrow Binary(JSONB)┌──────────────┐Arrow Binary(JSONB)
Client <----------------------│ Server │<------------------│ Query Engine │<----------------- Storage
└──────────┘ └──────────────┘
(Server identifies JSON type and performs auto-conversion based on column metadata)
Query through json_to_string function:
(Conversion is performed by function inside Query Engine)
Scan & Decode
JSON Strings ┌──────────┐JSON Strings ┌──────────────┐JSONB
Client <------------│ Server │<------------│ Query Engine │<----- Storage
└──────────┘ └──────────────┘
Scan & Decode
String(Serialized JSON)┌──────────┐Arrow Binary(JSONB)┌──────────────┐Arrow Binary(JSONB)
Client <----------------------│ Server │<------------------│ Query Engine │<----- Storage
└──────────┘ └──────────────┘
(Conversion is performed by UDF inside Query Engine)
```

However, if a function uses JSON type as its return type, the metadata method mentioned above is not applicable. Thus the functions of JSON type should specify the return type explicitly instead of returning a JSON type, such as `json_get_int` and `json_get_float` which return corresponding data of `INT` and `FLOAT` type respectively.
Expand Down

0 comments on commit 93181a2

Please sign in to comment.