Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CAST(JSON as ROW(ARRAY)) #9447

Closed

Conversation

mbasmanova
Copy link
Contributor

Summary:
CAST(JSON as ROW(ARRAY()) used to fail with

OUT_OF_ORDER_ITERATION: Objects and arrays can only be iterated when they are first encountered.

According to simdjson documentation, https://github.com/simdjson/simdjson/blob/master/doc/basics.md, it is not allowed to store object values for later processing. These must be consumed or copied before proceeding.

Also, fixed behavior when JSON object contains duplicate keys. Presto throws, but previous implementation used to allow duplicates.

Also, fix the test to actually verify JSON objects with mixed case keys.

Differential Revision: D56013293

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 11, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D56013293

Copy link

netlify bot commented Apr 11, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit efeb0de
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66183050e5e7a800082b3a5d

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Apr 11, 2024
Summary:

CAST(JSON as ROW(ARRAY()) used to fail with

```
OUT_OF_ORDER_ITERATION: Objects and arrays can only be iterated when they are first encountered.
```

According to simdjson documentation, https://github.com/simdjson/simdjson/blob/master/doc/basics.md, it is not allowed to store object values for later processing. These must be consumed or copied before proceeding.

Also, fixed behavior when JSON object contains duplicate keys. Presto throws, but previous implementation used to allow duplicates.

Also, fix the test to actually verify JSON objects with mixed case keys.

Differential Revision: D56013293
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D56013293

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Apr 11, 2024
Summary:

CAST(JSON as ROW(ARRAY()) used to fail with

```
OUT_OF_ORDER_ITERATION: Objects and arrays can only be iterated when they are first encountered.
```

According to simdjson documentation, https://github.com/simdjson/simdjson/blob/master/doc/basics.md, it is not allowed to store object values for later processing. These must be consumed or copied before proceeding.

Also, fixed behavior when JSON object contains duplicate keys. Presto throws, but previous implementation used to allow duplicates.

Also, fix the test to actually verify JSON objects with mixed case keys.

Differential Revision: D56013293
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D56013293

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Apr 11, 2024
Summary:

CAST(JSON as ROW(ARRAY()) used to fail with

```
OUT_OF_ORDER_ITERATION: Objects and arrays can only be iterated when they are first encountered.
```

According to simdjson documentation, https://github.com/simdjson/simdjson/blob/master/doc/basics.md, it is not allowed to store object values for later processing. These must be consumed or copied before proceeding.

Also, fixed behavior when JSON object contains duplicate keys. Presto throws, but previous implementation used to allow duplicates.

Also, fix the test to actually verify JSON objects with mixed case keys.

Differential Revision: D56013293
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Apr 11, 2024
Summary:

CAST(JSON as ROW(ARRAY()) used to fail with

```
OUT_OF_ORDER_ITERATION: Objects and arrays can only be iterated when they are first encountered.
```

According to simdjson documentation, https://github.com/simdjson/simdjson/blob/master/doc/basics.md, it is not allowed to store object values for later processing. These must be consumed or copied before proceeding.

Also, fixed behavior when JSON object contains duplicate keys. Presto throws, but previous implementation used to allow duplicates.

Also, fix the test to actually verify JSON objects with mixed case keys.

Differential Revision: D56013293
Copy link
Contributor

@Yuhta Yuhta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// Mapping from lower-case field names of the target RowType to their
// indices.
folly::F14FastMap<std::string, column_index_t> fieldIndices;
for (auto i = 0; i < rowType.size(); ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: RowType::size is virtual and may not be able to inline here, so better assign it to a variable before loop

fieldIndices[key] = i;
}

std::vector<bool> foundIndices(rowType.size(), false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could initialize fieldIndices value to -1 then we don't need this second memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Updated.

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Apr 11, 2024
Summary:

CAST(JSON as ROW(ARRAY()) used to fail with

```
OUT_OF_ORDER_ITERATION: Objects and arrays can only be iterated when they are first encountered.
```

According to simdjson documentation, https://github.com/simdjson/simdjson/blob/master/doc/basics.md, it is not allowed to store object values for later processing. These must be consumed or copied before proceeding.

Also, fixed behavior when JSON object contains duplicate keys. Presto throws, but previous implementation used to allow duplicates.

Also, fix the test to actually verify JSON objects with mixed case keys.

Reviewed By: Yuhta

Differential Revision: D56013293
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D56013293

@mbasmanova
Copy link
Contributor Author

@Yuhta Jimmy, thank you for review. Updated to address your comments.

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Apr 11, 2024
Summary:

CAST(JSON as ROW(ARRAY()) used to fail with

```
OUT_OF_ORDER_ITERATION: Objects and arrays can only be iterated when they are first encountered.
```

According to simdjson documentation, https://github.com/simdjson/simdjson/blob/master/doc/basics.md, it is not allowed to store object values for later processing. These must be consumed or copied before proceeding.

Also, fixed behavior when JSON object contains duplicate keys. Presto throws, but previous implementation used to allow duplicates.

Also, fix the test to actually verify JSON objects with mixed case keys.

Reviewed By: Yuhta

Differential Revision: D56013293
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D56013293

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Apr 11, 2024
Summary:

CAST(JSON as ROW(ARRAY()) used to fail with

```
OUT_OF_ORDER_ITERATION: Objects and arrays can only be iterated when they are first encountered.
```

According to simdjson documentation, https://github.com/simdjson/simdjson/blob/master/doc/basics.md, it is not allowed to store object values for later processing. These must be consumed or copied before proceeding.

Also, fixed behavior when JSON object contains duplicate keys. Presto throws, but previous implementation used to allow duplicates.

Also, fix the test to actually verify JSON objects with mixed case keys.

Reviewed By: Yuhta

Differential Revision: D56013293
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Apr 11, 2024
Summary:

CAST(JSON as ROW(ARRAY()) used to fail with

```
OUT_OF_ORDER_ITERATION: Objects and arrays can only be iterated when they are first encountered.
```

According to simdjson documentation, https://github.com/simdjson/simdjson/blob/master/doc/basics.md, it is not allowed to store object values for later processing. These must be consumed or copied before proceeding.

Also, fixed behavior when JSON object contains duplicate keys. Presto throws, but previous implementation used to allow duplicates.

Also, fix the test to actually verify JSON objects with mixed case keys.

Reviewed By: Yuhta

Differential Revision: D56013293
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova thanks!

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Apr 11, 2024
Summary:

CAST(JSON as ROW(ARRAY()) used to fail with

```
OUT_OF_ORDER_ITERATION: Objects and arrays can only be iterated when they are first encountered.
```

According to simdjson documentation, https://github.com/simdjson/simdjson/blob/master/doc/basics.md, it is not allowed to store object values for later processing. These must be consumed or copied before proceeding.

Also, fixed behavior when JSON object contains duplicate keys. Presto throws, but previous implementation used to allow duplicates.

Also, fix the test to actually verify JSON objects with mixed case keys.

Reviewed By: xiaoxmeng, Yuhta

Differential Revision: D56013293
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Apr 11, 2024
Summary:

CAST(JSON as ROW(ARRAY()) used to fail with

```
OUT_OF_ORDER_ITERATION: Objects and arrays can only be iterated when they are first encountered.
```

According to simdjson documentation, https://github.com/simdjson/simdjson/blob/master/doc/basics.md, it is not allowed to store object values for later processing. These must be consumed or copied before proceeding.

Also, fixed behavior when JSON object contains duplicate keys. Presto throws, but previous implementation used to allow duplicates.

Also, fix the test to actually verify JSON objects with mixed case keys.

Reviewed By: xiaoxmeng, Yuhta

Differential Revision: D56013293
@kgpai
Copy link
Contributor

kgpai commented Apr 11, 2024

Please rebase to main after #9451 lands.

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Apr 11, 2024
Summary:

CAST(JSON as ROW(ARRAY()) used to fail with

```
OUT_OF_ORDER_ITERATION: Objects and arrays can only be iterated when they are first encountered.
```

According to simdjson documentation, https://github.com/simdjson/simdjson/blob/master/doc/basics.md, it is not allowed to store object values for later processing. These must be consumed or copied before proceeding.

Also, fixed behavior when JSON object contains duplicate keys. Presto throws, but previous implementation used to allow duplicates.

Also, fix the test to actually verify JSON objects with mixed case keys.

Reviewed By: xiaoxmeng, Yuhta

Differential Revision: D56013293
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D56013293

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Apr 11, 2024
Summary:

CAST(JSON as ROW(ARRAY()) used to fail with

```
OUT_OF_ORDER_ITERATION: Objects and arrays can only be iterated when they are first encountered.
```

According to simdjson documentation, https://github.com/simdjson/simdjson/blob/master/doc/basics.md, it is not allowed to store object values for later processing. These must be consumed or copied before proceeding.

Also, fixed behavior when JSON object contains duplicate keys. Presto throws, but previous implementation used to allow duplicates.

Also, fix the test to actually verify JSON objects with mixed case keys.

Reviewed By: xiaoxmeng, Yuhta

Differential Revision: D56013293
Summary:

CAST(JSON as ROW(ARRAY()) used to fail with

```
OUT_OF_ORDER_ITERATION: Objects and arrays can only be iterated when they are first encountered.
```

According to simdjson documentation, https://github.com/simdjson/simdjson/blob/master/doc/basics.md, it is not allowed to store object values for later processing. These must be consumed or copied before proceeding.

Also, fixed behavior when JSON object contains duplicate keys. Presto throws, but previous implementation used to allow duplicates.

Also, fix the test to actually verify JSON objects with mixed case keys.

Reviewed By: xiaoxmeng, Yuhta

Differential Revision: D56013293
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Apr 11, 2024
Summary:

CAST(JSON as ROW(ARRAY()) used to fail with

```
OUT_OF_ORDER_ITERATION: Objects and arrays can only be iterated when they are first encountered.
```

According to simdjson documentation, https://github.com/simdjson/simdjson/blob/master/doc/basics.md, it is not allowed to store object values for later processing. These must be consumed or copied before proceeding.

Also, fixed behavior when JSON object contains duplicate keys. Presto throws, but previous implementation used to allow duplicates.

Also, fix the test to actually verify JSON objects with mixed case keys.

Reviewed By: xiaoxmeng, Yuhta

Differential Revision: D56013293
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D56013293

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 707dbfd.

yanngyoung pushed a commit to yanngyoung/velox that referenced this pull request Apr 12, 2024
Summary:
Pull Request resolved: facebookincubator#9447

CAST(JSON as ROW(ARRAY()) used to fail with

```
OUT_OF_ORDER_ITERATION: Objects and arrays can only be iterated when they are first encountered.
```

According to simdjson documentation, https://github.com/simdjson/simdjson/blob/master/doc/basics.md, it is not allowed to store object values for later processing. These must be consumed or copied before proceeding.

Also, fixed behavior when JSON object contains duplicate keys. Presto throws, but previous implementation used to allow duplicates.

Also, fix the test to actually verify JSON objects with mixed case keys.

Reviewed By: xiaoxmeng, Yuhta

Differential Revision: D56013293

fbshipit-source-id: 28337280e1943bca4d6b46ff6f1f62341656bdda
Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
Summary:
Pull Request resolved: facebookincubator#9447

CAST(JSON as ROW(ARRAY()) used to fail with

```
OUT_OF_ORDER_ITERATION: Objects and arrays can only be iterated when they are first encountered.
```

According to simdjson documentation, https://github.com/simdjson/simdjson/blob/master/doc/basics.md, it is not allowed to store object values for later processing. These must be consumed or copied before proceeding.

Also, fixed behavior when JSON object contains duplicate keys. Presto throws, but previous implementation used to allow duplicates.

Also, fix the test to actually verify JSON objects with mixed case keys.

Reviewed By: xiaoxmeng, Yuhta

Differential Revision: D56013293

fbshipit-source-id: 28337280e1943bca4d6b46ff6f1f62341656bdda
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants