feat: data source from JSON array data #2306

Lilit0x · 2023-12-26T08:59:30Z

Closes: #2218

How to Test

SELECT * FROM read_arjson('path/to/file/or/urls', max_size, <creds>)

tychoish

I'm super excited about this!

I wonder if implementing the object store's file type ends up having you write more boilerplate than is needed.

I had great luck recently using the StreamingTable infrastructure in the bson functionality, though I don't know if it makes a lot of sense (also your code supports gzipped inputs and the BSON code does not.)

crates/sqlbuiltins/src/functions/table/arr_json/exec.rs

…r_json

Lilit0x · 2023-12-27T13:08:41Z

I wonder if implementing the object store's file type ends up having you write more boilerplate than is needed.

You mean the FileFormat? Yes, it did. Especially in the create_physical_plan function because of the execution. I tried to look for a way around that, but I couldn't since the default JsonFormat is new line delimited which gives more credibility to @universalmind303's suggestion to making the change upstream. For now though, I think it is okay like this, albeit, I'll still have to write more for the create_writer_physical_plan.
The other option I had was to create a different type similar to FileType and then implement all the traits that datafusion's FileType did, but that would have been extremely longer and probably unnecessary.

I had great luck recently using the StreamingTable infrastructure in the bson functionality, though I don't know if it makes a lot of sense (also your code supports gzipped inputs and the BSON code does not.)

Hmm, I looked through it, I can't seem to see where and how it will fit this use case. Maybe I am missing something.

CLAassistant · 2024-01-02T22:53:19Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ Lilit0x
❌ Ubuntu

Ubuntu seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

…r_json

tychoish · 2024-01-30T23:37:20Z

I think given that df419aa has landed, we should let this one drop.

I think there's a lot of merit to this approach, and I'm super thankful that you did this work for us to learn from, and sorry that we didn't use the code directly.

Lilit0x added 2 commits December 26, 2023 00:41

chore: rough implementation of schema generation

9540855

chore: execution implementation for primitive values

11e6d4d

tychoish reviewed Dec 26, 2023

View reviewed changes

universalmind303 self-requested a review December 27, 2023 00:39

universalmind303 reviewed Dec 27, 2023

View reviewed changes

crates/sqlbuiltins/src/functions/table/arr_json/exec.rs Outdated Show resolved Hide resolved

chore: Merge branch 'main' of github.com:Lilit0x/glaredb into feat/ar…

1b8e721

…r_json

Lilit0x added 6 commits December 28, 2023 00:55

chore: wip for nested lists. still facing length errors

94b3e0b

chore: used generic builder for nested lists

7b66db5

fix: working implementation for nested lists

6742b63

chore: implementation for handling structs

ffa7542

chore: merged main into branch

23ecb24

chore: switched to sim-json for reading data from file

ac708c6

universalmind303 mentioned this pull request Dec 29, 2023

data source from JSON array data #2218

Closed

3 tasks

chore: Merge branch 'main' of github.com:Lilit0x/glaredb into HEAD

cb6a28c

Ubuntu and others added 2 commits January 3, 2024 00:11

chore: Merge branch 'main' of github.com:Lilit0x/glaredb into feat/ar…

8db63f8

…r_json

chore: added nested lists and structs. adapted arrow2 desrialize

5663c0c

tychoish mentioned this pull request Jan 25, 2024

feat: streaming json document/array data #2494

Merged

2 tasks

tychoish closed this Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: data source from JSON array data #2306

feat: data source from JSON array data #2306

Lilit0x commented Dec 26, 2023

tychoish left a comment

Lilit0x commented Dec 27, 2023 •

edited

Loading

CLAassistant commented Jan 2, 2024

tychoish commented Jan 30, 2024

feat: data source from JSON array data #2306

feat: data source from JSON array data #2306

Conversation

Lilit0x commented Dec 26, 2023

How to Test

tychoish left a comment

Choose a reason for hiding this comment

Lilit0x commented Dec 27, 2023 • edited Loading

CLAassistant commented Jan 2, 2024

tychoish commented Jan 30, 2024

Lilit0x commented Dec 27, 2023 •

edited

Loading