Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: various issues with read_json #2708

Closed
talagluck opened this issue Feb 27, 2024 · 2 comments
Closed

bug: various issues with read_json #2708

talagluck opened this issue Feb 27, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@talagluck
Copy link
Contributor

talagluck commented Feb 27, 2024

Description

I'm having various issues testing out read_json.

The below produces ValueError: Found non-unique column index

select * from '../glaredb/glaredb/testdata/csv/userdata1.csv' c
join '../glaredb/glaredb/testdata/json/userdata1.json' j
on c.email = j.email

When I use the explicit read functions below, I get ExecutionException: External error: trailing characters at line 2 column 1

select * from read_csv('../glaredb/glaredb/testdata/csv/userdata1.csv') c
join read_json('../glaredb/glaredb/testdata/json/userdata1.json') j
on c.email = j.email

(The above is also true when I use csv_scan instead of read_csv.

All data used above is the test data in this repo.

This also produced the trailing characters error:

create table my_user_data as select * from read_json('../glaredb/glaredb/testdata/json/userdata1.json')

However, both of these queries work:

create table my_user_data as select * from read_csv('../glaredb/glaredb/testdata/csv/userdata1.csv')
create table my_user_data as select * '../glaredb/glaredb/testdata/json/userdata1.json'

This WHERE clause also seems to lead to that trailing characters issue.

select * from read_json('../glaredb/glaredb/testdata/json/userdata1.json') 
WHERE comments is not null
@talagluck talagluck added the bug Something isn't working label Feb 27, 2024
@talagluck talagluck changed the title bug: unable to join the results of read_csv and read_json bug: various issues with read_json Feb 27, 2024
@tychoish
Copy link
Contributor

So we should probably do some clarification and I'm working (actually today) on making this support more flexible, but read_json (not read_ndjson) only reads the first object from a data source (often an array but could be a one-row-single document), and ndjson reads line separated json.

While the ndjson support leads to some optimization, the read_json function makes it possible to read some data-dumps and perhaps data off of public APIs.

@tychoish tychoish self-assigned this Feb 27, 2024
@tychoish
Copy link
Contributor

tychoish commented Mar 6, 2024

I believe that #2729 should address most of the concerns of this issue.

The big (remaining) read_json issue that I think we have is reading from non-object-store-compatible URLs. (e.g. with query strings)

@tychoish tychoish closed this as completed Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants