You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unlike when using the https connection variant, when making a S3 connection against a CSV served by an S3 backend powered by minio, the first row (column headers) is not correctly parsed. Instead the headers appear as row 1 of the data.
Create a S3 connection to a CSV file on a minio server. This cannot be made using the web UI or the rill CLI, so edit the sources/my_source.yaml file to add the "endpoint" setting which is required for non-AWS S3 connections:
@mskyttner This looks like the header detection limitations in csv parsing. If all columns values in csv are text, duckdb is not able to identify if the first row is a header or not. You can override this behaviour by setting following in your my_source.yaml file :
Thanks for the tip. Specifying the setting explicitly for the S3 connection takes care of the issue, in the sense that the header row does not appear as row 1 in the table.
It looks like my issue is with the duckdb read_csv_auto type detection, which treats all my columns as varchar, which differs from readr which picks up a couple of date columns and numerical columns for the same CSV.
I think I will switch to parquet instead to workaround the issue with type detection for the CSVs I use, while waiting for future improvements in that area.
Unlike when using the https connection variant, when making a S3 connection against a CSV served by an S3 backend powered by minio, the first row (column headers) is not correctly parsed. Instead the headers appear as row 1 of the data.
Also ensure environment variables are set, at least these:
Expected behavior
For column headers to appear "as usual", like when using http(s) to access the file.
Screenshots
Desktop (please complete the following information):
Additional context
Using the minio client ("mc", equivalent of aws cli) and viewing the first row of the file returns the header columns:
Possibly related issue
#1967
The text was updated successfully, but these errors were encountered: