Skip to content
This repository has been archived by the owner on May 17, 2024. It is now read-only.

Add support for DuckDB #176

Closed
extrobe opened this issue Jul 25, 2022 · 4 comments
Closed

Add support for DuckDB #176

extrobe opened this issue Jul 25, 2022 · 4 comments
Labels
enhancement New feature or request new-db-driver Request to add a new database driver

Comments

@extrobe
Copy link

extrobe commented Jul 25, 2022

DuckDB is an in-process database. You typically create it as a session, then discard it once you're done (though not the only way to use it)

https://duckdb.org

It's awesome for a few reasons that apply to data-diff. Namely, you can direct-query raw csv/txt/parquet files as though they were tables. (eg select posting_date, count(*) as r_count from '/Users/me/data.csv' group by posting_date )
We use this ability to load PROD v UAT files from our system to compare output. Being able to pass this across to data-diff would be incredible.

Whilst just being able to reference csv files in data-diff might be another option, doing this via duckDB would allow you to perform some basic transformations on the way; such as renaming fields, selecting a reduced range etc

@erezsh erezsh added enhancement New feature or request new-db-driver Request to add a new database driver labels Jul 25, 2022
@danthelion
Copy link
Contributor

I have actually started working on a duckdb driver not so long ago, might have something ready next week, but the second part of this

Whilst just being able to reference csv files in data-diff might be another option, doing this via duckDB would allow you to perform some basic transformations on the way; such as renaming fields, selecting a reduced range etc

might deserve a separate issue as it could be generalized for all drivers, no?

@extrobe
Copy link
Author

extrobe commented Jul 30, 2022

I have actually started working on a duckdb driver not so long ago, might have something ready next week, but the second part of this

Nice! Would love to give it a go when you have something (though should point out I'm a data-diff newbie, so not across every aspect of it)

might deserve a separate issue as it could be generalized for all drivers, no?

I absolutely agree... though I think an aspect of this is captured in #79

erezsh added a commit that referenced this issue Nov 15, 2022
@erezsh
Copy link
Contributor

erezsh commented Nov 16, 2022

DuckDB is now supported! It's already available in master, and will be included in the upcoming release.

@erezsh erezsh closed this as completed Nov 16, 2022
@extrobe
Copy link
Author

extrobe commented Nov 19, 2022

Awesome! Look forward to trying it out!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request new-db-driver Request to add a new database driver
Projects
None yet
Development

No branches or pull requests

3 participants