You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am doing an ETL benchmarks that read csv files with variable number of columns, do some transformation and write it back as delta, I test it with 7 Python Engines, unfortunately datafusion support only a csv with a fixed schema.
This is a very good addition to work on, but I suspect we will need to do it upstream in the datafusion core repo and then expose the options in this repo.
I am doing an ETL benchmarks that read csv files with variable number of columns, do some transformation and write it back as delta, I test it with 7 Python Engines, unfortunately datafusion support only a csv with a fixed schema.
fwiw the notebook is here with a reproducible data source : https://github.com/djouallah/Fabric_Notebooks_Demo/blob/main/ETL/Light_ETL_Python_Notebook.ipynb
The text was updated successfully, but these errors were encountered: