Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add ManagedTableDataset for managed Delta Lake tables in Databricks #127

Closed
wants to merge 78 commits into from

Conversation

dannyrfar
Copy link
Contributor

@dannyrfar dannyrfar commented Mar 14, 2023

Description

Creating first of few PRs to add functionality for Databricks in Kedro datasets. This PR includes the ManagedTableDataset which will allow users to interface with managed Delta tables in Databricks or locally in PySpark.

Development notes

Changes include a net new dataset, databricks.ManagedTableDataSet, which allows users to interface with managed delta tables inside of Databricks.

Checklist

  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the relevant RELEASE.md file
  • Added tests to cover my changes

@dannyrfar
Copy link
Contributor Author

hey @noklam just saw your comment in the other PR. I did see those two datasets, this will be more focused on Databricks Unity catalog tables. The SparkDataSet and DeltaTableDataSets are for interfacing with files directly. Both can be used on databricks but are intended for different purposes.

dannyrfar and others added 26 commits March 21, 2023 15:16
…atasets allows users to interface with Unity catalog tables in Databricks to both read and write.

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
…org#99)

* Add non-spark related test changes
Replace kedro.pipeline.Pipeline with
kedro.pipeline.modular_pipeline.pipeline factory.
This is for symmetry with changes made to the main kedro library.

Signed-off-by: Adam Farley <adamfrly@gmail.com>

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
* fix links

* fix dill links

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
* Fix docs formatting and phrasing for some datasets

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* Manually fix files not resolved with patch command

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* Apply fix from kedro-org#98

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

---------

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
* bump version and update release notes

* fix pylint errors

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
* Prefix Docker plugin name with "Kedro-" in usage message

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
…o-org#56)

* Keep Kedro-Docker plugin docstring from appearing in `kedro -h`

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: wmoreiraa <walber3@gmail.com>

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
…dro-org#54)

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
…ro-org#118)

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
* [kedro-docker] Layers size optimization (kedro-org#92)

* [kedro-docker] Layers size optimization

Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>

* Adjust test requirements

Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>

* Skip coverage check on tests dir (some do not execute on Windows)

Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>

* Update .coveragerc with the setup

Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>

* Fix bandit so it does not scan kedro-datasets

Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>

* Fixed existence test

Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>

* Check why dir is not created

Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>

* Kedro starters are fixed now

Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>

* Increased no-output-timeout for long spark image build

Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>

* Spark image optimized

Signed-off-by: Mariusz Strzelecki <szczeles@gmail.com>

* Linting

Signed-off-by: Mariusz Strzelecki <szczeles@gmail.com>

* Switch to slim image always

Signed-off-by: Mariusz Strzelecki <szczeles@gmail.com>

* Trigger build

Signed-off-by: Mariusz Strzelecki <szczeles@gmail.com>

* Use textwrap.dedent for nicer indentation

Signed-off-by: Mariusz Strzelecki <szczeles@gmail.com>

* Revert "Use textwrap.dedent for nicer indentation"

This reverts commit 3a1e3f8.

Signed-off-by: Mariusz Strzelecki <szczeles@gmail.com>

* Revert "Revert "Use textwrap.dedent for nicer indentation""

This reverts commit d322d35.

Signed-off-by: Mariusz Strzelecki <szczeles@gmail.com>

* Make tests read more lines (to skip all deprecation warnings)

Signed-off-by: Mariusz Strzelecki <szczeles@gmail.com>

Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>
Signed-off-by: Mariusz Strzelecki <szczeles@gmail.com>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Release Kedro-Docker 0.3.1 (kedro-org#94)

* Add release notes for kedro-docker 0.3.1

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Update version in kedro_docker module

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Bump version and update release notes (kedro-org#96)

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Make the SQLQueryDataSet compatible with mssql.

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Add one test + update RELEASE.md.

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Add missing pyodbc for tests.

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Mock connection as well.

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Add more dates parsing for mssql backend (thanks to fgaudindelrieu@idmog.com)

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Fix an error in docstring of MetricsDataSet (kedro-org#98)

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Bump relax pyarrow version to work the same way as Pandas (kedro-org#100)

* Bump relax pyarrow version to work the same way as Pandas

We only use PyArrow for `pandas.ParquetDataSet` as such I suggest we keep our versions pinned to the same range as [Pandas does](https://github.com/pandas-dev/pandas/blob/96fc51f5ec678394373e2c779ccff37ddb966e75/pyproject.toml#L100) for the same reason.

As such I suggest we remove the upper bound as we have users requesting later versions in [support channels](https://kedro-org.slack.com/archives/C03RKP2LW64/p1674040509133529)

* Updated release notes

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Add missing type in catalog example.

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Add one more unit tests for adapt_mssql.

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* [FIX] Add missing mocker from date test.

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* [TEST] Add a wrong input test.

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Add pyodbc dependency.

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* [FIX] Remove dict() in tests.

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Change check to check on plugin name (kedro-org#103)

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Set coverage in pyproject.toml (kedro-org#105)

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Move coverage settings to pyproject.toml (kedro-org#106)

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Replace kedro.pipeline with modular_pipeline.pipeline factory (kedro-org#99)

* Add non-spark related test changes
Replace kedro.pipeline.Pipeline with
kedro.pipeline.modular_pipeline.pipeline factory.
This is for symmetry with changes made to the main kedro library.

Signed-off-by: Adam Farley <adamfrly@gmail.com>

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Fix outdated links in Kedro Datasets (kedro-org#111)

* fix links

* fix dill links

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Fix docs formatting and phrasing for some datasets (kedro-org#107)

* Fix docs formatting and phrasing for some datasets

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* Manually fix files not resolved with patch command

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* Apply fix from kedro-org#98

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

---------

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Release `kedro-datasets` `version 1.0.2` (kedro-org#112)

* bump version and update release notes

* fix pylint errors

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Bump pytest to 7.2 (kedro-org#113)

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Prefix Docker plugin name with "Kedro-" in usage message (kedro-org#57)

* Prefix Docker plugin name with "Kedro-" in usage message

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Keep Kedro-Docker plugin docstring from appearing in `kedro -h` (kedro-org#56)

* Keep Kedro-Docker plugin docstring from appearing in `kedro -h`

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* [kedro-datasets ] Add `Polars.CSVDataSet` (kedro-org#95)

Signed-off-by: wmoreiraa <walber3@gmail.com>

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Remove deprecated `test_requires` from `setup.py` in Kedro-Docker (kedro-org#54)

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* [FIX] Fix ds to data_set.

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

---------

Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>
Signed-off-by: Mariusz Strzelecki <szczeles@gmail.com>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-authored-by: Mariusz Strzelecki <szczeles@gmail.com>
Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Co-authored-by: OKA Naoya <pn11@users.noreply.github.com>
Co-authored-by: Joel <35801847+datajoely@users.noreply.github.com>
Co-authored-by: adamfrly <45516720+adamfrly@users.noreply.github.com>
Co-authored-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com>
Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-authored-by: Walber Moreira <58264877+wmoreiraa@users.noreply.github.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
… file path (kedro-org#114)

* Add databricks deployment check and automatic DBFS path addition

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Add newline at end of file

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Remove spurious 'not'

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Move dbfs utility functions from SparkDataSet

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Add edge case logic to _build_dbfs_path

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Add test for dbfs path construction

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Linting

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Remove spurious print statement :)

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Add pylint disable too-many-public-methods

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Move tests into single method to appease linter

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Modify prefix check to /dbfs/

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Modify prefix check to /dbfs/

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Make warning message clearer

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Add release note

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Fix linting

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Update warning message

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Modify log warning level to error

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Modify message back to warning, refer to undefined behaviour

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Modify required prefix to /dbfs/

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Modify doc string

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Modify warning message

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Split tests and add filepath to warning

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Modify f string in logging call

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Fix tests

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Lint

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

---------

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
* Add Snowpark datasets

Signed-off-by: Vladimir Filimonov <vladimir_filimonov@mckinsey.com>
Signed-off-by: heber-urdaneta <heber_urdaneta@mckinsey.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
* bump version and update release notes

* fix pylint errors

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
)

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
astrojuanlu and others added 12 commits May 3, 2023 12:50
* Migrate kedro-airflow to static metadata

See kedro-org/kedro#2334.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Add explicit PEP 518 build requirements for kedro-datasets

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Typos

Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Remove dangling reference to requirements.txt

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Add release notes

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

---------

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
* Migrate kedro-telemetry to static metadata

See kedro-org/kedro#2334.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Add release notes

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

---------

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
* Add unit test + lint test on GA

* trigger GA - will revert

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Fix lint

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Add end to end tests

* Add cache key

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Add cache action

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Rename workflow files

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Lint + add comment + default bash

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Add windows test

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Update workflow name + revert changes to READMEs

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Add kedro-telemetry/RELEASE.md to trufflehog ignore

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Add pytables to test_requirements remove from workflow

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Revert "Add pytables to test_requirements remove from workflow"

This reverts commit 8203daa.

* Separate pip freeze step

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

---------

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
* Migrate kedro-docker to static metadata

See kedro-org/kedro#2334.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Address packaging warning

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Fix tests

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Actually install current plugin with dependencies

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Add release notes

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

---------

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Currently opening gitpod will installed a Python 3.11 which breaks everything because we don't support it set. This PR introduce a simple .gitpod.yml to get it started.

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
* Update APIDataSet

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Sync ParquetDataSet

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Sync Test

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Linting

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Revert Unnecessary ParquetDataSet Changes

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Sync release notes

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

---------

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
…edro-org#182)

* bump tables version and remove step in workflow

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* revert version for linux

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* change version to 3.7

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* remove extra line

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

---------

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
* Create validate-pr-title.yaml

* ci: add `ready_for_review` to the PR type triggers

* Update validate-pr-title.yaml

* revert: drop the `ready_for_review` type from list

* ci: restrict the set of scopes to the plugin names

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
)

* refactor TensorFlowModelDataset to Set

matching consistency of all other kedro-datasets, DataSet should be camelcase. will be reverted in 0.19.0

Signed-off-by: BrianCechmanek <brian@hazy.com>

* Introdcuing .gitpod.yml to kedro-plugins (kedro-org#185)

Currently opening gitpod will installed a Python 3.11 which breaks everything because we don't support it set. This PR introduce a simple .gitpod.yml to get it started.

Signed-off-by: BrianCechmanek <brian@hazy.com>

* sync APIDataSet  from kedro's `develop` (kedro-org#184)

* Update APIDataSet

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Sync ParquetDataSet

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Sync Test

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Linting

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Revert Unnecessary ParquetDataSet Changes

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Sync release notes

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

---------

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: BrianCechmanek <brian@hazy.com>

* [kedro-datasets] Bump version of `tables` in `test_requirements.txt`  (kedro-org#182)

* bump tables version and remove step in workflow

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* revert version for linux

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* change version to 3.7

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* remove extra line

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

---------

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: BrianCechmanek <brian@hazy.com>

* refactor tensorflowModelDataset casing in datasets setup.py

Signed-off-by: BrianCechmanek <brian@hazy.com>

* add tensorflowmodeldataset bugfix to release.md

Signed-off-by: BrianCechmanek <brian@hazy.com>

* Update all the doc reference with TensorFlowModelDataSet

Signed-off-by: Nok <nok.lam.chan@quantumblack.com>

---------

Signed-off-by: BrianCechmanek <brian@hazy.com>
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Nok <nok.lam.chan@quantumblack.com>
Co-authored-by: Nok Lam Chan <mediumnok@gmail.com>
Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
Co-authored-by: Nok <nok.lam.chan@quantumblack.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
@jmholzer jmholzer changed the title First release of databricks.ManagedTableDataset Add ManagedTableDataset for managed Delta Lake tables in Databricks May 4, 2023
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
@jmholzer jmholzer changed the title Add ManagedTableDataset for managed Delta Lake tables in Databricks feat: Add ManagedTableDataset for managed Delta Lake tables in Databricks May 4, 2023
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
@jmholzer
Copy link
Contributor

jmholzer commented May 5, 2023

I made a few changes:

  • I added tests to reach 100% coverage in managed_table_dataset.
  • I removed the functions for automatically getting and updating the version cache (and the version cache itself) as all of these were unused (and untested). They are not necessary for the dataset to function, and in other datasets we do not use this approach. They also introduced unnecessary dependencies. If we decide we want the functionality they intended (?) to offer, we can always add this in a future PR.
  • I removed a walrus operator, as we need to support Python 3.7 :).

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
@jmholzer
Copy link
Contributor

Closing this in favour of #206, which has a clean commit history, has signed commits and is based on the latest commit in main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.