Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft PR for issue 1946 - Snowpark dataset #2032

Closed
wants to merge 8 commits into from

Conversation

heber-urdaneta
Copy link

@heber-urdaneta heber-urdaneta commented Nov 16, 2022

NOTE: Kedro datasets are moving from kedro.extras.datasets to a separate kedro-datasets package in
kedro-plugins repository. Any changes to the dataset implementations
should be done by opening a pull request in that repository.

Description

Addresing comments from previous draft PR

Development notes

Draft implementation of SnowParkDataSet kedro/extras/datasets/snowflake/snowpark_dataset.py

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes

@datajoely
Copy link
Contributor

Also as the PR description at the top says, we'll actually have to move all of this to the new kedro-dataset standalone package.

Copy link
Contributor

@datajoely datajoely left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in general we shouldn't be doing Pandas stuff in here, it should be a Snowpark dataset first and foremost the same was as we do with Spark - the user can transcode to Pandas later if they need to.


missing_module = res[0]

if KNOWN_PIP_INSTALL.get(missing_module):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the existing way to do this is to edit setup.py and provide optional dependencies.

return DataSetError(f"{DRIVER_ERROR_MESSAGE}{missing_module_instruction}")


class SnowflakeTableDataSet(AbstractDataSet[pd.DataFrame, pd.DataFrame]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class SnowflakeTableDataSet(AbstractDataSet[pd.DataFrame, pd.DataFrame]):
class SnowparkTableDataSet(AbstractDataSet[pd.DataFrame, pd.DataFrame]):

@sfc-gh-mgorkow
Copy link

sfc-gh-mgorkow commented Nov 17, 2022

Hi, I am a Field CTO for Datascience at Snowflake and really like the idea of having an integration with Snowpark. You did an amazing job so far and I am happy to support where I can.

@datajoely
Copy link
Contributor

@sfc-gh-mgorkow delighted to see you here! Keen to make some noise about this once it launches :)

@heber-urdaneta
Copy link
Author

Development of snowpark integration moved to this PR: kedro-org/kedro-plugins#78

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants