Skip to content

longhowlam/snowflake_cars_prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Demo machine learning with snowflake snowpark

Introduction

This repo contains some info to

  • Setup snowflake (for machine learning),
  • An example CSV cars data set
  • A notebook demonstrating the use of snowpark ML.

We'll train a machine learning model (a regression model in this case) using a snowpark ML pipeline and register that pipeline in the snowpark model registery. The model can then be used for scoring new data. See the carprice_prediction_snowflake_notebook.ipynb notebook for more details.

Setup your snowflake account

If you don't have a snowflake account yet, it is easy and fast (within 4 minutes without credit card) to create a trial account. You should have 120 days and 400 credits to do some experimention in snowflake

Create database objects

In your snowflake environment it is good idea to separate this machine learning 'experiment' in a separate database. So lets create a new database in Snowflake called 'CARS_DATA'.

  1. In the snowsight web interface, click on the '+ CREATE' button,
  2. Then select SQL worksheet
  3. In that new worksheet type in and run:
CREATE DATABASE CARS_DATA;

Get the data and the notebook into snowflake

There are two ways to get the CSV data set and the notebook in this repo into your snowflake environment. 1. manualy upload the data and the notebook, or 2 via git integration.

1. Manually uploads

Upload Data

Now that we have a database, we can populate it with tables, one way is to upload data. Download the car_prices.csv data set from this repository. Then to upload the car_prices.csv dataset, you can use the Snowflake web interface. Below is an example of how to upload the data using the Snowflake web interface:

  1. Navigate to the Data > Databases section,
  2. Select the CARS_DATA database that we just created.
  3. Select the PUBLIC schema,
  4. Click on the CRAETE button and select Table from file.
  5. Use the "Load Data" option to upload the car_prices.csv file.

Here are some screenshot of the upload process:

Upload Data

Upload Data

Upload Data

When the data is uploaded you can view the car_price data in the snowflake interface

car price Data

Upload Notebook

We can now upload the carprice_prediction_snowflake_notebook.ipynb in this repository to snowflake. From the snowflake homepage:

  1. Click + Create
  2. Select Notebookk > Import *.ipynb file

create notebook

create notebook

2. Via Git integration in Snowflake

Snowflake allows you to clone git repositories from github or gitlab. In order to thatwe first need to create an API integration and the using that api integration, we can clone a repository (into a data base).

  1. Click on the '+ CREATE' button
  2. Select SQL worksheet
  3. Select the database CARS_DATA and PUBLIC schema
  4. In that new worksheet type in and run:
CREATE or REPLACE api integration git_api_integration
    api_provider = git_https_api
    api_allowed_prefixes = ('https://github.com/longhowlam/')
    enabled = true
    allowed_authentication_secrets = all
;

Run the above command to create an api integration, then we can clone this repo

CREATE OR REPLACE GIT REPOSITORY cars_prediction
    API_INTEGRATION = git_api_integration
    ORIGIN = 'https://github.com/longhowlam/snowflake_cars_prediction.git'
;

Now you should see the repo in your snowflake account, see a screenshot in the figure below.

git repo

Form there you can create the notebook in your snwoflake account by clicking on the three dots.

create notebook

A dialog will appear where you can set the notebook runtime, and then you will see the notebook appear as a natibe snowflake notebook inside your snowflake environment. The notebook environment includes the car_prices.csv data set. This can be imported with pandas in the notebook and then the data can be put in a snowflake database with snowpark stateents. See notebook.

Packages used by botebook

The notebooks inside the snowflake web interface make use of packages, some packages are already pre loaded, some need to be added. The notebook in this repo makes use of the plotly and the snowflake-ml-python packages we need to enter those in the package list of the notebook first.

Upload Data

Now you can run the cells in the notebook and get an idea of snowpark ML.

About

demo notebook on snowflake snowpark ML

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published