This repository contains examples and related resources showing you how to preprocess, train, and serve your models using Amazon SageMaker with data fetched from Delta Lake.
The repository contains the following resources:
-
scikit-learn resources:
- Delta Lake scikit-learn Script Mode Training and Serving: This example shows how to train a scikit-learn model on the boston-housing dataset fetched from Delta Lake, and then serve your model with scikit-learn and SageMaker script mode.
- Delta Lake Bring Your Own Container Processing Job: This example provides a detailed walk-through on how to package a scikit-learn Docker image for processing job that fetch data from a table on Delta Lake, and aggregate total COVID-19 cases per country.
Those notebooks were tested on SageMaker Studio with Python 3 (Data Science) Kernel.
Disclaimer: The examples in this repository are for demo purposes only and not meant to be used in production:
- The solution is missing appropriate authorization/authentication tokens.
- Transfer of data over the cloud will be a challenge for large datasets, mainly from cost perspective.
Please contact @e_sela or raise an issue on this repo.
This library is licensed under the MIT-0 License. See the LICENSE file.