A time series is a series of data points ordered in time, where time is often the independent variable. Time series data is valuable for various applications, such as:
- Tracking weather data at different intervals
- Monitoring application performance changes
- Visualizing real-time vital statistics in medical devices
The Auto-Regressive Integrated Moving Average (ARIMA) model is a widely used statistical method for time-series forecasting. ARIMA stands for Auto-Regressive Integrated Moving Average and is used to analyze and predict time-series data. It is suitable for metrics recorded at regular intervals, from fractions of a second to daily, weekly, or monthly periods. ARIMAX and SARIMAX are extensions of the ARIMA model.
MLOps is a set of practices that ensure reliable and efficient deployment and maintenance of machine learning models in production. Many organizations are adopting machine learning and artificial intelligence to gain a competitive edge, automate processes, and make data-driven decisions. MLOps advocates for automation and monitoring at all stages of machine learning system development, including integration, testing, releasing, deployment, and infrastructure management.
The dataset contains "Call centers" data, recorded at the monthly level. The dataset categorizes calls by domain as the call center operates for various domains. It also includes external regressors like the number of phone lines and channels, which indicate traffic prediction and resource availability. The dataset has 132 rows and 8 columns:
- Month, healthcare, telecom, banking, technology, insurance, number of phone lines, and number of channels.
- Build an ARIMA model using the provided dataset.
- Create an end-to-end machine learning development process on Google Cloud Platform (GCP) to design, build, and manage reproducible, testable, and evolvable machine learning models.
- Language: Python
- Libraries: pandas, numpy, matplotlib, seaborn, statsmodels, scipy
- Cloud Services: Google Cloud Platform (GCP)
- Web Technologies: uWSGI, Flask
- Containerization: Docker
- Container Orchestration: Kubernetes
- Import the necessary libraries and load the dataset.
- Perform descriptive analysis.
- Explore the data through Exploratory Data Analysis (EDA) and data visualization.
- Check for white noise and random walk in the time series.
- Conduct stationarity tests, including Augmented Dickey-Fuller and KPSS tests.
- Create a seasonal decomposition plot.
- Apply Holt Winter Exponential Smoothing.
- Develop an ARIMA model:
- Experiment with different lag values.
- Compare models using log-likelihood and AIC values.
- Conduct the LLR test.
- Analyze ACF plots of residuals.
- Implement an ARIMAX model.
- Analyze ACF plots of residuals.
- Explore SARIMAX modeling.
- Analyze ACF plots of residuals.
- Create a new Cloud Build trigger in the GCP console.
- Link the trigger to your source repository.
- Launch a Kubernetes cluster from the console.
- Connect to the cluster and create the following two files:
deployment.yaml
service.yaml
- Copy the code from the "Kubernetes Files" folder in the cloned repository.
- Execute the following commands:
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
- Obtain the deployment name with the command:
kubectl get deployments
- Create a Pub/Sub topic named
cloud-build
. - Set up a subscription to trigger a cloud function.
- Launch the cloud function from the Pub/Sub console.
- Configure environment variables through the GUI console, including
PROJECT
,ZONE
,CLUSTER
, andDEPLOYMENT
. - Copy the code and
requirements.txt
files for the cloud function from thecloud-function-trigger
folder. - Set the Entrypoint for the cloud function as
onNewImage
. - Deploy the function.
After successful deployment, commit changes to the source repository, triggering the following sequence:
- Cloud Build sends a message to Pub/Sub upon successful build.
- Pub/Sub triggers the cloud function.
- The cloud function deploys the new image on Kubernetes.
- To test the deployment, check the logs on the Kubernetes cluster using the following commands:
kubectl get pods
kubectl logs <pod name>
The deployment status is reflected in the logs and endpoints.
- Input: Contains the "Data-chillers.csv" dataset.
- Kubernetes files: Includes files to trigger Kubernetes.
- MLPipeline: Contains functions in different Python files.
- Notebook: The time series ARIMA notebook.
- Output: Stores the model in (.pkl) format.
__init__.py
: An empty required file.Dockerfile
: The Docker image configuration.- Engine: Where the MLPipeline files are called.
Main.py
: Hosts the Flask API.Readme
: Provides an explanation of the entire approach and steps.Requirements.txt
: Lists all required libraries.Uwsgi.ini
: uWSGI configuration file.