Skip to content

This is a data pipeline for predicting stock prices using Apache Spark, Apache Cassandra, and machine learning techniques. It collects and preprocesses stock data from Alpha Vantage API, engineers features, trains models, and performs data analysis and predictions.

Notifications You must be signed in to change notification settings

Sabareh/stock-price-prediction-spark-cassandra

Repository files navigation

Stock Prediction with Apache Spark and Apache Cassandra

Overview

This project aims to predict stock prices using machine learning techniques, Apache Spark, and Apache Cassandra. The goal is to build a data pipeline that collects stock data, preprocesses it, engineers features, and trains a machine learning model to predict future prices.

Features

  • Collect stock data from multiple sources using the Alpha Vantage API.
  • Store stock data in Apache Cassandra for efficient and scalable data management.
  • Preprocess data by cleaning, handling missing values, and normalizing data.
  • Engineer features by calculating technical indicators, adding sentiment analysis, and including macroeconomic indicators.
  • Train and evaluate machine learning models using Apache Spark.
  • Perform data analysis and predictions using Spark and Cassandra.

Technologies Used

  • Apache Spark
  • Apache Cassandra
  • Alpha Vantage API
  • Python
  • Scikit-learn
  • Pandas

Installation

To install and run this project, follow these steps:

  • Clone the repository to your local machine.
  • Install Apache Cassandra and Spark on your local machine or cluster.
  • Install the required Python libraries by running pip install -r requirements.txt.
  • Set up the Alpha Vantage API and obtain an API key.
  • Configure the Cassandra keyspace and table by running python cassandra_config.py.
  • Collect stock data by running python data_collection.py.
  • Preprocess the data by running python data_preprocessing.py.
  • Engineer features by running python feature_engineering.py.
  • Train and evaluate the machine learning model by running python model_building.py.
  • Perform data analysis and predictions by running python spark_integration.py.

Usage

The project can be used to predict stock prices by running the scripts provided in the repository. The data can be customized by modifying the parameters in the scripts or by using a different data source. The machine learning model can be modified by changing the algorithm or the hyperparameters.

Data Sources

The data used in this project is collected from the Alpha Vantage API, which provides real-time and historical stock data in various formats.

Contributors

@sabareh

License

This project is released under the MIT License.

Acknowledgements

Alpha Vantage API for providing the stock data.

About

This is a data pipeline for predicting stock prices using Apache Spark, Apache Cassandra, and machine learning techniques. It collects and preprocesses stock data from Alpha Vantage API, engineers features, trains models, and performs data analysis and predictions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages