Skip to content

Ecommerce data into mobile search index (Data pipeline) using Python, Algolia, and Google Cloud for scalability and efficiency

Notifications You must be signed in to change notification settings

faisal-fida/Ecommerce-ETL-Pipeline

Repository files navigation

Ecommerce Mobile Search Pipeline

In this project, I developed a robust system to scrape data from an ecommerce website Thredup, process it for quality, and integrate it into a mobile application for efficient searching. It utilizes various technologies such as Python for web scraping, Pandas for data preprocessing, Algolia for indexing and search capabilities, Google Cloud Firestore for data storage, and Google Cloud Run with Docker for deployment.

Architecture 🚀

Part 1: Web Scraping Architecture

1

Part 2: Data Processing and Validation Workflow

2

Part 3: Data Pipeline Workflow

3

Project Structure

The project repository contains the following directories and files:

  • data_processing/: Contains scripts related to data processing and cleaning.
  • handle_database/: Includes code for handling the product database and storage.
  • output/: Stores output files generated during the data processing pipeline.
  • Dockerfile: Defines the instructions to build a Docker image for this project.
  • Initial_Products_Scraper.ipynb: Jupyter Notebook file containing the initial product scraping code.
  • run_image.py: Script to run the Docker image on GCP.
  • scraping_list_product_modules.py: Contains modules for scraping product listings.

Installation

To run this project locally, follow these steps:

  1. Clone the repository:
git clone https://github.com/faisal-fida/Ecommerce-ETL-Pipeline
  1. Install the required Python dependencies using Pipenv:
pipenv install
  1. Set up any necessary configurations and environment variables.

  2. Run the main script:

python main.py

About

Ecommerce data into mobile search index (Data pipeline) using Python, Algolia, and Google Cloud for scalability and efficiency

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published