Ecommerce Mobile Search Pipeline

In this project, I developed a robust system to scrape data from an ecommerce website Thredup, process it for quality, and integrate it into a mobile application for efficient searching. It utilizes various technologies such as Python for web scraping, Pandas for data preprocessing, Algolia for indexing and search capabilities, Google Cloud Firestore for data storage, and Google Cloud Run with Docker for deployment.

Architecture 🚀

Part 1: Web Scraping Architecture

Part 2: Data Processing and Validation Workflow

Part 3: Data Pipeline Workflow

Project Structure

The project repository contains the following directories and files:

data_processing/: Contains scripts related to data processing and cleaning.
handle_database/: Includes code for handling the product database and storage.
output/: Stores output files generated during the data processing pipeline.
Dockerfile: Defines the instructions to build a Docker image for this project.
Initial_Products_Scraper.ipynb: Jupyter Notebook file containing the initial product scraping code.
run_image.py: Script to run the Docker image on GCP.
scraping_list_product_modules.py: Contains modules for scraping product listings.

Installation

To run this project locally, follow these steps:

Clone the repository:

git clone https://github.com/faisal-fida/Ecommerce-ETL-Pipeline

Install the required Python dependencies using Pipenv:

pipenv install

Set up any necessary configurations and environment variables.
Run the main script:

python main.py

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github/workflows		.github/workflows
data_processing		data_processing
handle_database		handle_database
output		output
.gitignore		.gitignore
Dockerfile		Dockerfile
Initial_Products_Scraper.ipynb		Initial_Products_Scraper.ipynb
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt
run_image.py		run_image.py
scraping_list_product_modules.py		scraping_list_product_modules.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ecommerce Mobile Search Pipeline

Architecture 🚀

Project Structure

Installation

About

Releases

Packages

Languages

faisal-fida/Ecommerce-ETL-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Ecommerce Mobile Search Pipeline

Architecture 🚀

Project Structure

Installation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages