Orchestrating AI Applications Deployments on the Cloud

Team

Name	Github Handle	Email
Ryan Darrow	darrowball13	darrowry@bu.edu
Peter Gu	peterguzw0927	petergu@bu.edu
Harlan Jones	harlanljones	hljones@bu.edu
Kris Patel	Kris7180	krispat@bu.edu
Jimmy Sui	gimmymansui	suijs@bu.edu
Thai Nguyen	ThaiNguyen03	quocthai@bu.edu

Mentorship provided by Shripad Nadgowda at Intel

Sprint Presentations:

Sprint Demo Videos:

Sprint Slides:

Project Dependencies and Installation:

In order to run Cynthus, Docker needs to be installed on the device and be on the system PATH. The link to download Docker can be found below:

Docker/ Docker Desktop

In order to use the Cynthus CLI, pull this respository, then open a terminal from the cynthus_cli directory. Next, in the terminal run the following command:

pip install -e .

Once this installation is complete, commands are run using the following format:

cynthus [command]

More detailed instructions of the possible CLI commands can be found in the section below. Information regarding these commands can also be found by running cynthus --help, cynthus -h, or cynthus.

CLI Command Explanations:

The Cynthus CLI has the following commands:

signup: Allows the user to create an account. Running this command will prompt the user for an email to associate the account with, and then ask them to create a password. Note that running this command with another user logged in will log out that user.
login: Log in to a created Cynthus account.
init [project name]: will create a folder directory [project name] for the user within the directory this command is run. (Optional)
prepare --src_path [src_path] --data_path [data_path]: Given the directories containing the source code and data, this command will push the contents of the directories to the Google Cloud Platform in order to prepare the project. If no data_path is provided, the user will be prompted to provide a link to external data (currently only works with Kaggle and HuggingFace).
update-data: Given new data, this command will push the new data to the users data bucket. The user will be prompted as to whether this new data is local or external.
update-src: Given a new source code path, this command will push the new source code to the Artifact Registry.
update: After new source code or data has been prepared and pushed, this command will send a request to the GCP to update the VM instance accordingly.
run: Runs the VM instance containing the users data and source code. This function should only be run after the prepare function has been run.
destroy: Deletes the current resources created for this account on the GCP. Can be used once a project is finished and the VM no longer needs to run, or if a full reset is needed.
pull: Once the project has finished running, this command can be run to pull the output from the users output bucket locally, in the location this command is run.

NOTE: Docker Desktop must be open in order to run the commands prepare and update-src

Other requirements/restrictions

Running any Cynthus command for the first time will automatically prompt the user to either log in to their account or create an account.
Upon logging in or signing up, an authentication token file (auth_token.json) will be created for the user in the directory where the command triggering the log in/sign up was called. This token provides authentication for 1 hour. Any commands run when the token expires will trigger the user to log back in.
Cynthus commands need to be run from the same directory as the authentication token file, othewise the log in/sign up functionality will be triggered again.
Due to how construction of the Docker image is handled by the CLI, the file within the source directory containing the code that the user wants to run upon resource provisioning on the GCP should be named main.py.
When the data and source code are uploaded to the VM instance, the data directory and the main.py file will be placed in the same directory. Therefore, main,py should be constructed such that it looks for data in a data directory located within the same directory as itself.
After cynthus prepare has been run, serverless functions are run to spin up the VM instance and mount everything. To ensure that there are no issues with running the VM, wait 5-10 minutes before using cynthus run.

1. Vision and Goals Of The Project:

Cynthus aims to simplify the deployment of AI applications on cloud platforms. While initially designed for Intel Developer Cloud (IDC), the project currently operates on Google Cloud Platform (GCP) due to accessibility considerations. The platform addresses the challenges developers face when deploying AI workloads by providing automated solutions for resource management, dependency handling, and deployment orchestration. Key goals of the project include:

Creating a simplified command-line interface for end-to-end AI application deployment
Automating resource allocation and dependency management through Terraform and Ansible
Providing seamless integration with public datasets and models from sources like HuggingFace and Kaggle
Implementing secure containerized deployments using Docker
Managing cloud infrastructure through automated scripts and serverless functions
Supporting scalable and maintainable AI workload deployments

2. Users/Personas Of The Project:

The platform serves various users in the AI development ecosystem:

AI developers who need an efficient way to deploy models without managing complex infrastructure
Engineers requiring specific hardware configurations for AI model deployment
Newcomers to cloud computing who want to explore AI capabilities without deep cloud expertise
Teams needing secure and scalable infrastructure for AI workloads
Developers working with custom models who need flexible deployment options
Organizations requiring automated resource management and cost optimization

3. Scope and Features Of The Project:

The AI Deployment Platform provides:

Command-line interface with:
- User authentication via Firebase
- Project initialization and configuration
- Automated deployment to cloud storage
- Resource management and monitoring
Cloud Infrastructure:
- Serverless functions for VM provisioning
- MySQL database for logging and state management
- Cloud Storage buckets for project data and source code
- Docker containerization for application deployment
Integration Features:
- Support for HuggingFace and Kaggle datasets
- Automated dependency management
- Version control for containers and deployments
Security Features:
- Firebase authentication
- Resource tagging for access control
- Secure secret management
- Service account management

4. Solution Concept

The solution architecture consists of several key components working together to provide end-to-end AI application deployment:

Client Layer

Command Line Interface (CLI)
- Primary user interaction point
- Handles authentication through Firebase
- Manages project initialization and configuration
- Builds and uploads Docker containers
- Monitors deployment status and results
- Downloads results

Data Management Layer

Dataset Downloader
- Integrates with Kaggle and HuggingFace
- Manages dataset versioning and storage
- Handles data preprocessing requirements
Bucket Builder
- Creates and manages GCP storage buckets
- Generates requirements.txt automatically
- Handles input/output storage configuration

Storage Layer

Input Object Storage (GCP Bucket)
- Stores user data, requirements.txt, and source code
- Triggers deployment workflows
- Manages access control through Firebase authentication
Output Object Storage (GCP Bucket)
- Stores computation results
- Maintains execution logs
- Provides secure access to processed data

Processing Layer

Cloud Run Functions
- Handles VM provisioning and configuration
- Manages container deployment
- Coordinates with orchestrator for deployment status
- Processes authentication and authorization

Management Layer

SQL Database
- Tracks deployment metadata:
  - Run ID and User ID
  - Resource paths and states
  - Deployment configurations
- Maintains system state information
Orchestrator Server
- Monitors VM health through heartbeats
- Manages container lifecycle
- Handles failure recovery
- Updates deployment states
- Coordinates between components

Container Registry Layer

Artifacts Registry
- Stores Docker container images
- Manages image versions
- Provides secure container distribution
- Integrates with VM deployment

Compute Layer

VM Bare Metal
- Executes containerized AI workloads
- Reports health status to orchestrator
- Manages data processing
- Handles output generation

Key Workflows:

Authentication Flow:
- User authenticates via Firebase
- Access tokens manage resource permissions
- Secure communication between components
Deployment Flow:
- Container image built and pushed to Artifacts Registry
- Cloud Run Functions provision VM resources
- Orchestrator manages deployment lifecycle
- System state tracked in SQL database
Data Management Flow:
- Dataset Downloader fetches external data
- Bucket Builder creates storage infrastructure
- Input/Output buckets manage data lifecycle
Execution Flow:
- VM pulls container from Artifacts Registry
- Workload processes data
- Results stored in output bucket
- Status updates maintained in database
Monitoring Flow:
- Orchestrator tracks VM health
- System handles failure recovery
- Metrics and logs collected
- State management maintained

5. Acceptance criteria

Minimum acceptance criteria includes:

Functional CLI for end-to-end deployment:
- User authentication and project management
- Automated resource provisioning
- Container deployment and monitoring
Cloud Infrastructure Setup:
- Successful VM provisioning with Terraform
- Automated configuration with Ansible
- Docker container deployment
External Integrations:
- Working connections to HuggingFace and Kaggle
- Successful data and model management
Security Implementation:
- User authentication
- Resource access control
- Secure deployment pipeline

6. Challenges:

Accessing the Intel Developer Cloud
Architecture redesign
Microservice integration
Familiarization with technologies
Security considerations
Time management
Accessing GPUs and GKE on GCP

7. Release Planning:

Current Progress:

All functional requirements met
Have functional CLI for end-to-end deployment
Security implemeted with user authentication & resource access control
Setup cloud infrastructure
Persisted through a Cloud Platform migration and a major architecture redesign

Stretch Goals:

Queueing system for async operations
Enhanced secret management
Multi-tenancy support
Distributed ML training/inference
GPU support
CLI refinements
Billing managemet
GUI tracking

Previous team worked with supervisor: https://github.com/BU-CLOUD-F20/Securing_MS_Integrity

Name		Name	Last commit message	Last commit date
Latest commit History 386 Commits
.github/workflows		.github/workflows
control-ansible		control-ansible
cynthus_cli		cynthus_cli
deprecated/docker		deprecated/docker
serverless		serverless
test_apps		test_apps
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
my-image.tar		my-image.tar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Orchestrating AI Applications Deployments on the Cloud

Team

Sprint Presentations:

Project Dependencies and Installation:

CLI Command Explanations:

Other requirements/restrictions

1. Vision and Goals Of The Project:

2. Users/Personas Of The Project:

3. Scope and Features Of The Project:

4. Solution Concept

Client Layer

Data Management Layer

Storage Layer

Processing Layer

Management Layer

Container Registry Layer

Compute Layer

Key Workflows:

5. Acceptance criteria

6. Challenges:

7. Release Planning:

About

Releases

Packages

Contributors 7

Languages

License

EC528-Fall-2024/orchestrating-ai-app-cloud

Folders and files

Latest commit

History

Repository files navigation

Orchestrating AI Applications Deployments on the Cloud

Team

Sprint Presentations:

Project Dependencies and Installation:

CLI Command Explanations:

Other requirements/restrictions

1. Vision and Goals Of The Project:

2. Users/Personas Of The Project:

3. Scope and Features Of The Project:

4. Solution Concept

Client Layer

Data Management Layer

Storage Layer

Processing Layer

Management Layer

Container Registry Layer

Compute Layer

Key Workflows:

5. Acceptance criteria

6. Challenges:

7. Release Planning:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages