Skip to content

kjrstory/awesome-cloud-hpc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Cloud HPC Awesome

A curated list of cloud HPC.

Contents

Solution

Management Tool

  • Alibaba E-HPC - Alibaba Cloud's computing service for resource management, job submission, performance analysis, and VNC in E-HPC console. alibaba cloud

  • AWS ParallelCluster - Open source cluster management tool for deploying and managing HPC clusters (Repository). aws

  • AWS ParallelCluster UI - Front-end for AWS ParallelCluster. aws

  • Azure CycleCloud - Secure and flexible cloud HPC and Big Compute environments. azure

  • Azure HPC OnDemand Platform - Azure-based HPC cluster solution with features like Terraform, Ansible, Packer integration, job scheduling, autoscaling, and monitoring (Repository, Marketplace). azure

  • CloudyCluster - Turn-Key Cloud HPC elastic orchestration with a familiar hpc look and feel. aws gcp

  • Cluster in the Cloud - Multi cloud solution that uses Terraform for infrastructure setup, Ansible for software configuration, and Slurm with custom Python scripts for dynamic node management in cloud-based HPC environment. aws azure gcp oci

  • Cluster Toolkit - Google Cloud's open-source software for deploying AI/ML and high-performance computing environments on GCP, featuring customizable Terraform modules and Packer integration. (Repository). gcp

  • Flight Environment - The Flight User Suite for improved HPC access through CLI tools, the Flight Web Suite as a web interface for HPC end-users, and the Flight Admin Tools for administrative HPC environment configuration. aws openstack

  • HPC-NOW - The platform aims to simplify the process of starting and managing HPC workloads in the cloud. alibaba cloud tencent cloud aws HUAWEI CLOUD Baidu Cloud azure gcp

  • JedAI Cloud - Optimized HPC stacks enable easy cluster management and on-demand HPC through pre-integrated solutions, delivering bare metal infrastructure, virtualized services, and containerized apps via a single management interface by Define Tech.

  • KT Cloud HPC - KT Cloud's HPC management product integrating Altair's solutions. kt cloud

  • Magic Castle - Multi-cloud HPC cluster solution that leverages Terraform and Puppet for deployment, featuring job scheduling with Slurm and over 3000 research software applications. aws azure gcp openstack OVH

  • Microsoft HPC Pack - Creation and management tool of HPC clusters, enabling the use of Windows or Linux nodes on-premises and cloud resources in Azure. azure

  • OCI HPC Cluster - Automated HPC cluster deployment on OCI. oci

  • OCI HPC File System (HFS) - Solution for deploying various HPC file servers on OCI. Automated HPC cluster deployment on OCI. oci

  • SCP HPC Cluster - HPC cluster environment on SCP. scp

  • Scyld Cloud Manager - Comprehensive management platform to cloud-enable Enterprise HPC.

  • TrinityX - Next-gen open-source HPC, AI, and cloud platform offering customizable installations with efficient provisioning, SLURM/OpenPBS, OpenHPC, and more for modern cluster management. aws azure gcp

IaaS-Server

  • Amazon EC2 Hpc7g - HPC-optimized instances powered by AWS Graviton3E processors. aws

  • Amazon EC2 Hpc7a - HPC-optimized instances powered by 4th Generation AMD EPYC processors. aws

  • Amazon EC2 Hpc6id - HPC-optimized instances powered by 3rd Generation Intel Xeon Scalable processors. aws

  • Amazon EC2 P5 - GPU instances powerd by NVIDIA H100 GPUs. aws

  • Amazon EC2 P4 - GPU instances powerd by NVIDIA A100(80Gb,40Gb) GPUs. aws

  • Amazon EC2 P3 - GPU instances powerd by NVIDIA V100 GPUs. aws

  • Amazon EC2 G5 - GPU instances powerd by NVIDIA A10G GPUs and 2nd Gen AMD EPYC processors. aws

  • Azure HBv4-series - HPC-optimized instances powered by 4th Generation AMD EPYC processors. azure

  • Azure HBv3-series - HPC-optimized instances powered by 3rd Generation AMD EPYC processors. azure

  • Azure HBv2-series - HPC-optimized instances powered by 2nd Generation AMD EPYC processors. azure

  • Azure HB-series - HPC-optimized instances powered by 1st Generation AMD EPYC processors. azure

  • Azure HC-series - HPC-optimized instances powered by 1st Generation Intel Xeon Scalable processors. azure

  • Azure HX-series - Optimized instances for workloads that require significant memory capacity with twice the memory capacity as HBv4. azure

  • Azure NDm H100 v5-series - GPU instances powerd by NVIDIA H100 GPUs. azure

  • Azure NDm A100 v4-series - GPU instances powerd by NVIDIA A100(80Gb) GPUs and 3rd Generation AMD EPYC processors. azure

  • Azure NC A100 v4-series - GPU instances powerd by NVIDIA A100(40Gb) GPUs and 3rd Generation AMD EPYC processors. azure

  • Azure NCv3-series - GPU instances powerd by NVIDIA V100 GPUs. azure

  • Azure NCasT4_v3-series - GPU instances powerd by NVIDIA T4 GPUs and 2nd Gen AMD EPYC CPUs. azure

  • GCP H3 machine-series - CPU instances powerd by 4th Generation Intel Xeon Scalable processors. gcp

  • GCP C2D machine-series - CPU instances powerd by 3rd Generation AMD EPYC processors. gcp

  • GCP C2 machine-series - CPU instances powerd by 2nd Geration Intel Xeon Scalable processors. gcp

  • GCP A3 machine-series - GPU instances powerd by NVIDIA H100 GPUs. gcp

  • GCP A2 machine-series - GPU instances powerd by NVIDIA A100(80Gb,40Gb) GPUs. gcp

  • GCP G2 machine-series - GPU instances powerd by NVIDIA L4 GPUs. gcp

  • Super Computing Cluster - Based on ECS Bare Metal Instance powered by Alibaba Cloud, utilizes high-speed RDMA-based connections to enhance network performance and acceleration ratio in large-scale clusters, providing high-bandwidth and low-latency networks. alibaba cloud

IaaS-Network

  • Azure InfiniBand - RDMA capable HB-series and N-series VMs communicate over the InfiniBand network. azure

  • Compute Clusters(Cluster Networks with Instance Pools) - Group of high performance computing (HPC), GPU, or optimized instances that are connected with a high-bandwidth, ultra low-latency network.

  • Elastic Fabric Adapter - Network interface for Amazon EC2 instances that enables customers to run applications requiring high levels of inter-node communications at scale. aws

IaaS-Storage

  • Amazon FSx for Lustre - Fully managed shared storage with the scalability and performance of the popular Lustre file system. aws
  • Amazon FSx for OpenZFS - Fully managed shared storage built on the popular OpenZFS file system. aws
  • Azure HPC Cache File caching for HPC on Azure. azure
  • Azure Managed Lustre - Managed, pay-as-you-go file system for high-performance computing (HPC) and AI workloads. azure
  • Azure NetApp Files - Enterprise-grade Azure file shares, powered by NetApp. azure
  • GCP File Store - High-performance, fully managed file storage. gcp
  • GCP Parallel Store - Based on Intel DAOS and delivers up to 6.3x greater read throughput performance compared to competitive Lustre scratch offerings. gcp

IaaS-Image

PaaS

  • AWS Batch - Fully managed batch computing service. aws

  • AWS Parallel Computing Service - Managed service for HPC cluster deployment and scaling on AWS using Slurm. aws

  • Batch(Azure) - Cloud-scale job scheduling and compute management. azure

  • Batch(GCP) - Fully managed batch service to schedule, queue, and execute batch jobs on Google's infrastructure. gcp

  • Batch Compute - Cloud service for massive simultaneous batch processing on Alibaba Cloud. alibaba cloud

  • Covalent - Pythonic workflow orchestration platform for scaling workloads from your laptop to any compute backend (Repository). aws azure gcp oci

  • Amazon DCV - High-performance remote display protocol that provides customers with a secure way to deliver remote desktops and application streaming. aws

  • NI SP EF Portal - Unified interface to submit jobs for both on-premises and cloud workflow. aws

  • Research and Engineering Studio - Open source, easy-to-use web-based portal for administrators to create and manage secure cloud-based research and engineering environments on AWS. aws

  • Rntier Cloud - R&D cloud platform enabling easy and quick access to complex HPC simulations, vGPU-based remote 3D design, and multi-GPU deep learning environments via a web browser. aws ncp

  • Scyld Cloud Central™ - Fully managed, cloud-based, end-to-end solution for high performance computing that makes it easier and faster for end-users, developers, and data scientists to deploy pure HPC, pure AI, and converged HPC/AI workloads on high-performance clusters. aws azure gcp oci

  • Scyld ClusterWare - Intelligent suite of management functionality, including node provisioning, image customization, and cluster monitoring, while serving as a platform for additional software and schedulers.

  • Scyld Cloud Workstation - Unparalleled performance and a breadth of features that allow it to stand out as a solution for remote access.

  • Skypilot - Framework for running LLMs, AI, and batch jobs on any cloud, offering maximum cost savings, highest GPU availability, and managed execution (Repository). aws azure gcp oci ibm cloud scp

SaaS

  • CloudHPC - On-demand cloud computing for CAE engineering simulations powered by CFD FEA SERVICE.

  • dicehub - Real-time collaborative CFD (Computational Fluid Dynamics) simulations platform which simplifies your engineering workflow, offers massive parallel scaling and runs in web browser. aws gcp

  • EPIC - Primarily for CFD applications, available on the web and created by Zenotech, which also includes Zenotech's ZCFD. aws oci

  • Kaleidosim - Enabling of browser-based access to HPC software through advanced cloud orchestration technology.

  • Luminary Cloud - A cloud-based, pay-per-use SaaS simulation platform with a fast, GPU-powered, cloud-native CFD solver and comprehensive high-fidelity capabilities.

  • Nimbix - A comprehensive cloud computing solution powered by Atos, offering access to the HyperHub Application Marketplace with over 1,000 high-performance applications and workflows for diverse industries (Repository). aws azure gcp oci

  • OnScale Solve - The cloud engineering simulation platform built by engineers for engineers.

  • Rescale - Hybrid-cloud platform offering turnkey HPC with extensive(1000+) ecosystem integrations and API connections to major PLM, SPDM, and data storage systems(Repository). aws azure gcp oci ncp

  • Sabalcore - User-friendly, pay-as-you-go high performance computing cloud service with a full-featured, light-weight client that doesn't require a browser.

  • Scala Computing - Optimized, automated cloud-based HPC resource management platform with integrated network simulation and EDA tools, offering flexible, on-demand computing, secure workflows, and global infrastructure access. aws

  • Simscale - Cloud-based computer-aided engineering (CAE) software for computational fluid dynamics, finite element analysis, and thermal simulations, using open source codes in its backend (Repository(SDK)).

  • SyncHPC - Powerful and flexible hybrid HPC and VDI management platform that provides a comprehensive solution for managing high-performance computing (HPC) and Virtual Desktop Infrastructure (VDI) resources. aws azure gcp oci ibm cloud

  • TAESUNG Cloud - Offering Ansys applications as a service in a cloud-based SaaS. aws

CAE and EDA ISV

  • 3DEXPERIENCE platform on ther cloud - Complete suite of industry-leading apps and software(CATIA, SIMULIA, DELMIA, 3DEXCITE, etc.) powered by Dassalut Systèmes.

  • Altair One - Cloud Gateway offering dynamic and collaborative access to simulation and data analytics technology, along with scalable HPC and cloud resources.

  • Altair Unlimited - A turnkey, state-of-the-art private appliance available in both on-premises and cloud-based formats, offering unlimited access to a wide range of Altair HyperWorks solver software.

  • Ansys Access on Microsoft Azure - Cloud-based simulation solution available on the Azure Marketplace, offering fast, scalable access to Ansys applications (Marketplace). azure

  • Ansys Cloud Direct - Cloud-based interactive workstations and HPC clusters, with flexible licensing that can be accessed from desktop. azure

  • Ansys Gateway by AWS - Cloud-based solution for managing Ansys Simulation & CAD/CAE developments via a web browser. aws

  • Cadence OnCloud Platform - SaaS software platform for all your system design and simulation needs that can operate on any hardware, removing the requirement to run and maintain expensive infrastructure hardware.

  • Cloud Passport - Cloud-ready tools powered by Cadence that have been optimized for use in customers' own cloud environment. aws azure gcp

  • Managed Cloud Service - EDA-optimized platform powered by Cadence that provides a fully integrated and proven cloud environment to jump-start product design, verification, and implementation. aws azure

  • Palladium and Protium Cloud - Emulation and prototyping offering provides pre-silicon hardware system verification and debug powered by Cadence.

  • Simcenter Cloud HPC - Part of the Xcelerator as a Service(XaaS) offering powered by Siemens, offers increased flexibility and scalability for CFD simulations with no additional setup needed. aws

  • Synopsys Cloud - Platform that enables delivery of EDA tools, IP and infrastructure for end-to-end chip design through a browser. aws azure gcp

Job Scheduler

  • Altair Access - HPC Job Submission Portal for Researchers and Engineers.

  • Altair Control - HPC Administrator's Control Center for Managing, Optimizing, and Forecasting Resources with seamless cloud bursting capabilities.

  • Altair Grid Engine - Distributed Resource Management and Optimization.

  • Altair HPCWorks - High-Performance Computing (HPC) and Cloud Platform by Altair.

  • Altair NavOps - Cloud Migration, Automation, and Spend Management for HPC.

  • Altair PBS-Professional - Industry-leading Workload Manager and Job Scheduler for HPC and High-throughput Computing. aws azure gcp oci Open Telekom Cloud HUAWEI CLOUD openstack

  • IBM Spectrun LSF Suites - Workload management platform and job scheduler for HPC with dynamic HPC cloud support for all major cloud providers (Repository). aws azure gcp ibm cloud

  • Slurm on Google Cloud Platform - Open-source software solution that enables setting up Slurm clusters on Google Cloud Platform with ease. gcp

  • Slurm Power Saving Guide - Suspending and resuming nodes as needed, and supports cloud integration with providers like AWS, GCP, and Azure for workload management and cloud bursting. aws azure gcp

Resource

Recipes

  • Azure HPC - Easy automation scripts for building a HPC environment in Azure. azure

  • Cloud MPI - Collection of scripts for optimizing MPI performance in tightly coupled HPC workloads on GCP Compute Engines. gcp

  • Dynamic EC2 budget control - Dynamic EC2 cores allocation limit for each business unit (BU), automatically adapted according to a past time frame (e.g. one week) spending on AWS Parallel Cluster. aws

  • HPC Recipes for AWS - Example recipes that demonstrate how to build HPC systems using AWS ParallelCluster, Research and Engineering Studio on AWS, and other AWS products. aws

Azure CycleCloud

About

A curated list of Cloud HPC.

Topics

Resources

License

Stars

Watchers

Forks