Difference between revisions of "Using HPC on AWS and Azure"

From DeepSense Docs
Jump to: navigation, search
(Add benefits and restructure)
(Add benefits and restructure)
 
(No difference)

Latest revision as of 21:22, 24 April 2024

To empower demanding workloads and specialized use cases with tailored compute resources, we leverage both managed HPC platforms on AWS and Azure as well as customizable virtual machines. This approach offers the best of both worlds.

What is HPC?

High-Performance Computing (HPC) harnesses the collective power of multiple connected computers (called a cluster) to tackle complex problems far beyond the capabilities of a single machine. HPC is essential for Large-scale simulations such as modeling weather patterns, fluid dynamics, or complex engineering systems. For massive data analysis such as finding patterns in vast datasets, accelerating scientific discovery, or driving AI-powered insights. And also training complex AI/ML models such as developing models that require immense computational resources for tasks like natural language processing or computer vision.

Benefits of HPC Platforms (AWS ParallelCluster, Azure CycleCloud)

  • Scalability: Easily scale your compute power up or down based on workload requirements.
  • Specialized Hardware: Access high-performance CPU and GPU instances, often with advanced networking, specifically for computationally intensive tasks.
  • Workflow Management: Utilize queuing systems to optimize job scheduling and resource usage.
  • AI/ML Optimization: Configure clusters with popular Deep Learning frameworks (TensorFlow, PyTorch, etc.) pre-installed and optimized.

DeepSense HPC

Tools for efficient HPC are AWS ParallelCluster and Azure CycleCloud to streamline the management and deployment of HPC solutions.

AWS ParallelCluster

  • Simplified HPC Setup: AWS ParallelCluster handles much of the complexity involved in setting up and managing an HPC cluster on AWS.
  • Flexible Configurations: Choose from a variety of EC2 instance types (CPU, GPU, high-memory, etc.), network options, and storage solutions.
  • Integration: Easily connect your cluster with other AWS services for data processing, storage, and visualization.

Azure CycleCloud

  • Orchestration & Automation: Azure CycleCloud simplifies the creation and management of HPC clusters on Azure, automating scaling and provisioning.
  • Scheduler Support: Select your preferred HPC scheduler (Slurm, PBS Pro, GridEngine, etc.) for efficient job management.
  • Azure Ecosystem: Seamlessly integrate with other Azure services like Azure Blob Storage and Azure Kubernetes Service.

Accessing Your HPC Cluster

AWS & Azure Administrator Setup

  • Account Creation: Your dedicated AWS or Azure administrator will provision the necessary accounts and permissions for our team to set up your HPC environment.
  • Access & MFA: You'll receive secure credentials and instructions for setting up Multi-Factor Authentication (MFA) to enhance security.

Getting Started with HPC

  • Cluster Launch: Our team will configure and launch your cluster using either AWS ParallelCluster or Azure CycleCloud.
  • Connecting: We'll provide detailed instructions on how to connect to your cluster's head node via secure methods like SSH.
  • Job Submission: You'll utilize your chosen scheduler's interface (e.g., Slurm commands) to submit your computational jobs and scripts to the cluster's queue.

PyJobHPC.png

  • Monitoring: Tools within AWS ParallelCluster or Azure CycleCloud dashboards will allow you to track job progress, resource usage, and costs.

Remember: It's crucial to stop or scale down your cluster when not actively in use to avoid unnecessary charges.

Our team has extensive experience with HPC on AWS and Azure. Contact us to explore how these powerful platforms can accelerate your computational workloads!