Difference between revisions of "DeepSense Documentation"

From DeepSense Docs
Jump to: navigation, search
(DeepSense Cloud Computing Services)
 
(73 intermediate revisions by 6 users not shown)
Line 1: Line 1:
'''<span style="font-size:120%>Cluster status</span>'''
+
 
 +
'''Welcome to the DeepSense technical cloud documentation wiki'''.  This is the primary source for users with questions on the DeepSense cloud equipment and services. DeepSense uses cloud services from various cloud vendors for the development of AI projects. You'll now find all of our content on the sidebar.  Just below you can see the available cloud services, and information about any planned outages we may have. 
 +
Due to the nature of "unlimited resources" of cloud computing, DeepSense doesn't limit any cloud services that the projects need. The cloud services listed in the following tables are our currently running resources. This doesn't necessarily indicate we cannot use other cloud resources. DeepSense users are encouraged to contact us to apply for required cloud computing services. 
 +
We continuously explore the best cloud solutions to the AI projects. We don't lock our solutions in any specific cloud vendors. The tables below show the cloud services we have tested and are developing projects on. More cloud services will be coming soon.
 +
 
 +
We routinely make changes and update the content.  If you see anything missing, or have any suggestions for content, we would appreciate hearing from you.  You can send us an email at ([mailto:info@deepsense.ca info@deepsense.ca]).
 +
You can click on "Resources" on the navigation panel to find the technical details of the virtual machines and serverless computing services.
 +
 
 +
== DeepSense Cloud Computing Services ==
 +
 
 +
 
 +
'''<span style="font-size:120%>Amazon Web Services (AWS)</span>'''
 
{|class="wikitable" style="text-align: center; color: black; font-style:bold"
 
{|class="wikitable" style="text-align: center; color: black; font-style:bold"
|'''Status'''
+
|'''Availability'''
|style="width:20% | '''Planned Outage'''
+
|style="width:20% | '''Service'''
|style="width:70% | '''Notes'''
+
|style="width:70% | '''Usage'''
 +
|-
 +
|style="Color:green" | Available
 +
| S3 (Simple Storage Service)
 +
| Amazon S3 can be used to store and retrieve any amount of data. Mainly use it for long term data storage or backing up your data.
 +
|-
 +
|style="Color:green" | Available
 +
| EC2 (Elastic Compute Cloud)
 +
| Amazon EC2 can be used to create virtual machines for training models in a manually configured environment.
 +
|-
 +
|style="Color:green" | Available
 +
| SageMaker Notebook
 +
| Amazon SageMaker notebook instance is a machine learning (ML) compute instance that runs the Jupyter Notebook App.
 +
|-
 +
|style="Color:green" | Available
 +
| SageMaker Studio - AutoML
 +
| Amazon SageMaker Studio provides a single web-based visual interface where you can access purpose-built tools to perform all machine learning (ML) development steps.         
 
|-
 
|-
|style="color:orange" | Online
+
|style="Color:green" | Available
|unplanned outage Aug 16
+
| SageMaker Endpoint
|Duration unknown, likely all day. Users can still login head nodes and protocol nodes to transfer files, but cannot submit jobs.
+
| Amazon SageMaker Inference Endpoints are a powerful tool to deploy your machine learning models in the cloud and make predictions on new data.          
 
|}
 
|}
Legend:<br/>
 
<span style="color:green">Online</span>: cluster is running normally<br/>
 
<span style="color:orange">Online</span>: cluster has some problems and is partially available<br/>
 
<span style="color:red">Offline</span>: cluster is offine and users are not able to log in<br/>
 
  
== System Information ==
+
'''<span style="font-size:120%>Microsoft Azure</span>'''
* [[Resources]]
+
{|class="wikitable" style="text-align: center; color: black; font-style:bold"
* [[Available software]]
+
|'''Availability'''
 +
|style="width:20% | '''Service'''
 +
|style="width:70% | '''Usage'''
 +
|-
 +
|style="Color:green" | Available
 +
| Blob Storage
 +
| Azure Blob Storage is a store for objects capable of storing large amounts of unstructured data. Can be used for long term storage or backing up your data.
 +
|-
 +
|style="Color:green" | Available
 +
| Virtual Machine
 +
| Azure Virtual Machines are image service instances that provide on-demand and scalable computing resources for training models in a manually configured environment.
 +
|-
 +
|style="Color:green" | Available
 +
| Machine Learning Workspace
 +
| Azure Machine Learning is a cloud service for accelerating and managing the machine learning (ML) project lifecycle.         
 +
|}
  
== Guides ==
+
'''<span style="font-size:120%>Google Cloud Platform (GCP)</span>'''
* [[Getting started]]
+
{|class="wikitable" style="text-align: center; color: black; font-style:bold"
* [[Introduction to Linux]]
+
|'''Availability'''
* [[Getting started with Deep Learning]]
+
|style="width:20% | '''Service'''
* [[Storage policies]]
+
|style="width:70% | '''Usage'''
* [[Transferring Data]]
+
|-
* Running jobs
+
|style="Color:green" | Available
** [[LSF|LSF batch jobs]]
+
| Cloud Storage
** [[CWS|CWS web interface]]
+
| Cloud Storage is a service for storing your objects in Google Cloud. Mainly use it for long term storage or backing up your data.
* [[Installing local software]]
+
|-
* [[Known problems]]
+
|style="Color:green" | Available
* [[Contact information|Contacting DeepSense]]
+
| Compute Engine
 +
|
 +
Compute Engine is a customizable compute service that lets you create and run virtual machines for training models in a manually configured environment.
 +
|-
 +
|style="Color:orange" | Available Soon
 +
| Vertex AI Notebooks
 +
| Vertex AI Workbench managed notebooks instances are Google-managed end-to-end Jupyter notebook-based environment.
 +
|}
  
== Documentation ==
+
'''<span style="font-size:120%>HPC on AWS and Azure</span>'''
* [[Media:DeepSense_Computing_Platform.pdf|DeepSense Computing Platform]]
+
{|class="wikitable" style="text-align: center; color: black; font-style:bold"
 
+
|'''Availability'''
== Links ==
+
|style="width:20% | '''Service'''
* [https://deepsense.ca DeepSense home page]
+
|style="width:70% | '''Usage'''
* [https://dal.ca Dalhousie University]
+
|-
* [https://www.dal.ca/faculty/computerscience.html Faculty of Computer Science]
+
|style="Color:green" | Available
* [https://oceanfrontierinstitute.com/ Ocean Frontier Institute]
+
| AWS ParallelCluster
 +
| AWS ParallelCluster is an AWS supported open source cluster management tool that helps you to deploy and manage high performance computing (HPC) clusters in the AWS Cloud.
 +
|-
 +
|style="Color:green" | Available
 +
| Azure CycleCloud
 +
| Azure CycleCloud is designed to enable enterprise IT organizations to provide secure and flexible cloud HPC and Big Compute environments to their end users.
 +
|}

Latest revision as of 18:41, 12 August 2024

Welcome to the DeepSense technical cloud documentation wiki. This is the primary source for users with questions on the DeepSense cloud equipment and services. DeepSense uses cloud services from various cloud vendors for the development of AI projects. You'll now find all of our content on the sidebar. Just below you can see the available cloud services, and information about any planned outages we may have. Due to the nature of "unlimited resources" of cloud computing, DeepSense doesn't limit any cloud services that the projects need. The cloud services listed in the following tables are our currently running resources. This doesn't necessarily indicate we cannot use other cloud resources. DeepSense users are encouraged to contact us to apply for required cloud computing services. We continuously explore the best cloud solutions to the AI projects. We don't lock our solutions in any specific cloud vendors. The tables below show the cloud services we have tested and are developing projects on. More cloud services will be coming soon.

We routinely make changes and update the content. If you see anything missing, or have any suggestions for content, we would appreciate hearing from you. You can send us an email at (info@deepsense.ca). You can click on "Resources" on the navigation panel to find the technical details of the virtual machines and serverless computing services.

DeepSense Cloud Computing Services

Amazon Web Services (AWS)

Availability Service Usage
Available S3 (Simple Storage Service) Amazon S3 can be used to store and retrieve any amount of data. Mainly use it for long term data storage or backing up your data.
Available EC2 (Elastic Compute Cloud) Amazon EC2 can be used to create virtual machines for training models in a manually configured environment.
Available SageMaker Notebook Amazon SageMaker notebook instance is a machine learning (ML) compute instance that runs the Jupyter Notebook App.
Available SageMaker Studio - AutoML Amazon SageMaker Studio provides a single web-based visual interface where you can access purpose-built tools to perform all machine learning (ML) development steps.
Available SageMaker Endpoint Amazon SageMaker Inference Endpoints are a powerful tool to deploy your machine learning models in the cloud and make predictions on new data.

Microsoft Azure

Availability Service Usage
Available Blob Storage Azure Blob Storage is a store for objects capable of storing large amounts of unstructured data. Can be used for long term storage or backing up your data.
Available Virtual Machine Azure Virtual Machines are image service instances that provide on-demand and scalable computing resources for training models in a manually configured environment.
Available Machine Learning Workspace Azure Machine Learning is a cloud service for accelerating and managing the machine learning (ML) project lifecycle.

Google Cloud Platform (GCP)

Availability Service Usage
Available Cloud Storage Cloud Storage is a service for storing your objects in Google Cloud. Mainly use it for long term storage or backing up your data.
Available Compute Engine

Compute Engine is a customizable compute service that lets you create and run virtual machines for training models in a manually configured environment.

Available Soon Vertex AI Notebooks Vertex AI Workbench managed notebooks instances are Google-managed end-to-end Jupyter notebook-based environment.

HPC on AWS and Azure

Availability Service Usage
Available AWS ParallelCluster AWS ParallelCluster is an AWS supported open source cluster management tool that helps you to deploy and manage high performance computing (HPC) clusters in the AWS Cloud.
Available Azure CycleCloud Azure CycleCloud is designed to enable enterprise IT organizations to provide secure and flexible cloud HPC and Big Compute environments to their end users.