Difference between revisions of "Getting started"

From DeepSense Docs
Jump to: navigation, search
(Created page with " Getting Started with DeepSense <div class="noautonum"> == 1. Request access to DeepSense == If you belong to an approved DeepSense project but do not yet have an account...")
 
(1.1 VPN)
 
(12 intermediate revisions by 3 users not shown)
Line 3: Line 3:
 
<div class="noautonum">
 
<div class="noautonum">
  
== 1. Request access to DeepSense ==
+
== Note ==
  
If you belong to an approved DeepSense project but do not yet have an account then send an email to support@deepsense.ca with the subject "DeepSense Account Request" and provide your:
+
On June 26 we will update the GPU compute nodes to a new version of IBM Watson Machine Learning Accelerator. This will change the way you access deep learning packages like Tensorflow and Pytorch. Instead of "activating" these packages, you will be able to install new versions directly in your anaconda environment.
  a) First and last name
 
  b) Faculty of Computer Science username or requested FCS username
 
  c) Dalhousie BannerID
 
  d) Project ID
 
  e) Project leader
 
  f) reason for requesting the account.
 
  
== 2. Change your password ==
+
We are actively updating the wiki documentation to explain the new method of accessing deep learning packages. Please bear with us during these updates as some documentation may still refer to the old method of "activating" deep learning packages
  
If you require a new FCS username then your initial password is your BannerID. Please change it immediately upon receiving access to DeepSense.
+
== 1. Logging on ==
  
You can change your password at https://www.cs.dal.ca/csid
+
DeepSense has two login nodes, login1.deepsense.ca and login2.deepsense.ca . You can access these through SSH with your username and password from any computer on campus.
  
Alternatively, you can contact cshelp@cs.dal.ca to reset your password.
+
For example, if your userid is <code>user1</code>, you can connect to deepsense by typing <code>ssh user1@login1.deepsense.ca</code> just like logging on to any other network computer.
  
== 3. Logging on ==
+
'''Note''': The login nodes are intended for testing and compiling code. Please don’t run long or intensive computation on these nodes. Keep reading for instructions on how to submit compute jobs to dedicated compute nodes.
  
DeepSense has two login nodes, login1.deepsense.ca and login2.deepsense.ca . You can access these through SSH with your username and password from any computer on campus.
+
== 1.1 VPN ==
 +
 
 +
To connect to the DeepSense platform from outside of the Dalhousie Campus, you'll need to use a VPN.
 +
If you are are student, staff or faculty, you can use the Dalhousie VPN (https://wireless.dal.ca/vpnsoftware.php).
 +
 
 +
If you are not a Dalhousie staff, student, or faculty but require offsite access and cannot use the Dalhousie VPN then contact your project leader or ([mailto:support@deepsense.ca support@deepsense.ca]) to make different arrangements.
  
From off campus you’ll need to use the Dalhousie VPN (https://wireless.dal.ca/vpnsoftware.php). If you are not a Dalhousie staff, student, or faculty but require offsite access and cannot use the Dalhousie VPN then contact your project leader or support@deepsense.ca to make different arrangements.
+
For more info, see [[VPN Setup]].
  
The login nodes are intended for testing and compiling code. Please don’t run long or intensive computation on these nodes.
+
==  2. Transfer data ==
  
==  4. Transfer data ==
+
For more information, see [[Transferring Data]].
  
Deepsense has two protocol nodes, protocol1.deepsense.ca and protocol2.deepsense.ca . You can connect to these using the SAMBA transfer protocol, e.g. smb://protocol1.deepsense.ca with your username and password. Please contact your project leader or support@deepsense.ca if you need help transferring large amounts of data.
+
Deepsense has two protocol nodes, protocol1.deepsense.ca and protocol2.deepsense.ca . You can connect to these using the SAMBA transfer protocol, e.g. smb://protocol1.deepsense.ca with your username and password. Please contact your project leader or support@deepsense.ca if you need help transferring large amounts of data.
  
 
Data transferred through the protocol nodes will be located in the shared /data directory .
 
Data transferred through the protocol nodes will be located in the shared /data directory .
Line 37: Line 36:
 
See [[Storage policies]] for more information about the available shared file systems, storage policies, and backup policies.
 
See [[Storage policies]] for more information about the available shared file systems, storage policies, and backup policies.
  
== 5. Configure your environment ==
+
== 3. Configure your environment ==
  
 
DeepSense compute and management nodes are IBM Power8 computers (ppc64le) running Redhat Enterprise Linux. See [[Resources]] for more details on the available nodes.
 
DeepSense compute and management nodes are IBM Power8 computers (ppc64le) running Redhat Enterprise Linux. See [[Resources]] for more details on the available nodes.
  
=== 5.1 Missing /run directory workaround ===
+
=== 3.1 Loading a python environment ===
 +
 
 +
You have two options for using python on DeepSense. You can use the systemwide python install, managed by DeepSense administrators. This is recommended for users new to Linux. You will need to contact DeepSense support to have additional software packages installed in the systemwide python.
  
The default login shell is BASH.  Make sure the following parameter is in your .bashrc file in your home directory, as it prevents a problem where some types of jobs fail when run through the LSF queue. This should be done automatically the first time you log onto DeepSense.  
+
Alternatively, you can install an Anaconda python environment or other software in your home directory. This allows you to install or update packages or software without requesting and waiting for DeepSense staff.  
 
<code>echo 'unset XDG_RUNTIME_DIR' >> ~/.bashrc</code>
 
  
=== 5.2 Loading a python environment ===
+
==== Systemwide python (managed by DeepSense) ====
  
 
DeepSense nodes have anaconda2 python installed in /opt/anaconda2. To use this systemwide python add a parameter to your .bashrc file in your home directory:
 
DeepSense nodes have anaconda2 python installed in /opt/anaconda2. To use this systemwide python add a parameter to your .bashrc file in your home directory:
Line 63: Line 62:
 
You can add either line to your .bashrc file to automatically load the desired environment when you log in.
 
You can add either line to your .bashrc file to automatically load the desired environment when you log in.
  
Alternatively, you may wish to install Anaconda or other software locally in your home directory. This allows you to install or update packages or software without requesting and waiting for DeepSense staff. See [[Installing local software]] for more information.
+
==== Local python install (managed by individual user) ====
 +
 
 +
See [[Installing local software]] for more information.
  
== 6. Running compute jobs ==
+
== 4. Running compute jobs ==
  
 
DeepSense has two different methods of submitting compute jobs.
 
DeepSense has two different methods of submitting compute jobs.
  
=== 6.1 Load Sharing Facility (LSF) ===
+
=== 4.1 Load Sharing Facility (LSF) ===
  
 
LSF is a set of command line tools for submitting compute jobs. You may be familiar with other similar software such as Sun Grid Engine or SLURM.
 
LSF is a set of command line tools for submitting compute jobs. You may be familiar with other similar software such as Sun Grid Engine or SLURM.
Line 81: Line 82:
 
For more information about using LSF see [[LSF]].
 
For more information about using LSF see [[LSF]].
  
=== 6.2 Conductor with Spark (CWS) ===
+
=== 4.2 Conductor with Spark (CWS) ===
  
 
CWS is an IBM web-based graphical interface for creating and running Apache Spark compute jobs.
 
CWS is an IBM web-based graphical interface for creating and running Apache Spark compute jobs.
Line 89: Line 90:
 
Note that currently you need to accept a self-signed web certificate. In the future this will be fixed.
 
Note that currently you need to accept a self-signed web certificate. In the future this will be fixed.
  
For more information about using CWS see [[Conductor with Spark]].
+
For more information about using CWS see [[CWS]].
  
== 7. Deep Learning packages and other available software ==
+
== 5. Deep Learning packages and other available software ==
  
DeepSense has a variety of Deep Learning packages installed as part of IBM PowerAI including Tensorflow, Caffe, and PyTorch. These packages are installed in /opt/DL/ on each compute node and typically need to be activated before using them, e.g. <code>source /opt/DL/tensorflow/bin/tensorflow-activate</code>.
+
DeepSense has a variety of Deep Learning packages available as part of IBM Watson Machine Learning Accelerator including Tensorflow, Caffe, and PyTorch. These packages can be installed from the anaconda repository https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/
  
Deep Learning packages are typically used on the GPU nodes but some deep learning packages can also be used on the login nodes and CPU-only nodes. This can be useful for testing your code or running CPU-bound workloads. To use the deep learning packages on the login or compute nodes you will also need to load the GPU libraries with <code>source /opt/DL/cudnn/bin/cudnn-activate</code>. Note that some deep learning packages may fail if run without a GPU, e.g. Caffe currently requires a GPU.
+
These packages were formerly installed in /opt/DL/ on each compute node and used to need to be activated before using them, e.g. <code>source /opt/DL/tensorflow/bin/tensorflow-activate</code>.
 +
 
 +
Deep Learning packages are typically used on the GPU nodes but some deep learning packages can also be used on the login nodes and CPU-only nodes. This can be useful for testing your code or running CPU-bound workloads. Note that some deep learning packages may fail if run without a GPU, e.g. Caffe currently requires a GPU.
  
 
For a brief tutorial including running Caffe and Tensorflow in a Jupyter notebook see [[Getting started with Deep Learning]].
 
For a brief tutorial including running Caffe and Tensorflow in a Jupyter notebook see [[Getting started with Deep Learning]].
Line 101: Line 104:
 
See [[Available software]] for the current list of installed software. If you require additional software you are welcome to install it locally in your home directory or contact DeepSense support.
 
See [[Available software]] for the current list of installed software. If you require additional software you are welcome to install it locally in your home directory or contact DeepSense support.
  
== 8. Technical and research support ==  
+
== 6. Technical and research support ==  
  
 
DeepSense has a dedicated support team of research scientists ready to help you with technical questions, installing software, or even research questions.
 
DeepSense has a dedicated support team of research scientists ready to help you with technical questions, installing software, or even research questions.
Line 108: Line 111:
  
 
See [[Technical support]] for more information about the support available.
 
See [[Technical support]] for more information about the support available.
 +
 +
 +
  
 
</div> <!-- autonum -->
 
</div> <!-- autonum -->

Latest revision as of 17:50, 1 December 2020

Getting Started with DeepSense 

Note

On June 26 we will update the GPU compute nodes to a new version of IBM Watson Machine Learning Accelerator. This will change the way you access deep learning packages like Tensorflow and Pytorch. Instead of "activating" these packages, you will be able to install new versions directly in your anaconda environment.

We are actively updating the wiki documentation to explain the new method of accessing deep learning packages. Please bear with us during these updates as some documentation may still refer to the old method of "activating" deep learning packages

1. Logging on

DeepSense has two login nodes, login1.deepsense.ca and login2.deepsense.ca . You can access these through SSH with your username and password from any computer on campus.

For example, if your userid is user1, you can connect to deepsense by typing ssh user1@login1.deepsense.ca just like logging on to any other network computer.

Note: The login nodes are intended for testing and compiling code. Please don’t run long or intensive computation on these nodes. Keep reading for instructions on how to submit compute jobs to dedicated compute nodes.

1.1 VPN

To connect to the DeepSense platform from outside of the Dalhousie Campus, you'll need to use a VPN. If you are are student, staff or faculty, you can use the Dalhousie VPN (https://wireless.dal.ca/vpnsoftware.php).

If you are not a Dalhousie staff, student, or faculty but require offsite access and cannot use the Dalhousie VPN then contact your project leader or (support@deepsense.ca) to make different arrangements.

For more info, see VPN Setup.

2. Transfer data

For more information, see Transferring Data.

Deepsense has two protocol nodes, protocol1.deepsense.ca and protocol2.deepsense.ca . You can connect to these using the SAMBA transfer protocol, e.g. smb://protocol1.deepsense.ca with your username and password. Please contact your project leader or support@deepsense.ca if you need help transferring large amounts of data.

Data transferred through the protocol nodes will be located in the shared /data directory .

See Storage policies for more information about the available shared file systems, storage policies, and backup policies.

3. Configure your environment

DeepSense compute and management nodes are IBM Power8 computers (ppc64le) running Redhat Enterprise Linux. See Resources for more details on the available nodes.

3.1 Loading a python environment

You have two options for using python on DeepSense. You can use the systemwide python install, managed by DeepSense administrators. This is recommended for users new to Linux. You will need to contact DeepSense support to have additional software packages installed in the systemwide python.

Alternatively, you can install an Anaconda python environment or other software in your home directory. This allows you to install or update packages or software without requesting and waiting for DeepSense staff.

Systemwide python (managed by DeepSense)

DeepSense nodes have anaconda2 python installed in /opt/anaconda2. To use this systemwide python add a parameter to your .bashrc file in your home directory:

echo ". /opt/anaconda2/etc/profile.d/conda.sh" >> ~/.bashrc

Then source your .bashrc file: source ~/.bashrc

To load the python2 environment run conda activate

To use python3 you can activate the py36 environment: conda activate py36

You can add either line to your .bashrc file to automatically load the desired environment when you log in.

Local python install (managed by individual user)

See Installing local software for more information.

4. Running compute jobs

DeepSense has two different methods of submitting compute jobs.

4.1 Load Sharing Facility (LSF)

LSF is a set of command line tools for submitting compute jobs. You may be familiar with other similar software such as Sun Grid Engine or SLURM.

LSF jobs are submitted using the bsub command.

You can examine the progress of your currently running jobs with the bjobs command.

You can examine the available compute nodes and their available resources with the bhosts command.

For more information about using LSF see LSF.

4.2 Conductor with Spark (CWS)

CWS is an IBM web-based graphical interface for creating and running Apache Spark compute jobs.

To use CWS, connect to the IBM Spectrum Computing Cluster Management Console at https://ds-mgm-02.deepsense.cs.dal.ca:8443. Log in with your username and password.

Note that currently you need to accept a self-signed web certificate. In the future this will be fixed.

For more information about using CWS see CWS.

5. Deep Learning packages and other available software

DeepSense has a variety of Deep Learning packages available as part of IBM Watson Machine Learning Accelerator including Tensorflow, Caffe, and PyTorch. These packages can be installed from the anaconda repository https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/

These packages were formerly installed in /opt/DL/ on each compute node and used to need to be activated before using them, e.g. source /opt/DL/tensorflow/bin/tensorflow-activate.

Deep Learning packages are typically used on the GPU nodes but some deep learning packages can also be used on the login nodes and CPU-only nodes. This can be useful for testing your code or running CPU-bound workloads. Note that some deep learning packages may fail if run without a GPU, e.g. Caffe currently requires a GPU.

For a brief tutorial including running Caffe and Tensorflow in a Jupyter notebook see Getting started with Deep Learning.

See Available software for the current list of installed software. If you require additional software you are welcome to install it locally in your home directory or contact DeepSense support.

6. Technical and research support

DeepSense has a dedicated support team of research scientists ready to help you with technical questions, installing software, or even research questions.

If you can't find the answer to your question on this wiki or need more extensive help then send an email to support@deepsense.ca .

See Technical support for more information about the support available.