LSF
IBM Spectrum LSF is the command line job submission system for submitting batch and interactive jobs on DeepSense computing hardware.
Contents
Test code and short computation
DeepSense has two login nodes, login1.deepsense.ca and login2.deepsense.ca . You can access these through SSH with your username and password from any computer on campus. From off campus you’ll need to use the Dalhousie VPN.
The login nodes are intended for testing and compiling code. Please don’t run long or intensive computation on these nodes.
Job Submission
When you have a small example working with your code and are ready to run a real workload, use the LSF queue to submit your jobs to the cluster (https://www.ibm.com/support/knowledgecenter/SSWRJV_10.1.0/lsf_users_guide/batch_jobs_about.html). If you’ve used other queuing systems like slurm or Sun Grid Engine before then LSF will seem very familiar.
To submit a job you use the bsub
command (https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_command_ref/bsub.man_top.1.html).
For example, to submit a shared memory job using 20 processors and 256GB of memory you would run:
bsub -oo <output_file> -n 20 -M 256000 -R “span[hosts=1]” <executable> [options]
For openMP jobs, please make sure that you use OMP_NUM_THREADS
to limit the number of threads your program uses and that you set this variable in your code that will run on the server. LSF sets a variable $LSB_DJOB_NUMPROC
that you can use if you don’t want to hardcode OMP_NUM_THREADS
or set it with your own variable.
CPU Limit
The number of requested processors is specified with the option -n
.
The resource request -R "span[hosts=1]"
requires that all processors are on the same compute host, i.e. a shared memory job.
LSF can also be used to run compute jobs across multiple hosts such as MPI jobs. Examples will be included here at a later date.
Memory Limit
The memory limit -M
is specified in MB by default. You can also specify units, e.g. -M 256GB
.
GPU Computation
To request access to a GPU use the -gpu -
options.
Note the trailing dash, which specifies the default GPU arguments. The following options can be used in place of that dash.
The default GPU arguments are "num=1:mode=shared:mps=no:j_exclusive=no"
num=num_gpus
is the number of requested GPUs on each host.
mode=shared | exclusive_process
specifies the GPU mode.
mps=yes | no
use the Nvidia Multi-Process Server (MPS). MPS enables better sharing of GPU resources. If mode=exclusive_process
then mps should be set to yes.
j_exclusive=yes | no
Is the GPU exclusive to this job and prevented from being used by other jobs?
By default the -gpu -
option will request one nonexclusive GPU. Please limit your usage of GPU resources to a reasonable number of concurrently used GPUs and use shared GPUs when possible. We may enact limits on GPU use in the feature if necessary.
See the bsub.gpu documentation for more information on submitting GPU jobs.
Input and Output files
If you do not specify an output file with -o
(append) or -oo
(overwrite) then the output will be lost. Note that LSF will prepend submission information to this file. You can use typical linux options like > output_file2
in which case the file specified with -oo
will just contain any errors and submission information.
You can specify an input file with the -i
option or the typical linux option < <input_file>
Advanced Job Submission
Array Jobs
To run the same program multiple time with different input and output files you can use LSF Array Jobs.
An example command in the LSF documentation is given as:
bsub -J "myArray[1-1000]" -i "input.%I" -o "output.%I" myJob
This command uses only one line to submit 1000 jobs running the script myJob with the input file input.1, input.2, ... input.1000
with the output of each job placed in the files output.1, output.2, ... output.1000
Complicated Jobs
To run the same program with multiple files, possibly with different options, you can create a job submission script that iterates over the files and submits the jobs.
For example, suppose you have programA
and want to process input.1, input.2, ... input.N
with output in output.1, output.2, ... output.N
, as in the array example.
Create a bash script do_submit_programA.bash
that looks something like:
n=<N> arguments=<nodes, memory, time constraints, etc> for ((i=1; i<=$n; i++)); do bsub -oo log.$i $arguments programA < input.$i > output.$i done
Note that everything in triangle braces here is not real code. For example N
might be read from a command line argument or hardcoded as say 10. The arguments will be something like -n 1 -M 100MB
and any other desired options. You can run multiple types of jobs with complex arguments.
You may wish to create separate directories for the log files, input files, and output files if there are more than a handful of jobs.
If each job requires nontrivial processing (e.g. changing into different directories for each job) then you may want to create a second script that generates the jobfiles and then use a similar kind of submit script.
Job Information
Running Jobs
To examine currently running jobs you use the bjobs
command (https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_command_ref/bjobs.man_top.1.html)
bjobs -l
or bjobs -l <jobid>
shows additional job information including job status and resource usage.
Past Jobs
To examine current and past jobs use the bhist
command (https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_command_ref/bhist.1.html).
The following options will show jobs with the specified status:
-a all -d finished -e exited -p pending -r running -s suspended
You can use options like -S start_time,end_time
and -C start_time,end_time
to find jobs that were submitted or completed between the specified time intervals. These options require using the -a
option.
As with bjobs, you can use the -l
option for additional information and can also specify a specific known jobid as the last command argument.
Available Hosts
To see the available hosts and how busy they are you use the bhosts
command (https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_command_ref/bhosts.1.html)
LSF Command Reference
The complete list of LSF commands with description is available here.