Difference between revisions of "LSF"

Latest revision as of 18:13, 10 June 2022

IBM Spectrum LSF is the command line job submission system for submitting batch and interactive jobs on DeepSense computing hardware.

Test code and short computation

DeepSense has two login nodes, login1.deepsense.ca and login2.deepsense.ca . You can access these through SSH with your username and password from any computer on campus. From off campus you’ll need to use the Dalhousie VPN.

The login nodes are intended for testing and compiling code. Please don’t run long or intensive computation on these nodes.

Job Submission

When you have a small example working with your code and are ready to run a real workload, use the LSF queue to submit your jobs to the cluster (https://www.ibm.com/support/knowledgecenter/SSWRJV_10.1.0/lsf_users_guide/batch_jobs_about.html). If you’ve used other queuing systems like slurm or Sun Grid Engine before then LSF will seem very familiar.

To submit a job you use the bsub command (https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_command_ref/bsub.man_top.1.html).

For example, to submit a shared memory job using 20 processors and 256GB of memory for at most 24 hours you would run:

bsub -oo <output_file> -n 20 -M 256000 -W 24:0 -R "span[hosts=1] rusage[mem=256000]" <executable> [options]

For openMP jobs, please make sure that you use OMP_NUM_THREADS to limit the number of threads your program uses and that you set this variable in your code that will run on the server. LSF sets a variable $LSB_DJOB_NUMPROC that you can use if you don’t want to hardcode OMP_NUM_THREADS or set it with your own variable.

Shell Scripts for Batch Jobs

Users can just run a single command line to submit batch jobs. The job scheduler would take care of everything and users only need to check their output and/or errors. Users do not need to keep themselves logged in the systems when the jobs are running. An example job submission command is shown above.
However, if you will have to run your script in an environment that is not set as the default in your .bashrc file, you can write a simple shell script to set the environments. For example, you may want to use a specific Conda environment and/or Python version for your Python script, you would need to write a shell script to set the environments. Here is an example. I have a Python script whose name is "myPython.py" and it would need to use my anaconda3 and py36_tensorflow environments. I would create a shell script, say with name "myShellScript.sh", with the following contents:

#!/bin/bash
source ~/anaconda3/etc/profile.d/conda.sh
conda activate py36_tensorflow
python myPython.py

Then, save your edit and run the following command to make your shell script executable:

chmod +x myShellScript.sh

Then, submit your job:

bsub -gpu - /path/to/myShellScript.sh

Check if your job is submitted successfully.

Check Job Progress of Batch Jobs

When your batch jobs are not finished, you can check the progress of your jobs using command 'bpeek'. This command provides detailed information about your running jobs and it is very helpful for you to monitor and decide if you would need to terminate it for whatever reasons to save time.
You can always run 'bpeek jobid' to check the progress. You can also use 'man bpeek' to get more detailed usage of the command.
Be aware that this command doesn't work for interactive jobs.

CPU Limit

The number of requested processors is specified with the option -n.

The resource request -R "span[hosts=1]" requires that all processors are on the same compute host, i.e. a shared memory job.

LSF can also be used to run compute jobs across multiple hosts such as MPI jobs. Examples will be included here at a later date.

Memory Limit

LSF has two different types of memory limits. The scheduler memory limit -R "rusage[mem=<memlimit>]" requests <memlimit> amount of memory. Your job will not start until a compute node is available with that amount of memory. You are guaranteed to have this amount of memory available. If you exceed the requested amount then your job may be killed but it will only be killed if other jobs need that memory.

The job memory limit -R "rusage[mem=<memlimit>] will kill your job if it exceeds the given memory limit. Note that this option does not guarantee that you will have that amount of memory available.

The memory limits are specified in MB by default. You can also specify units, e.g. -M 256GB and -R "rusage[mem=256GB]".

If you are using more than a few GB of memory than you must specify the -R "rusage[mem=<memlimit>]" option or your job may be terminated. You may additionally want to use the -M <memlimit> option to be sure you aren't using more memory than intended.

Time Limit

The runtime limit -W hours:minutes specifies the maximum length of time your job is allowed to run. For example -W 24:0 requests 24 hours of running time. Your job will be terminated when the runtime limit is exceeded.

If you do not specify a runtime limit then the default runtime limit of 168 hours (7 days) will be used. The maximum possible runtime limit is currently 30 days and may vary by queue in the future.

If there is a scheduled maintenance window announced then any job with a run time limit that could extend into the maintenance period will be listed as pending and will not run until the maintenance has concluded. Use a shorter run time limit that ends before the maintenance period to avoid this.

GPU Computation

To request access to a GPU use the -gpu - options.

Note the trailing dash, which specifies the default GPU arguments. The following options can be used in place of that dash.

The default GPU arguments are "num=1:mode=exclusive_process:mps=yes:j_exclusive=yes" num=num_gpus is the number of requested GPUs on each host.

mode=shared | exclusive_process specifies the GPU mode. Yours jobs will be running on exclusive mode by default which means that no other jobs would share the gpus your jobs are using.

mps=yes | no use the Nvidia Multi-Process Server (MPS). MPS enables better sharing of GPU resources.

j_exclusive=yes | no Is the GPU exclusive to this job and prevented from being used by other jobs? By default, it is 'yes' for your jobs.

By default the -gpu - option will request one exclusive GPU. Please limit your usage of GPU resources to a reasonable number of concurrently used GPUs. We may enact limits on GPU use in the feature if necessary.

See the bsub.gpu documentation for more information on submitting GPU jobs.

Input and Output files

If you do not specify an output file with -o (append) or -oo (overwrite) then the output will be lost. Note that LSF will prepend submission information to this file. You can use typical linux options like > output_file2 in which case the file specified with -oo will just contain any errors and submission information.

You can specify an input file with the -i option or the typical linux option < <input_file>

Note that output may not be written to the specified file immediately. You can use the bpeek <jobid> command to view the output of a currently running job.

Advanced Job Submission

Array Jobs

To run the same program multiple time with different input and output files you can use LSF Array Jobs.

An example command in the LSF documentation is given as:

bsub -J "myArray[1-1000]" -i "input.%I" -o "output.%I" myJob

This command uses only one line to submit 1000 jobs running the script myJob with the input file input.1, input.2, ... input.1000 with the output of each job placed in the files output.1, output.2, ... output.1000

Complicated Jobs

To run the same program with multiple files, possibly with different options, you can create a job submission script that iterates over the files and submits the jobs.

For example, suppose you have programA and want to process input.1, input.2, ... input.N with output in output.1, output.2, ... output.N, as in the array example.

Create a bash script do_submit_programA.bash that looks something like:

n=<N>
arguments=<nodes, memory, time constraints, etc> 
for ((i=1; i<=$n; i++)); do
   bsub -oo log.$i $arguments programA < input.$i > output.$i
done

Note that everything in triangle braces here is not real code. For example N might be read from a command line argument or hardcoded as say 10. The arguments will be something like -n 1 -M 100MB -R "rusage[mem=100MB]" and any other desired options. You can run multiple types of jobs with complex arguments.

You may wish to create separate directories for the log files, input files, and output files if there are more than a handful of jobs.

If each job requires nontrivial processing (e.g. changing into different directories for each job) then you may want to create a second script that generates the jobfiles and then use a similar kind of submit script.

Interactive Jobs

Some jobs may require user input such as testing code on a gpu system or an interactive analytics program.

bsub -I requests an interactive job that will print its output to your terminal.

bsub -Ip requests an interactive job with a pseudo terminal. For example, this can be used to schedule a console program that takes user input and output.

bsub -Is requests an interactive job with a shell. This can be used to test code on one of the gpu nodes or for more resource intensive development than is allowed on the login nodes.

Note that interactive jobs are still subject to time and memory constraints as typical batch jobs. Please be careful not to interfere with other jobs running on a node and that your interactive job does not attempt to use more resources than you have requested. Please do not leave interactive jobs running for long periods and do not leave interactive jobs idle when you are not using them.

We do not currently treat interactive jobs different than any other jobs. As DeepSense becomes more heavily utilized we may need to limit the number of interactive jobs run by a user, project, or on a given node. We may need to limit the time or other resources used by interactive jobs.

Job Information

Running Jobs

To examine currently running jobs you use the bjobs command (https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_command_ref/bjobs.man_top.1.html)

bjobs -l or bjobs -l <jobid> shows additional job information including job status and resource usage.

Past Jobs

To examine current and past jobs use the bhist command (https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_command_ref/bhist.1.html).

The following options will show jobs with the specified status:

-a all
-d finished
-e exited
-p pending
-r running
-s suspended

You can use options like -S start_time,end_time and -C start_time,end_time to find jobs that were submitted or completed between the specified time intervals. These options require using the -a option.

As with bjobs, you can use the -l option for additional information and can also specify a specific known jobid as the last command argument.

Available Hosts

To see the available hosts and how busy they are you use the bhosts command (https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_command_ref/bhosts.1.html)

LSF Command Reference

The complete list of LSF commands with description is available here.

@@ Line 16: / Line 16: @@
 For openMP jobs, please make sure that you use <code>OMP_NUM_THREADS</code> to limit the number of threads your program uses and that you set this variable in your code that will run on the server. LSF sets a variable <code>$LSB_DJOB_NUMPROC</code> that you can use if you don’t want to hardcode <code>OMP_NUM_THREADS</code> or set it with your own variable.
+===Shell Scripts for Batch Jobs===
+Users can just run a single command line to submit batch jobs. The job scheduler would take care of everything and users only need to check their output and/or errors. Users do not need to keep themselves logged in the systems when the jobs are running. An example job submission command is shown above.</br>
+However, if you will have to run your script in an environment that is not set as the default in your .bashrc file, you can write a simple shell script to set the environments. For example, you may want to use a specific Conda environment and/or Python version for your Python script, you would need to write a shell script to set the environments. Here is an example. I have a Python script whose name is "myPython.py" and it would need to use my anaconda3 and py36_tensorflow environments. I would create a shell script, say with name "myShellScript.sh", with the following contents:</br>
+ #!/bin/bash
+ source ~/anaconda3/etc/profile.d/conda.sh
+ conda activate py36_tensorflow
+ python myPython.py
+Then, save your edit and run the following command to make your shell script executable:
+ chmod +x myShellScript.sh
+Then, submit your job:
+ bsub -gpu - /path/to/myShellScript.sh
+Check if your job is submitted successfully.
+===Check Job Progress of Batch Jobs===
+When your batch jobs are not finished, you can check the progress of your jobs using command 'bpeek'. This command provides detailed information about your running jobs and it is very helpful for you to monitor and decide if you would need to terminate it for whatever reasons to save time.</br>
+You can always run 'bpeek jobid' to check the progress. You can also use 'man bpeek' to get more detailed usage of the command.</br>
+Be aware that this command doesn't work for interactive jobs.
 === CPU Limit ===
@@ Line 50: / Line 69: @@
 Note the trailing dash, which specifies the default GPU arguments. The following options can be used in place of that dash.
-The default GPU arguments are <code>"num=1:mode=shared:mps=no:j_exclusive=no"</code>
+The default GPU arguments are <code>"num=1:mode=exclusive_process:mps=yes:j_exclusive=yes"</code>
 <code>num=num_gpus</code> is the number of requested GPUs on each host.
-<code>mode=shared | exclusive_process</code> specifies the GPU mode.
+<code>mode=shared | exclusive_process</code> specifies the GPU mode. Yours jobs will be running on exclusive mode by default which means that no other jobs would share the gpus your jobs are using.
-<code>mps=yes | no</code> use the Nvidia Multi-Process Server (MPS). MPS enables better sharing of GPU resources. If <code>mode=exclusive_process</code> then mps should be set to yes.
+<code>mps=yes | no</code> use the Nvidia Multi-Process Server (MPS). MPS enables better sharing of GPU resources.
-<code>j_exclusive=yes | no</code> Is the GPU exclusive to this job and prevented from being used by other jobs?
+<code>j_exclusive=yes | no</code> Is the GPU exclusive to this job and prevented from being used by other jobs? By default, it is 'yes' for your jobs.
-By default the <code>-gpu -</code> option will request one nonexclusive GPU. Please limit your usage of GPU resources to a reasonable number of concurrently used GPUs and use shared GPUs when possible. We may enact limits on GPU use in the feature if necessary.
+By default the <code>-gpu -</code> option will request one exclusive GPU. Please limit your usage of GPU resources to a reasonable number of concurrently used GPUs. We may enact limits on GPU use in the feature if necessary.
 See the [https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_command_ref/bsub.gpu.1.html bsub.gpu] documentation for more information on submitting GPU jobs.
@@ Line 145: / Line 163: @@
 == LSF Command Reference ==
-The complete list of LSF commands with description is available [https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_kc_cmd_ref.html here].
+The complete list of LSF commands with description is available [https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=reference-command here].