LSF Jobs

From DeepSense Docs
Jump to: navigation, search

What to do when you get "Out of Memory" errors

Users may see "out of memory" errors when they run their jobs. In most cases, this indicates that users try to use more memory than DeepSense systems can provide. However, users may try to run too many scripts on a single machine that use up all the available memory. This can be easily avoided by submitting new LSF jobs.
Here's a scenario why this error occurs. For example, a user submits an interactive LSF job and is assigned a session on a compute node. A user running interactive LSF jobs would have more control of the compute node than batch jobs. Users are able to manually run a lot of scripts on the compute node. If a user runs too many scripts in the same session, the memory could be used up by his/her scripts. In such a situation, the user should stop running too many scripts and submit new jobs to get new sessions. The new sessions would be open for the user with newly assigned resources. This scenario mostly happens for interactive LSF jobs.
However, if you experience memory issues running batch jobs, please contact DeepSense support. We will investigate what is happening and provide you help.

Where is my job output file?

Output may not be written to the specified file immediately when using the -o <filename> or -oo <filename> options. There are two workarounds for this problem:

  1. You can use the bpeek <jobid> command to view the output of a currently running job.
  2. You can send your output to a file with the typical unix output specifications such as > <filename> with your executed programs or by specifying output files in programs that support such options.