You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

You can minimize the time it takes for a general queue job to start running by estimating the maximum wall-clock time. Instead of explicity selecting a general queue (short, medium, long, xlong, xxl), supply the '-W' RUNLIMIT argument to the bsub command. LSF will terminate the job if it exceeds the run limit. The automatic queue selection feature will place your job in the most appropriate general queue, eliminating any guesswork. Some examples:

yemi@rhel6-64g $ bsub -W 10 echo "hello world"
Job <97451> is submitted to default queue <short>.

yemi@rhel6-64g $ bsub -W 60 echo "hello world"
Job <98011> is submitted to default queue <medium>.

yemi@rhel6-64g $ bsub -W 300 echo "hello world"
Job <98365> is submitted to default queue <long>.

If you use a dedicated (non-general) queue in your production environment, continue to specify the queue in your bsub command but just add the '-W' option.

By supplying a RUNLIMIT, your jobs can start faster because they stand a better chance of using a feature called "backfill". An increasing number of users are now running large parallel jobs across multiple cores/slots. These parallel jobs can take a considerable amount of time to reserve all of cores they require to start. The scheduler will attempt to run smaller jobs on reserved cores as long as the estimated start time of the bigger parallel job is not affected.

Providing a RUNLIMIT let's the scheduler know what the required time window for your job is. Without an explicit RUNLIMIT, the scheduler can only assume your job will run as long as the default RUNLIMIT for the queue, this default is often far greater than many jobs need. For example, the xlong queue currently has a RUNLIMIT default of 72 hours but queue statistics show the runtime average for this queue is currently ~2 hours.

 

 

  

  • No labels