Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We want to improve the robustness and reliability of the batch compute environment system by applying more rigid tighter resource controls. By running jobs in a "sandbox", they are protected The goal is to isolate jobs from each other and cannot consume all of prevent them from consuming all the resources on a machine. LSF version 9.1.2 makes use of linux Control Groups (AKA cgroups) to limit the CPU cores and memory that a job can use. These cgroup resource -based restrictions are currently not in our production LSF configuration. We want to understand the potential impact to users and get feedback from stakeholders. I have outlined some examples below using our test cluster.

...

Wait for the job to finish then resubmit to the same host but this time a host that has cgroups enabled. This time also we request CPU affinity in the job submission command: "bsub -q mpitest -m bullet0019 -R 'affinity[core:membind=localprefer]' ./mploadtest.csh". Observe this job again using the per-core load view with top. This time you should see all of the load is associated with a single core. The number of assigned cores will match the number of job slots so submitting the job with "-n 3" will result in the job using 3 CPU cores.

...