Introduction

We want to improve the robustness and reliability of the batch compute environment by applying more rigid resource controls. By running jobs in a "sandbox", they are protected from each other and cannot consume all of the resources on a machine. LSF version 9.1.2 makes use of linux Control Groups (AKA cgroups) to limit the CPU cores and memory that a job can use. The LSF cgroup-based resource restrictions are currently not in our production configuration. We want to understand the potential impact to users and get feedback from stakeholders. I have outlined some examples below using our test cluster.

 

not been deployed in productionWe are not enabled these features in the production environment