Introduction
We want to improve the robustness and reliability of the batch compute environment by applying more rigid resource controls. By running jobs in a "sandbox", they are protected from each other and cannot consume all of the resources on a machine. LSF version 9.1.2 makes use of linux Control Groups (AKA cgroups) to limit the CPU cores and memory that a job can use. These cgroup resource restrictions are currently not in our production LSF configuration. We want to understand the potential impact to users and get feedback from stakeholders. I have outlined some examples below using our test cluster.
CPU core affinity
Take a look at this simple script named mploadtest.csh. The script explicitly forks several CPU-bound tasks. In reality many users submit "single-slot" jobs that may behave in a similar manner or call API functions that spawn multiple child threads or processes:
-----start of mploadtest.csh -----------
#!/bin/csh
dd if=/dev/urandom bs=1M count=80 | bzip2 -9 >> /dev/null &
dd if=/dev/urandom bs=1M count=80 | bzip2 -9 >> /dev/null &
dd if=/dev/urandom bs=1M count=80 | bzip2 -9 >> /dev/null &
dd if=/dev/urandom bs=1M count=100 | bzip2 -9 >> /dev/null
-----end of mploadtest.csh -----------
Run the script as a single slot job on a specific idle host, for example: "bsub -q mpitest -m bullet0019 ./mploadtest.csh" . Open up a terminal session on the chosen host and observe the load across the CPU cores. You can do this running "top" in interactive mode and pressing "1" for the per-core load view. You'll notice the child processes associated with your running job are distributed across several cores, even though the job is "single-slot".
Wait for the job to finish then resubmit to the same host but this time request CPU affinity: "bsub -q mpitest -m bullet0019 -R 'affinity[core:membind=localprefer]' ./mploadtest.csh". Observe this job again using the per-core load view with top. This time you should see all of the load is associated with a single core. The number of assigned cores will match the number of job slots so submitting the job with "-n 3" will result in the job using 3 CPU cores.
Memory limit enforcement
This C program named memeater consumes some physical memory in a fairly short time period. We basically allocate and reference 100MB every 2 seconds for 10 iterations for a total of 1GB:
------ start of memeater.c -------------
#include <stdio.h>
#include <stdlib.h>
/* allocate memory in 100MB chunks */
#define HUNDRED_MB 104857600
main(){
int x;
for( x = 1; x <= 10 ; x++){
char *myptr;
myptr = malloc(HUNDRED_MB );
int y;
/* Dereference memory pointer so we use physical RAM */
for(y = 0; y < HUNDRED_MB; y++){
myptr[y] = 0;
}
printf("%d MBs allocated\n", x * HUNDRED_MB / 1048576);
sleep(2);
}
}
------ end of memeater.c -------------