You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Introduction

We want to improve the robustness and reliability of the batch compute environment by applying more rigid resource controls. By running jobs in a "sandbox", they are protected from each other and cannot consume all of the resources on a machine. LSF version 9.1.2 makes use of linux Control Groups (AKA cgroups) to limit the CPU cores and memory that a job can use. These cgroup resource restrictions are currently not in our production LSF configuration. We want to understand the potential impact to users and get feedback from stakeholders. I have outlined some examples below using our test cluster.

CPU core affinity

Take a look at this simple script named mploadtest.csh. The script explicitly forks several CPU-bound tasks. In reality many users submit "single-slot" jobs that may behave in a similar manner or call API functions that spawn multiple child threads or processes:

-----start of mploadtest.csh -----------
#!/bin/csh
dd if=/dev/urandom bs=1M count=80 | bzip2 -9 >> /dev/null &
dd if=/dev/urandom bs=1M count=80 | bzip2 -9 >> /dev/null &
dd if=/dev/urandom bs=1M count=80 | bzip2 -9 >> /dev/null &
dd if=/dev/urandom bs=1M count=100 | bzip2 -9 >> /dev/null
-----end of mploadtest.csh -----------

Run the script as a single slot job on a specific idle host, for example: "bsub -q mpitest -m bullet0019 ./mploadtest.csh" . Open up a terminal session on the chosen host and observe the load across the CPU cores. You can do this running "top" in interactive mode and pressing "1" for the per-core load view. You'll notice the child processes associated with your running job are distributed across several cores, even though the job is "single-slot".

Wait for the job to finish then resubmit to the same host but this time request CPU affinity: "bsub -q mpitest -m bullet0019 -R 'affinity[core:membind=localprefer]' ./mploadtest.csh". Observe this job again using the per-core load view with top. This time you should see all of the load is associated with a single core. The number of assigned cores will match the number of job slots so submitting the job with "-n 3" will result in the job using 3 CPU cores.

Memory limit enforcement

This C program named memeater consumes some physical memory in a fairly short time period. We basically allocate and reference 100MB every 2 seconds for 10 iterations for a total of 1GB:

 

------ start of memeater.c -------------
#include <stdio.h>
#include <stdlib.h>
/* allocate memory in 100MB chunks */
 #define HUNDRED_MB 104857600
 
main(){
    int x;
    for( x = 1; x <= 10 ; x++){
        char *myptr;
        myptr = malloc(HUNDRED_MB );
        int y;
        /* Dereference memory pointer so we use physical RAM */
        for(y = 0; y < HUNDRED_MB; y++){
            myptr[y] = 0;
        }
        printf("%d MBs allocated\n", x * HUNDRED_MB / 1048576);
        sleep(2);
    }
}
------ end of memeater.c -------------


 

 

 

 

  • No labels