System Information

Compiling GPAW for RHEL5 x86_64, for intel Xeon 5650 with intel compilers and mkl. At SLAC this improves the 8-core performance benchmark by 12% compared to the opencc/ACML approach.

Software versions:

python: 2.4
gpaw: 0.8.0.7419 
ase: 3.5.0.1919
numpy: 1.4.1
openmpi: 1.4.3
intel compilers: 11.1  (includes mkl 10.2, I believe)
openmpi
./configure --prefix=/nfs/slac/g/suncatfs/sw/gpawv15/install CC=icc CXX=icpc F77=ifort FC=ifort
make
make install
numpy, ase

Build in usual fashion. At the moment we use default gnu compilers for numpy, since gpaw performance benchmark drops by 3% when it is built with icc/mkl/dotblas, for reasons that are not understood. Also, some gpaw self-tests start to fail.

gpaw

Set the following "unusual" environment variables. I believe the first one may not be necessary, since we link to libmklsequential, but I'm not certain.

setenv OMP_NUM_THREADS 1
setenv LD_PRELOAD libmkl_core.so:libmkl_sequential.so

Relevant lines from customize.py:

scalapack = False
compiler = 'icc'
libraries =['mkl_intel_lp64','mkl_sequential','mkl_cdft_core','mkl_core','pthread','m']
library_dirs = ['/nfs/slac/g/suncatfs/sw/external/intel11.1/openmpi/1.4.3/install/lib','/afs/slac/package/intel_tools/compiler11.1/mkl/lib/em64t/']

include_dirs += ['/nfs/slac/g/suncatfs/sw/external/numpy/1.4.1/install/lib64/python2.4/site-packages/numpy/core/include']
extra_link_args += ['-fPIC']

extra_compile_args = ['-I/afs/slac/package/intel_tools/compiler11.1/mkl/include','-xHOST','-O1','-ipo','-no-prec-div','-static','-std=c99','-fPIC']

define_macros =[('GPAW_NO_UNDERSCORE_CBLACS', '1'), ('GPAW_NO_UNDERSCORE_CSCALAPACK', '1')]

mpicompiler = 'mpicc'
mpilinker = mpicompiler

Notes on customize.py:

  • -O1 is used above because -O2 and -O3 appear to break a number of gpaw self-tests. The lower optimization level didn't appear to have any impact on performance
  • scalapack is off above. It may work with mkl scalapack. I just haven't tried.

Build command:

python setup.py install --home=/nfs/slac/g/suncatfs/sw/gpawv15/install --remove-default-flags

All gpaw self-tests pass, except relax.py which failed with what appears to be a precision problem:

relax.py                        12.717  FAILED!
#############################################################################
Traceback (most recent call last):
  File "/nfs/slac/g/suncatfs/sw/gpawv17/install/lib64/python/gpaw/test/__init__.
py", line 358, in run_one
    execfile(filename, loc)
  File "/nfs/slac/g/suncatfs/sw/gpawv17/install/lib64/python/gpaw/test/relax.py"
, line 102, in ?
    equal(e2, -6.290744, energy_tolerance)
  File "/nfs/slac/g/suncatfs/sw/gpawv17/install/lib64/python/gpaw/test/__init__.
py", line 25, in equal
    raise AssertionError(msg)
AssertionError: -6.29069568 != -6.290744 (error: |4.83220516e-05| > 7e-06)
#############################################################################
mkl LD_PRELOAD Hack Discussion

We use the fairly-horrific LD_PRELOAD hack to avoid run-time errors like this:

*** libmkl_mc3.so *** failed with error : /afs/slac/package/intel_tools/compiler11.1/mkl/lib/em64t/libmkl_mc3.so: undefined symbol: mkl_dft_commit_descriptor_s_c2c_md_omp
*** libmkl_def.so *** failed with error : /afs/slac/package/intel_tools/compiler11.1/mkl/lib/em64t/libmkl_def.so: undefined symbol: mkl_dft_commit_descriptor_s_c2c_md_omp
MKL FATAL ERROR: Cannot load neither libmkl_mc3.so nor libmkl_def.so

I believe it "works" for one of two reasons (I'm not certain which is relevant in our case):

  1. python uses "dlopen" which perhaps has troubles with the circular dependencies present in the mkl libraries (discussed here).
  2. to resolve the circular dependencies, mkl normally wants "-Wl,--startgroup" and "-Wl,--endgroup" in the link line, which isn't trivial to add with python distutils. My understanding is these flags will cause the linker to iterate through these libraries until all possible references have been satisfied. Since we don't have those flags, we get undefined symbols.

The LD_PRELOAD hack loads the relevant libraries earlier (I believe when the executable is activated) so we don't get the undefined symbols.

There are significant problems with the LD_PRELOAD hack: other unrelated executables will also try to preload these libraries. In particular, 32-bit (non-GPAW!) executables will be unable to load the 64-bit libraries and will complain:

ERROR: ld.so: object 'libmkl_core.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libmkl_sequential.so' from LD_PRELOAD cannot be preloaded: ignored.

A workaround ("hacking the hack") to avoid these errors is to also include the path to the 32-bit mkl libraries in LD_LIBRARY_PATH. However, Jun Yan has seen a case where setting LD_PRELOAD causes linking of GPAW private versions to mess up. This illustrates that LD_PRELOAD is fundamentally a bad idea. I think the best fix is to move to mkl 10.3 where I believe they have addressed this problem (discussed here).

mkl 10.3 Update

I believe MKL 10.3 does indeed "solve" the LD_PRELOAD problem, by introducing only one runtime library. Additionally we seem to get another 1% performance improvement.

  • No labels