You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 50 Next »

Building libxc 1.2.0 adding additional GPU-compatible functionals

  • install CUDA, if necessary (have only tested with CUDA4 and libxc 1.2.0)
  • get a fresh libxc-1.2.0 (e.g. tar xfz /nfs/slac/g/suncatfs/sw/package/libxc-1.2.0.tar.gz)
  • cd libxc-1.2.0
  • ./configure --prefix=`pwd`/install LIBS="-L/opt/CUDA/CUDA41/cuda/lib64 -lcudart"
  • get libxc-gpu:
    • for SLAC users: svn co svn+ssh://username@suncatls1.slac.stanford.edu/libxc-gpu/tags/1.0 libxc-gpu
    • for offsite users (readonly): svn co svn://suncatls1.slac.stanford.edu/libxc-gpu/tags/1.0 libxc-gpu
  • cd libxc-gpu
  • modify top two directory names (CUDADIR, XCDIR) in Makefile
  • make setup
  • cd libxc-1.2.0/src/
  • make xc_funcs.h libxc.la
  • cd libxc-gpu
  • make
  • (optional) run xc_gpu_timing executable to compare CPU/GPU timing for several functionals
  • cd libxc-1.2.0
  • make install

For Developers: Moving libxc Functional to GPU

This requires that the functional use the common "work.c" mechanism. Some functionals (e.g. tpss_c seem to not do this yet).

  • if one doesn't exist, create an "init" function
  • add copy_gga (or copy_lda/copy_mgga as appropriate) to the init function
  • declare "static XC(mgga_type) p_d" and "xc_mgga_type copy_mgga(xc_mggatype* h, size_t sizeofparams)" at the top of the .cu file
  • change "static FLOAT" to "static const _device_ FLOAT"
  • change routines for equation routines from "static void" to "static inline _device_ void"
  • add "extern "C" const" in front of XC(func_info_type) declaration
  • add "_gpu" to the name of the XC(func_info_type) declaration
  • change the functional number (first parameter in XC(func_info_type)). We typically add _GPU to the end
  • change the #define for the functional number at the top of the file. We typically add 1000
  • save the functional as a .cu file
  • changed "#include "work_mgga_x.c"" to "#include "work_mgga_x.cuh""
  • some functionals will call external functions on the GPU like XC(rho2dzeta). May need to cobble together declaration header files for these.
  • if a functional has a "params" struct, you must add an "int" first member of that struct, which must be the sizeof(struct). this is used to copy the params struct to the GPU.

For Developers: Moving General "Work" routine to GPU

This need to be done once for LDA/GGA/MGGA x/c for the common "work.c" file (total of 6 of these)

  • copy "static void work_mgga_x" declaration (including arguments) to the bottom. this will become cpu wrapper for the gpu kernel.
  • in this new "wrapper" routine
    • change "static void" declaration to "extern "C" static void" declaration
    • define block_size and n_blocks (based on number of points)
    • call the gpuwork kernel
    • pass p_d as the first argument
  • in the "kernel" routine
    • change kernel "static void work_mgga_x" declaration to "static void _global_ gpuwork"
    • delete "ip" loop over number of points
    • add "int idx = blockIdx.x*blockDim.x + threadIdx.x"
    • add "if (idx<np)"
    • pointers are normally incremented at the bottom. move all the stuff after "end_ip_loop:" immediately after the above "if" statement
    • add "*idx" at the end of each pointer assignment (pointer arithmetic)
  • save work as new copy with .cuh extension
  • make sure to set block size large enough so that number of blocks does not exceed 65536 (GPU limit)

libxc on GPUs: Issues

  • we really need to move from CUDA4 to CUDA5 so we have a linker. right now all the source for a functional has to show up in one file, so we have to use ugly include files.
  • One of the tricky things was to copy the structs xc_lda_type,xc_gga_type,xc_mgga_type from CPU to GPU. A picture of all the structs involved is here. We currently leak the memory for this struct (never deleted)
  • mgga_c in libxc 1.2.0 doesn't follow the right pattern in it's "work".c file to be ported to the GPU. we believe this is fixed in 2.0.0.
  • we use nvcc for all mixed host/gpu code
  • nvcc does C++ mangling. need extern "C" in some cases.
  • not sure if we can make kinetic functionals and hybrid functionals (like PBE0) work
  • using libxc gets tricky with spin-polarized calculations if spin-indices are not dense in memory (true on CPUs as well)
  • libxc is currently unable to add separate X/C contributions "in place" without using extra arrays (true on CPUs as well)

CUDA Problems

  • Unique symbol problem, including the same .cuh in multiple files can't set breakpoint. Need cuda5 for linkers.
  • Slow startup, 30 sec wait, fixed with "nvidia-smi -pm 1"
  • sometimes, we can't step into a function (if it is a file included in a ".cuh"? (multi-layer include))
(cuda-gdb)
200	/tmp/tmpxft_00001dc8_00000000-7_gga_c_pbe.cpp3.i: No such file or directory.
	in /tmp/tmpxft_00001dc8_00000000-7_gga_c_pbe.cpp3.i



(cuda-gdb)
func (xs=0xdddddddddddddddd, p=warning: Variable is not live at this point. Value is undetermined.
0x0, order=warning: Variable is not live at this point. Value is undetermined.
0, rs=warning: Variable is not live at this point. Value is undetermined.
0, zeta=-1.4568159901474629e+144, xt=warning: Variable is not live at this point. Value is undetermined.
0, f=warning: Variable is not live at this point. Value is undetermined.
0x0, dfdrs=0xdddddddddddddddd, dfdz=0xdddddddddddddddd, dfdxt=0xdddddddddddddddd, dfdxs=warning: Variable is not live at this point. Value is undetermined.
0x0, d2fdrs2=0xdddddddddddddddd, d2fdrsz=0xdddddddddddddddd, d2fdrsxt=0xdddddddddddddddd, d2fdrsxs=0xdddddddddddddddd, d2fdz2=0xdddddddddddddddd, d2fdzxt=0xdddddddddddddddd, d2fdzxs=0xdddddddddddddddd, d2fdxt2=0xdddddddddddddddd, d2fdxtxs=0xdddddddddddddddd, d2fdxs2=0xdddddddd) at gga_c_pbe.cu:271
271	}
  • No labels