You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 38 Next »

Recipe for Building libxc.a with GPU-compatible functionals

  • intall CUDA, if necessary
  • get a fresh libxc-1.2.0 (e.g. tar xfz /nfs/slac/g/suncatfs/sw/package/libxc-1.2.0.tar.gz)
  • cd libxc-1.2.0
  • ./configure --prefix=`pwd`/install LIBS="-L/opt/CUDA/CUDA41/cuda/lib64 -lcudart"
  • get libxc-gpu:
    • for SLAC users: svn co svn+ssh://username@suncatls1.slac.stanford.edu/libxc-gpu/trunk libxc-gpu
    • for offsite users (readonly): svn co svn://suncatls1.slac.stanford.edu/libxc-gpu/trunk libxc-gpu
  • cd libxc-gpu
  • modify top two directory names (CUDADIR, XCDIR) in Makefile
  • make setup
  • cd libxc-1.2.0/src/
  • make xc_funcs.h libxc.la
  • cd libxc-gpu
  • make
  • cd libxc-1.2.0
  • make install

Recipe for Moving libxc Functional to GPU

This requires that the functional use the common "work.c" mechanism. Some functionals (e.g. tpss_c seem to not do this yet).

  • if one doesn't exist, create an "init" function
  • add copy_gga (or copy_lda/copy_mgga as appropriate) to the init function
  • declare "static XC(mgga_type) p_d" and "xc_mgga_type copy_mgga(xc_mggatype* h, size_t sizeofparams)" at the top of the .cu file
  • change "static FLOAT" to "static const _device_ FLOAT"
  • change routines for equation routines from "static void" to "static inline _device_ void"
  • add "extern "C" const" in front of XC(func_info_type) declaration
  • add "_gpu" to the name of the XC(func_info_type) declaration
  • change the functional number (first parameter in XC(func_info_type)). We typically add _GPU to the end
  • change the #define for the functional number at the top of the file. We typically add 1000
  • save the functional as a .cu file
  • changed "#include "work_mgga_x.c"" to "#include "work_mgga_x.cuh""
  • some functionals will call external functions on the GPU like XC(rho2dzeta). May need to cobble together declaration header files for these.
  • if a functional has a "params" struct, you must add an "int" first member of that struct, which must be the sizeof(struct). this is used to copy the params struct to the GPU.

Recipe for Moving General "Work" routine to GPU

This need to be done once for LDA/GGA/MGGA x/c for the common "work.c" file (total of 6 of these)

  • copy "static void work_mgga_x" declaration (including arguments) to the bottom. this will become cpu wrapper for the gpu kernel.
  • in this new "wrapper" routine
    • change "static void" declaration to "extern "C" static void" declaration
    • define block_size and n_blocks (based on number of points)
    • call the gpuwork kernel
    • pass p_d as the first argument
  • in the "kernel" routine
    • change kernel "static void work_mgga_x" declaration to "static void _global_ gpuwork"
    • delete "ip" loop over number of points
    • add "int idx = blockIdx.x*blockDim.x + threadIdx.x"
    • add "if (idx<np)"
    • pointers are normally incremented at the bottom. move all the stuff after "end_ip_loop:" immediately after the above "if" statement
    • add "*idx" at the end of each pointer assignment (pointer arithmetic)
  • save work as new copy with .cuh extension
  • make sure to set block size large enough so that number of blocks does not exceed 65536 (GPU limit)

Porting libxc: Lessons Learned

  • we really need to move from CUDA4 to CUDA5 so we have a linker. right now all the source for a functional has to show up in one file, so we have to use ugly include files.
  • One of the tricky things was to copy the "p" struct (like xc_gga_type) from CPU to GPU. A picture of all the structs involved is here.
  • use nvcc for all mixed host/gpu code
  • nvcc does C++ mangling. need extern "C" in some cases.

Things that need to be dealt with:

  • can't call external _device_ function with nvcc-compiled code: no CUDA4 linker (workaround: use .cuh files with statically declared functions)
  • what to do about k functionals? (multiple includes of work_gga_x.c)
  • kludged local "static/global" variables (statics inside kernel don't work?)
  • stride problem for spin indices
  • memset in gga.c etc.
  • work_mgga_c.c sets policy for particular func_aux? also calls func_c_parallel which is part of mgga_xc_vsxc.c
  • figure out how to get nested param-size (will change "p" struct for this, in general it would be a function to deep-copy params)
  • figure out how to get p_d into the functional (should probably change "p" struct for this)
  • kinetic functionals
  • understand PBE instruction replays and constants-memory
  • think about deletion of p memory (we leak that now)

CUDA Problems

  • Unique symbol problem, including the same .cuh in multiple files can't set breakpoint. Need cuda5 for linkers.
  • Slow startup, 30 sec wait, fixed with "nvidia-smi -pm 1"
  • sometimes, we can't step into a function (if it is a file included in a ".cuh"? (multi-layer include))
(cuda-gdb)
200	/tmp/tmpxft_00001dc8_00000000-7_gga_c_pbe.cpp3.i: No such file or directory.
	in /tmp/tmpxft_00001dc8_00000000-7_gga_c_pbe.cpp3.i



(cuda-gdb)
func (xs=0xdddddddddddddddd, p=warning: Variable is not live at this point. Value is undetermined.
0x0, order=warning: Variable is not live at this point. Value is undetermined.
0, rs=warning: Variable is not live at this point. Value is undetermined.
0, zeta=-1.4568159901474629e+144, xt=warning: Variable is not live at this point. Value is undetermined.
0, f=warning: Variable is not live at this point. Value is undetermined.
0x0, dfdrs=0xdddddddddddddddd, dfdz=0xdddddddddddddddd, dfdxt=0xdddddddddddddddd, dfdxs=warning: Variable is not live at this point. Value is undetermined.
0x0, d2fdrs2=0xdddddddddddddddd, d2fdrsz=0xdddddddddddddddd, d2fdrsxt=0xdddddddddddddddd, d2fdrsxs=0xdddddddddddddddd, d2fdz2=0xdddddddddddddddd, d2fdzxt=0xdddddddddddddddd, d2fdzxs=0xdddddddddddddddd, d2fdxt2=0xdddddddddddddddd, d2fdxtxs=0xdddddddddddddddd, d2fdxs2=0xdddddddd) at gga_c_pbe.cu:271
271	}
  • No labels