You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

big steps:

  1. standalone main calls "work" kernel we call with GPU pointers (completed)
  2. standalone main calls exc_vxc (RPBE only) interface with GPU pointers
  3. gpaw CPU version uses exc_vxc on GPU (we give exc_vxc GPU pointers)
  4. gpaw GPU version uses exc_vxc on GPU

plan for step(2):

  • starting to work in libxc source
  • start with work_gga_x.c. make work_gga_x a "shell".
  • try using nvcc for everything

questions:

  • we may run out of memory when putting more stuff on GPU
  • can gga.c call a "kernel pointer" or does work_gga_x become a "shell" that calls kernel?

to make an XC(gga_type) pointer "p" on the device:

  • need the size of params
  • swap out the info/params pointers for device pointers
  • p gpu-initialization happens at func_init time

to make "work" functions into a kernel:

  • need a _global_ in the work
  • need a _device_ in the rpbe

Porting libxc: Lessons Learned

  • use nvcc for all mixed host/gpu code
  • originally had problems linking with nvcc. maybe messed up by "-x cu"? used gcc instead. but later nvcc linked OK for top level executable.
  • nvcc does C++ mangling. need extern "C" in some cases.

Things we need to deal with:

  • can't call external _device_ function with nvcc-compiled code?
  • what to do about k functionals? (multiple includes of work_gga_x.c)
  • kludged local "static/global" variables (statics inside kernel don't work?)
  • make the copying of "p" beautiful (size of params problem)
  • gpu calling back to host code (and potentially vice-versa)
  • p sometimes needs to contain GPU function pointers and perhaps associated data (e.g. gga calling lda)
  • stride problem for spin indices

Process for RPBE:

  • use nvcc for everything (./configure CC=nvcc CFLAGS="-arch=sm_20")
  • rename gga_x_rpbe.c to .cu, also in src/Makefile
  • added _device_ to gga_x_rpbe.c, and "extern C" to "info" struct
  • included work_gga_x.cu in the gga_x_rpbe.cu with _global_
  • removed the memset in gga.c
  • No labels