big steps:
- standalone main calls "work" kernel we call with GPU pointers (completed)
- standalone main calls exc_vxc (RPBE only) interface with GPU pointers
- gpaw CPU version uses exc_vxc on GPU (we give exc_vxc GPU pointers)
- gpaw GPU version uses exc_vxc on GPU
plan for step(2):
- starting to work in libxc source
- start with work_gga_x.c. make work_gga_x a "shell".
- try using nvcc for everything
questions:
- we may run out of memory when putting more stuff on GPU
- can gga.c call a "kernel pointer" or does work_gga_x become a "shell" that calls kernel?
to make an XC(gga_type) pointer "p" on the device:
- need the size of params
- swap out the info/params pointers for device pointers
- p gpu-initialization happens at func_init time
to make "work" functions into a kernel:
- need a _global_ in the work
- need a _device_ in the rpbe
Porting libxc: Lessons Learned
- use nvcc for all mixed host/gpu code
- need to link using gcc (nvcc can't seem to link)
- nvcc does C++ mangling. need extern "C" in some cases.
Things we need to deal with:
- can't call external _device_ function with nvcc-compiled code?
- what to do about k functionals? (multiple includes of work_gga_x.c)
- kludged local "static/global" variables
- make the copying of "p" beautiful (size of params problem)
- stride problem for spin indices
Process for RPBE:
- use nvcc for everything (./configure CC=nvcc CFLAGS="-arch=sm_20")
- rename gga_x_rpbe.c to .cu, also in src/Makefile
- added _device_ to gga_x_rpbe.c, and "extern C" to "info" struct
- included work_gga_x.cu in the gga_x_rpbe.cu with _global_
- removed the memset in gga.c