The work below describes a process that allows libxc to run on GPUs, using the natural parallelization over gridpoints. It was done with libxc 1.2.0, but we later also did it (with some relatively minor modifications) for libxc 2.0.0. We obtained speedups of 30x-100x (cpu+gpu vs. cpu).
We did this as an introductory project to learn how to port software to GPUs. The documentation below is "crude" because there is currently not much demand for libxc on GPUs. But perhaps in the future it may be useful for people.
gpaw-setup -f GGA_X_RPBE_GPU+GGA_C_PBE_GPU H |
#!/usr/bin/env python from ase import * from gpaw import GPAW a = 5.0 H = Atoms([Atom('H',(a/2, a/2, a/2), magmom=1)], pbc=False, cell=(a, a, a)) H.set_calculator(GPAW(nbands=1, h=0.2, convergence={'eigenstates': 1e-3},txt='H.txt', xc='GGA_X_RPBE_GPU+GGA_C_PBE_GPU')) e = H.get_potential_energy() |
This requires that the functional use the common "work.c" mechanism. Some functionals (e.g. tpss_c seem to not do this yet).
This need to be done once for LDA/GGA/MGGA x/c for the common "work.c" file (total of 6 of these)
(cuda-gdb) 200 /tmp/tmpxft_00001dc8_00000000-7_gga_c_pbe.cpp3.i: No such file or directory. in /tmp/tmpxft_00001dc8_00000000-7_gga_c_pbe.cpp3.i (cuda-gdb) func (xs=0xdddddddddddddddd, p=warning: Variable is not live at this point. Value is undetermined. 0x0, order=warning: Variable is not live at this point. Value is undetermined. 0, rs=warning: Variable is not live at this point. Value is undetermined. 0, zeta=-1.4568159901474629e+144, xt=warning: Variable is not live at this point. Value is undetermined. 0, f=warning: Variable is not live at this point. Value is undetermined. 0x0, dfdrs=0xdddddddddddddddd, dfdz=0xdddddddddddddddd, dfdxt=0xdddddddddddddddd, dfdxs=warning: Variable is not live at this point. Value is undetermined. 0x0, d2fdrs2=0xdddddddddddddddd, d2fdrsz=0xdddddddddddddddd, d2fdrsxt=0xdddddddddddddddd, d2fdrsxs=0xdddddddddddddddd, d2fdz2=0xdddddddddddddddd, d2fdzxt=0xdddddddddddddddd, d2fdzxs=0xdddddddddddddddd, d2fdxt2=0xdddddddddddddddd, d2fdxtxs=0xdddddddddddddddd, d2fdxs2=0xdddddddd) at gga_c_pbe.cu:271 271 } |