...
- constants memory
- texture memory
- optimization tricks: pre-fetch etc.
- what does a queued warp do? (does it pre-fetch the memory)
- reducing number of registers in kernel (does compiler typically do this optimally?)
- how to learn with nvvp if we're memory/flops limited
- understanding the nvvp columns
- best way to associate right GPU with right core (e.g. "taskset", "numactl")
- ask about zher speedup numbers: for 4kx4k why does gemm improve by x30 but zher improves by x6?
- using automake with cuda and c in one library?
...
1/
...
8/
...
2013
- libxc on gpu (lin)
- work on automake stuff
- get the cleaned-up ifdef version from Miguel
- digest RPA timing measurements (lin)
- AJ and cpo start meeting once per week (friday) to work/strategize on convergence
- paper (jun)
- redo timing measurements (jun/lin)
- understand new GPU box memory slowness (cpo)
12/18/2012
- libxc on gpu (lin)
- use common work file for CPU/GPU
- digest RPA timing measurements (lin)
- paper (jun)
- redo timing measurements (jun)
- understand timing measurements more fully (jun)
- dacapo density mixing vs. GPAW (cpo)
...