To-Do List
Questions for Nvidia
- how to use constants memory
- how to use texture memory
- optimization tricks: pre-fetch etc.
- we get 85GB/s out of 150GB/s on 2075. use cudaDMA?
- what does a queued warp do? (does it pre-fetch the memory)
- yes
- reducing number of registers in kernel (does compiler typically do this optimally?)
- can control register usage using launch bounds
- how to learn with nvvp if we're memory/flops limited
- philippe just counts instructions and measures MB/s by running code (no NVVP). He has some special code that counts instructions for him in complicated cases.
- understanding the nvvp columns
- best way to associate right GPU with right core (e.g. "taskset", "numactl")
- ask about zher speedup numbers: for 4kx4k why does gemm improve by x30 but zher improves by x6?
- gemm with large sizes is compute limited, which GPU does well. zher is memory limited.
- using automake with cuda and c in one library?
- swapping out priority: free up memory?
- proxy gpu allocation only works on K20?
...