Page History

...

how to use constants memory
how to use texture memory
what does the 150GB/s mem bandwidth number mean?
- it is sum of read/write bandwidth (each is 75GB/s)
optimization tricks: pre-fetch etc.
- we get 85GB/s out of 150GB/s on 2075. use cudaDMA?
- philippe measures 84% memory bandwidth (154GB/s) on K20
what does a queued warp do? (does it pre-fetch the memory)
- yes, but can do better (e.g. cudaDMA)
reducing number of registers in kernel (does compiler typically do this optimally?)
- can control register usage using launch bounds
how to learn with nvvp if we're memory/flops limited
- philippe just counts instructions and measures MB/s by running code (no NVVP). He has some special code that counts instructions for him in complicated cases.
understanding the nvvp columns
best way to associate right GPU with right core (e.g. "taskset", "numactl")
ask about zher speedup numbers: for 4kx4k why does gemm improve by x30 but zher improves by x6?
- gemm with large sizes is compute limited, which GPU does well. zher is memory limited.
using automake with cuda and c in one library?
swapping out priority: free up memory?
proxy gpu allocation only works on K20?

...

Versions Compared