To decide if we should invest in GPU's we will do some evaluations.
Prediction
I'm confident saying that GPU's will be very helpful for training large models, but it is not as clear if we need them for prediction. In the below test, I did much better scaling out to 16 cores using MPI, having each rank make one prediction at a time on its core, indicating - GPU's are not needed for prediction.
I also compare exclusive use of a node to using the GPU, and I saw no difference.
Issues: testing on the machine with the GPU competes with other users on the interactive nodes
Have not studies the best way to optimize, i.e, using a larger batch for each prediction rather than one image per prediction.
Details
For different sizes of a square image, 1900 x 1900 down to 1000 x 1000, we took the following:
a 6 layer neural network, the first 3 layers are convolutional
- We predict one image at a time (working with a batch can improve performance)
- The first layer, typically the bottleneck, produces a map of four feature vectors (output, 4x size of input)
- The psnehq GPU exclusive means exclusive use of a psnehq machine (psana1502)
- GPU means running on psanagpu102, with whatever other processes are present
- psnehq means a MPI job using 16 cores
image size | psnehq on psana1502 No GPU exclusive use of node | psanagpu102 GPU (Kesla K40c) | psnehq MPI on psana1502 NO GPU 16 jobs |
1900 | 3.8hz | 3.3hz | 2.6/rank=41hz/node |
1700 | 4.2hz | 4.4hz | 2.5hz/rank=40hz/node |
1500 | 5.3hz | 5.5hz | 2.8/hz/rank=45hz/node |
1300 | 6.5hz | 6.2hz | 3.7/rank = 59hz/node |
1000 | 10.8hz | 11.5hz | 6.4hz/rank = 100hz/node |
The run on psanagpu102 was done between 2016/07/11 11:54:26 to 12:02, one could look at Ganglia to see the load on the machine.
The 16 was with tensorflow r9.