To decide if we should invest in GPU's we will do some evaluations.

Prediction

I'm confident saying that GPU's will be very helpful for training large models, but it is not as clear if we need them for prediction. In the below test, I did much better scaling out to 16 cores using MPI, having each rank make one prediction at a time on its core, indicating - GPU's are not needed for prediction.

I also compare exclusive use of a node to using the GPU, and I saw no difference.

Issues: testing on the machine with the GPU competes with other users on the interactive nodes

Have not studies the best way to optimize, i.e, using a larger batch for each prediction rather than one image per prediction.

Details

For different sizes of a square image, 1900 x 1900 down to 1000 x 1000, we took the following:

a 6 layer neural network, the first 3 layers are convolutional

We predict one image at a time (working with a batch can improve performance)
The first layer, typically the bottleneck, produces a map of four feature vectors (output, 4x size of input)
The psnehq GPU exclusive means exclusive use of a psnehq machine (psana1502)
GPU means running on psanagpu102, with whatever other processes are present
psnehq means a MPI job using 16 cores

image size	psnehq on psana1502 No GPU exclusive use of node	psanagpu102 GPU (Kesla K40c)	psnehq MPI on psana1502 NO GPU 16 jobs
1900	3.8hz	3.3hz	2.6/rank=41hz/node
1700	4.2hz	4.4hz	2.5hz/rank=40hz/node
1500	5.3hz	5.5hz	2.8/hz/rank=45hz/node
1300	6.5hz	6.2hz	3.7/rank = 59hz/node
1000	10.8hz	11.5hz	6.4hz/rank = 100hz/node

The run on psanagpu102 was done between 2016/07/11 11:54:26 to 12:02, one could look at Ganglia to see the load on the machine.

The 16 was with tensorflow r9.