ref: http://cs231n.stanford.edu/slides/winter1516_lecture8.pdf

This is something we are very interested in, however we tend not to pursue it along these directions since it is for the most part treated as a supervised learning problem, that is someone annotates training images to put boxes or polygons around the objects that are meant to be found. We assume this is an expensive operation, and maybe it is, but some points

Annotating

-maybe not as expensive as one thinks, options

Good Results with Little Training Data

Here is what it looks like to work in labelme - note labelme only takes jpg to the image quality is poorer:

Here is a visualization of the 250 training images. All 250 images are summed up, and all localization boxes for the are plotted.  This gives a sense of the how the boxes are distributed so we can gauge how hard the regression problem is, of predicting boxes. These lasing images are plotted "right side up", that is with the fingers going down - e1, the earlier time arrival, higher energy, is the white boxes on the top of the image (they were the bottom of the labelme images) and e2, the later time, lower energy, are the green boxes on the bottom of the image

The localization proceeded as follows

Below we plot results. Overall, the predicted boxes, always in green, look surprisingly good. I only found one 'bad' one for each of e1, and e2, which I plot below.

e1 images

This is the one bad one - the 10th of about 13 test images

e2 images

images