ref: http://cs231n.stanford.edu/slides/winter1516_lecture8.pdf
This is something we are very interested in, however we tend not to pursue it along these directions since it is for the most part treated as a supervised learning problem, that is someone annotates training images to put boxes or polygons around the objects that are meant to be found. We assume this is an expensive operation, and maybe it is, but some points
- don't need as much labeled localization data as you do classification data
- build a classifier on lots of training data, or use transfer learning
- now train a regressor to localize
Annotating
-maybe not as expensive as one thinks, options
- http://labelme.csail.mit.edu/
opensource tool for labeling, I labeled 250 xtcav images. You can only upload 20 at a time, this is tedious. You have to give each box a name, which takes unnecessary time for our problem. I'd estimate 8 seconds per box, some images have two boxes, some 1. - mechanical turk - the labelme website talks about using this: http://labelme2.csail.mit.edu/Release3.0/browserTools/php/mechanical_turk.php
in particular they report a price per image of only 1 penny?