...
- use python 3 semantics, print(x) instead of print x
- will use cross entropy loss, ref: tensorflow mnist tutorial, in particular: colah post on Visual Information
- predict a probability distribution for each sample
- truth will be 0.0, 1.0 or 1.0, 0.0
- cross entropy loss and softmax - good for classification
- utitlity function takes 1D vector of labels and returns one hot - 2D vector with 2 columns
- model - as in code
- convolution is a sum over all channels of input, over kernel rows/columns
- good to shuffle data between each epoch
- keras tutorial talks about fit() for in memory, doesn't scale well
...