Data

The page Accelerator Beam finding - Internal Notes talks about the data and code, page has restricted access.

Below is an example of the problem we are trying to solve:

this is a vcc screen, and the location of the beam has been labeled with the white box.
We want to use machine learning to predict these box locations.
The dataset contains vcc screens and YAG screens - two different datasets, currently looking at training two different models - a YAG predictor and a VCC predictor.

Description

There are 3 files, called 1,2 and 4.

Files 1 and 2 have 142 samples. With file 4, the total number of 239 samples is.
Each sample has a yag, vcc, and box for each - there are also backgrounds to subtract for file 4, and a oval like image of the entire beam region - one can use that to narrow the search so as to not predict a box in a corner of the yag or vcc screen.
vcc values are in [0,255], and the boxed beam can get quite brite
yag values go over 1000, I think, but the boxed value is always dim, like up to 14

First Pass - just files 1 and 2

Given the apparent success of using transfer learning to do Spatial Localization to find the 'fingers' in XTCAV data, we will try the same thing with the accelerator data.

We have to fit the 480 x 640 vcc images, and 1040 x 1392 yag images into the 224 x 224 x 3 RBG sized images that the vgg16 convolutional neural network expects.

I thresholed yag at 255, then made grayscale images for each, using a scipy imresize option.

I generated codewords for the yag and vcc. The yag, which has bright beam, shows alot of structure:

These are plotted with a very large aspect ratio, the bottom is the 'nobeam' images.

However with the yag images, there is very little difference between nobeam and beam:

There a

I suspect we will not be able to do much with these codewords without more preprocessing of the yag images - I think they are too faint for what vgg16 expects - it was trained on the imagenet color images.

Second pass

This problem seems harder than the localization for lasing fingers in amo86815. There is more variety in the signal we are trying to find. This leads to different kinds of signal processing pre-filtering of the images. Then sometimes the vgg16 codewords don't seem that homogenous - suggesting.

Of the 239 samples, 163 of the vcc have a labeled box. Below is a plot where we grab what is inside each box and plot it all in a grid - this is with the background subtraction for file 4. The plot on the left is before, and on the right, is after reducing the 480 x 640 vcc images to (224,224) for vgg16. We used scipy imreduce 'lanczos' to reduce (this calls PIL).

Here are the 159 smaples of the yag with a box - here are are using 'lanczos' to reduce from the much larger size of 1040 x 1392 to (224,224). It is interesting to note how the colorbar changes - the range no longer goes up to 320 - I think the 320 values were isolated pixels that get washed out? Or else there is something else I don't understand - we are doing nothing more than scipy.misc.imresize(img,(224,224), interp='lanczos',mode='F') but img is np.uint16 after careful background subtraction - (going through float32, thresholding at 0 before converting back)

Pipeline

Regression/box prediction

The processing pipeline for the regression

choose a preprocessing algorithm
create vgg16 codeword1 and 2 (8196 numbers, last two layers)
separately for 'yag' and nm'
1. for each of the for each of the 163 (or 159) samples,
  1. train a linear regression classifier on the remaining 162 (or 158) samples
    1. map from 8196 variables to 4
  2. use it to predict a box for the ommitted sample
  3. optionally - limit input features - reject features with variance less than threshold
    1. (maybe they are noisy and throwing off classifier)

Measure accuracy

For localization - on a shot by shot basis, where we are comparing boxA to boxB, one typically calculated the ratio of area of the intersection to area of the union. For imagenet competitions, one gets success on a shot/image of inter/union >=0.5, those predictions look quite good! One can then come up with a overall accuracy based on the inter/union threshold. Below, we report on accuracy for different thresholds, .5, .2 and .01 - the latter is to see how accurate we are at getting any overlap.

To visualze the results, we make a similar plot to above, but plot the truth box in white, and the predicted box in red.

Results

Pre processing=None, files=1,2,4

For the vcc, accuracies, a 1% overlap is 55%

For the yag, a 1% overlap is 86%:

All Accuracies

36 different runs were carried out, varying each of the following:

Pre-processing algorithm, one of
- none
  - just 'lanczos' reduction
- denoise-log
  - 3 pt median filter
  - log(1+img)
  - 'lanczoz' reduction
  - multiply by scale factor
- denoise-max-log
  - 3 pt median filter
  - 3 x 3 sum
  - 3 pt median filter
  - 'max_reduce' (save largest pixel value over square)
  - 'lanczoz' reduction (to get final (224,224) size)
  - log(1+img)
  - scale up
files, one of
- just 1,2
- 1,2,4
Do and Don't subtract background for file 4
Do and Don't filter out some of the 8192 features with variance <= 0.01 before doing regression

Below is a table of all these results

nm=yag eb_alg_none_f1_f2-regress.h5 inter/union accuracies:  th=0.50 acc=0.41 th=0.20 acc=0.78 th=0.01 acc=0.86
nm=vcc eb_alg_none_f1_f2-regress.h5 inter/union accuracies:  th=0.50 acc=0.14 th=0.20 acc=0.38 th=0.01 acc=0.65
nm=yag eb_alg_denoise-log_f1_f2-regress.h5 inter/union accuracies:  th=0.50 acc=0.66 th=0.20 acc=0.87 th=0.01 acc=0.90
nm=vcc eb_alg_denoise-log_f1_f2-regress.h5 inter/union accuracies:  th=0.50 acc=0.38 th=0.20 acc=0.61 th=0.01 acc=0.72
nm=yag eb_alg_denoise-max-log_f1_f2-regress.h5 inter/union accuracies:  th=0.50 acc=0.46 th=0.20 acc=0.77 th=0.01 acc=0.88
nm=vcc eb_alg_denoise-max-log_f1_f2-regress.h5 inter/union accuracies:  th=0.50 acc=0.41 th=0.20 acc=0.60 th=0.01 acc=0.76
nm=yag eb_alg_none_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.34 th=0.20 acc=0.68 th=0.01 acc=0.82
nm=vcc eb_alg_none_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.11 th=0.20 acc=0.34 th=0.01 acc=0.61
nm=yag eb_alg_denoise-log_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.47 th=0.20 acc=0.75 th=0.01 acc=0.89
nm=vcc eb_alg_denoise-log_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.26 th=0.20 acc=0.42 th=0.01 acc=0.56
nm=yag eb_alg_denoise-max-log_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.34 th=0.20 acc=0.62 th=0.01 acc=0.81
nm=vcc eb_alg_denoise-max-log_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.28 th=0.20 acc=0.39 th=0.01 acc=0.48
nm=yag eb_subbkg_alg_none_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.38 th=0.20 acc=0.73 th=0.01 acc=0.86
nm=vcc eb_subbkg_alg_none_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.08 th=0.20 acc=0.28 th=0.01 acc=0.55
nm=yag eb_subbkg_alg_denoise-log_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.56 th=0.20 acc=0.81 th=0.01 acc=0.92
nm=vcc eb_subbkg_alg_denoise-log_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.23 th=0.20 acc=0.46 th=0.01 acc=0.65
nm=yag eb_subbkg_alg_denoise-max-log_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.27 th=0.20 acc=0.64 th=0.01 acc=0.84
nm=vcc eb_subbkg_alg_denoise-max-log_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.26 th=0.20 acc=0.48 th=0.01 acc=0.63
nm=yag eb_varthresh_alg_none_f1_f2-regress.h5 inter/union accuracies:  th=0.50 acc=0.41 th=0.20 acc=0.79 th=0.01 acc=0.86
nm=vcc eb_varthresh_alg_none_f1_f2-regress.h5 inter/union accuracies:  th=0.50 acc=0.10 th=0.20 acc=0.38 th=0.01 acc=0.63
nm=yag eb_varthresh_alg_denoise-log_f1_f2-regress.h5 inter/union accuracies:  th=0.50 acc=0.67 th=0.20 acc=0.87 th=0.01 acc=0.90
nm=vcc eb_varthresh_alg_denoise-log_f1_f2-regress.h5 inter/union accuracies:  th=0.50 acc=0.37 th=0.20 acc=0.58 th=0.01 acc=0.71
nm=yag eb_varthresh_alg_denoise-max-log_f1_f2-regress.h5 inter/union accuracies:  th=0.50 acc=0.45 th=0.20 acc=0.77 th=0.01 acc=0.88
nm=vcc eb_varthresh_alg_denoise-max-log_f1_f2-regress.h5 inter/union accuracies:  th=0.50 acc=0.40 th=0.20 acc=0.59 th=0.01 acc=0.76
nm=yag eb_varthresh_alg_none_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.30 th=0.20 acc=0.67 th=0.01 acc=0.82
nm=vcc eb_varthresh_alg_none_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.10 th=0.20 acc=0.33 th=0.01 acc=0.60
nm=yag eb_varthresh_alg_denoise-log_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.47 th=0.20 acc=0.75 th=0.01 acc=0.89
nm=vcc eb_varthresh_alg_denoise-log_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.24 th=0.20 acc=0.42 th=0.01 acc=0.57
nm=yag eb_varthresh_alg_denoise-max-log_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.34 th=0.20 acc=0.62 th=0.01 acc=0.81
nm=vcc eb_varthresh_alg_denoise-max-log_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.28 th=0.20 acc=0.39 th=0.01 acc=0.48
nm=yag eb_varthresh_subbkg_alg_none_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.35 th=0.20 acc=0.72 th=0.01 acc=0.86
nm=vcc eb_varthresh_subbkg_alg_none_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.07 th=0.20 acc=0.25 th=0.01 acc=0.53
nm=yag eb_varthresh_subbkg_alg_denoise-log_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.54 th=0.20 acc=0.79 th=0.01 acc=0.92
nm=vcc eb_varthresh_subbkg_alg_denoise-log_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.22 th=0.20 acc=0.44 th=0.01 acc=0.64
nm=yag eb_varthresh_subbkg_alg_denoise-max-log_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.27 th=0.20 acc=0.65 th=0.01 acc=0.84
nm=vcc eb_varthresh_subbkg_alg_denoise-max-log_f1_f2_f4-regress.h5 inter/union accuracies:  th=0.50 acc=0.26 th=0.20 acc=0.47 th=0.01 acc=0.62

Best Result - YAG

The best 1% overlap for the YAG is 92%

It is over files 1,2,4
used denoise-log, and subtracts the background.
Same result with/without the variance feature selection
Not subtracting the background reduced the accuracy to 89%.
Not using file 4 reduced accuracy to 90%.
Using the denoise-max-log led to 84% accuracy (worse than no preprocessing = 86%).

Best Result - VCC

The best 1% overlap for the VCC is 76%.

It is over files 1,2
used denoise-log
adding file 4, with subbkg, reduced acc to 63%
adding file 4, without subbkg reduced acc to 48%

Space shortcuts

Page tree

Data

Description

First Pass - just files 1 and 2

Second pass