Island Separation Algorithm Details

(descriptions courtesy of Alvaro Sanchez-Gonzalez)

scipyLabel

The image is taken right after filtering, and an island separation algorithm is used. This algorithm relies on having a strong filter beforehand by setting the parameter 'snrfilter' to a quite high value so the bunches are separated. This is a bad approach since the 'snrfilter' parameter is design to get rid of background noise, and should be set to values near 3 or so, as it essentially filters based on the standard deviation of the noise, more specifically anything below 'snrfilter' times the standard deviation of the noise.

autothreshold

This algorithm attempts to solve the previous problem by applying an extra layer of filtering before the island separation with the sole purpose of separating the islands. The pseudocode goes as follows:

  1. A range of thresholds for the filter to apply to the image is initialized (th_min,th_max), so th_min is equal to the noise threshold that was applied, and th_max is equal to the largest value of the image.
  2. The threshold is set to the mean of the range th_try=(th_min+th_max)/2 and applied to a copy of the image
  3. An algorithm is applied to find the number of reasonable size islands, based on comparing the volume of the large islands, to the next islands, to filter out islands made out of noise.
  4. Based on the number of reasonable size islands:
    1. If there are too few reasonable sized islands we decrease the upper limit, because we clearly thresholded too much: th_max=th_try
    2. If we have the right number of reasonable sized islands we decrease the upper limit, because even though it worked, we might have got away by thresholding a bit less: th_max=th_try.
    3. If less reasonable sized island than we expect, then we have to threshold more to separate the island, so: th_min=th_try
  5. We repeat steps 1-3 24 times.
  6. We repeat steps 2-3 a 25th time but using th_try=th_max. To make sure that if we ever where able to separate into the right number of islands, this will also be achieved in the last iteration, which is the one that we will used to do the final separation of the islands.

Since we are dividing our interval always by 2, this would give up an accuracy of 1/2**24~=1/1.5e7, on the value of the threshold for the image normalized to one. Only in the case that this is not accurate enough it will return the wrong number of islands, and discard the shot. But in practice this never happens, because I guess the XTCAV images are stored with 2**16 precision.

As you can see this algorithm precisely solves the problem of the threshold for separation of the signal in a logarithmic time.

contourLabel

The first step of this algorithm (as far as I understand, as I am not sure I understand all of it) is essentially the same as the previous algorithm entirely:

  1. Iteratively find a threshold that gives the right number of islands.
  2. Separate the islands.
  3. Based on the initial island separation, uses the gradient contours to assign to either island the different points that were thresholded in the steps 1-2.

So in principle could give a result as good as the autothreshold, but also including gradient contours. However there are a couple of things that can make it slow/unreliable:

  • The iterations to find the best optimal threshold are just sequential, starting from a hardcoded value of th_try and performing th_try= th_try + .000002 on each iteration. This means that it can take up to 1/0.000002=500000 iterations to find the right threshold within the dynamic range of the image (0 to 1).
  • Instead of comparing the size of the islands using the volume, it uses the area. This is of course faster, but, it may be not reliable when the different bunches can be lasing or non lasing for different shots (the lasing process would make the area of the island larger, but would keep the volume constant), and also is giving the same importance to pixels with a large signal value, than to pixels which were just above the threshold, and just made it after the filter. -It is not implemented to work for more than two islands, although I do not think there are running a triple bunch mode yer, so this is not a problem really.

So right now, the algorithm I would recommend is 'autothreshold' as it will pretty much always work by eating the right amount of signal it needs to eat to get the bunches separated. However there could be a advantage of the 'contourLabel' in case the issues above are solved. That said, when the islands are close enough that cannot be separated, it may be impossible already to do the x-ray reconstruction even if we assigned all the pixels to a bunch, or to the other, because following the contours will not give the signal that invaded the territory of the other, back to the original bunch anyway.

So if we get into a point when that happens, we would probably have to think harder to be able not to assign each pixel to a bunch, but to split the value on each pixel between the two bunches. I did some tests on this using logistic regression, which would give you a probability of each pixel of being in bunch one, or in bunch two (so this probability could be used to split the signal on each pixel), and managed to separate the islands in a more smooth way, however, I do not think this would solve all of our problems either.

Timing Measurements

(courtesy of Alvaro Sanchez-Gonzalez)

I just timed the three algorithms on the function that is in charge of splitting the bunches using the three algorithms. I got the following numbers in seconds to split a single image (based on the average of doing it 20 times):

  • scipyLabel: 0.057 s
  • autothreshold: 0.31 s (with 25 iterations)
  • autothreshold: 0.17 s (with 16 iterations, which should be enough)
  • contourLabel: 1.18 s

These tests have been done applying a snrfilter of 3. This is quite critical as the time taken by the labeling function and the island size check itself depends a lot on the number of islands, and the weaker the snr filter the more remaining noise islands are there. If you run the same for a value of snrfilter=5 you get the following times:

  • scipyLabel: 0.015 s
  • autothreshold: 0.24 s (with 25 iterations)
  • autothreshold: 0.17 s (with 16 iterations, which should be enough)
  • contourLabel: 0.54 s

The countourlabel is the one which improves most. This is due to the fact that most of the iterations to split the islands are run for a low threshold, which increases in small steps, so it can benefit the most from not having islands made out of noise. However in this case you would be throwing away more of the image information since the beginning, which is kind of the opposite we wanted to do by following the contours.

Algorithm Steps

(courtesy of Mihir Mongia)

We believe XTCAVRetrieval.SetCurrentEvent(evt) calls the following three "steps":

Routine ProcessShotStep1:

  • overall goal: subtract dark, denoise, signal ROI, bunch split
  • subtract dark image
  • run denoting algorithm median filter
  • median filter (for smoothing):
    • looks at pixel and its neighbors (number can be specified in some manner not yet understood)
    • take median of that set and set the center pixel value to the median
  • look at noise region
  • subtract mean of noise from whole image
  • anything >10 (can be over-ridden) standard deviation of noise, keep it.  < 10 stddevs set it to zero.
  • normalize image so sum=1
  • assumes dark image is larger than the shot image ROI (xmin,xmax,ymin,ymax probably coming from EPICS)
  • looks at max value in normalized image
  • takes all pixels>0.2*max then you "stay" in the ROI.  the software "draws a rectangle" around these pixel to keep ROI rectangular
  • expands rectangle dimensions by 2.5 (from the center) (user-settable) to bring in all interesting pixels into ROI, with a high likelihood
  • calls splitimage (says 'not done' in code?) to handle the bunches
    • this calls IslandSplitting which calls scipy.measurement.label on a boolean image, where the threshold for computing the boolean is zero (this is not really Otsu's method, we believe)
  • splitimage returns a 3D array where first dimension is the bunch (i.e. a set of rectangular ROIs)

ProcessShotStep2

  • overall goal: convert x to time, and y to energy (calibration, probably use EPICS variables)

ProcessShotStep3

  • overall goal: calculate power profile (power, time arrays)
  • calculate center-of-mass vector (1 number per time bin) and ERMS (energy RMS).  this may be related to the current-projection.
  • take the lasing-on image project onto time axis to get the current, and normalize
  • loop through the lasing-off images and do dot-products to find the most similar
  • subtract the lasing-on center-of-mass vector from lasing-off vector to get the power, and similarly for the sigma method (although for some reason the sign of the subtraction is opposite).
  • things not understood: bunch delay
  • calculate the power profile (delta and sigma methods)

Then Call XRayPower

  • averages the delta/sigma results (not weighted)
  • No labels