Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

This page describes the failure of the HE channels of a single GCFE.  There are 3072 GCFEs in CAL, two for each of the 1536 xtals.

The problem occurred in the run starting at MET = 301753824, which is MJD 55402.52108796, or 2010 Jul 25 at 12:30:22 UTC, or 2010 day 206 at 12:30:22 UTC.

Symptoms of the failure

The problem occurred in the run starting at MET = 301753824, which is MJD 55402.52108796, or 2010 Jul 25 at 12:30:22 UTC, or 2010 day 206 at 12:30:22 UTC.

On 27 Jul 2010 (Day 208), at 1:55 PM EDT, Anders Borgland wrote:

Starting with run 301753824 we have two problems:

1/
We do not see any signal in the high energy diode in tower 4, layer X1, column 4, + side. From Eric Siskind: "Whether the failure is within the diode itself or in the GCFE electronics chain (HE preamp or slow shaper) is currently unknown". You can read the whole thread here:

https://www-glast.stanford.edu/protected/mail/datamon/4835.html

2/ While problem 1/ affects all (high energy) events in that channel, GCR events (4-range and zero-suppressed events) tickle FSW bug 1156:

https://jira.slac.stanford.edu/browse/FSW-1156

This means that about 10 events per run will fail in the decompression. Because of the way the Halfpipe works we lose the complete datagram for each of these events. Since a datagram contains about 110 events we are currently losing about 1100 events per run. This corresponds to about 2.5 seconds of data for each 90 minute run.

The FSW group have a fix for bug 1156 and will upload a new build asap.

Note that currently there is no failure mode in CalRecon so events from this channel is not treated in any special way. NRL is working on this.

It should also be noted that the problem was caught immediately by two separate parts of the Data monitoring. The automatic alarms caught both the missing datagrams and the missing signal from the diode. These runs are marked as 'GOOD' by the DQM shifter, but with a comment attached to them. Obviously we will have to live with the missing diode signal from now on.

Some of you will not have failed to notice the irony that it's GCR events tickling FSW bug 1156 (hint: SSC-258) (smile)

anders

What has failed?

Some text As soon as we understand exactly what has failed in this GCFE, I'll type something here.

Consequences of this failure

On 27 Jul 2010 (Day 208), at 2:15 PM EDT, J. Eric Grove wrote:

Additional clarification:

Wiki MarkupWe do not see any signal in the high energy diode in tower 4, layer X1, column 4, + side. From Eric Siskind: "Whether \
[\]
Note that currently there is no failure mode in CalRecon so events from this channel is not treated in any special way. NRL is working on this.

What this means is that (until xtal recon is fixed):

  1. Any photon that MISSES this one crystal is not affected. It is correctly and properly reconstructed.
  2. Any photon that deposits LESS THAN about 1 GeV in this one crystal is not affected. It is correctly and properly reconstructed.
  3. Any photon that deposits MORE THAN about 1 GeV in this one crystal has an incorrect energy and position measurement in this one crystal, and therefore has an incorrect reconstructed incident energy and direction. The level of error in reconstructed incident energy and direction is surely energy-dependent, and I don't have an estimate of the magnitude yet.

In the above sentences, "photon" means "any event that is not read out by Trigger Engine 4, i.e. any event that is not read out in 4-range, zero-suppressed mode". I used the word photon to focus the discussion.

The combination of cases (1) and (2) covers the overwhelming majority of photons in the LAT dataset, so most events are perfectly fine, but clearly we need to implement a fix for this particular failure in the code that reconstructs crystal energy and position.

Eric

Once CAL xtal recon has been modified with the changes outlined below, the performance of this xtal will be essentially nominal, and the performance of the LAT will be unaffected.

Changes necessary to recon

As of July 2010, CAL recon code contained no mitigation against failures, despite our having discussed adding such code for the last 10+ years.  This failure requires that we do something.  The intent here is to fix the reconstruction of events that hit this xtal at the level of xtal recon so that the failure and fix are transparent to downstream energy reconstruction and clustering.  

We're adding status bits to xtal recon to indicate that the energy and position information returned by a xtal with compromised readout have been modified.  Any reconstruction code that follows xtal recon could make use of those bits to accept/reject that xtal or modify higher level recon algorithms, if the authors and architects of that code wish.

On 4 Aug 2010 (Day 216), at 4:59 PM EDT, J. Eric Grove wrote:

...

0

bad energy

1

bad longitudinal position

2

energy has been calculated by failure mitigation algorithm

3

energy has been calculated for corrected longitudinal position

4

position has been provided by external means

5

longitudinal position has been corrected for direct light

6

longitudinal position has been corrected for ambiguous ratio

7

unused

status of h/w

8

bad minus-face LEX8

9

bad minus-face LEX1

10

bad minus-face HEX8

11

bad minus-face HEX1

12

bad plus-face LEX8

13

bad plus-face LEX1

14

bad plus-face HEX8

15

bad plus-face HEX1

16-23

unused

...

24

minus-face LE autoranging disabled

25

minus-face HE autoranging disabled

26

plus-face LE autoranging disabled

27

plus-face HE autoranging disabled

28-31

unused

Algorithms for correcting current and not-unlikely future failures in FixXtalResp

On 5 Aug 2010 (Day 217), at 5:24 PM EDT, J. Eric Grove wrote:

Chul,
Here I've detailed the actions for FixXtalResp in the cases we currently understand. I've added a new CalXtalRecData status word definition [at the bottom of this email] adding Bill's corrected-energy bit and a bit to indicate that we've mitigated the failure.
Eric

Algorithms for correcting current and not-unlikely future failures in FixXtalResp
Code Block
case == bad plus-face HEX8 && bad plus-face HEX1 && ! plus-face HE autoranging disabled
/* this is the case for the current failure and current configuration */
/* if one of the HEX ranges is best range, fix it */
/* if plus-face first range == (HEX8 || HEX1),
	calculate E using the opposite face and the externally provided longitudinal position
	clear the bad energy bit
	set the externally provided longitudinal position bit
	(maybe set the failure-mitigation energy bit)
*/


case == bad plus-face HEX8 && bad plus-face HEX1 && plus-face HE autoranging disabled
/* this is the case for the current failure, but it requires a configuration we have not used */
/* if LEX1 is best range and saturated, fix it */
/* if plus-face first range == LEX1 && plus-face > 4050 (i.e. saturated),
	calculate E using the opposite face and the externally provided longitudinal position
	clear the bad energy bit
	set the externally provided longitudinal position bit
	(maybe set the failure-mitigation energy bit)
*/


case == bad plus-face LEX8 && bad plus-face LEX1 && bad plus-face HEX8 && bad plus-face HEX1
/* this is the case we may get to soon if this GCFE continues to degrade and the LE ranges fail */
/* in all cases,
	calculate E using the opposite face and the externally provided longitudinal position
	clear the bad energy bit
	set the externally provided longitudinal position bit
	(maybe set the failure-mitigation energy bit)
*/

Fix for Failed Electronics on Cal Crystal

Slides  for fixing Cal Response for T4L2C4