Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Larry thinks that these are in the raw units read out from the device (mW) and says that to convert to dBm use the following formula: 10*log(10)(val/1mW).  For example, 0.6 corresponds to -2.2dBm.

Code Block
(ps-4.1.2) tmo-daq:scripts> pvget DAQ:NEH:XPM:0:SFPSTATUS
DAQ:NEH:XPM:0:SFPSTATUS 2021-01-13 14:36:15.450    
LossOfSignal ModuleAbsent TxPower RxPower
           0            0  6.5535  6.5535
           1            0  0.5701  0.0001
           0            0  0.5883  0.7572
           0            0  0.5746  0.5679
           0            0  0.8134   0.738
           0            0  0.6844    0.88
           0            0  0.5942  0.4925
           0            0  0.5218  0.7779
           1            0   0.608  0.0001
           0            0  0.5419  0.3033
           1            0  0.6652  0.0001
           0            0  0.5177  0.8751
           1            1       0       0
           0            0  0.7723   0.201

Incorrect Fiducial Rates

In Jan. 2023 Matt saw a failure mode where xpmpva showed 2kHz fiducial rate instead of the expected 930kHz.  This was traced to an upstream accelerator timing distribution module being uninitialized.

No RxRcv/RxErr Frames in xpmpva

If RxRcv/RxErr frames are stuck in xpmpva it may be that the network interface to the ATCA crate is not set up for jumbo frames.

Link Lock Issues

If XPM links don't lock, here are some past causes:

  • check that transceivers (especially QSFP, which can be difficult) are fully plugged in.
  • for opal detectors:
    • use devGui to toggle between xpmmini/LCLS2 timing (Matt has added this to the opal config script, but to the part that executes at startup time)
    • hit TxPhyReset in the devGui (this is now done in the opal drp executable)
  • for timing system detectors: run "kcuSim -s -d /dev/datadev_1", this should also be done when one runs a drp process on the drp node (to initialize the timing registers).  the drp executable in this case doesn't need any transitions.
  • hit Tx/Rx reset on xpmpva gui (AMC tabs).
  • use loopback fibers (or set a loopback in xpmpva) to determine which side has the problem
  • try swapping fibers in the BOS to see if the problem is on the xpm side or the kcu side
  • we saw once where we have to power cycle a camlink drp node to make the xpm timing link lock.  Matt suggests that perhaps hitting PLL resets in the rogue gui could be a more delicate way of doing this
  • (old information with the old/broken BOS) Valerio and Matt had noticed that the BOS sometimes lets its connections deteriorate. To fix:
    • ssh root@osw-daq-calients320
    • omm-ctrl --reset

Timing Frames Not Properly Received

  • do TXreset on appropriate port
  • toggling between xpmmini and lcls2 timing can fix (we have put this in the code now, previously was lcls1-to-lcls2 timing toggle in the code)
  • sometimes xpm's have become confused and think they are receiving 26MHz timing frames when they should be 0.9MHz (this can be seen in the upstream-timing tab of xpmpva ("UsTiming").  you can determine which xpm is responsible by putting each link in loopback mode: if it is working properly you should see 0.9MHz of rx frames in loopback mode (normally 20MHz of frames in normal mode).  Proceed upstream until you find a working xpm, then do tx resets (and rx?) downstream to fix them, 

Network Connection Difficulty

Saw this error on Nov. 2 2021 in lab3 over and over:

Code Block
WARNING:pyrogue.Device.UdpRssiPack.rudpReg:host=10.0.2.102, port=8193 -> Establishing link ...

Matt writes:

That error could mean that some other pyxpm process is connected to it.  Using ping should show if the device is really off the network, which seems to be the case.  You can also use "amcc_dump_bsi --all shm-tst-lab2-atca02" to see the status of the ATCA boards from the shelf manager's view.  (source /afs/slac/g/reseng/IPMC/env.sh[csh] or source /cds/sw/package/IPMC/env.sh[csh])  It looks like the boards in slots 2 and 4 had lost ethernet connectivity (with the ATCA switch) but should be good now.  None of the boards respond to ping, so I'm guessing its the ATCA switch that's failed.  The power on that board can also be cycled with "fru_deactivate, fru_activate".  I did that, and now they all respond to ping.

Switching Between Internal/External Timing

Do this if accelerator timing is gone.  Can see the timestamp is "wonky" to quote Dr. Matthew Weaver.

Only have to do this for the master xpm that is receiving the accelerator timing.  Run "pyxpm --ip 10.0.1.102 -I -P DAQ:NEH:XPM:0" (i.e. remove the database flags).  The "-I" flag does some initialization, in particular the CuInput flag (DAQ:NEH:XPM:0:XTPG:CuInput) is set to 1 (for internal timing) instead of 0 (external timing with first RTM SFP input, presumably labelled "EVR[0]" on the RTM, but we are not certain) or 3 (second RTM SFP timing input labelled "EVR[1]" on the RTM).

Matt says there are three types of XPM firmware: (1) and XTPG version which requires an RTM input (2) a standard XPM version which requires RTM input (3) a version which gets its timing input from AMC0 port 0.


This file puts xpm-0 in internal timing mode: https://github.com/slac-lcls/lcls2/blob/master/psdaq/psdaq/cnf/internal-neh-base.cnf.  Note that in internal timing mode the L0Delay (per-readout-group) seems to default to 90.  Fix it with pvput DAQ:NEH:XPM:0:PART:0:L0Delay 80".

One should switch back to external mode by setting CuInput to 0 in xpmpva CuTiming tab.  Still want to switch to external-timing cnf file after this is done.  Check that the FiducialErr box is not checked (try ClearErr to see if it fixes).  If this doesn't clear it can be a sign that ACR has put it "wrong divisor" on their end.

...