DAQ Troubleshooting - first level

Contents:

Recovering from errors: restarting the DAQ

To (re)start the DAQ system open the icon labeled "Restart DAQ" on the DAQ console:

In XPP or XCS use the "restartdaq <-w>" command. If you only want to stop the DAQ, call "stopdaq".

Troubleshooting the DAQ

Ami does not work: the DAQ is fine after a restart, but I see no updates in ami

Make sure that all dss nodes are selected. If you needed to take a node out due to problems, you need to edit the <hutch>.cnf file.

The DAQ shows an error message, requesting a restart, indicating a given IP as culprit

XPP/XCS: use "serverStat <ip>" to check if both interfaces of the node in question are up. This script will also tell you which machine has the issue.

One/both of the pings fails: use "serverStat <ip/node name> cycle" to power cycle the machine. After the script returns, continue to run "serverStat <ip/node name>" until both pings work. If you can ssh into the node, you can restart the DAQ.

Otherwise: decide if you'd rather restart the DAQ and hope for the best. Power cycling a machine takes a few minutes.

The IP is a dss-node: here you have an additional option: you can edit the <hutch>.cnf file to take out the node: look for "dss_nodes = [....]" and take out the problematic node.

(XPP/XCS specific):Is this the first node, notify the PCDS-POC as this node runs a special process that will NOT stop when the DAQ stops. Depending on the data rate, you can run with 2 or 3 nodes (cspad + other detectors: 3 nodes, two EPIX: 2 nodes). As we run all the data into a single ami session and the best mapping allows max one ami node/dss node, you have less ami power if you have less dss nodes.

One of my DAQ devices has a problem (damage,....):

use "serverStat <DAQ device alias>" to check on the health of the node. Most likely it is prudent to power-cycle this node. Does this not help, you should power cycle the detector/camera/device itself as well. If the problematic detector is a big CsPad, please note that after you turn the detector off, you will need to also power cycle the concentrator as it will not see all the quads when you power up again!

My ipimb has an issue:

Troubleshooting ipimbs is described on this page:IPIMB Troubleshooting for Controls IPIMBs

My data does not seem to be moving well:

check the status of the data moving here: Data Mover Monitoring, this page can be seen one the main "pswww" page as well. Take out the nodes with the issue, assuming your problem is limited to a single node. If it's wider spread, it might warrant a call.

The used dss nodes are listed in the <hutch>.cnf file in the dss_nodes line. For XPP, if you have to take out the first dss_node in the list, you need to kill the source process running on that note, one way to do that is to use serverStat to reboot that node. This process will remain after the DAQ has been stopped and if you start a second one, weird things will happen.

Technical note:

"serverStat" at this moment lives are /reg/g/xpp/scripts, but should work in all hutches. A better place for common scripts will created & populated soon.

Expert Troubleshooting (limited permissions)

Child pages