Hardware

In ATCA parlance, what physicists have been calling a crate is the shelf. Each shelf has a shelf manager, one or more power supplies, and a fan module. Each shelf has two or more slots, numbered from the bottom and starting with 1. On the shelf, there is a handy diagram showing the slot numbering. The boards provided by SLAC are called the Cluster on Board or COB and the Rear Transition Module or RTM. In the picture here, a COB is on the left, and an RTM is on the right.

Each COB has 5 bays for COB Mezannine Boards (CMB). In the above picture, the lower-left bay is for the Data Transmission Module (DTM) CMB. The DTM has the primary responsibility of controlling the 10-Gbit switch to its right, as well as controlling networking for all CMBs on the COB. The other four bays are for Data Processing Module (DPM) CMBs. These bays are numbered from 0 to 3 clockwise, from the upper left of the COB. Each DPM can read and write data on two multi-gigabit small form-factor pluggable (SFP) transceivers mounted into the RTM. The numbering on the RTM corresponds to the bay number.

The RTM also contains SFP cages for the ethernet networking. The left-most cage (as you face the back of the COB) should contain a 1 Gbit copper CAT6 module. The next cage to the right of the 1Gbit cage can handle 10Gbit fiber-optic transceivers. At the time of the BNL workshop (2012/06/25), only the left-most slot in each cage is active.

At time of the BNL workshop (2012/06/25) the COB networking does not work when the COB is in slot 2.

Hot plugging

Plugging in a COB/RTM pair is fairly straightforward:

  1. Seat the board on the rails in the shelf.
  2. Release the red handle releases so that the handles are open.
  3. Slide the board in until the handles start to engage.
  4. Press firmly on the black portion of the handles until a click is heard.
    • Note: keep your fingers off the red handle releases, or the handles will not engage and the COB will not power on. If this happens, release the handles and re-seat them.
  5. On the COB, observe the blue light go out in a few seconds, followed by the DTM reporting DHCP then COB on the front panel. Shortly after this, the green health LED should show on the front panel. If this light does not appear, the yellow failure LED will show.

Removing a COB/RTM pair is similar:

  1. Press the red handle releases until the handles click and wait until the blue "ready to remove" LED lights.
    • This may not happen on the current RTM, as it may not be implemented.
  2. Lever the handles out until the board is released, and then remove the board.

Hardware in each shelf

  • Ritter shelf @ BNL:
    • COB serial number:
    • DTM serial number:
    • DPM serial number:
  • Darwin shelf @ Penn:
    • COB serial number:
    • DTM serial number:
    • DPM serial number:

Setup

All software from this trip is available in tarball form at http://www.slac.stanford.edu/~panetta/LSST_BNL

  1. BNL_120622.tgz — The release of i386-linux and ppc-rtems-rce405 compatible executables, libraries and include files.
  2. exampleSource.tgz — Example register and image client code (see below)
  3. workspace.tgz — An empty workspace for building code in the environment that created the release (see below)
  4. tools.tgz — Scripts for using ipmitool (see below)
  5. firmware.tgz — Firmware for DPM, DTM and IPMC (for emergencies)
  6. ipmitool-1.8.9-pps-10.tar.gz — PigeonPoint Software's optimizations to ipmitool. Needs to be compiled.
    • Note: ipmitool-pps is not required for using the SLAC COBs. The standard version of ipmitool supplied with Linux will suffice for most purposes, but will be slower for firmware loads.

The following is required for the system to function on Linux.

  • Linux must have 32-bit compatibility turned on if it is x86_64.
  • DHCP must be available for the COB Mezzannine boards
  • NFS should be available in order to replace software on the CMBs.
  • Software firewalls such as iptables must be configured so that UDP packets from the CMB are accepted.
    • In the case of Penn's system, this required adding the following rule to iptables:
      iptables -I INPUT 4 -t filter -p udp -s 128.91.41.132 -j ACCEPT

Unpack all tarballs to a directory readable by your group. Create a setup script to be run when using code in this environment with the following contents:

setup.sh
# setup.sh
# BASH setup script for usage of SLAC DAT release

# change the following to where you unpacked the tarballs
export LSST_ROOT=/installdir

# Define the location of the DAT release:
export DAT_RELEASE=${LSST_ROOT}/BNL_120622

# PATH
export PATH=${PATH}:${DAT_RELEASE}/bin/i386-linux-opt
export PATH=${PATH}:${LSST_ROOT}/tools/bin

# LD_LIBRARY_PATH and PYTHONPATH
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${DAT_RELEASE}/lib/i386-linux-opt
export PYTHONPATH=${PYTHONPATH}:${DAT_RELEASE}/lib/i386-linux-opt


Pre-built executables

Several linux executables are provided as part of the release, and will be in your path:

  • rri — A simple client to read/write remote registers.
    • Usage: rri <rtmID> read <regAddress> [<nToRead>]
      rri <rtmID> write <regAddress> <value> [<value> ...]
  • imageClient — Sets up a UDP client for the image transfer protocol
    • Usage: imageClient <rtmID>
  • atca_dump — Query the networking information from a COB Mezannine board
    • Usage: atca_dump <shelf>/<slot>/<CMB>/<element>
      atca_dump ritter/5/2/0

Coding using the blank_workspace environment

Full documentation on the coding environment is at DAT Build and Release system. What follows is a short quickstart.

  1. Set up your environment with the setup.sh script
  2. Untar the workspace.tgz file to create a copy of blank_workspace.
  3. Create a "project" under a copy of blank_workspace:
    • mkdir <prjname>
    • make/prjupdate.py --project <prjname>
  4. Create a "package" under your project:
    • make/pkgcreate.py --project <prjname> --package <pkgname>
  5. Copy make/share/packages.mk.template to <prjname> and rename to packages.mk
  6. Edit this file to add <pkgname> to the package list
  7. Copy make/sw/constituents.mk.template to <prjname>/<pkgname> and rename to constituents.mk
  8. Edit constituents.mk to add your code. Examples are in the file.
  9. make all

Remote Registers (rriClient.cc)

rriClient.cc illustrates the how we in the DAT group construct a simple client for remote register access. The first item of note is the inclusion of datCode.hh on line 5. This include file exists both in the include/ directory of $DAT_RELEASE and in the workspace. It provides several #define}}s used to rationalize the {{#include syntax in the face of many different hardware architectures and operating systems. Perusing this file may be instructive.

Farther down we create an instance of lsst::slice::Client, passing the ID of the RCM to the constructor. This class creates and handles the connection to the CMB, which is not exposed to the user. The Client class exposes two methods:

     /** @brief Write a value to a specific register address
          @param regAddr
          @param value
          @note: synchronous function.  Will block until write succeeds or fails
          @throws WriteError
       */
      void write(uint32_t regAddr, uint32_t value);

      /** @brief Read a value from a register address
          @param regAddr
          @return uint32_t value
          @throws ReadError
       */
      uint32_t read(uint32_t regAddr);

These methods are both synchronous, and will block until they return. If an error occurs anywhere in the system, each will throw an exception detailing what the error was. write will throw lsst::slice::WriteError and read will throw lsst::slice::ReadError. The most common error is when the register to be written or read either does not exist in the RCM firmware. In this case the error text will contain text about an "internal PGP error". Attempting to write to a read-only register will also generate an internal PGP error.

Images imageClient.cc

imageClient.cc is structured similarly to rriClient.cc. Here, we start by creating instances of lsst::cdi::Subscriber and lsst::cdi::Client, each referencing the RCM id. We then request an image from the Subscriber with the image() member function. This function camps on a socket until the server (on the DPM) posts an Image. The Client class then reads the image into a buffer and returns true/false depending on whether the operation was successful. We then store the image to disk.

The lsst::cdi::Image class is meant to be the holder of the Image metadata. At this time, we expose two member functions:

   /** @brief Return the unique 64-bit identifier of this image
   */
   uint64_t tag();
   /** @brief Return the length (in bytes) of this image
     */
   uint32_t length();

In the future, all image metadata will be exposed through the Image class.


Useful Scripts

The tools directory under LSST_ROOT contains several shell scripts created to make it easier to interact with the COBs.

ipmi_activate / ipmi_deactivate

ipmi_activate and ipmi_deactivate are used to activate and deactivate the COB payload power. This is used to force a hard reset of the COB and everything on it. In ATCA parlance, ipmi_activate initiates the M1->M2 state transition, and ipmi_deactivate initiates the M4->M5 transition.

Usage Example
  ipmi_activate ritter 5
  ipmi_deactivate ritter 5

ipmi_boot / ipmi_bootstrap

ipmi_boot and ipmi_bootstrap control the booting behavior of a CMB in a COB. ipmi_boot causes an immediate boot of one or more CMBs, while ipmi_bootstrap sets or prints the "bootstrap word" in persistent memory on the CMB and does not immediately boot the board. For the systems shipped to BNL, the bootstrap word is 0x80000. The COB element number for DPMs is equal to (bay+1)<<1.

Usage Example
  ipmi_boot ritter 5 2 0x80000

ipmi_eeprom

ipmi_eeprom can dump the information contained in persistent memory on a CMB. Typically, the end user won't need to use this command except to provide information to SLAC for debugging purposes.

ipmi_id

ipmi_id dumps the information in a serial number PROM located on a CMB board.

ipmi_bsi

ipmi_bsi reads data from the BootStrap Interface memory on a CMB. Typically, the end user won't need to use this command except to provide information to SLAC for debugging purposes.

ipmi_write

ipmi_write writes data to the BootStrap Interface memory on a CMB. Typically, the end user won't need to use this command except when specifically required to by SLAC. Misusing this command will render the CMB inoperable.

reset_cable.sh

reset_cable.sh cleans up leftover cable locks on XILINX programming dongles. It typically needs to be run as root in /tmp to work correctly.

Attachments

  • RCM Firmware used for final testing at BNL (.bit) (.mcs) – From Stefano
  • Documentation for Firmware prepared by Stefano (pdf)
  • No labels