Introduction

Modern High Energy Physics and Photon Science experiments now require high-speed serial data links of 10 Gbps and higher for their data acquisition systems. These links must be lightweight, with low protocol overhead and small FPGA resource utilization. Often these links will multiplex data from several sources, so the link protocol should also support the idea of multiple “Virtual Channels” per physical link. Based on our experience developing the PGP2 protocol, supported link rates of up to 6 Gbps using 8b/10b encoding, we have now developed the PGP4 protocol to support link rates in excess of 10 Gbps using 64b/66b encoding.

Difference between PGP Version 3 to Version 4

After PGP Version 3 was developed and deployed, it was determined that there was no bit error checking on the K-code words.  We discovered this issue during a application that had poor signal integrity of the high speed PGPv3 routing on the circuit board. 

PGP Version 4 is nearly identical to PGP Version 3. The primary difference between Version 3 and Version 4 is the addition of a 8-bit checksum within every 64-bit K-code word.  Since we couldn't find a way to fit the new 8-bit checksum into the existing PGP Version 3 protocol to make it forward compatible, we decided to increment the version field and reorganize the K-code word formatting.  The data transport and non-virtual channel metadata are the same Version 3 and Version 4.

To support the 8-bit checksum, the RemoteLinkData is reduced from 56-bit to 48-bits.

To support the 8-bit checksum on the SOF/SOC K-code, VC OVERFLOW bits are no longer part of the LINKINFO structure. The VC OVERFLOW data is sent only on IDLE K-codes and not on SOF/SOC K-codes as it is in PGP3.

Based on experience of implementing PGP Version 3, we determined that only 1 USER K-code was all that's needed.  In Version 4, we reduce the number of USER codes from 8 to 1 which help reduce the gate logic resource usage. This also leaves 7 reserved BTF codes available for future use.

Difference between PGP4 to Pgp4Lite

PGP Version 4 Lite (Pgp4Lite) is nearly identical to PGP Version 4 (Pgp4).  But Pgp4Lite only supports a subset of all the Version 4 features.  Here are the features that are not supported in Pgp4Lite:

  • no SOC (Start of Cell)
  • no EOC (End of Cell)
  • VC (Virtual Channel) is supported but interleaving is NOT supported
  • No limit on frame size due to not having a AxiStreamMux and not having a packetizer 

Link Layer

PGP4 uses 64b/66b encoding to achieve DC balance of the serial data stream. Each 64-bit word is scrambled with a source synchronous scrambler with polynomial G(x)=x58+x29+1. Two bits are then appended to each word, 0b01 to mark regular data, and 0b10 to mark control characters (K-Codes). This ensures that a transition between 0 and 1 at least once every 66 bits. It is also used for word alignment. These 66-bit words can then be serialized and deserialized using the high speed transceivers found in modern FPGAs. The protocol does not specify any link rates, and any link speed may be targeted provided that the link medium and FPGA on each side can support it.

K-Codes

The protocol defines several K-Codes to indicate data framing, flow-control, opcodes, and other metadata. For all K-Codes, the most significant 8 bits of the 64-bit word indicate which code it is. This is known as the Block Type Field (BTF). The next byte after BTF is a 8-bit checksum of entire 64-word (including BTF). The lower 48 bits are then specified differently depending on the K-Code. 

K-Code nameBTF
IDLE0x99
SOF (Start of Frame)0xAA
EOF (End of Frame)0x55
SOC (Start of Cell)0xCC
EOC (End of Cell)0x33
SKIP0x66
USER0x78

LINKINFO Structure

Flow control in performed on a per virtual channel basis. Each received Virtual Channel is expected to be separately buffered in external logic, with PAUSE signals from the buffer fed back to the PGP4 block. The PAUSE signal indicates that the buffer has less than Cell (128 words) of space remaining. The receive buffer status of all Virtual Channels is grouped into a 32-bit LINKINFO structure, which is included in each IDLE, SOF and SOC code that is transmitted. The maximum cell size of 128 words grantees that any change in buffer fill status will be transmitted back upstream within at most 128 word-clock cycles

Bit(s)Name
0-7PGP Version (Always 0x4)
8RXREADY
9-15Reserved (zeros)
16-31VC 0-15 PAUSE

K-Code: SKIP

SKIP codes are sent once every 5000 words. They are used to mitigate clock drift between the oscillators on either side of the link. SKIP characters are not written into the elastic buffer in the receive logic, allowing the buffer to avoid overflows when the transit clock on one side of the link is slightly faster than the receive clock on the other.  The lower 48-bit data field is called "RemoteLinkData" and used to publish status data between the two end-point with high latency with no guarantee of transmission.  RemoteLinkData is intended for high level, slow changing status bit communication (Example: Board ID Number).

Individual PGP4 TX implementations my choose to tweak the SKIP code period to suit the conditions of the system where they are deployed. For instance, systems that share a common clock on either side of the link may not need to send SKIP codes at all. The PGP4 RX logic will handle SKIP characters as it sees them, but does not expect them at any specific frequency.

Bit(s)Name
0-47RemoteLinkData
48-55checksum for k-code
56-63BTF = 0x66

K-Code: IDLE

IDLE codes are sent when the transmit logic has nothing else to send, or when flow control indicates that the downstream side is unable to receive data.

Bit(s)Name
0-31LINKINFO
32-47VC 0-15 OVERFLOW Event
48-55checksum for k-code
56-63BTF = 0x99

K-Code: SOF/SOC

SOF (Start of Frame) or SOC (Start of Cell) codes are sent at the start of data payload transmission.

Bit(s)Name
0-31LINKINFO
32-35Virtual Channel
36-47Packet number
48-55checksum for k-code
56-63BTF: SOF=0xAA or SOC=0xCC

K-Code: EOF/EOC

EOF (End of Frame) or EOC (End of Cell) codes are sent at the end of data payload transmission.

Bit(s)Name
0-7TLAST USER
8-11Reserved (zeros)
12-15Last byte count
16-4732-bit CRC for data payload
48-55checksum for k-code
56-63BTF: EOF=0x55 or EOC=0x33

Data Cells

Data frames received on the user AXI-Stream interface are broken into Cells of at most 128 words each. The first cell of a frame in indicated by the SOF (start-of-frame) character. Subsequent cells belonging to the same frame begin with the SOC character to indicate that they are a continuation of frame data. SOF/SOC characters also contain the Virtual Channel number, a sequence field to check whether Cells have been dropped, and the LINKINFO data to indicate flow control status. Cells are terminated with an EOF character to indicate that a cell is the last of a frame, or and EOC character to indicate that more data is expected from the current frame. EOF/EOC characters also contain a 32-bit CRC that is computed over all the data in a cell (excluding the SOF/SOC), with the CRC from the previous cell of the frame used as the starting value for the new CRC calculation. The CRC polynomial is identical to that used for Ethernet and Aurora 64b/66b:

G(x)=x32+x26+ x23+x22+x16+x12+x11+x10+x8+x7+x5+x4+x2+x+1

User Opcodes

The Opcode interface allows for 48-bit user opcodes to be transmitted sideband of any Virtual Channel data frames. Opcode transmission takes priority over frame data transmission. Opcodes are contained in a single K-Code, and may therefore be placed in the middle of a cell sequence.

Bit(s)Name
0-47OP-Code Data
48-55checksum for k-code
56-63BTF = 0x78

Startup Sequence

The transmit logic begins by sending at least 1000 IDLE characters. It then continues to send only IDLE and SKIP codes until flow control indicates that the receive logic on the other end has locked on and aligned to the data stream (RXREADY). This assures that the receive logic will see a “10” sequence exactly every 66 bits for as long as necessary until it can establish proper word alignment. A word alignment state machine in the receive logic allows bits  to slip in the deserializer until it sees 128 words in a row that begin with the “10” control sequence, at which point it is locked. The RXREADY flag is then asserted and sent in every LINKINFO message to the other side, enabling the other side to begin transmitting user data.

Unidirectional Mode

The PGP4 protocol normally relies on a full-duplex link in order for flow control information to be passed from receiver back to sender. Some system architectures however simply cannot support bidirectional link lines, and so PGP4 includes a unidirectional mode. In this mode, the system essentially operates without flow control, with the transmit side assuming that the receive side always has buffer space available for new data. Care must be taken then that the receive side will always process incoming data fast enough to keep up with the maximum data transmission rate.

User Interface

PGP4 supports up to 16 Virtual Channels, each with its own discrete 64-bit wide bidirectional AXI-Stream interface. Through these channels, frame-based data is multiplexed through the link.. Virtual channels are selected for transmission in round-robin priority, so that each channel has equal access to the available link bandwidth. Interleaving is enabled by default, so that once a Cell of 128 words from a frame on one channel have been accepted, the next Cell will be taken from a different channel that is requesting transmission. This setting is highly recommended so that long frames on one Virtual Channel do not starve out all of the other channels. Data frames sent on the Virtual Channel may be any number of bytes. The interface supports the AXI-Stream TKEEP signal on the final transaction of a frame to indicate a number of bytes between 0 and 8.

Implementation

The PGP4 protocol is implemented in synthesizable VHDL. It is open source, released under a permissive modified BSD license. It is included in the SURF firmware library from SLAC, available on Github at github.com/slaclab/surf.

The number of Virtual Channels supported can be configured via VHDL generics, allowing for better resource utilization when fewer channels are needed. Total resource utilization depends on the number of Virtual Channels synthesized and the amount of buffering required per channel. 

PGP4Lite Implementation Discussion

For Pgp4TxLite, we don't instantiate the Packetizer, and instead have a customized Pgp4TxLiteProtocol block that is 90% similar to Pgp4TxProtocol, but includes the CRC logic normally handled by the Packetizer. This also removes a pipeline register stage, which is useful in ASICs to reduce area.In Pgp4RxLite, we've instead added a new Depacketizer mode that removes the packet sequence RAM. I imagine this removes a good chunk of the depacketizer logic, but the depacketizer register stage is still there. This is probably a decent tradeoff. I guess we just want to be aware that there is additional optimization to be had if needed in the future.

Going forward, it might make sense not to define "PGP4Lite" as such. Instead we'd just have a set of features which may or may not be supported.

  • Cell Size
  • Max Frame Size (set == Cell Size to disable SOC/EOC)
  • NUM VC
  • Interleaving
  • Elastic Buffer - (Not needed when source synchronous)

Then the logic gets optimized based on the settings. We'd have to change the current "TxLite" to be more like "RxLite", with an optimized packetizer. A lot of automatic logic optimization is possible around the settings above, except for the extra pipeline stage in the packetizer/depacketizer.

Contact

Ben Reese

bareese@slac.stanford.edu

  • No labels