Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

After PGP Version 3 was developed and deployed, it was determined that there was no bit error checking on the K-code words.  We discovered this issue during a application that had poor signal integrity of the high speed PGPv3 routing on the circuit board. 

PGP Version 4 is nearly identical to PGP Version 3. The primary different difference between Version 3 and Version 4 is the addition of a 8a 8-bit CRC checksum within the every 64-bit kK-core code word.  Since we couldn't find a way to fit the new 8new 8-bit CRC checksum into the existing PGP Version 3 protocol to make it forward compatible, we decided to increment the version field and reorganize the K-code word formatting.  The data transport and non-virtual channel metadata are the same Version 3 and Version 4.

To support the 8-bit checksum, the RemoteLinkData is reduced from 56-bit to 48-bits.

To support the 8-bit checksum on the SOF/SOC K-code, VC OVERFLOW bits are no longer part of the LINKINFO structure. The VC OVERFLOW data is sent only on IDLE K-codes and not on SOF/SOC K-codes as it is in PGP3.

Based on experience of implementing PGP Version 3, we determined that only 1 USER K-code was all that's needed.  In Version 4, we reduce the number of USER codes from 8 to 1 which help reduce the gate logic resource usage. This also leaves 7 reserved BTF codes available for future use.

Difference between PGP4 to Pgp4Lite

PGP Version 4 Lite (Pgp4Lite) is nearly identical to PGP Version 4 (Pgp4).  But Pgp4Lite only supports a subset of all the Version 4 features.  Here are the features that are not supported in Pgp4Lite:

  • no SOC (Start of Cell)
  • no EOC (End of Cell)
  • VC (Virtual Channel) is supported but interleaving is NOT supported

Link Layer

PGP4 uses 64b/66b encoding to achieve DC balance of the serial data stream. Each 64-bit word is scrambled with a source synchronous scrambler with polynomial G(x)=x58+x29+1. Two bits are then appended to each word, 0b01 to mark regular data, and 0b10 to mark control characters (K-Codes). This ensures that a transition between 0 and 1 at least once every 66 bits. It is also used for word alignment. These 66-bit words can then be serialized and deserialized using the high speed transceivers found in modern FPGAs. The protocol does not specify any link rates, and any link speed may be targeted provided that the link medium and FPGA on each side can support it.

...

The protocol defines several K-Codes to indicate data framing, flow-control, opcodes, and other metadata. For all K-Codes, the most significant 8 bits of the 64-bit word indicate which code it is. This is known as the Block Type Field (BTF). The next byte after BTF is a 8-bit checksum of entire 64-word (including BTF).   This 8-bit CRC uses G(x)=x8+x2+x+1 polynomial. The lower 48 bits are then specified differently depending on the K-Code. 

K-Code nameBTF
IDLE0x99
SOF (Start of Frame)0xAA
EOF (End of Frame)0x55
SOC (Start of Cell)0xCC
EOC (End of Cell)0x33
SKIP0x66
USER0USER0x78
USER10x87
USER20x2D
USER30xD2
USER40x1E
USER50xE1
USER60xB4
USER70x4B

LINKINFO Structure

Flow control in performed on a per virtual channel basis. Each received Virtual Channel is expected to be separately buffered in external logic, with PAUSE signals from the buffer fed back to the PGP4 block. The PAUSE signal indicates that the buffer has less than Cell (128 words) of space remaining. The receive buffer status of all Virtual Channels is grouped into a 32-bit LINKINFO structure, which is included in each IDLE, SOF and SOC code that is transmitted. The maximum cell size of 128 words grantees that any change in buffer fill status will be transmitted back upstream within at most 128 word-clock cycles

...

SKIP codes are sent once every 5000 words. They are used to mitigate clock drift between the oscillators on either side of the link. SKIP characters are not written into the elastic buffer in the receive logic, allowing the buffer to avoid overflows when the transit clock on one side of the link is slightly faster than the receive clock on the other.  The lower 48-bit data field is called "RemoteLinkData" and used to publish status data between the two end-point with high latency with no guarantee of transmission.  RemoteLinkData is intended for high level, slow changing status bit communication (Example: Board ID Number).

Individual PGP4 TX implementations my choose to tweak the SKIP code period to suit the conditions of the system where they are deployed. For instance, systems that share a common clock on either side of the link may not need to send SKIP codes at all. The PGP4 RX logic will handle SKIP characters as it sees them, but does not expect them at any specific frequency.

Bit(s)Name
0-47RemoteLinkData
48-55checksum for k-code
56-63BTF = 0x66

...

SOF (Start of Frame) or SOC (Start of Cell) codes are sent when at the start of data payload transmission.

Bit(s)Name
0-31LINKINFO
32-35Virtual Channel
36-47Packet number
48-55checksum for k-code
56-63BTF: SOF=0xAA or SOC=0xCC

...

EOF (End of Frame) or EOC (End of Cell) codes are sent when at the start end of data payload transmission.

Bit(s)Name
0-7TLAST USER
8-11Reserved (zeros)
12-15Last byte count
16-4732-bit CRC for data payload
48-55checksum for k-code
56-63BTF: EOF=0x55 or EOC=0x33

...

The Opcode interface allows for 48-bit user opcodes to be transmitted sideband of any Virtual Channel data frames. Opcode transmission takes priority over frame data transmission. Opcodes are contained in a single K-Code, and may therefore be placed in the middle of a cell sequence. Each opcode can also be assigned to one of 8 opcode channels, so that opcodes directed toward different logic units may be multiplex together.

Bit(s)Name
0-47OP-Code Data
48-55checksum for k-code
56-63BTF = 0x78

Startup Sequence

The transmit logic begins by sending at least 1000 IDLE characters. It then continues to send only IDLE and SKIP codes until flow control indicates that the receive logic on the other end has locked on and aligned to the data stream (RXREADY). This assures that the receive logic will see a “10” sequence exactly every 66 bits for as long as necessary until it can establish proper word alignment. A word alignment state machine in the receive logic allows bits  to slip in the deserializer until it sees 128 words in a row that begin with the “10” control sequence, at which point it is locked. At this point, the RXREADY signal is The RXREADY flag is then asserted and sent in every LINKINFO message to the other side, allowing enabling the other side to begin transmitting user data.

...

PGP4 supports up to 16 Virtual Channels, each with its own discrete 64-bit wide bidirectional AXI-Stream interface. Through these channels, frame-based data is multiplexed through the link.. Virtual channels are selected for transmission in round-robin priority, so that each channel has equal access to the available link bandwidth. Interleaving is enabled by default, so that once a Cell of 128 words from a frame on one channel have been accepted, the next Cell will be taken from a different channel that is requesting transmission. This setting is highly recommended so that long frames on one Virtual Channel do not starve out all of the other channels. Data frames send sent on the Virtual Channel may be any number of bytes. The interface supports the AXI-Stream TKEEP signal on the final transaction of a frame to indicate a number of bytes between 0 and 8.

...

The number of Virtual Channels supported can be configured via VHDL generics, allowing for better resource utilization when fewer channels are needed. Total resource utilization depends on the number of Virtual Channels synthesized and the amount of buffering required per channel. The “core” of the PGP4 protocol with all 16 channels synthesized, takes up the following resources on Xilinx FPGAs:

PGP4Lite Implementation Discussion

For Pgp4TxLite, we don't instantiate the Packetizer, and instead have a customized Pgp4TxLiteProtocol block that is 90% similar to Pgp4TxProtocol, but includes the CRC logic normally handled by the Packetizer. This also removes a pipeline register stage, which is useful in ASICs to reduce area.In Pgp4RxLite, we've instead added a new Depacketizer mode that removes the packet sequence RAM. I imagine this removes a good chunk of the depacketizer logic, but the depacketizer register stage is still there. This is probably a decent tradeoff. I guess we just want to be aware that there is additional optimization to be had if needed in the future.

Going forward, it might make sense not to define "PGP4Lite" as such. Instead we'd just have a set of features which may or may not be supported.

  • Cell Size
  • Max Frame Size (set == Cell Size to disable SOC/EOC)
  • NUM VC
  • Interleaving
  • Elastic Buffer - (Not needed when source synchronous)

Then the logic gets optimized based on the settings. We'd have to change the current "TxLite" to be more like "RxLite", with an optimized packetizer. A lot of automatic logic optimization is possible around the settings above, except for the extra pipeline stage in the packetizer/depacketizer.

Contact

Ben Reese

bareese@slac.stanford.edu