Publications
Requirements
- Support targeting "point-to-point" and ""
- "point-to-point" by default (better logic optimization)
- 3 layer of Ethernet:
- Layer1: PHY
- Layer2: MAC/FEC
- Layer3: PGP ETH V1HTSP
Layer 3 Ethernet framing
- Wrapper on the Ethernet MAC
- Targeting 100G Hard IP MAC
- 512-bit AXI Stream interface
- IEEE 802.3bj Reed-Solomon Forward Error Correction (RS-FEC)
- Frame Check Sequence (FCS) checking, adding and deleting
- 512-bit header and 3248-bit footer
- Header and payload always 512-bit wide
- Simplify the logic for 100GbE MAC interface (no AxiStreamShifting required because matched to MAC's AXI stream 512-bit width)
- Chuck up the stream to MAX_SIZE (1024B default)
8196B default)
- 91.8% efficient = 1024B/(1024B + 4B FCS + 12B intergap + 8B preamble + 64B PGP Ethernet Header + 4B PGP Ethernet Footer)
- 95.7% efficient = 2048B/(2048B + 4B FCS + 12B intergap + 8B preamble + 64B PGP Ethernet Header + 4B PGP Ethernet Footer)
- 97.8% efficient = 4096B/(4096B + 4B FCS + 12B intergap + 8B preamble + 64B PGP Ethernet Header + 4B PGP Ethernet Footer)
- 98.9% efficient = 8196B/(8196B + 4B FCS + 12B intergap + 8B preamble + 64B PGP 64B HTSP Ethernet Header + 4B PGP HTSP Ethernet Footer)
- Smaller MAX_SIZE lower the latency for publishing the virtual channel "pause" status to the remote side but increases overhead
- Use the MAC's FCS for error checking
- Virtual channel flow control done at the L3 layer
...
- (instead of ETH pause)
- Support up to 16 Virtual channels on the same link
Word# | Word's BYTE | Type | Name | Description | Note |
---|
0 | [5:0] | Header | DestMac | Destination MAC | |
0 | [11:6] | Header | SrcMac | Source MAC | |
0 | [13:12] | Header | EtherType | TBD Value | |
0 | 14 | Header | Version | 0x1 | |
0 | 15 | Header | TID | Transaction ID | - Increments once per packet
- Used for out-of-order frame reorganizing in network mode
|
0 | [17:16] | Header | Pause | Virtual Channel Pause | - 1 bit per Virtual Channel
|
0 | 18 | Header | VC | Virtual Channel Index | - BIT[3:0]: Virtual Channel Index
- BIT[7:4]: Reserved
|
0 | 19 | Header | tUserFirst | first 8-bits of tUser |
|
|
0 | 20 | Header | OpCodeEn | OP-code Enable |
3129:20] | Header | Reserved | Reserved | |
0 | [31:30] | Header | HdrXsum | 16-bit Header Checksum | |
0 | [47:32] | Header | OpCodeData | 128-bit OP-code Data | - Supports 64-bit timestamp + more information
|
0 | [63:48] | Header | UserData | 128-bit User Data | - Sampled every packet sent
|
1 | [63:0] | Payload | AXIS Data | User Payload Data |
|
|
(MAX_SIZE/64) | [63:0] | Payload | AXIS Data | User Payload Data |
|
|
(MAX_SIZE/64)+1 | 0 | Footer | tKeepLast | TKEEP on last payload word | - Converted from byte map to an integer
|
(MAX_SIZE/64)+1 | 1 | Footer | EOF/EOFE | EOF and EOFE marker | |
(MAX_SIZE/64)+1 | [3:2] | Footer | Pause | Virtual Channel Pause | - paused bits are sampled and latched during the payload transport
|
(MAX_SIZE/64)+1 | [5:4] | Footer | PayloadSize | Number of bytes in payload | |
Performance
Image Added
Going "point-to-point" we were measuring 97.3 Gb/s bandwidth for 1MB frames using 8196B HTSP burst size. 97.3 Gb/s is < 98.9% theoretical limit, which we think is related to some unaccounted inefficiency in the CMAC4 hard IP core that we used in this testing. The DMA bandwidth limit in this test was 101 Gb/s, which means the DMA was not back pressuring on the HTSP link. Auto-polling register channel access on the HTSP link on a different virtual channel was enabled in this testing.