Table of Content Zone | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ConceptsThe low-level interface to an RCE's protocol plug-ins uses abstractions called ports, frames, channels, lanes, pipes and factories. A port is the RCE end of a communications link similar in concept to a BSD socket. Ports are globally visible but not MT-safe; at a given time at most one task may be waiting for data or receiving data from any given port. Ports deliver data as frames. The exact content of a frame depends on the transmission protocol being used but the port recognizes a broad division into header and payload. One frame corresponds to one message on the I/O medium and is delivered in a single buffer. In other words all ports implement datagram rather than byte-stream protocols. It's up to higher-level software such as a TCP stack to provide any operations that cross frame boundaries. Each port contains a queue of frames which have arrived and not yet been consumed by the application. A channel is a hardware I/O engine capable of DMA to and from system RAM, i.e., a protocol plug-in. An RCE has at most eight channels. Most channels will make use of one or more of the Multi-Gigabit Tranceiver modules (lanes) available in firmware, though some may simply offer access to on-board resources such as DSPs. Each channel has its own port space where each port is identified by a 16-bit unsigned integer. Each port represents a different source of incoming data such as a UDP port or a Petacache flash lane. The actual number of ports available on a channel depends on the channel type and may be as low as one. With one exception there is a one-to-one correspondence between (channel, port no.) and port objectss. The exception is a "catch-all" port, which receives data that doesn't "belong" to any other port in the system. There is a catch-all pseudo-channel available with a limit of one port; creating a port on this channel will create a catch-all port. If no catch-all port exists then an orphan frame is dropped and an error message is placed in the system log. All the lanes (if any) of a given channel go to one the outputs (pipes) of the rear transition module. This mapping is fixed by hardware. Each type of channel has both an offical (unsigned) number and an official short name such as "eth" or "config". Either may be used to look up the corresponding Channel object after system startup. Channels that differ only in the number of lanes they use, e.g., 10 Gb/s ethernet (4) and a slower ethernet (1) will have the same channel type, in this case "eth". The factory code that creates the right kind of Channel and Frame objects for a given type of channel may be already part of the system core or it may be in a container in configuration flash. In the latter case the code must be loaded, relocated and bound to the system core before its first use. An entry point is called in each such loaded factory code module which will register it in a central table using a function exported by the core for this purpose. Pre-loaded code must also be registered using this function. Factory code returns values of Channel* or Frame* though the actual objects pointed to are tailored for the specific channel type. The information needed to initialize the channels is found in configuration flash and/or by probing the hardware. Channel information in configuration flashData container zero in the configuration flash contains tables providing all the information needed to make all channels ready. This includes references to type-specific factory code but not the code itself; other containers will hold that. In addition the tables provide some extra information not actually needed for setup but required to print a summary of what the protocol plug-ins provide. The tables are called Channels, Channel Types, Factories, Data Paths, Buffers and Strings. Except for Strings each table is an array of plain-old-data structs with the first field being a key value normally equal to the array index. The keys are there to allow tables to refer to one another's entries. A key equal to 0xffffffff signifies the end of the table in which case none of the other fields in the struct have any guaranteed values and should never be read. Thus you'll generate end-of-table sentinels automatically when you erase a flash block before writing tables into it, provided you leave at least one 32-bit word of unwritten space at the end of each table. If you write all the tables in one go then you must provide the end-of-table markers explicitly. The Strings table is like an ELF string table; NUL-terminated ASCII strings laid end to end. Certain standard strings such as the short-names of channels are at the front of the table with only one instance of each string present. The other tables refer to a string by giving the offset of its first character in the Strings table. General layout of the container contentsThe first words of the container are the 32-bit offsets in the container to the starts of the tables in the following order:
Each of the offsets should be divisible by four and if the corresponding table is present must be greater than zero. Only the Factories and Buffers tables are needed for Virtex-5,6 RCEs so the other offsets are zero. After the offsets come the tables themselves. No particular order is required, though since String table entries have variable length it's most convenient to place it last. To make alignment easier we use 32-bit fields wherever possible, even for 16-bit quantities such as port numbers. All of the declarations for the configuration tables are in namespace RCE::config.
The Channels table (Virtex-4)
The Channel Types table (Virtex-4)
The Factories table (Virtex-4,5,6)In the Factories table a container name of 0xffffffff marks pre-loaded factory code; otherwise the name is used to find the required container as specified in the RCE document.
The Data Paths table (Virtex-4)The lanes (MGTs) allocated to a channel will all feed a particular output (pipe) on the backplane. Those channels that have no lanes will not appear in this table.
The Buffers table (Virtex-4,5,6)Each channel comes with a recommendation for the number and size (in bytes) of buffers to be allocated in non-cached memory. Those recommendations are in this table. Note that these are only recommendations ; the system initialization procedure needs to consider all the recommendations together along with the amount of memory actually available.
Example configuration tablesHere's what the tables would look like for an Virtex-4 RCE that defines nothing but an ethernet LAN and configuration flash, which are standard for all RCEs. We assume that the ethernet uses lanes 0-3 and feeds pipe zero. The Channels table:
Ethernet allows 65536 ports so as not to restrict the port space of UDP or TCP. The Channel Types table:
The Factories table:
The Data Paths table:
The Buffers table:
Here we assume that the firmware delivers all ethernet framing bytes as well as the payload and that jumbo frames are not allowed. We assume that the formware delivers configuration flash data in units of pages and that a page is 2K bytes long. How the tables appear in RAMWe copy the tables without alteration and set the members of the following structure:
Use case: System startup for Virtex-4
Use case: Frame importPrior to this the code wishing to use the port has associated a consumer task with it. That task is blocked or idling waiting for new frames.
Running the Channel, Frame and Port object code in a dispatcher task rather than the ISR keeps the latter simple; it doesn't have to know about system data structures, just hardware. Simplicity should translate to speed of response. The dispatcher task can be normal C++ code that runs with the MMU on, which an ISR by default doesn't. Sub-case: Full cooperative multitaskingWe assign the same priority to the I/O dispatcher task and all the consumer tasks. All of these tasks remain in the ready queue; each puts itself at the back of ready queue (yields) when it finds that its input queue is empty or after it has completed a certain amount of work. The consumer tasks and the I/O dispatcher task don't need to synchronize their communication because no one of them can preempt another. The communication between the ISR and the I/O dispatcher task needs some synchronization since the ISR can preempt the dispatcher task at any time. We need not resort to semaphores or other locks, though; there are relatively simple lock-free algorithms we can use. The dispatcher task, once it comes to the front of the ready queue, loops until it manages to read its input queue without detecting interference from the ISR, then either yields immediately if the queue was empty or after performing one or more dispatches. As an alternative the dispatcher can disable interrupts for the short time it takes to check its queue. Sub-case: Preemptive multitaskingIf the I/O dispatcher is given a higher priority than consumer threads then it must synchronize its communication with them. It also won't just be able to yield when it has nothing to do since it will go onto what amounts to a different ready queue from the consumer tasks, one that is examined first. The dispatcher task would always run, starving the consumers. The dispatcher task would have to actually block itself and the ISR would have to unblock it. If the consumer tasks don't all have the same priority then they too will have to block themselves and be unblocked by the dispatcher task. If we use time-slicing then we don't have to be so careful in deciding when each task should yield but then all inter-task communication will require synchronization. If priority assignment is non-uniform then we need to use explicit blocking and unblocking instead of yielding. Use case: Frame exportLow-level APIOfficial RCE channel type numbers and namesThese are given in a header file made available to both core and application code. The numbers are members of an enumeration and the names are given in a static array.
Channel and Frame factoriesThese are the abstract base classes.
Channel type registry
An instance of this class will be exported by the core code using some design pattern such as Singleton or functional equivalent. Class Frame
Frame instances are created by Channel instances. Channel
Frames |