Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

There can be up to four independent MMUs per CPU (though they may be implemented with a single block of silicon with multiple banks of control registers). Without the security or virtualization extensions there is just one MMU which is used for both privileged and non-privileged accesses. Adding the security extension adds another for secure code, again for both privilege levels. Adding the virtualization extension adds two more MMUs, one for the hypervisor and one for a second stage of translation for code running in a virtual machine. The first stage of translation in a virtual machine maps VM-virtual to VM-real addresses while the second stage maps VM-real to actual hardware addresses. The hypervisor's MMU maps only once, from hyper-virtual to actual hardware addresses.

The Zynq CPUs have just the security extension and so each has two MMUs. All the MMUs present come up disabled after a reset, with TLBs disabled and garbage in the TLB entries. If all the relevant MMUs for a particular CPU state are disabled the system is still operable. Data accesses assume a memory type of Ordered, so there is no prefetching or reordering; data caches must be disabled or contain only invalid entries since a cache hit in this state results in unpredictable action. Instruction fetches assume a memory type of Normal, are uncached but still speculative, so that addresses up to 8 KB above the start of the current instruction may be accessed.

...

Automatic replacement of TLB entries normally uses a "pseudo-random" or "round robin" algorithm, not the "least recently used" algorithm implemented in the PowerPC. The only way to keep heavily used entries in the TLB indefinitely is to explicitly lock them in, which you can do with up to four entries. These locked entries occupy a special part of the TLB which is separate from the normal main TLB, so you don't lose entry slots if you use locking.

...

When the MMU is fetching translation table entries it will ignore the L1 cache unless you set some special bits in the Translation Table Base Register telling it that the table is write-back cached. Apparently write-through caching isn't good enough but ignoring the L1 cache in that case is correct, if slow.

Proposed MMU translation tables for RTEMS

As far as possible we keep to a single Level 1 translation table, where each entry describes a 1 MB "section" of address space. 4096 entries will cover all of the 4 GB address space. All entries specify access domain zero which will be set up for Client access, meaning that the translation table entries specify access permissions. All entries will be global entries, meaning that they apply to all code regardless of threading; the table is never modified by context switches. The address mapping is the identity, i. e., virtual == real.

For a few 1 MB sections we require a finer granularity and provide second-level tables for "small" pages of 4 KB each. The on-chip memory and the RCE protocol plugins will most likely receive this treatment; how many addresses the latter require will be calculated at system startup.

The first level translation table is part of the system image containing RTEMS. It has its own object code section named ".mmu_table" so that the linker script used to create the image can put it in some suitable place independently of the placement of other data; this includes giving it the proper alignment. If we keep to a small number of second-level tables, say ten so so, we can reserve space for them statically at the end of the .mmu_table section. Each second-level table occupies 1 KB.

The default Level 1 memory map

This default map is established at system startup. OCM and DDR have the attributes Normal, Read/Write, Executable and Cacheable. Memory mapped I/O regions are Device, Read/Write, Non-executable and Outer Shareable. Regions reserved by Xilinx will No-access, causing a Permission exception on attempted use.

Address range

Region size

Mapping type

Resources included

0x00000000-0x3fffffff

1 GB

RAM

DDR + low OCM

0x40000000-0x7fffffff

1 GB

I/O

PL AXI slave port 0

0x80000000-0xbfffffff

1 GB

I/O

PL AXI slave port 1

0xc0000000-0xdfffffff

512 MB

No access

 

0xe0000000-0xefffffff

256 MB

I/O

IOP devices

0xf0000000-0xf7ffffff

128 MB

No access

 

0xf8000000-0xf9ffffff

32 MB

I/O

Registers on AMBA APB bus

0xfa000000-0xfbffffff

32 MB

No access

 

0xfc000000-0xfdffffff

32 MB

I/O

Quad-SPI

0xfe000000-0xffffffff

32 MB

No access

Unsupported Quad-SPI + high OCM

Caches

The Zynq-7000 system has both L1 and L2 caching. Each CPU has its own L1 instruction and data caches. All CPUs share a unified L2 cache. The L1 caches don't support entry locking but the L2 cache does. The L2 cache can operate in a so-called exclusion mode which prevents a cache line from appearing both in any L1 data cache and at the same time in the L2 cache.

...