Basic architecture

The architecture implemented is ARMv7-A. The "v7" is the instruction set revision whereas the "A" stands for "Application".

Instruction set

v7 actually comprises several different instruction sets:

The standard ARM instruction set. Each instruction is 32 bits long and aligned on a 32-bit boundary. The full set of general registers is available. Shift operations may be combined with arithmetic and logical operations. This is the instruction set we'll be using for our project. Oddly, an integer divide instruction is optional and the Zynq CPUs don't have it.
Thumb-2. Designed for greater code density. Contains a mix of 16-bit and 32-bit instructions. Many instructions can access only general registers 0-7.
Jazelle. Similar to Java byte code.
ThumbEE. A sort of hybrid of Thumb and Jazelle, actually a CPU operation mode. Intended for environments where code modification is frequent, such as ones having a JIT compiler.

Coprocessors

The ARM instruction set has a standard coprocessor interface which allows up to 16 distinct coprocessors.

Coprocessor 15, CP15, is a pseudo-coprocessor which performs cache and MMU control as well as other system control functions.

CPs 12, 13 and 14 are reserved for floating point and vector hardware, which in this system are both part of the NEON extension.

Options and extensions

There are a number of options and extensions available for a Cortex-A CPU. The following table lists them and indicates whether they are available on the Zynq.

Name	On Zynq?	Description
ARM instruction set
Thumb-2
Jazelle
ThumbEE
Integer divide instructions
MMU
Fast multiply		Improved integer multiplication
VFP3-32		Vector floating point rev. 3 with 32 double-sized registers
VFP4-x		Vector floating point rev. 4
NEON		Vector integer and floating point
EVA		Extended virtual addresses (40 bits)
Timer
MPCore		Multiple CPU cores sharing memory

ARM CPU options used on the Zynq

zynq> cat /proc/cpuinfo
Processor       : ARMv7 Processor rev 0 (v7l)
processor       : 0
BogoMIPS        : 1594.16

processor       : 1
BogoMIPS        : 1594.16

Features        : swp half thumb fastmult vfp edsp neon vfpv3 tls 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x3
CPU part        : 0xc09
CPU revision    : 0
                                                                                
Hardware        : Xilinx Zynq Platform                                          
Revision        : 0000                                                          
Serial          : 0000000000000000

MMU

It appears that there's no way to avoid having a translation table in
memory. There's no way to directly insert a TLB entry into the MMU, you
can only invalidate existing entries. To put new TLB entries in you must
first update the translation table and then invalidate TLB entries for
the affected virtual addresses. If you wanted to you could make sure
that the translation table had only N valid entries, where N is the TLB
size, and make the rest of the table entries dummies that would not be
loaded by the MMU. This seems pointless though because the MMU would
still have to access memory, fetch an entry and then discard it if it's
a dummy. You can disable the reading of the translation table but then
the TLB would never change, except for the loss of entries that are
invalidated.

Automatic replacement of TLB entries normally uses a "pseudo round
robin" algorithm, not the "least recently used" algorithm implemented in
the PowerPC. The only way to keep heavily used entries in the TLB
indefinitely is to explicitly lock them in, which you can do with up to
four entries. These locked entries occupy a special part of the TLB
which is separate from the normal main TLB, so you don't lose entry
slots if you use locking.

In a multi-core system like the Zynq all the CPUs can share the same
translation table if all of the following conditions are met:

(1) All CPUs are in SMP mode.

(2) All CPUs are in TLB maintenance broadcast mode.

(3) All the MMUs are given the same real base address for
the translation table.

(4) The translation table is in memory marked Normal, Sharable
with write-back caching.

Under these conditions any CPU can change an address translation as if
it were alone and have the changes broadcast to the other CPUs.

When the MMU is fetching translation table entries it will ignore the L1
cache unless you set some special bits in the Translation Table Base
Register telling it that the table is write-back cached. Apparently
write-through caching isn't good enough but ignoring the L1 cache in
that case is correct, if slow.

Caches

SMP support

System state after a reset

MMU: disabled with TLB disabled. Contents of TLB entries are random so one must at least disable all TLB entries before enabling the TLB. With the MMU disabled all instruction fetches are assumed to be to Normal memory while data accesses are assumed to be to Ordered memory.

Child pages

Basic architecture

Instruction set

Coprocessors

Options and extensions

ARM CPU options used on the Zynq

MMU

Caches

SMP support

System state after a reset

References

Child pages

Cortex-A9 MPCore notes

Basic architecture

Instruction set

Coprocessors

Options and extensions

ARM CPU options used on the Zynq

MMU

Caches

SMP support

System state after a reset

References