Hardware Basics

A word on the ARM7tdmi is 4 bytes, making a halfword 2 bytes. All numeric values are stored in what is known as little-endian format, where the less significant bytes are stored in lower memory addresses. This is the same format as used on PC's, and so no explicit conversion is neccecary, but should be noted.


The CPU

The CPU (or central processing unit) is a ARM7tdmi core clocked at 16.78MHz, which is a very powerful 32-bit RISC cpu with a hardware multiply and a second, less extensive instruction set known as Thumb. The CPU can only execute one instruction set at a time, but the switching of CPU modes is asorbed into a branch, making the switch free. All ARM opcodes are contained in 4 bytes and MUST be word aligned, otherwise the CPU will enter an undefined state. Thumb opcodes are contained in a halfword and must also be halfword aligned. A nifty feature of the ARM instruction set is that all opcodes are conditionally executed, making branches much less neccecary and allowing quite complex code to be written in a short space.


Memory Map

Description Base Size Access Width Wait States Notes
System ROM $00000000 unknown 8/16/32 none Cannot be accessed in user mode
External RAM $02000000 $40000 8/16/32 +2N/+2S
Work RAM $03000000 $8000 8/16/32 none Typically variables and innerloop code are stored here
I/O Register Space $04000000 Not Contigous 8/16/32 variable All system interface is handled through here
Palette RAM $05000000 $400 16/32 none Background and sprite palette
Video RAM $06000000 $18000 16/32 Accesses may stall durring display Contains all graphics and maps
Sprite RAM (OAM) $07000000 $400 16/32 Cannot be accessed outside of V-Blank Contains all sprite attributes
Game ROM $08000000 0..32 MB 8/16/32 variable All user code is initially held here
Game ROM Mirror $0A000000 0..32 MB 8/16/32 variable A mirror of the game ROM to allow use of multiple speed ROMs in one pack
Game ROM Mirror $0C000000 0..32 MB 8/16/32 variable A mirror of the game ROM to allow use of multiple speed ROMs in one pack
Cart RAM $0E000000 0..64 KB 8 variable Either SRAM or flash ROM

The system ROM contains a boot-up sequence and a number of system utilities collectivley called the BIOS (Basic Input Output System). This memory is banked out of memory when the CPU is in user mode and can only be read in SVC mode. A BIOS function is executed using a swi call with arguments in the low registers.

All memory contained within the game has variable wait states controlled by the wait state register, where different setttings for ROM and RAM may be selected. A wait state is an additional delay incurred durring a memory access because of slower components. There are two types of memory access cycles, a N cycle, which is a non-sequential access and is typically longer than the other type, an S or sequential access. A sequential access is any one that followed the last memory access by 4 bytes or less, and both N and S accesses cost one cycle from fast, internal memory. The wait states are listed as +N/+S and indicate the additional cycle penalty for each type of access. Thus a read from external ram actually costs 3 cycles, not 1.

graphic

The Display

The display is 240x160 pixels in dimension, and is capable of displaying 15-bit color simultaneously. Each pixel of the display is actually composed of 3 seperate elements stacked nearly on top of each other, each generating a different portion of spectrum. Most colors visible to the human eye can be resonably simulated using a composition of red, green, and blue light, and almost all computer displays are based on this principle. The display is no exception, and it uses a 5 bit quanitization in each component, giving a range of 32 shades of each, and a total of 32768 possible colors.

Another concept that is useful to understand but not essential is persistance of vision, which is the fundamental basis of all video games, television, and films. Our eyes operate continously, constantly adjusting to the influx of information we are presented with, but they have a latency in this adjustment. This depends on the individual, but for most sampled images, 30 different images per second are sufficient to give the illusion of motion, and 60-80 are more than adequate for a completly convincing illusion. It is for this reason that televisions operate at 50 to 60 Hz (frames per second), and the designers of the display saw fit to choose 59.727 Hz as our time base (16.78 MHz / 1232 cycles / 208 rasters). The display is automatically updated with this frequency, but for the illusion of motion and not a still image to appear, we must update the display freqently. When we may do so is discussed in the following paragraphs.

A concept used on conventional consoles and computers are blanking periods, which occur because the electron beam used to 'paint' an image on phosphors must travel back to the left or top side of the screen before it can continue painting the image. The time required for the beam to retrace its steps left and down is known as the horizontal blank and lasts for a relatively short period. Similarly, the vertical retrace is known as the vertical blank, and lasts for a considerable time (in computer terms anyways, it is still a minute fraction of a second).

A LCD screen has virtually no need for such a distintion between rendering phases, as it is entirely electronic, with no mechanical beam deflection, but the blanking periods are still simulated for the simple reason that they are convienent for programmers and make cheaper hardware usable. Typically, the screen contents cannot be modified durring display, otherwise the screen would appear to 'tear' or shimmer, as part of the displayed image occured before modification and is not synchronous with the later, modified image. If there were no blanking periods provided to modify the image, a double buffer would be neccecary, which would instead be modified for as long as needed, and then instantly swapped for the currently displayed image. This would require more memory and extra hardware, and rule out the possibility of many raster effects.

The display we are concerned with has a visible display of 240x160 pixels, but for the blanking period, the H/V clock (the master clock controlling rendering,) operates on the larger domain of 308x228 pixels. The H/V clock runs such that one pixel is traversed every 4 CPU cycles, so each line of display or scanline takes 1232 cycles to traverse, of which 228 are contained in the horizontal blank. Similarly, the vertical blank consists of 68 scanlines, or 83776 cycles, plenty of time for display update. A scanline or 'raster' as they are also called is a very important concept that will be used many times ahead.

Displaying Horizontal Blanking
Vertical Blanking

Plane based graphics

The video system is based on the concepts of 'planes' or 'layers', which can be assigned relative priorities to each other and to sprites. There are 4 available layers, conventionally labeled BG0, BG1, BG2, and BG3. Each plane can be individually controlled in almost every aspect, except what sort of graphics it can display, which is controlled by the overall video mode. There is a fifth plane that is always behind all others and cannot be controlled directly, called the backdrop. This backdrop can only be seen through transparent holes in the layers above, and is a solid plane of the color specified by palette entry 0 in the background palette (more to come on palettes).

There are 6 video modes, each with varying restrictions on the type of planes it can display, as summarised in the table below.
Mode BG0 BG1 BG2 BG3
0 Text Text Text Text
1 Text Text Rot/Scale -
2 - - Rot/Scale Rot/Scale
3 - - 15-bit Bitmap -
4 - - 8-bit Bitmap -
5 - - 15-bit Bitmap -

Text Screen

A text screen is similar in effect to the text mode of computers, where there is an array of entries that specify which characters or 'tiles' are to be displayed at any given location. Most personal computers support multiple character or tile sizes, with the standard often being 8x12 or 9x16, however here they are always restricted to exactly 8x8 pixels.

graphic

A text screen can be individually panned in reference to the viewport, and can take on any of 4 different sizes: 256x256 pixels, 512x256 pixels, 256x512 pixels, and 512x512 pixels (or 32x32 to 64x64 tiles). There are a vareity of I/O registers associated with each text screen, but the key register for each background is the control register, where the mapping and character base addresses and sizes can be specifed.

Two types of tiles can be used: 16 color and 256 color (however, entry zero in each is always transparent). A 16 color tile takes up 32 bytes, and up to 1024 of these may be addressed in a tile map, however, only 512 full color tiles may be addressed at any time, using even tile indicies (i.e. 0, 2, 4...). The text plane is confined to use only one type of tile throughout, as specified in the control register.

Each text screen map entry occupies a halfword and contains a 10 bit tile index and a few other salient attributes, such as horizontal and vertical flipping, and a palette index if the plane is in 16 color mode.

Rotation and Scaling Screen

A rotation and scaling capable screen is similar to a text screen, in that it is composed of a tile map and tiles. However, many powerful effects can be realized in these planes, as there is a hardware texture shear matrix available. These planes are more restricted than their text bretheren and may only address 256 full color tiles, and thus, the tile map entries are 1 byte in size. The screen is also always a square dimension, but this may be chosen from 128, 256, 512, or 1024 pixels on a side (or 16 to 128 tiles square).

The texture (rot/scale plane) is addressed for each pixel on the screen via a 2x2 matrix and a reference vector. The formula for converting from screen space to texture space is thus:

(texture_x  =  (A  B (screen_x-center_x  + (center_x
 texture_y)     C  D) screen_y-center_y)    center_y)

If a rotated texture is desired, the shear matrix becomes a standard rotation matrix, and scaling coefficients can be specified at the same time for added flexibility.

A = 1/a cos @
B = 1/a sin @
C = -1/b sin @
D = 1/b cos @

The actual matrix coefficients are stored in I/O registers in a format called fixed point, which simulates a floting point number in a standard 2's compliment format. The width of a fixed point number is quoted as its integer portion, followed by its fractional portion, so a 8.8 fixed point number is stored in 16 bits, and the lower 8 bits are designated as a fraction, although there is no explicit information contained in the number about this. When multiplying or dividing two fixed point numbers, care must be taken to renormalize the resultant value. For instance, when multiplying two 8.8 numbers, you are actually multiplying (integer1*2^8 + fraction1) * (integer2*2^8 + fraction2), which gives (integer1*integer2*2^16 + fraction1*integer2*2^8 +fraction2*integer1*2^8 + fraction1*fraction2), which is now a 16.8 fixed point number. A shift right by 8 bits will renormalize this number. When multiplying or dividng a fixed point number by a scalar, no renormalization is neccecary, nor is it neccecary when adding or subtracting two fixed point numbers, but you must convert a scalar number into a fixed point number to use it in arithmetic with a fixed point number.

The effect commonly dubbed 'mode 7' (which is actually a misnomer, as mode 7 was just a rot/scale mode, where this effect had to be faked in the same manner as described here) to achieve a 3D plane is not directly supported in hardware, but can be simulated with clever programming. If the matrix registers are updated each scanline with a sutiable formula, the 1/z distortion can be overcome.

The math for doing so varies depending on the effect and flexibility required, but can be derived with some basic geometry on paper before coding. The CPU is not really fast enough to do all of the math durring rendering, so a common technique is to precalculate the values in a table which is copied to the I/O registers durring each horizontal retrace either via the CPU or a repeating DMA channel.

Bitmap Screens

A bitmap screen is perhaps more familiar to those from a personal computer background, it simply consists of a 2 dimentional array of colors that specify what color each screen pixel should be. There are actually 3 different bitmap modes available, each with its own pros and cons. In all three modes however, the portion of VRAM reserved for sprites is cut in half, but the numbering system is not changed (see the section on sprites for more information).

Mode 3 is a 240x160 screen, where each entry is 15 bits packed in a halfword, capable of any color from the full displayable range. However, this uses up the majority of VRAM and does not allow for a back or double buffer to access while the front is being displayed and thus is typically only used for static images, such as titles, cutscenes, and credits.

Mode 4 is a 240x160 paletted mode, where each entry is an 8-bit index into the background color palette. Since this uses half of the VRAM of mode 3, two pages of VRAM are provided, and which one is displayed at any given time can be selected via a bit in the global display control register. This mode allows for both double buffering rendering and palette effects, and is thus sutiable for a wide range of applications.

Mode 5 is a 160x128 pixel screen with 15-bit pixels, and is a hybrid of modes 3 and 4. It provides two pages like mode 4, but the full color range of mode 3. The reduced resolution reserves this mode typically for cutscenes and FMV (full motion video) interludes.


Video Registers

OffsetNameType FEDCBA98 76543210
$000 DISP_CR R/W Sprite Windows Enabled WIN1 Enabled WIN0 Enabled Sprites Enabled BG3 Enabled BG2 Enabled BG1 Enabled BG0 Enabled Forced Blank 1D Sprite Mode H-Blank OAM Access OK Frame Buffer Select CGB Mode Video Mode
$004 DISP_SR R/W Y-Line Trigger Value - Enable Y-Trigger IRQ Enable H-Blank IRQ Enable V-Blank IRQ Y Triggered (R) In H-Blank (R) In V-Blank (R)
$006 DISP_Y R   Current Scanline
$008 BG0_CR R/W Screen Size - Screen Base Addr Unified Palette Mosiac 0 0 Tile Base Addr Priority
$00A BG1_CR R/W Screen Size - Screen Base Addr Unified Palette Mosiac 0 0 Tile Base Addr Priority
$00C BG2_CR R/W Screen Size Overflow Mode Screen Base Addr Unified Palette Mosiac 0 0 Tile Base Addr Priority
$00E BG3_CR R/W Screen Size Overflow Mode Screen Base Addr Unified Palette Mosiac 0 0 Tile Base Addr Priority
$010 BG0_X W   Horizontal Scroll
$012 BG0_Y W   Vertical Scroll
$014 BG1_X W   Horizontal Scroll
$016 BG1_Y W   Vertical Scroll
$018 BG2_X W   Horizontal Scroll
$01A BG2_Y W   Vertical Scroll
$01C BG3_X W   Horizontal Scroll
$01E BG3_Y W   Vertical Scroll
$020 BG2_DX W 8.8 H-Step, X-Dir Texture Delta
$022 BG2_VDX W 8.8 V-Step, X-Dir Texture Delta
$024 BG2_DY W 8.8 H-Step, Y-Dir Texture Delta
$026 BG2_VDY W 8.8 V-Step, Y-Dir Texture Delta
$028 BG2_XL W Low 16 of 20.8 X Center
$02A BG2_XH W   High 12 of 20.8 X Center
$02C BG2_YL W Low 16 of 20.8 Y Center
$02E BG2_YH W   High 12 of 20.8 Y Center
$030 BG3_DX W 8.8 H-Step, X-Dir Texture Delta
$032 BG3_VDX W 8.8 V-Step, X-Dir Texture Delta
$034 BG3_DY W 8.8 H-Step, Y-Dir Texture Delta
$036 BG3_VDY W 8.8 V-Step, Y-Dir Texture Delta
$038 BG3_XL W Low 16 of 20.8 X Center
$03A BG3_XH W   High 12 of 20.8 X Center
$03C BG3_YL W Low 16 of 20.8 Y Center
$03E BG3_YH W   High 12 of 20.8 Y Center
$040 WIN0_H W Left (x0) Right (x1)
$042 WIN1_H W Left (x0) Right (x1)
$044 WIN0_V W Top (y0) Bottom (y1)
$046 WIN1_V W Top (y0) Bottom (y1)
$048 WIN_IN R/W   Blends in Win1 Sprites in Win1 BG3 in Win1 BG2 in Win1 BG1 in Win1 BG0 in Win1   Blends in Win0 Sprites in Win0 BG3 in Win0 BG2 in Win0 BG1 in Win0 BG0 in Win0
$04A WIN_OUT R/W   Blends in Sprite Win Sprites in Sprite Win BG3 in Sprite Win BG2 in Sprite Win BG1 in Sprite Win BG0 in Sprite Win   Blends outside Sprites outside BG3 outside BG2 outside BG1 outside BG0 outside
$04C MOSAIC W Sprite Y Level Sprite X Level Background Y Level Background X Level
$050 BLEND_CR R/W - - reserved Blend Sprites in alpha Blend BG3 in alpha Blend BG2 in alpha Blend BG1 in alpha Blend BG0 in alpha Blend Mode reserved Blend Sprites Blend BG3 Blend BG2 Blend BG1 Blend BG0
$052 BLEND_AB W - Coefficient B - Coefficient A
$052 BLEND_Y W - Coefficient Y


Sound Registers

Specification not done yet.

OffsetNameType FEDCBA98 76543210
addr name write mode - - - - - - - - - - - - - - - -


DMA Source Registers

OffsetNameType FEDCBA98 76543210
$0B0 DMA0_SRC_L W Low 16 of 27 bit source address
$0B2 DMA0_SRC_H W   High 11 of 27 bit source address
$0BC DMA1_SRC_L W Low 16 of 28 bit source address
$0BE DMA1_SRC_H W   High 12 of 28 bit source address
$0C8 DMA2_SRC_L W Low 16 of 28 bit source address
$0CA DMA2_SRC_H W   High 12 of 28 bit source address
$0D4 DMA2_SRC_L W Low 16 of 28 bit source address
$0D6 DMA2_SRC_H W   High 12 of 28 bit source address


DMA Dest. Registers

OffsetNameType FEDCBA98 76543210
$0B4 DMA0_DEST_L W Low 16 of 27 bit destination address
$0B6 DMA0_DEST_H W   High 11 of 27 bit destination address
$0C0 DMA1_DEST_L W Low 16 of 27 bit destination address
$0C2 DMA1_DEST_H W   High 11 of 27 bit destination address
$0CC DMA2_DEST_L W Low 16 of 27 bit destination address
$0CE DMA2_DEST_H W   High 11 of 27 bit destination address
$0D8 DMA3_DEST_L W Low 16 of 28 bit destination address
$0DA DMA3_DEST_H W   High 12 of 28 bit destination address


DMA Count Registers

OffsetNameType FEDCBA98 76543210
$0B8 DMA0_SIZE W   Transfer Count
$0C4 DMA1_SIZE W   Transfer Count
$0D0 DMA2_SIZE W   Transfer Count
$0DC DMA3_SIZE W Transfer Count


DMA Control Registers

OffsetNameType FEDCBA98 76543210
$0BA DMA0_CR R/W Enabled IRQ Start Mode   Width Repeat Source Mode Dest Mode  
$0C6 DMA1_CR R/W Enabled IRQ Start Mode   Width Repeat Source Mode Dest Mode  
$0D2 DMA2_CR R/W Enabled IRQ Start Mode   Width Repeat Source Mode Dest Mode  
$0DE DMA3_CR R/W Enabled IRQ Start Mode reserved Width Repeat Source Mode Dest Mode  

Dest. Mode (bits 5-6)

Source Mode (bits 7-8)

Repeat

When start modes 1-3 are selected and this bit is set, the DMA will continue to occur each interval until the enable bit in the control register is cleared.

Width

The amount of data transfered is specified by both the count register for a DMA channel, and this bit, which indicates either a halfword (0) or word (1) transfer. Thus, count*2 bytes will be transfered on start of DMA if this bit is clear, and count*4 if it is set.

Start Mode (bits 12-13)

Dest Mode Source Mode Repeat Width Start Mode IRQ Enabled


Timer Registers

There are 4 16 bit timers, which count up according to either a specified frequency or via a cascade. When a timer overflows (crosses from FFFF to 0000), other timers can be incremented (if the cascade bit in the following timer is set), and an interrupt can occur, if the interrupt request bit is set.
Timers are useful for implementing the passage of time in a game, to profile code, or when producing sound.

Timer Data Registers (TIMERx_DATA)

OffsetNameType FEDCBA98 76543210
$100 TIMER0_DATA R/W Timer Value
$104 TIMER1_DATA R/W Timer Value
$108 TIMER2_DATA R/W Timer Value
$10C TIMER3_DATA R/W Timer Value

Timer Value:

16 bit timer value.

Timer Control Registers (TIMERx_CR)

OffsetNameType FEDCBA98 76543210
$102 TIMER0_CR R/W   Enabled IRQ   Cascade Frequency
$106 TIMER1_CR R/W   Enabled IRQ   Cascade Frequency
$10A TIMER2_CR R/W   Enabled IRQ   Cascade Frequency
$10E TIMER3_CR R/W   Enabled IRQ   Cascade Frequency

Frequency (bits 0-1):

Cascade (bit 2):

When set, the timer is part of a cascade chain timer, in which the lower timer's overflow is the update frequency, and the frequency bits are ignored. This bit is ignored for timer 0, since it does not have a lower timer to cascade from.

IRQ (bit 6):

When set, an interrupt request is generated each time the timer overflows.

Enabled (bit 7):

When set, the timer proceeds as normal, otherwise the timer is halted.


Network Registers

Specification not done yet.

OffsetNameType FEDCBA98 76543210
addr name write mode - - - - - - - - - - - - - - - -


Keypad Registers

There are 10 inputs available, corresponding to a 4 way direction pad, start, select, A, B, and two shoulder buttons L and R.

Key Status Register (KEYS)

OffsetNameType FEDCBA98 76543210
$130 KEYS R/W   L Shoulder Released R Shoulder Released Down Released Up Released Left Released Right Released Start Released Select Released B Released A Released

Each bit corresponds to a specific input, and is set when released (i.e. a standby state of 1).


Interrupt Registers

The hardware interrupt vector of the ARM resides in BIOS, and the handler redirects the call to the address stored at 0x03007FFC. The interrupt handler that is pointed to must be in ARM, and return via a
BX LR
instruction.

OffsetNameType FEDCBA98 76543210
$200 IE R/W   Cart Keypad DMA 3 DMA 2 DMA 1 DMA 0 Comms Timer 3 Timer 2 Timer 1 Timer 0 Y-Trigger H-Blank V-Blank
$202 IF R/W   Cart Keypad DMA 3 DMA 2 DMA 1 DMA 0 Comms Timer 3 Timer 2 Timer 1 Timer 0 Y-Trigger H-Blank V-Blank
$208 IME R/W   Enabled

wegfweg


System Registers

OffsetNameType FEDCBA98 76543210
$204 WS_CR R/W Game Pack Type (R) Prefetch   Cart Clock Bank 2 Bank 1 Bank 0 SRAM mode
$300 PAUSE * Power Down Mode   reserved

Debug Protocol

The debug protocol is accessed using a special register setup and a NOP, and functions only on Mappy VM 0.7b and above.

dprint prints a string to the MVM console

ARM SDT Code:
void dprint(char *string) {
  __asm {
    mov r2, r0
    mov r0, #0xC0DED00D
    mov r1, #0
    and r0, r0, r0
  }
}

GCC Code:
void dprint(const char *sz) {
  asm volatile("
    mov r2, %0
    ldr r0, =0xc0ded00d
    mov r1, #0
    and r0, r0, r0
  " :
  /* No output */ :
  "r" (sz) :
  "r0", "r1", "r2");
}

(thanks to Darren Sillett for the correction to the GCC version)

This document is copyright © 2001 Joat, and may not be copied, altered, or redistributed in any form or manner without my prior written consent: this includes any copies whatsoever, even mirrors.