A word on the ARM7tdmi is 4 bytes, making a halfword 2 bytes. All numeric values are stored in what is known as little-endian format, where the less significant bytes are stored in lower memory addresses. This is the same format as used on PC's, and so no explicit conversion is neccecary, but should be noted.
The CPU (or central processing unit) is a ARM7tdmi core clocked at 16.78MHz, which is a very powerful 32-bit RISC cpu with a hardware multiply and a second, less extensive instruction set known as Thumb. The CPU can only execute one instruction set at a time, but the switching of CPU modes is asorbed into a branch, making the switch free. All ARM opcodes are contained in 4 bytes and MUST be word aligned, otherwise the CPU will enter an undefined state. Thumb opcodes are contained in a halfword and must also be halfword aligned. A nifty feature of the ARM instruction set is that all opcodes are conditionally executed, making branches much less neccecary and allowing quite complex code to be written in a short space.
| Description | Base | Size | Access Width | Wait States | Notes |
| System ROM | $00000000 | unknown | 8/16/32 | none | Cannot be accessed in user mode |
| External RAM | $02000000 | $40000 | 8/16/32 | +2N/+2S | |
| Work RAM | $03000000 | $8000 | 8/16/32 | none | Typically variables and innerloop code are stored here |
| I/O Register Space | $04000000 | Not Contigous | 8/16/32 | variable | All system interface is handled through here |
| Palette RAM | $05000000 | $400 | 16/32 | none | Background and sprite palette |
| Video RAM | $06000000 | $18000 | 16/32 | Accesses may stall durring display | Contains all graphics and maps |
| Sprite RAM (OAM) | $07000000 | $400 | 16/32 | Cannot be accessed outside of V-Blank | Contains all sprite attributes |
| Game ROM | $08000000 | 0..32 MB | 8/16/32 | variable | All user code is initially held here |
| Game ROM Mirror | $0A000000 | 0..32 MB | 8/16/32 | variable | A mirror of the game ROM to allow use of multiple speed ROMs in one pack |
| Game ROM Mirror | $0C000000 | 0..32 MB | 8/16/32 | variable | A mirror of the game ROM to allow use of multiple speed ROMs in one pack |
| Cart RAM | $0E000000 | 0..64 KB | 8 | variable | Either SRAM or flash ROM |
The system ROM contains a boot-up sequence and a number of system utilities collectivley called the BIOS (Basic Input Output System). This memory is banked out of memory when the CPU is in user mode and can only be read in SVC mode. A BIOS function is executed using a swi call with arguments in the low registers.
All memory contained within the game has variable wait states controlled by the wait state register, where different setttings for ROM and RAM may be selected. A wait state is an additional delay incurred durring a memory access because of slower components. There are two types of memory access cycles, a N cycle, which is a non-sequential access and is typically longer than the other type, an S or sequential access. A sequential access is any one that followed the last memory access by 4 bytes or less, and both N and S accesses cost one cycle from fast, internal memory. The wait states are listed as +N/+S and indicate the additional cycle penalty for each type of access. Thus a read from external ram actually costs 3 cycles, not 1.
graphicThe display is 240x160 pixels in dimension, and is capable of displaying 15-bit color simultaneously. Each pixel of the display is actually composed of 3 seperate elements stacked nearly on top of each other, each generating a different portion of spectrum. Most colors visible to the human eye can be resonably simulated using a composition of red, green, and blue light, and almost all computer displays are based on this principle. The display is no exception, and it uses a 5 bit quanitization in each component, giving a range of 32 shades of each, and a total of 32768 possible colors.
Another concept that is useful to understand but not essential is persistance of vision, which is the fundamental basis of all video games, television, and films. Our eyes operate continously, constantly adjusting to the influx of information we are presented with, but they have a latency in this adjustment. This depends on the individual, but for most sampled images, 30 different images per second are sufficient to give the illusion of motion, and 60-80 are more than adequate for a completly convincing illusion. It is for this reason that televisions operate at 50 to 60 Hz (frames per second), and the designers of the display saw fit to choose 59.727 Hz as our time base (16.78 MHz / 1232 cycles / 208 rasters). The display is automatically updated with this frequency, but for the illusion of motion and not a still image to appear, we must update the display freqently. When we may do so is discussed in the following paragraphs.A concept used on conventional consoles and computers are blanking periods, which occur because the electron beam used to 'paint' an image on phosphors must travel back to the left or top side of the screen before it can continue painting the image. The time required for the beam to retrace its steps left and down is known as the horizontal blank and lasts for a relatively short period. Similarly, the vertical retrace is known as the vertical blank, and lasts for a considerable time (in computer terms anyways, it is still a minute fraction of a second).
A LCD screen has virtually no need for such a distintion between rendering phases, as it is entirely electronic, with no mechanical beam deflection, but the blanking periods are still simulated for the simple reason that they are convienent for programmers and make cheaper hardware usable. Typically, the screen contents cannot be modified durring display, otherwise the screen would appear to 'tear' or shimmer, as part of the displayed image occured before modification and is not synchronous with the later, modified image. If there were no blanking periods provided to modify the image, a double buffer would be neccecary, which would instead be modified for as long as needed, and then instantly swapped for the currently displayed image. This would require more memory and extra hardware, and rule out the possibility of many raster effects.
The display we are concerned with has a visible display of 240x160 pixels, but for the blanking period, the H/V clock (the master clock controlling rendering,) operates on the larger domain of 308x228 pixels. The H/V clock runs such that one pixel is traversed every 4 CPU cycles, so each line of display or scanline takes 1232 cycles to traverse, of which 228 are contained in the horizontal blank. Similarly, the vertical blank consists of 68 scanlines, or 83776 cycles, plenty of time for display update. A scanline or 'raster' as they are also called is a very important concept that will be used many times ahead.
| Displaying | Horizontal Blanking |
| Vertical Blanking | |
The video system is based on the concepts of 'planes' or 'layers', which can be assigned relative priorities to each other and to sprites. There are 4 available layers, conventionally labeled BG0, BG1, BG2, and BG3. Each plane can be individually controlled in almost every aspect, except what sort of graphics it can display, which is controlled by the overall video mode. There is a fifth plane that is always behind all others and cannot be controlled directly, called the backdrop. This backdrop can only be seen through transparent holes in the layers above, and is a solid plane of the color specified by palette entry 0 in the background palette (more to come on palettes).
There are 6 video modes, each with varying restrictions on the type of planes it can display, as summarised in the table below.
| Mode | BG0 | BG1 | BG2 | BG3 |
| 0 | Text | Text | Text | Text |
| 1 | Text | Text | Rot/Scale | - |
| 2 | - | - | Rot/Scale | Rot/Scale |
| 3 | - | - | 15-bit Bitmap | - |
| 4 | - | - | 8-bit Bitmap | - |
| 5 | - | - | 15-bit Bitmap | - |
A text screen is similar in effect to the text mode of computers, where there is an array of entries that specify which characters or 'tiles' are to be displayed at any given location. Most personal computers support multiple character or tile sizes, with the standard often being 8x12 or 9x16, however here they are always restricted to exactly 8x8 pixels.
graphicA text screen can be individually panned in reference to the viewport, and can take on any of 4 different sizes: 256x256 pixels, 512x256 pixels, 256x512 pixels, and 512x512 pixels (or 32x32 to 64x64 tiles). There are a vareity of I/O registers associated with each text screen, but the key register for each background is the control register, where the mapping and character base addresses and sizes can be specifed.
Two types of tiles can be used: 16 color and 256 color (however, entry zero in each is always transparent). A 16 color tile takes up 32 bytes, and up to 1024 of these may be addressed in a tile map, however, only 512 full color tiles may be addressed at any time, using even tile indicies (i.e. 0, 2, 4...). The text plane is confined to use only one type of tile throughout, as specified in the control register.
Each text screen map entry occupies a halfword and contains a 10 bit tile index and a few other salient attributes, such as horizontal and vertical flipping, and a palette index if the plane is in 16 color mode.
A rotation and scaling capable screen is similar to a text screen, in that it is composed of a tile map and tiles. However, many powerful effects can be realized in these planes, as there is a hardware texture shear matrix available. These planes are more restricted than their text bretheren and may only address 256 full color tiles, and thus, the tile map entries are 1 byte in size. The screen is also always a square dimension, but this may be chosen from 128, 256, 512, or 1024 pixels on a side (or 16 to 128 tiles square).
The texture (rot/scale plane) is addressed for each pixel on the screen via a 2x2 matrix and a reference vector. The formula for converting from screen space to texture space is thus:
(texture_x = (A B (screen_x-center_x + (center_x texture_y) C D) screen_y-center_y) center_y)
If a rotated texture is desired, the shear matrix becomes a standard rotation matrix, and scaling coefficients can be specified at the same time for added flexibility.
A = 1/a cos @ B = 1/a sin @ C = -1/b sin @ D = 1/b cos @
The actual matrix coefficients are stored in I/O registers in a format called fixed point, which simulates a floting point number in a standard 2's compliment format. The width of a fixed point number is quoted as its integer portion, followed by its fractional portion, so a 8.8 fixed point number is stored in 16 bits, and the lower 8 bits are designated as a fraction, although there is no explicit information contained in the number about this. When multiplying or dividing two fixed point numbers, care must be taken to renormalize the resultant value. For instance, when multiplying two 8.8 numbers, you are actually multiplying (integer1*2^8 + fraction1) * (integer2*2^8 + fraction2), which gives (integer1*integer2*2^16 + fraction1*integer2*2^8 +fraction2*integer1*2^8 + fraction1*fraction2), which is now a 16.8 fixed point number. A shift right by 8 bits will renormalize this number. When multiplying or dividng a fixed point number by a scalar, no renormalization is neccecary, nor is it neccecary when adding or subtracting two fixed point numbers, but you must convert a scalar number into a fixed point number to use it in arithmetic with a fixed point number.
The effect commonly dubbed 'mode 7' (which is actually a misnomer, as mode 7 was just a rot/scale mode, where this effect had to be faked in the same manner as described here) to achieve a 3D plane is not directly supported in hardware, but can be simulated with clever programming. If the matrix registers are updated each scanline with a sutiable formula, the 1/z distortion can be overcome.
The math for doing so varies depending on the effect and flexibility required, but can be derived with some basic geometry on paper before coding. The CPU is not really fast enough to do all of the math durring rendering, so a common technique is to precalculate the values in a table which is copied to the I/O registers durring each horizontal retrace either via the CPU or a repeating DMA channel.
A bitmap screen is perhaps more familiar to those from a personal computer background, it simply consists of a 2 dimentional array of colors that specify what color each screen pixel should be. There are actually 3 different bitmap modes available, each with its own pros and cons. In all three modes however, the portion of VRAM reserved for sprites is cut in half, but the numbering system is not changed (see the section on sprites for more information).
Mode 3 is a 240x160 screen, where each entry is 15 bits packed in a halfword, capable of any color from the full displayable range. However, this uses up the majority of VRAM and does not allow for a back or double buffer to access while the front is being displayed and thus is typically only used for static images, such as titles, cutscenes, and credits.
Mode 4 is a 240x160 paletted mode, where each entry is an 8-bit index into the background color palette. Since this uses half of the VRAM of mode 3, two pages of VRAM are provided, and which one is displayed at any given time can be selected via a bit in the global display control register. This mode allows for both double buffering rendering and palette effects, and is thus sutiable for a wide range of applications.
Mode 5 is a 160x128 pixel screen with 15-bit pixels, and is a hybrid of modes 3 and 4. It provides two pages like mode 4, but the full color range of mode 3. The reduced resolution reserves this mode typically for cutscenes and FMV (full motion video) interludes.
| Offset | Name | Type | F | E | D | C | B | A | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| $000 | DISP_CR | R/W | Sprite Windows Enabled | WIN1 Enabled | WIN0 Enabled | Sprites Enabled | BG3 Enabled | BG2 Enabled | BG1 Enabled | BG0 Enabled | Forced Blank | 1D Sprite Mode | H-Blank OAM Access OK | Frame Buffer Select | CGB Mode | Video Mode | ||
| $004 | DISP_SR | R/W | Y-Line Trigger Value | - | Enable Y-Trigger IRQ | Enable H-Blank IRQ | Enable V-Blank IRQ | Y Triggered (R) | In H-Blank (R) | In V-Blank (R) | ||||||||
| $006 | DISP_Y | R |   | Current Scanline | ||||||||||||||
| $008 | BG0_CR | R/W | Screen Size | - | Screen Base Addr | Unified Palette | Mosiac | 0 | 0 | Tile Base Addr | Priority | |||||||
| $00A | BG1_CR | R/W | Screen Size | - | Screen Base Addr | Unified Palette | Mosiac | 0 | 0 | Tile Base Addr | Priority | |||||||
| $00C | BG2_CR | R/W | Screen Size | Overflow Mode | Screen Base Addr | Unified Palette | Mosiac | 0 | 0 | Tile Base Addr | Priority | |||||||
| $00E | BG3_CR | R/W | Screen Size | Overflow Mode | Screen Base Addr | Unified Palette | Mosiac | 0 | 0 | Tile Base Addr | Priority | |||||||
| $010 | BG0_X | W |   | Horizontal Scroll | ||||||||||||||
| $012 | BG0_Y | W |   | Vertical Scroll | ||||||||||||||
| $014 | BG1_X | W |   | Horizontal Scroll | ||||||||||||||
| $016 | BG1_Y | W |   | Vertical Scroll | ||||||||||||||
| $018 | BG2_X | W |   | Horizontal Scroll | ||||||||||||||
| $01A | BG2_Y | W |   | Vertical Scroll | ||||||||||||||
| $01C | BG3_X | W |   | Horizontal Scroll | ||||||||||||||
| $01E | BG3_Y | W |   | Vertical Scroll | ||||||||||||||
| $020 | BG2_DX | W | 8.8 H-Step, X-Dir Texture Delta | |||||||||||||||
| $022 | BG2_VDX | W | 8.8 V-Step, X-Dir Texture Delta | |||||||||||||||
| $024 | BG2_DY | W | 8.8 H-Step, Y-Dir Texture Delta | |||||||||||||||
| $026 | BG2_VDY | W | 8.8 V-Step, Y-Dir Texture Delta | |||||||||||||||
| $028 | BG2_XL | W | Low 16 of 20.8 X Center | |||||||||||||||
| $02A | BG2_XH | W |   | High 12 of 20.8 X Center | ||||||||||||||
| $02C | BG2_YL | W | Low 16 of 20.8 Y Center | |||||||||||||||
| $02E | BG2_YH | W |   | High 12 of 20.8 Y Center | ||||||||||||||
| $030 | BG3_DX | W | 8.8 H-Step, X-Dir Texture Delta | |||||||||||||||
| $032 | BG3_VDX | W | 8.8 V-Step, X-Dir Texture Delta | |||||||||||||||
| $034 | BG3_DY | W | 8.8 H-Step, Y-Dir Texture Delta | |||||||||||||||
| $036 | BG3_VDY | W | 8.8 V-Step, Y-Dir Texture Delta | |||||||||||||||
| $038 | BG3_XL | W | Low 16 of 20.8 X Center | |||||||||||||||
| $03A | BG3_XH | W |   | High 12 of 20.8 X Center | ||||||||||||||
| $03C | BG3_YL | W | Low 16 of 20.8 Y Center | |||||||||||||||
| $03E | BG3_YH | W |   | High 12 of 20.8 Y Center | ||||||||||||||
| $040 | WIN0_H | W | Left (x0) | Right (x1) | ||||||||||||||
| $042 | WIN1_H | W | Left (x0) | Right (x1) | ||||||||||||||
| $044 | WIN0_V | W | Top (y0) | Bottom (y1) | ||||||||||||||
| $046 | WIN1_V | W | Top (y0) | Bottom (y1) | ||||||||||||||
| $048 | WIN_IN | R/W |   | Blends in Win1 | Sprites in Win1 | BG3 in Win1 | BG2 in Win1 | BG1 in Win1 | BG0 in Win1 |   | Blends in Win0 | Sprites in Win0 | BG3 in Win0 | BG2 in Win0 | BG1 in Win0 | BG0 in Win0 | ||
| $04A | WIN_OUT | R/W |   | Blends in Sprite Win | Sprites in Sprite Win | BG3 in Sprite Win | BG2 in Sprite Win | BG1 in Sprite Win | BG0 in Sprite Win |   | Blends outside | Sprites outside | BG3 outside | BG2 outside | BG1 outside | BG0 outside | ||
| $04C | MOSAIC | W | Sprite Y Level | Sprite X Level | Background Y Level | Background X Level | ||||||||||||
| $050 | BLEND_CR | R/W | - | - | reserved | Blend Sprites in alpha | Blend BG3 in alpha | Blend BG2 in alpha | Blend BG1 in alpha | Blend BG0 in alpha | Blend Mode | reserved | Blend Sprites | Blend BG3 | Blend BG2 | Blend BG1 | Blend BG0 | |
| $052 | BLEND_AB | W | - | Coefficient B | - | Coefficient A | ||||||||||||
| $052 | BLEND_Y | W | - | Coefficient Y | ||||||||||||||
| Offset | Name | Type | F | E | D | C | B | A | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| addr | name | write mode | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| Offset | Name | Type | F | E | D | C | B | A | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| $0B0 | DMA0_SRC_L | W | Low 16 of 27 bit source address | |||||||||||||||
| $0B2 | DMA0_SRC_H | W |   | High 11 of 27 bit source address | ||||||||||||||
| $0BC | DMA1_SRC_L | W | Low 16 of 28 bit source address | |||||||||||||||
| $0BE | DMA1_SRC_H | W |   | High 12 of 28 bit source address | ||||||||||||||
| $0C8 | DMA2_SRC_L | W | Low 16 of 28 bit source address | |||||||||||||||
| $0CA | DMA2_SRC_H | W |   | High 12 of 28 bit source address | ||||||||||||||
| $0D4 | DMA2_SRC_L | W | Low 16 of 28 bit source address | |||||||||||||||
| $0D6 | DMA2_SRC_H | W |   | High 12 of 28 bit source address | ||||||||||||||
| Offset | Name | Type | F | E | D | C | B | A | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| $0B4 | DMA0_DEST_L | W | Low 16 of 27 bit destination address | |||||||||||||||
| $0B6 | DMA0_DEST_H | W |   | High 11 of 27 bit destination address | ||||||||||||||
| $0C0 | DMA1_DEST_L | W | Low 16 of 27 bit destination address | |||||||||||||||
| $0C2 | DMA1_DEST_H | W |   | High 11 of 27 bit destination address | ||||||||||||||
| $0CC | DMA2_DEST_L | W | Low 16 of 27 bit destination address | |||||||||||||||
| $0CE | DMA2_DEST_H | W |   | High 11 of 27 bit destination address | ||||||||||||||
| $0D8 | DMA3_DEST_L | W | Low 16 of 28 bit destination address | |||||||||||||||
| $0DA | DMA3_DEST_H | W |   | High 12 of 28 bit destination address | ||||||||||||||
| Offset | Name | Type | F | E | D | C | B | A | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| $0B8 | DMA0_SIZE | W |   | Transfer Count | ||||||||||||||
| $0C4 | DMA1_SIZE | W |   | Transfer Count | ||||||||||||||
| $0D0 | DMA2_SIZE | W |   | Transfer Count | ||||||||||||||
| $0DC | DMA3_SIZE | W | Transfer Count | |||||||||||||||
| Offset | Name | Type | F | E | D | C | B | A | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| $0BA | DMA0_CR | R/W | Enabled | IRQ | Start Mode |   | Width | Repeat | Source Mode | Dest Mode |   | |||||||
| $0C6 | DMA1_CR | R/W | Enabled | IRQ | Start Mode |   | Width | Repeat | Source Mode | Dest Mode |   | |||||||
| $0D2 | DMA2_CR | R/W | Enabled | IRQ | Start Mode |   | Width | Repeat | Source Mode | Dest Mode |   | |||||||
| $0DE | DMA3_CR | R/W | Enabled | IRQ | Start Mode | reserved | Width | Repeat | Source Mode | Dest Mode |   | |||||||
| Offset | Name | Type | F | E | D | C | B | A | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| $100 | TIMER0_DATA | R/W | Timer Value | |||||||||||||||
| $104 | TIMER1_DATA | R/W | Timer Value | |||||||||||||||
| $108 | TIMER2_DATA | R/W | Timer Value | |||||||||||||||
| $10C | TIMER3_DATA | R/W | Timer Value | |||||||||||||||
| Offset | Name | Type | F | E | D | C | B | A | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| $102 | TIMER0_CR | R/W |   | Enabled | IRQ |   | Cascade | Frequency | ||||||||||
| $106 | TIMER1_CR | R/W |   | Enabled | IRQ |   | Cascade | Frequency | ||||||||||
| $10A | TIMER2_CR | R/W |   | Enabled | IRQ |   | Cascade | Frequency | ||||||||||
| $10E | TIMER3_CR | R/W |   | Enabled | IRQ |   | Cascade | Frequency | ||||||||||
| Offset | Name | Type | F | E | D | C | B | A | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| addr | name | write mode | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| Offset | Name | Type | F | E | D | C | B | A | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| $130 | KEYS | R/W |   | L Shoulder Released | R Shoulder Released | Down Released | Up Released | Left Released | Right Released | Start Released | Select Released | B Released | A Released | |||||
Each bit corresponds to a specific input, and is set when released (i.e. a standby state of 1).
BX LRinstruction.
| Offset | Name | Type | F | E | D | C | B | A | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| $200 | IE | R/W |   | Cart | Keypad | DMA 3 | DMA 2 | DMA 1 | DMA 0 | Comms | Timer 3 | Timer 2 | Timer 1 | Timer 0 | Y-Trigger | H-Blank | V-Blank | |
| $202 | IF | R/W |   | Cart | Keypad | DMA 3 | DMA 2 | DMA 1 | DMA 0 | Comms | Timer 3 | Timer 2 | Timer 1 | Timer 0 | Y-Trigger | H-Blank | V-Blank | |
| $208 | IME | R/W |   | Enabled | ||||||||||||||
wegfweg
| Offset | Name | Type | F | E | D | C | B | A | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| $204 | WS_CR | R/W | Game Pack Type (R) | Prefetch |   | Cart Clock | Bank 2 | Bank 1 | Bank 0 | SRAM mode | ||||||||
| $300 | PAUSE | * | Power Down | Mode |   | reserved | ||||||||||||
ARM SDT Code:
void dprint(char *string) {
__asm {
mov r2, r0
mov r0, #0xC0DED00D
mov r1, #0
and r0, r0, r0
}
}
GCC Code:
void dprint(const char *sz) {
asm volatile("
mov r2, %0
ldr r0, =0xc0ded00d
mov r1, #0
and r0, r0, r0
" :
/* No output */ :
"r" (sz) :
"r0", "r1", "r2");
}
(thanks to Darren Sillett for the correction to the GCC version)
This document is copyright © 2001 Joat, and may not be copied, altered, or redistributed in any form or manner without my prior written consent: this includes any copies whatsoever, even mirrors.