Ramblings of an NSF enthusiast (guide to writing an NSF player)

Pages: 1234
To do a bitwise and, just line up the two binary numbers as in grade-school addition, and if both the bits are ones, then the resulting bit is a 1, otherwise 0.

0010101100
0011010110
----------------
0010000100


Here is 0x07FF in binary, 11111111111. So you can see that any bit higher than the 11th bit is going to be anded with a 0, and will then be surely turned off, and each of the first 11 bits will be anded with 1, so they will stay on if they were on, and otherwise stay off.

In assembly, it's nice to know the hexadecimal versions of numbers which are all 1's up to some i'th bit, and to also realize that powers of two correspond to numbers with single on bits.

ex:

10 = 2
100 = 4
1000 = 8
10000 = 16
100000 = 32
1000000 = 64
....


This is why you often see options which are stored in bit-fields set using | or &. You can use each bit as a true false variable. To set the bit true, bitwise or it with it's corresponding power of 2. To set it false, & it with the not of the corresponding power of two.

ex:
1
2
3
4
5
6
7
8
9
10
11
12
#define USE_P1 64

// USE_P1 is this number   01000000
// ~ USE_P1 is this number 10111111

uint8_t options = 0;

//turn on USE_P1 option
options |= USE_P1;

//turn off USE_P1
options &= ( ~ USE_P1 );

Also you could just bitwise xor it with itself to turn it off.


EDIT correction
: Thanks Dische
Last edited on
Yes, VRAM is the same way.

EDIT:

@hritwin... your example is illustrating binary complement (~) not logical not (!)

options &= ( ~ USE_P1 ); // <- this is what you meant
Last edited on
closed account (9wqjE3v7)
Ok, thanks. Also, a little off-topic, does the 6502 have an ALE equivalent to limit interface DIP usage?
Last edited on
Ok, thanks. Also, a little off-topic, does the 6502 have an ALE equivalent to limit interface DIP usage?


You lost me there. Don't know what that is. So I'm going to say "no". Haha.
closed account (9wqjE3v7)
oh...oops :*

By ALE I meant 'Address Latch Enable', a pin intel used on older processors to enable the sharing *of data and address lines as a workaround for the limited 40 interface pins. I was wondering if the 6502 had a similar concept to this.
Last edited on
Ah. Nah I wouldn't know.
closed account (N36fSL3A)
How would I even get the pixel data from the PPU to render in OGL with acceptable performance? Isn't accessing textures slow?
There's no way around sending all the pixel data to the GPU every frame. And yes it's slow, but since the NES screen res is so small (256x240) it's not that bad.

Where you get the OpenGL performance boost is stretching the image and/or applying filters (as shaders).
closed account (N36fSL3A)
Any ROM loading tutorials? Can't find any on Google.
Like full NES ROM? or are you just interested in NSFs?
closed account (N36fSL3A)
A full NES ROM, I'm going for an emulator.
Here's the file layout:

http://wiki.nesdev.com/w/index.php/INES


First 0x10 bytes is the header. The most important parts of the header are:

1) The mapper number (start with Mapper 0! Very simple games like Ice Climber, Balloon Fight, Excitebike)

2) The number of PRG banks. This is the size of the PRG ROM on the cartridge. PRG ROM is ROM that is visible in CPU addressing space. For mapper 0, you need to support 16K and 32K PRG sizes.

if 16K ($4000 bytes):
The PRG ROM should be readable from $C000-FFFF
and also mirrored at $8000-BFFF


if 32K ($8000 bytes):
The PRM ROM should be readable from $8000-FFFF


3) The number of CHR ROM banks. For mapper 0 ROMs this should always be 8K since that's the max available without a mapper. CHR is the "pattern tables" and should be placed at $0000-1FFF in PPU memory (not CPU memory). It is not directly accessible by the CPU.



After the 0x10 byte header is the PRG ROM. Followed by the CHR ROM.
Since Lumpkin is doing a full NES emu... PPU BREAKDOWN


Preface:

I'm going to assume NTSC timing (US/Japan). PAL timings (Europe) are slightly different. I'll make a note at the end to differentiate between them.

THE FRAME / TIMING OVERVIEW

The PPU is a separate unit that you can think of running independently of the CPU.... like it were its own processor. While the CPU is running it will be fetching and rendering data and doing its own logic. You might think "oh, then I'll just make it a separate thread!" Don't. You'll have to sync the timing between the two too often and it'll be super slow. I've tried it. (Cooperative threading is ok.. just not preemptive -- but I'll get into this later)

The NTSC Frame consists of of 262 "scanlines". Each scanline basically represents 1 row of pixels and consists of 341 "dots" or cycles. I'm going to use the term dots as I'm more familiar with it in this context, but just remember that 1 dot = 1 ppu cycle.

The frame is like so:


+-------------------------------------------------+
| 1 "idle" scanline                               |
+-------------------------------------------------+
|                                                 |
| 20 scanlines of VBlank                          |
|                                                 |
+-------------------------------------------------+
| 1 "prerender" scanline (aka scanline -1)        |
+-------------------------------------------------+
|                                                 |
|                                                 |
|                                                 |
| 240 "render" scanlines  (aka scanlines 0-239)   |
|                                                 |
|                                                 |
|                                                 |
|                                                 |
+-------------------------------------------------+


This frame repeats indefinitely. Also note this frame is constant... and merely is indicitave of the passage of time. There is nothing the game can do to "stop" or "pause" the passage of the frame... it just progresses naturally as time passes... as instructions are executed.

So: 262 scanlines * 341 dots per scanline = 89342 dots per frame

Timing wise... 3 dots = 1 CPU cycle. So when the CPU executes an instruction... let's say it executes LDA zero page.... that instruction takes 3 CPU cycles to complete. That means that 9 dots have passed on the PPU.

More math:

89342 dots per frame / 3 dots per CPU cycle = 29780.6667 CPU cycles per frame

1789772.727272 cycles per second (CPU clock rate) / 29780.6667 CPU cycles per frame = ~60.0984776 frames per second (slightly faster than the expected 60 FPS)


At the start of VBlank, the 'vblank' status bit of $2002 is set (so reading $2002 will return the high bit set), and the PPU will generate an NMI (if enabled via the high bit of $2000). When an NMI is triggered, an interrupt occurs in the CPU (current PC and status flags are pushed to the stack, and the CPU jumps to the 'NMI' vector, specified by the address at $FFFA). NMIs are how games get notified that VBlank has started. One should happen every frame (when enabled). When disabled they should not occur.

VBLANK, PPU ON/OFF SWITCH

The $2001 PPU Register has 2 very important control bits.
ref: http://wiki.nesdev.com/w/index.php/PPU_registers#Mask_.28.242001.29_.3E_write

Bit 3 (0x08), when set, the background (BG) will be visible
Bit 4 (0x10), when set, the sprites (movable objects, like Mario, Link, etc) will be visible

When either of these bits are set... the PPU is considered "on". When both of them are clear, the PPU is considered "off" and does different behavior than if it were on.

The game will set/clear these bits by writing to $2001:


LDA #$18
STA $2001  ; Turns on the PPU
LDA #$00
STA $2001  ; Turns off the PPU


Note that as previously mentioned... turning off the PPU does not stop the frame from progressing. "off" is a bit of a misnomer because the PPU is still powered and is doing stuff... it's just doing less work.


During the "idle" and "VBlank" scanlines... the PPU is effectively doing nothing but waiting for time to pass. During idle and VBlank scanlines the PPU is "not in rendering"

The "prerender" and "render" scanlines are when the PPU is drawing pixels to the screen, updating internal registers, and doing all its crazy work to get the image displayed. If the PPU is on during these scanlines... the PPU is "in rendering". If the PPU is off during this time... it is "not in rendering".

When the PPU is "in rendering" (or hereon, just "rendering"), it is unsafe to access various PPU registers. For example, $2007 (CPU<->PPU data port) can only be accessed outside of rendering. Same with $2003 and $2004 (Sprite address and data regs). Therefore games can only updates sprites/BG to update the screen outside of rendering. This means either waiting for VBlank (or idle scanline)... or forcibly turning the PPU off via $2001.

Typically, games will turn the PPU off... draw the entire screen by doing a bazillion writes to $2007, then will turn the PPU on. While the PPU is on.. every frame, during VBlank, they will make minor changes to the screen.

For example in Super Mario Bros... once you press start at the title screen:

- the screen will go black for a few frames because the PPU is switched off so it can clear the BG and print how many lives you have remaining
- then the screen will turn back on (PPU on) to show that info to you for a few frames.
- then it goes black again (PPU off) to draw the visible tiles from level 1-1
- then PPU goes on again and you start playing the game
- while playing the game, the PPU remains on... and as you move through the level, more of it is drawn with the PPU remaining on (because the drawing is being done by the CPU during VBlank)


MORE SCANLINE DETAILS

As mentioned, the PPU is effectively waiting for the idle and vblank scanlines... so nothing happens on those lines.

However during rendering, the PPU is doing all sorts of crap. I'm not going to get into details just yet. For extreme details on the timing you can refer to the wiki (or ask... I don't mind answering... I just don't want to overwhelm you). I recommend you start with a simple emu and don't focus too much on the extreme timing. At least not until later.

If interested in the exact timing... a diagram can be found here:
http://wiki.nesdev.com/w/images/d/d1/Ntsc_timing.png (note it refers to the "idle" scanline as "post-render", but it's the same thing)

The basic work done by the PPU during a rendering scanline is:
- Fetch tile data for visible tiles
- Output pixels (1 pixel per dot). Only outputs 256 pixels.
- Updates scroll (will explain in another section)
- Fetches tiles for sprite data for the next scanline


For a basic emu... if you don't care about cycle-level timing (pixel accurate emulation)... you can just draw one row of pixels every 341 dots.



SYNCING PPU AND CPU + TECHNIQUES

It is important to keep track of where both the CPU and PPU are in the current frame. Games will do various raster effecs by writing to PPU registers mid-frame (and in some cases, even mid-scanline!). The most common technique is basic screen splitting, where the game will change the scroll mid-screen.

For example of splitting the screen... you can look at Super Mario Bros. It has a status bar at the top of the screen which stays stationary, whereas the map scrolls horizontally below it. This is accomplished by having the scroll set to 0,0 at the start of the frame... then about 70 or so scanlines into rendering... it will change the PPU scroll values. This results in the visible "split". Super Mario Bros waits those 70 or so scanlines by having the CPU effectively wait/spin until a certain number of CPU cycles has passed. (and... it also uses "Sprite 0 hit" which is another topic I'll get into later).

There are a couple of techniques for emulating CPU and PPU timing interations. The most common one that I'm going to focus on is the "catch up" approach. It works like so:

- You run the CPU ahead of the PPU... have it run for a full frame's worth of cycles.
- Keep track of a CPU 'timestamp' which increments with each passing cycle.
- Whenever the CPU does something that could impact the PPU (ie: read/write any PPU register), you pause CPU execution and run the PPU up to the current CPU timestamp.
- Once the PPU "catches up" to the CPU timestamp... you perform the register read/write, and continue executing the CPU.


This can be done in a single thread. Or if you want to get very adventerous you can try multithreading. Note that I do not recommend pre-emptive multithreading (which is what you are thinking of when I say "multithreading"). However cooperative multithreading actually works pretty well. The idea with cooperative is that the two threads do not run simultaneously. Instead, only one thread is running at a time.. and you can "switch" to other threads whenever you want.

The reason cooperative is better for this is because you will have to sync up the PPU and CPU a LOT. Several times (possibly several dozen times) per frame. Having one thread wait for the other that many times creates a lot of blocking which ultimately makes things very slow.

If interested in cooperative threading... a very simple to use lib is libco, which is available here:
https://www.dropbox.com/s/cqfhidb8djug9lq/libco.zip

Libco is used in various emus.


As for keeping track of timestamps... I recommend keeping both timestamps in the same "base" so comparisons between them are easier. So for example:

- increment the CPU timestamp by 3 every CPU cycle
- increment the PPU timestamp by 1 every dot

That will keep the 3:1 ratio. OR, a better way might be:

- increment the CPU timestamp by 15 every CPU cycle
- increment the PPU timestamp by 5 every dot

This will keep the 3:1 ratio, and will also make PAL support easier if/when you add it in the future.



BUMP

With a demo NSF player I whipped up:

https://www.dropbox.com/s/5brkisuy8xhvoc8/nsfplayer.zip


weeeeeeeee

source and binary included.

Also includes some sample nsfs.
Last edited on
closed account (N36fSL3A)
I think the hardest part of this project is deciphering how the opcodes work.
Do you mean the addressing modes? Like "Zero Page", "Absolute" and all that?

Or do you mean the instructions, like LDA, STA, etc?

Like I say I'm very happy to answer Qs.
closed account (N36fSL3A)
Well I found a table of all opcodes but I don't know how to decipher the table.

(instructions)
This is the best 6502 reference page, IMO:

http://www.obelisk.demon.co.uk/6502/reference.html


So for example...

opcode 0x69 is ADC Immediate.

'ADC' is the instruction.
'Immediate' is the addressing mode.

ADC adds a value into the A register (with carry, as illustrated in my previous post).
The addressing mode determines where the operand comes frome. IE: what number to add.

Immediate mode means the value comes immediately after the opcode. So if you run across these two bytes in the ROM:

69 06

That is ADC #$06 which would add 6 to A.


edit: had typed lda instead of adc. doh
Last edited on
closed account (NUj6URfi)
Sounds like you guys know a lot about emulation. Does anyone know why there hasn't been a xbox emulator?
Pages: 1234