NES Emulator

Pages: 12
closed account (N36fSL3A)
Okay so I decided to suck it up and just start to redo my previously lost NES emulator. I haven't worked on anything NES-related recently so this is all new to me now.

This topic is going to be the topic I ask questions about a NES emulator, so I'll try to keep it alive in case anyone else feels interested and wants to do the same.

I'm currently having trouble deciphering the .nes file format (The creator *had* to make it more complicated than it needed to be).

First off I'd like to make sure that it's correct:
1
2
3
4
5
6
7
8
9
10
11
12
struct FileHeader
{
	uint8_t  id[4];
	uint8_t  numPRGROM16;
	uint8_t  numCHRROM8;
	uint8_t  flags1;
	uint8_t  flags2;
	uint8_t  numPRGRAM8;
	uint8_t  flags3;
	uint8_t  flags4;
	uint8_t  reserved[5];
};


How am I supposed to extract the data from the flags? What do they mean?
Last edited on
Last edited on
I was fixing to recommend seeing if Disch would take a minute to post the links he knows of. Also won't hurt to link to this thread by him: http://www.cplusplus.com/forum/lounge/119813/
The question is how much do you now know about computer architecture. To fully emulate a nes, requires you to know how virtual memory, cache memory, memory interleaving (for banks), and addressing modes/opcodes work. Then it'd require you to be able to parse the game image file to validate if the header is right if it is not a raw image. Then you'd have to load in the individual instructions and decode them into something useable. I don't know if you comepleted your last nes project, but I imagine it would be difficult if you didn't know anything about the engineering of virtual machines in general.

dude if you pull this off before you are in highschool, you are a passionate, or you have no personal life XD. I still haven't had enough time to make my own emulator, even though I know how. Flags are used, because they are a type of encoding.

for example... in windows. there are several functions that require flags to work
the flags serve as identifiers when they can be properly decoded..Since you've commented on some of my threads I'll help you some.

something im writing (in a more generic adaptation) for my engine:

1
2
3
4
#define RED_M 0x00ff0000
#define GREEN_M 0x0000ff00
#define BLUE_M 0x000000ff
#define ALPHA_M 0xff000000 


if you have a pixel of the format RGBA for example,
|R|G|B|A
 
pixel = RED_M | GREEN_M | BLUE_M | ALPHA_M

gives a white image with full opacity, depending on the system

it gets deeper than that. But it is beautiful
to check to see if a certain field is set you can use the '&' operator
to see if it matches any of the mask
if(RED_M & pixel) //detected red, decode the red, and do shit with it.

this type of masking is used with controllers, to check if a button is pressed on the closer to hardware levels.

for example..

 
GetAsyncKeystate(key_from_0_to_255) & 0xff 

will check to see if the associated ascii key was pressed.
Last edited on
Fredbill wrote:
I'm currently having trouble deciphering the .nes file format (The creator *had* to make it more complicated than it needed to be).


It was actually simple initially, but kept getting expanded as more and more games got dumped. They tried to keep it backwards compatible.

Fredbill wrote:
How am I supposed to extract the data from the flags?


Bitwise operations, good buddy. I gave a breakdown of how they work in this post:
http://www.cplusplus.com/forum/beginner/72705/#msg387899


Here's a snippit from the nesdev wiki:
http://wiki.nesdev.com/w/index.php/INES#Flags_6

76543210
||||||||
||||+||+- 0xx0: vertical arrangement/horizontal mirroring (CIRAM A10 = PPU A11)
|||| ||   0xx1: horizontal arrangement/vertical mirroring (CIRAM A10 = PPU A10)
|||| ||   1xxx: four-screen VRAM
|||| |+-- 1: SRAM in CPU $6000-$7FFF, if present, is battery backed
|||| +--- 1: 512-byte trainer at $7000-$71FF (stored before PRG data)
++++----- Lower nybble of mapper number


(ignore bit 3 for now... I'll explain later)

As this chart illustrates... after we ignore bit 3... we can see that bit 0 selects whether or not to use Horizontal or Vertical mirroring.

So we can just use the AND operator to extract the bit we're interested in. In this case, since we're interested in the lowest bit (bit 0), we AND with 1 (since 1 == 00000001 in binary)

1
2
3
4
if( flags1 & 0x01 )
  // vertical mirroring
else
  // horizontal mirroring 


likewise, bit 1 (0x02) selects whether or not the SRAM is battery backed (so you can generate .sav files to save the user's game) So to extract that bit, we just AND with 2:

1
2
3
4
if( flags1 & 0x02 )
  // SRAM is battery backed
else
  // SRAM is not battery backed 



Fredbill wrote:
What do they mean?


flags6: bit 0 = selects whether to use Horizontal or Vertical mirroring by default

flags6: bit 1 = indicates that SRAM is battery backed (the cartridge is capable of saving games between plays)

flags6: bit 2 = Ignore this. No games use it. It's a legacy thing for FFE mappers that are long obsolete.

flags6: bit 3 = This bit is only significant for mapper 4. And it is only used in 2 games (Rad Racer 2 and Gauntlet). You can safely ignore it for now.

flags6: bits4-7 = Determine the low 4 bits of the mapper number. Combine this with the bits from flags7 to form an 8-bit mapper number. The mapper number tells you the cartridge type and how the game will swap out PRG/CHR and other cartridge-side hardware info.


flags7: You can ignore all of this except the mapper bits. The low bits are practically never used in the wild.

Everything after flags7 in the header can be ignored, as in-the-wild ROMs don't use them or have them set incorrectly.




A question I anticipate but you have not asked yet wrote:
So wtf is mirroring and what's the difference between horizontal, vertical, and 4-screen?


NES graphics work by having 'nametables'. A nametable is basically a 2D grid of tiles. Each tile is 8x8 pixels.

Each nametable consists of 32*30 tiles... effectively forming a full "screen".

A nametable is 0x0400 bytes (1K) in size.


Conceptually, there are 4 nametables on the NES. They reside in PPU memory starting at address $2000 (not to be confused with $2000 in CPU memory).

They're logically laid out like so (remember each NT is a "screen"):


  [ $2000 ][ $2400 ]
  [ $2800 ][ $2C00 ]


So... if the screen is not scrolled at all... the only NT that will be visible will be the $2000 NT. But as the screen scrolls to the right, you'll see more of $2400 and less of $2000. As you scroll down, you'll see more of $2800,$2C00. Etc.

They "wrap"... so as you scroll passed the right of $2400 you will scroll back into the $2000 NT.



This is all fine and good... however the NES does not physically have the memory for 4 nametables. It only has 2K of memory for nametables... so there are only two physical nametables. I call these nametables NTA and NTB.

Nametable mirroring (aka "mirroring") is the technique used to spread these 2 nametables across all 4 "slots".

The most simple form of mirroring is "1 screen" (not covered in the NES header). "1 screen" simply takes one of the nametables and puts it in all 4 slots. Example:

[ NTA ][ NTA ]
[ NTA ][ NTA ]


The idea here is that addresses $2000, $2400, $2800, and $2C00 all 'mirror' the same place in memory. Reading from any of those addresses will read the same byte. And writing to any of those addresses will change the value at all of them (ie: they all point to the same place).

In 1-screen, each of the four slots will appear identical.


More common forms of mirroring are horizontal and vertical. These simply arrange the nametables in a different fashion.

Horizontal:
[ NTA ][ NTA ]
[ NTB ][ NTB ]


Vertical:
[ NTA ][ NTB ]
[ NTA ][ NTB ]



Games which scroll horizontally (like Super Mario Bros) will tend to use vertical mirroring. Whereas games which scroll vertically (like Ice Climber) will tend to use horizontal mirroring.


"4 screen" mirroring is extremely rare. I'm only aware of 3 games which use it, and one of them is incredibly obscure. Games which use 4-screen use additional memory on the cartridge to provide a full 4 nametables, so that none of the slots have to be mirrors -- they can all be unique.
Last edited on
Seriously Disch, with that type of knowledge you could easily get paid man. Somebody would buy it I guarrantee. Nearly every post you make I learn something from it.
I don't think there's much of a market for "misc NES technical trivia"
closed account (N36fSL3A)
Thanks Disch and DeXecipher. So which game would be the easiest for me to get my emulator working with?

I still don't understand why the memory was mirrored, it's just a waste of space.

And how do I combine the mapper bits?
Last edited on
closed account (10X9216C)
dex wrote:

something im writing (in a more generic adaptation) for my engine:

#define RED_M 0x00ff0000
#define GREEN_M 0x0000ff00
#define BLUE_M 0x000000ff
#define ALPHA_M 0xff000000


if you have a pixel of the format RGBA for example,
|R|G|B|A

pixel = RED_M | GREEN_M | BLUE_M | ALPHA_M

gives a white image with full opacity, depending on the system

it gets deeper than that. But it is beautiful
to check to see if a certain field is set you can use the '&' operator
to see if it matches any of the mask
if(RED_M && pixel) //detected red, decode the red, and do shit with it.


1
2
3
4
5
6
7
8
9
10
11
12
13

struct Color
{
    unsigned char red;
    unsigned char green;
    unsigned char blue;
    unsigned char alpha;    
};

Color c;

if(c.red) // detected red...


I don't really understand why you are using macros to begin with, they aren't needed and as a whole using a struct is more meaningful.

1
2
void somefunc(Color);
void somefunc(int);


for example..


GetAsyncKeystate(key_from_0_to_255) & ff

will check to see if the associated ascii key was pressed.

You prob shouldn't take examples from windows API, most of it is god awful and in C.
Last edited on
closed account (N36fSL3A)
Just wondering if this is the correct way to retrieve the mapper.mapper = (header.flags2 & 0x0F) | (header.flags1 & 0x0F) << 4;
Fredbill wrote:
So which game would be the easiest for me to get my emulator working with?


Ice Climber
Balloon Fight

Both are very basic and don't do any fancy tricks. They're great starters.

I still don't understand why the memory was mirrored, it's just a waste of space.


Memory was expensive. The NES only had 2K memory for nametables because 4K would have cost more.

On the other hand... address space is basically free (at least until you need an additional pin)... so having 4K of addresses doesn't cost any more than having 2K.

Just wondering if this is the correct way to retrieve the mapper.


Close. You're getting the low 4 bits when the mapper number is actually stored in the high bits.

You want this:

 
mapper = (header.flags2 & 0xF0) | ((header.flags1 & 0xF0) >> 4);


Assuming flags2 is hdr[7] and flags1 is hdr[6]
Last edited on
closed account (N36fSL3A)
Okay, quick question. Does the PC register increment before or after the instruction is handled?

Example:
1
2
3
uint8_t opcode = memory.Read(pc);
pc++;
HandleOpcode(opcode);


Or:
1
2
3
uint8_t opcode = memory.Read(pc);
HandleOpcode(opcode);
pc++;

It increments after the fetch. So your first example is correct.

Note that the opcode might also increment the PC by up to 2 more bytes depending on the addressing mode.
closed account (N36fSL3A)
Which way do you think is the most efficient way to implement rom bank switching?
I wouldn't worry about it yet. Mapper 0 games have no bankswitching. At least get them working first before worrying about more complex mappers.


That said... the typical approach is to break the memory into "slots" and have a pointer for each slot. When the game reads from an address, you extract the slot index from the address and read from that slot's pointer.

For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// ref:
typedef uint8_t  u8;

// For PRG, swapping to 4K is typically good:
// So you break the address into 16 slots, each slot is 4K in size.
//
// IE:  address $8632 would read from slot $8, address $632

u8 PRGROMBuffer[0x40000];  // a 256K PRG ROM

u8* Prg[0x10];  // your 16 PRG slots
  // these pointers would point to different areas of the PRGROMBuffer

// the function to read from a given PRG address
//  (a cpu read)
u8 read(u16 addr)
{
    int slot = (addr & 0xF000) >> 12;  // extract the slot number
    return Prg[slot][addr & 0x0FFF];  // read from that slot
}


Then... all you have to do to switch banks is change a pointer or two.

EDIT: note that the above 'read' function would work for PRG reads only.. so typically only for addresses $6000 and up. For other addresses, you'll have to map read to some other logic to read from system RAM or CPU/APU/PPU registers. /EDIT



CHR swapping is the same idea. Only you'd break it up into smaller slots (since CHR swapping typically has finer granularity).

1K CHR slots works for all games I'm aware of. So 8 slots of 1K will get you the CHR you need.
Last edited on
@mysolar
I'm inmplementing my windows part of my game engine on top on windows, therefore I am using WINAPI for certain things. Macros are good for initilization and checking/masking fields using the '&' operator. I wouldn't do the following because floating point numbers are iffy because of the way they are encoded and slower. It depends on your implementation. I implement scalable code. Just about every company that deals with c related api's uses macros. But definitely it is not necessary to use macroes for everything.

1
2
3
4
5
6
7
8
9
10
11
12
13
struct Color{
float r;
float g;
float b;
float a;
}c;

int main(){
//what should it be in this case, since 7 decimals is the limit?
Color c{0.0000001,0.0,0.0,0.0};
if(c.r) return -1;
return 0;
}
Last edited on
closed account (N36fSL3A)
1
2
3
4
5
6
7
struct Color
{
    uint8_t r;
    uint8_t b;
    uint8_t g;
    uint8_t a;
};
closed account (10X9216C)
I'm inmplementing my windows part of my game engine on top on windows, therefore I am using WINAPI

That doesn't really mean you have to use windows bad practices.

Macros are good for initilization and checking/masking fields using the '&' operator.


You can achieve the same result without using a macro:
1
2
3
4
const int red_mask = 0x00ff0000;
//or
constexpr int red_mask = 0x00ff0000;
if(pixel & red_mask)


I implement scalable code

How does this make your code scalable... Stop trying to justify bad practices.

Just about every company that deals with c related api's uses macros. But definitely it is not necessary to use macroes for everything.

Don't know any recent games written in C. As well most game API are written exclusively in C++ (havok, directx, bullet, etc) and then have wrappers for C, python, Java, etc.
Last edited on
closed account (N36fSL3A)
For the PPU should I use multithreading?
Fredbill wrote:
For the PPU should I use multithreading?


No. At least not pre-emptive multithreading (which is the kind of threading you're probably thinking of).

It might seem like a good idea because the PPU and CPU are running in parallel... but games will frequently do timed writes to PPU regs which change the displayed image partway through the screen.

In order to emulate those, you will need to "sync" the CPU and PPU at least at each of those writes. Syncing two threads is slow and painful.



On the other hand... there's "cooperative" multithreading... which runs in one thread, but creates several different psuedo-threads and allows you to explicitly switch between them.

IE, you could have a CPU thread and a PPU thread. And while they will never run in parallel... you can run the CPU for a while, then explictly switch to the PPU thread and run the PPU for a while, then switch back, etc.

I did a test with both co-op and pre-emptive multithreading a while back. Cooperative was not only easier to work with (getting pre-emptive to work without deadlocks was actually kind of tricky) but it also performed better.

http://forums.nesdev.com/viewtopic.php?p=96069#p96069

If you want to play around with coop-multithreading, I recommend libco:

http://byuu.org/programming/libco/
Pages: 12