NES database lookup #611
yo1dog
started this conversation in
Coding Corner
Replies: 2 comments 2 replies
-
I realized there was a bug with the way I was hashing the CHR ROM. Here is an updated index which maps to full NES2.0 header data and No-Intro names: nesIndex4.json.txt |
Beta Was this translation helpful? Give feedback.
0 replies
-
Interesting idea 👍 But I'm not sure it will work on physical cartridges(at least not all of them) because with some carts you have to set the mapper first to read the bytes of the ROM. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Don't know much about NES in general, but I had a thought: AFAIK, dumping NES is difficult because it requires knowing PRG/CHR size beforehand. Hence the need for manually sourcing and inputting these values from places like http://nes.dnsabr.com. Cart dumper automates this by storing a database of hashes of globally known seekable sections of the PRG ROM. However, this does not always work because there can be conflicts when these sections are identical between different games.
My mind immediately jumps to a b-tree index type strategy: Progressively scan and hash the PRG and CHR ROMs using the current match set to increase the seekable area and narrow down possibilities.
Let's pretend we have a database of the entire NES library which contains hashes for the first
n
bytes of each PRG ROM:9f115a9e
a3b3e36e
53b88a7a
a3b3e36e
b0c8c11e
a3b3e36e
2864f7cc
4f3ad289
a3b3e36e
2864f7cc
d2dce641
dd611655
f5525447
707c2b2f
4bf93c55
3b2dc183
12c70afd
4bf93c55
3b2dc183
8c7869e6
Now we are dumping a cart. We know the minimum size is 16k so it is safe to read and hash the first 16k. Doing so produces
a3b3e36e
. This matches Zelda, Contra, Kirby, and Tetris. Of the 4, the smallest size is 128k, so we continue reading and hashing to 128k. Now we produce2864f7cc
which matches Kirby and Tetris. Both games are 512k, so we continue reading and hashing to 512k and produced2dce641
which matches Tetris.Theoretically this would resolve all ambiguity except for (very rare) cases in which both the PRG ROM and CHR ROM begin with the entirety of another PRG ROM and CHR ROM.
The database could be minimized drastically to only hashes required to resolve conflicts. For example, there is no reason to store the 128k and 512k hashes for Gradius because its 16k hash is unique. Same with the 128k hashes for Frogger and PacMan as they provide no disambiguation. In fact, rather than storing a flat lookup table, you could store an index-tree-like structure instead that contained disambiguation instructions:
I tested this theory on a headerless no-intro ROM set using NES2.0 DB. It was able to index and distinguish all but 21 of the 3,560 ROMs. The Virtual Console and cassette dumps can be excluded which brings the number down to 9, only 3 of which are "standard" games:
All of these are instances in which the original/parent ROM is included in its entirety at the start of the child ROM.
I attached the generated index in JSON. Right now it's just a map of partial CRC32 to full CRC32, but it could instead map to game name, PRG ROM size, mapper, etc.
nesIndex2.json.txt
Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions