Tuesday, November 26, 2013

The Card Format

Introduction

Well - here it is: this one...

This one's a tad rough - many elements, multiple files, but very extensive and probably the most valuable of all the file formats in this game to crack. A little bit of encryption, a bit of compression, some character encoding; they have it all.

This is going to be a multi-part more than likely as there are some things to clear up still, so let's get to it!!!

Raw File

Standard LH5 compressed file; roughly 80k. 0x08 with a 3 byte size, big endian; you know the drill by now.
Size of this one is the normal big endian two bytes but then the third byte gets a << 0xF and OR'd to the end, meaning:
0x8868 OR ((0x05 << 0xF ) == 0x028000) == 0x28868 or 165992 bytes.

For Dreamcast: 

 All compressed cards are a single format (0x08 header with a 3 byte size)

[see above]

For Playstation 2: 

Most are like Dreamcast, but some cards (like hidden ones such as Carbunfly) start with 0x09... more on that later.



An extracted card header looks like this:


An extracted 0x09 compressed card looks like this:


Whoa - well... something's wrong here. It could be that the developers didn't want these cards showing up somewhere, it could be they were privy to tools to dump the cards - hard to say.

At first, the card looks corrupt, we can see the start of the word 'Ca' in one part and it looks like it still retains some form...

Making the assumption that this has to be decrypted in memory and simple to be done rather quickly, I started looking online for values for Carbunfly because it appears that the shot values are still in place (you can see the header size looks like its a 16 bit value still.)


The Hypothesis


Referring back to the 'corrupted' data, lets say that our ST value (40) which is 0x28 is SUPPOSED to actually BE 0x28. It's currently 0xFFDC, what's 0x28-FFDC? WELL - it's 0x4C!!!

We've run into some kind of chain cipher algorithm! Because we're 'scientists', let's actually confirm this; the next value should be 0x28 as well. What's 0x28 - 0x00? It's 0x28!!!

One more! The next one is 100 for G (0x64) what's 0x64 - 0x3C? 0x28!!!!! YESSIR!!!



OK! Excitement aside, let's write a decryptor!

The Decrypt Algorithm


For every two bytes after the first:
- Add that value to the 16 bits behind it.
- Done!

In Python?


On to everything else!

Card File Structure


As a breakdown, the card file structure is as follows:

[Header]
[Strings]
[Sprite file - if creature]
[Animation Data - if creature]
[Card Graphic]


Header


This area has a number of card-related metadata such as:
  • Size of the pre-text header in bytes
  • ST (which equates to the card's attack rating)
  • HP (well...hp)
  • MHP (Max HP)
  • G (Cost to use card)
  • Type (Neutral, Earth, Air, Fire, Water, Spell Weapon, Armor, etc.)
  • Land restriction (can't use with types above)
  • Artist ID who drew the card (somewhere)
  • Item restriction (can't use with armor, spell, scroll, etc.)
  • Other values (Will come back to these - most are like, extra cost to use, etc.)
  • Card ID in the set
  • Offset to Card Graphic
  • Offset to Sprite
  • Offset to Animation Data



Strings


  • Title of card (always at offset 0x34)
  • Description Page 1 (Normally with a repeated title header)
  • Description Page 2
  • Description Page 3



Only a couple of notes here:

The devs used special non-printable hex digits to mean in-game icons like elements and weapons to be displayed instead of text. This makes Python throw a shitfit; I replace them with special characters to denote at a later time.


The Dreamcast strings are all Shift-JIS... Python's default JSON module doesn't like them - partially because of that, and partially out of laziness, my JSON that I'll talk about later on isn't pretty-printed.




Sprite File


This one is a basic LH5 compressed file, the two consoles differ greatly, however.



The Dreamcast uses an 8 bit texture that has been twiddled (all the pixels have been
resorted by row so they load faster into the GPU). They also have no color data embedded,
an external palette is used and yet to be found (more on this in part 2). Basically, they're a real mess to extract (lots of math to flip stuff around, etc.)



The Playstation 2 version is 8 bit as well, but is indexed and uses a CLUT (also note that the PS2 re-adjusts the width of the sprite):



Basically, a Color Lookup Table is a palette of all the colors in an image. Instead
of storing color values, each pixel need only store a 1 byte index of the color it
requires at that spot (meaning you can put up to 256 8 bit colors, or 128 16 bit colors,
or 64 32 bit colors (meaning ARGB).

(lifted from wikipedia)



To reconstruct this image is fairly easy then, we:
  • Read all the 16 bit colors in the palette, convert them to 32 bit RGB values
  • Go through each pixel, find which number out of 256 it points to
  • Draw a pixel at [x,y] on a new image with that color.

What if you don't have enough colors to fill the palette? Well, it just repeats the
colors you do have until the palette memory is 1024 bytes in size (LOL).





Animation Format

This one is interesting - I haven't fully figured out how this one works yet. It's like a number of values that specify each frame's upper-right coord, the width and height, and some projection value. The first two bytes are definitely the size of the data, however.


Card Graphic


This, unlike the sprite, isn't compressed. It's actually 256x320 ARGB1555 in the Playstation2 version (as basically everything is) and RGB565 with BGR channel swap in the Dreamcast version; so swap those channels or you'll get blue when you want red!



The end result is something like:


Oprah Moment

For fun, I wrote a WIP tool that dumps all the card files, decrypts the PS2 encrypted cards, and writes the results like so:
  • Metadata -> JSON file
  • Sprite -> PNG
  • Animation -> Bin file
  • Card Graphic -> PNG


I've also zipped them below for those interested.

Cards

Stay tuned for Part2.

Wednesday, November 20, 2013

Hacking the Map Format - Part Deux

The previous method was a little too hacky for me - let's do this the right way:

The map file basically has two values at the front (map rows and map columns):


Why didn't I notice this earlier? Well, I was brute force decompressing the images, originally -
the second multi file archive format is a little different than what I originally thought;
more on that in a later post. For now, let's attack this map format.

So this is how I normally plan an attack:

1.What do I know?
- I know the width and height of the map in blocks
- Knowing the above, I can figure out the number of files (WIDTH*HEIGHT)
- I know that each block is 64x64
- I know that the end result is a big blitted map

2.What do I have?
- I have parts of the map.
- I have information about the map in its final form.

3.What do I want?
- I want a compiled map.

4.How can I get what I want with as little manual effort as possible?
- I could stream every step:
*For each file
>Carve out
>Unpack
>Convert to ARGB8888 from ARGB1555 (16->32bit)
>Stick it on a blank image

So that's our workflow... in python:


As promised, in no particular order, here's a zip of all the maps dumped (including the unreleased ones):

Culdcept 2 Maps

Next up - The Card Format

Tuesday, November 19, 2013

Part4: The Map Format

Ok - so a ton of things have happened and I have a back-up of about 5 or so posts to make that haven't been written.


We left off with finding that each map is stored in 64x64 chunks. I'm sure that in this crazy meta-format they have, somewhere there's a number that's gonna tell me how many chunks wide this map is (probably even statically in the binary as people love doing that kind of thing), buuuuut I think we'll just make a map hacker - gogogo!


So going on a MASSIVE assumption that they made the maps even (ie the end result is a perfect rectangle), we can do this pretty quickly - let's look at the workflow:

1. We parse every file in the map chunk directory (order is going to matter).
2. For each file, we:
        - Read in the pixel data.
        - Paste it onto a WIP framebuffer in memory.
        - Do that first graphics assignment in undergrad paradigm of keeping track when we need to jump                   down a column.

Sound easy? Good! Python will help a lot here:



Basically, this program takes in two arguments - what I "think" the width of the map is in tiles, and a number so I can quickly name the output something unique.

What next? Well, here we have attempt #1 - width of 20; doesn't look quite right:


We'll try a little more (23) - this one, we can tell it's getting close due to the board almost lining up (looking toward the top for alignment makes sense as things get rather chaotic toward the bottom due to being mis-aligned for the entire image:


Seems like 24 is the magic number for this one! 



For the sake of scale - this is about the zoom level in-game:



As a bonus - it's interesting to note that the Playstation 2 version of this game contained extra maps not seen prior (they're marked with notes and look like test maps that were never active in the game:











At some point, I'll up the full resolution maps if anyone wants to take a look at them - they're actually pretty well done.

Next time, we'll talk about the CARD files - this one's gonna be fun, so stick around. :)



Thursday, November 14, 2013

Huffman Encoding - Pt 3

Quite a bit of progress:

After looking at the ASM again,  there appear to be multiple formats.

The first format is the type that starts with the offset,size values,
but the 'gotcha' is that there isn't any indication as to how many files
there are (most likely in the executable) besides running until one hits
the first offset of the first file.



The second format is our compression data - Prefixed with 0x08. There are actually
TWO execution paths here ; if the prefix is 0x0C or if it's 0x08:


0x0C appears to be our standard 64K sliding window (LH7) whereas
anything else (0x08 included) appears to be an 8k sliding window (LH5).

The next two bytes are the uncompressed size, but our compiler does something
odd; it takes the first byte, lshifts 8, then ORs the second byte onto it
like this:

0x08 0x20 0x40

flag = 0x08 (LH5)

uncompressed_size = 0x20 << 0x8 (0x2000)
uncompressed_size |= 0x40 (0x2000 | 0x40 = 0x2040)

Basically, it's reading the 16-bit size as big endian on a small endian system.




Our third and final type (so far) is rather odd - basically, it's a collection of
composited files (not like a directory) split up into compression chunks.
These files are generally given away by not starting with 0x08+size but
rather a strange int that varies (probably a checksum) and continues with
0x00 + 0x08 + UncSize



Generally, these files are in 8192 chunks; 8k sliding window, remember?
Extracting individual pieces gets you something like this:





Dear god - all the backgrounds are in 64x64 parts ><



Wednesday, November 13, 2013

Huffman Encoding - Part 2 (The Legend Continues)

Ok, so we have some uncompressed chunks, what now? Comparing the DS version (which uses the same format), we can see that the plaintext is in some kind of file structure. The figure on the left is actually a memory dump from Demul at runtime (basically dumping the Dreamcast RAM while the game is running).


A better way to view the memory, however, is with savestates. Demul saves the state of the virtual 'system' in a binary file like the one below - as we can see, it has an int value at the top which is followed by 0x78 0xXX which if it werent for the fact that the source code is available and uses zlib exclusively for compression, this would be a clear indication.


Throwing the zlib data into a stream decompressor and we have the 16MB dump of the Dreamcast RAM at the time the state was taken. The RAM itself has many interesting bits - especially the one I marked below in blue which is rather familiar...


Aha! It's from our DAT file, however, not from the top due to the fact that this group of entries has to do with a compressed sub-file in our compressed file (a 'yo-dawg' situation, indeed).




Another interesting bit is what appears to be our Model Format! It uses a header of "MODL" for both DS and DC versions - more than likely pointing to this being some kind of proprietary format.


We've also found this - not sure, yet...

Some of these files have to be the card graphics. Chances are, they're compressed as a straight up texture binary (not even in a pvr format or anything - ready to get thrown at the GPU).




Breaking out Gimp with a binary file renamed with a .data extension is quite handy to find graphics in unknown files.

A little messy... let's see if we can mess around with the resolution (keeping in mind it's more than likely in powers of 2) to get a picture.



Ahhh! There we go! Color datas a little messed up - will have to fix that.


This ones color is a bit more true to the game:



More as it develops!