TOMB5 - Feb'19 status update

The project has come a long way since June of 2017.

Since the beginning of the project, the codebase has been divided into two separate branches: PC and PSX, which share a common "GAME" folder which contains the platform-independent game logic code.

More progress has been made on the PSX version since a lot of code is plain old MIPS assembly which we haven't had to decompile to C, so the PC version is still lagging behind. Problems with the old DirectX version used by the game (DX5) makes all of this much harder.

Also, debugging the PSX version has always been a pain in the ass because of the need to run it in an emulator (we've used no$psx most of the time).

To solve that problem, we've started working on what we call the Emulator, which is simply put an implementation of the PSY-Q PSX SDK acting as an HLE emulator for the game. That way, we can debug the game directly in VS which is quite appreciable. We simply need to link the game binary against our emulator DLL instead of the standard PSY-Q libs. The emulator is based on SDL for windowing and OpenGL for 3D rendering.

Behold, pics.

Title screen
Secret cutscene menu used for debugging purposes, only reachable through RAM editing
Internal beta title screen
Game load screen
Level loading screen

You can check the code out on GitHub.

Paella, or how to emulate PSX games in Windows CE userland

Lately, I've been decompiling Tomb Raider 5 with some friends and while researching potential sources of debug informations that could help the process, I stumbled upon the Pocket PC version of Tomb Raider 1. It was ported by Ideaworks3D, a London-based game development company specialized in porting.

It's supposed to run on low-performance handheld devices running Windows Mobile/CE 5.0 and thus one would imagine that they have simply taken the Windows code and tweaked it a little bit to make it run on CE. Well as I discovered, it's more complicated than that. First, there is no Windows version of TR1, it was only released for DOS and was never ported to either Win16 or Win32. Second, they actually didn't take the PC version as a base, but the PSX version.

It may seem weird, why take the PSX version if your product is going to run on Windows CE. As it appears, Ideaworks3D seems to have developed an in-house userland syscall JIT translator for PSX, and uses it when porting games to CE. In other words, they compile the PSX codebase to ARM code and link that binary against a DLL file called iepaella.dll which contains implementations of PSX syscalls that call the WinCE API. Apparently, they have also made a version of that DLL which runs on standard Win32 that they used to make an ActiveX port of their Pocket PC port of the PSX version. It allowed playing TR1 in Internet Explorer. Not sure why anyone would ever do that, though.

I find this quite interesting because the main game executable seems to have a code near-identical to the original PSX version, which means that "IEPaella" is effectively a full-featured userland PSX emulator for WinCE and Win32 that is capable of mapping the PSX system routines to DirectX API calls. I haven't been able to find any other similar product on the internet. The only software that could be considered similar is Usercorn, a userland emulator based on Unicorn which implements most of the Linux, BSD and Darwin syscalls and even some DOS interrupts. It's very basic though, nowhere near what Paella does.

Currently, the codebase of the decompilation project is divided in 3 main folders:

  • GAME contains the shared game code
  • SPEC_PSX contains the PSX platform code
  • SPEC_PC contains the PC platform code

Debugging on PSX is much harder than on PC, because the binary is run in an emulator and you can't just run the code step-by-step to see where it crashes. Using Paella would allow doing such a thing because the binary is effectively being run on the computer and works like any other C++ program. We're currently search for ways to implement Paella support in TOMB5, but it may take time because not all PSX syscalls are implemented. It will eventually work, though.

PyQt, or how to break Unicode in 2018

For my latest project (Turing), I chose to use PyQt 5 for the GUI because it's what seemed the best to me to allow the whole thing to be cross-platform without much of a hassle. It has done its job quite well for about everything.

After some months of development though, I ran into an issue I was unable to fix : there is a bug, somewhere inbetween pylupdate5 (that scans code files for lines to translate) and lrelease (that compiles .ts files to .qm files) which basically prevents using non-ASCII characters in source strings. You can use them in translated strings, but not in the code files.

Quite strange, since both the code files and the .ts files (which are in XML format) are encoded in UTF-8. Or so I thought.

It seems that lrelease assumes that everything is ASCII (well, to be precise, Latin-1) if you don't specify it at each <message> element in the file, even though the very first line of the file (the XML header) specifies the encoding, in this case utf-8. pylupdate5 has no problem with that and assumes UTF-8 by default.

The workaround is to add an attribute to each <message> element in the .ts file to force lrelease to understand that it's UTF-8. Basically, replacing <message> by <message encoding="UTF-8"> everywhere. The problem is that when pylupdate5 re-saves the file after adding the new lines from the code, it discards the attributes (maybe it assumes that they aren't needed, which is a correct assumption given that it's a freaking XML file). Thus, I needed to write a script that is executed between the two calls to make sure that lrelease always gets fed a file with the attributes, so that it always parses it correctly.

for ts in glob.iglob('../**/*.ts', recursive=True):
    with open(ts, "r", encoding="utf8") as f:
        orig = f.read()
        
    orig = orig.replace('<message>', '<message encoding="UTF-8">')
    
    with open(ts, "w", encoding="utf8") as f:
        f.write(orig)

I don't know if it's a bug on the PyQt side or on the Qt side, but it stills seems quite weird to me that we still encounter this kind of problems in 2018, when every sane piece of software uses UTF-8 by default.

This article by Joel Spolsky is from 2003, and it seems that back then the basic knowledge of how to correctly use encodings had already been invented, so why are so many programs still unable to use text correctly?