Why do this?

I recently watched a presentation on PSP Homebrew Developer Conference by Precise Museum. In the presentation, they talked about their new translation patch for PSP game Puyo Puyo!! 20th anniversary, and a problem they came across while making this patch: If a non full-width character (ie. a character narrower than the font grid) appears in any row other than the first row in the font file, it will not display properly. They worked around this strange bug by shrinking the size of the font and re-arranging the characters.

This raised my interest, because 5 years ago, when I started tinkering around the Wii version of this game in order to make a Chinese translation patch, I came across this bug too. It was not so much of a problem for me, as Chinese characters are all full-width, and I can easily fit a few other characters in the first row. (The Chinese patch was later stalled, because I couldn’t find translators in Chinese Puyo fandom.)

I happened to have Ghidra and some spare time at hand, so I decided to dig into this bug, to find out its cause and potentially a fix. I don’t have much experience in reverse engineering, and I only have did anything to two Unity games (which are very easy to analyze, as they are not obfuscated and their symbols are there in plain sight). Let’s see if I can learn anything this time.

Background: Text formats used by the game

Before we begin, I need to introduce the file formats used by this game to display texts. There are already tools made to edit them: Puyo tools and Puyo text editor. By reading their source code, I can learn about the formats. Let me borrow a image from the presentation:

fnt file structure

The image shows the FNT file, which the game uses to store its font. The file header defines the grid size and character count, and a table follows, describing each character’s UTF-16 encoding and width. After the list is a texture file (GIM on PSP, or GVR on Wii) storing the graphics for each character. Note that the placements of the characters are aligned in the grid defined by the file header, and are not affected by the width written in the table.

There is an MTX file for each FNT file. MTX files store the string data used by the game, and each character in the string are encoded by a 2-byte index in the FNT table. For example, is encoded as 0000 in the font file shown in the image.

The real work

Preparation

In order to be able to parse the dol executable files for Wii, I need to install an extension for Ghidra. Then I can extract the main.dol executable from the game files with dolphin emulator and load it into Ghidra.

Locating relevant code

Apparently, there are no useful symbol tables or debug info in the executable file. Locating the logics for processing FNT files is like finding a needle in a haystack. Fortunately, by searching FNT in the program, I found that there actually is a pair of FNT and MTX files “embedded” in the data sections. These files seem to store some prompts used during the game’s startup, like the message of creating a savefile and some error messages.

embedded fnt file

If I follow the XREF here, I see this function (I renamed it to keep track):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
undefined4 * load_embed_text(undefined4 *param_1)
{
uint size;
uint size_00;
undefined *puVar1;

*param_1 = 0;
param_1[4] = 0;
param_1[5] = 0;
param_1[6] = 0;
param_1[7] = 0;
param_1[8] = 0;
param_1[9] = 3;
param_1[10] = 0;
param_1[0xb] = 0;
param_1[0xc] = 0;
param_1[0xd] = 0xffffffff;
*(undefined2 *)(param_1 + 0xe) = 0;
puVar1 = malloc_simple(0x8c);
if (puVar1 != (undefined *)0x0) {
FUN_800305e4(puVar1,0,1);
}
size_00 = x402c8;
*param_1 = puVar1;
puVar1 = malloc(size_00,0x20,0x20,0);
size = DAT_803606ec;
param_1[4] = puVar1;
puVar1 = malloc(size,0x20,0x20,0);
param_1[5] = puVar1;
memmove((undefined8 *)param_1[4],(undefined8 *)ptr2fnt,size_00);
memmove((undefined8 *)param_1[5],(undefined8 *)ptr2mtx,size);
puVar1 = malloc_simple(0x34);
if (puVar1 != (undefined *)0x0) {
FUN_800315d8(puVar1,(undefined *)*param_1,(undefined *)param_1[4],size_00,1);
}
FUN_800308e4(*param_1,puVar1,0);
puVar1 = malloc_simple(0x28);
if (puVar1 != (undefined *)0x0) {
FUN_80031abc(puVar1,*param_1,param_1[5],size);
}
FUN_80030a54(*param_1,puVar1);
FUN_80030a5c(*param_1,0);
memset_((int)(param_1 + 1),0,0xc);
puVar1 = malloc_simple(0x50);
if (puVar1 != (undefined *)0x0) {
FUN_8001d3ac(puVar1,*param_1,DAT_802a2810);
}
param_1[1] = puVar1;
*(undefined4 *)(puVar1 + 0x18) = 0;
*(undefined4 *)(puVar1 + 0x1c) = 0;
*(undefined *)(param_1[1] + 0x11) = 3;
*(byte *)(param_1[1] + 0x10) = *(byte *)(param_1[1] + 0x10) | 1;
FUN_8002f8cc(param_1,param_1[7]);
return param_1;
}

Note: the malloc memmove memset function names are guessed by inspecting the function bodies and their usages. It surprised me that LLMs can do a great job in guessing a decompiled function’s purpose, even if the functions are heavily optimized.

The program copies the contents of the FNT file to newly allocated memory, and then calls FUN_800315d8:

1
2
3
4
5
6
7
8
9
10
11
12
int FUN_800315d8(undefined *param_1,undefined *param_2,undefined *param_3,uint size,int param_5)
{
list_push_front((ListNode *)param_1,param_2);
*(undefined **)(param_1 + 8) = &DAT_801a2944;
*(undefined4 *)(param_1 + 0x20) = 0;
*(undefined4 *)(param_1 + 0x24) = 0;
*(undefined4 *)(param_1 + 0x28) = 0;
*(undefined **)(param_1 + 0x2c) = param_2;
param_1[0x30] = 0;
parse_fnt((uint *)param_1,(uint *)param_3,size,param_5);
return (int)param_1;
}

The parse_fnt called here actually parses the FNT file. It’s very long and I show part of it here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
void parse_fnt(uint *param_1,uint *fnt_base,int param_3,int param_4)
{
...
param_1[4] = (uint)fnt_base;
if (fnt_base == (uint *)0x0) {
param_1[5] = 0;
param_1[6] = 0;
param_1[7] = 0;
return;
}
uVar11 = fnt_base[1];
fnt_base[1] = uVar11 << 0x18 | (uVar11 & 0xff00) << 8 | uVar11 >> 0x18 | uVar11 >> 8 & 0xff00;
uVar11 = *(uint *)(param_1[4] + 8);
*(uint *)(param_1[4] + 8) =
uVar11 << 0x18 | (uVar11 & 0xff00) << 8 | uVar11 >> 0x18 | uVar11 >> 8 & 0xff00;
uVar11 = *(uint *)(param_1[4] + 0xc);
*(uint *)(param_1[4] + 0xc) =
uVar11 << 0x18 | (uVar11 & 0xff00) << 8 | uVar11 >> 0x18 | uVar11 >> 8 & 0xff00;
param_1[5] = 0;
uVar11 = param_1[4] + 0x10;
param_1[6] = uVar11;
param_1[7] = 4;
param_1[8] = uVar11 + *(int *)(param_1[4] + 0xc) * 4;
for (uVar11 = 0; uVar11 < *(uint *)(param_1[4] + 0xc); uVar11 = uVar11 + 1) {
puVar10 = (ushort *)(param_1[6] + param_1[7] * uVar11);
uVar2 = *(ushort *)(param_1[6] + param_1[7] * uVar11);
*puVar10 = uVar2 >> 8 | uVar2 << 8;
puVar10[1] = puVar10[1] >> 8 | puVar10[1] << 8;
}
...

Note that this just converts each field in the FNT file from little endian to big endian (one that PowerPC uses). Also, this function is called by 2 different functions, indicating other FNT’s (the non-embedded ones) are probably also parsed by this function.

Considering that the embedded text only shows up once when creating a new savefile, it can be inconvinient for me to debug it, instead I want to debug with the text files used in the main menu. So I fired up Dolphin emulator, set a breakpoint on parse_fnt and let the game run. The breakpoint was triggered upon opening the main menu.

Now I can get the memory address of the loaded FNT file by reading the register r4 (where the 2nd param is).

fnt loaded in memory

There seems to be no logic about the cropping of the textures in parse_fnt, so I set a memory breakpoint on the entire character table and see when the game reads them.

A few frames later, the memory breakpoint is triggered, and this function is called each time a character appears on a screen:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
void FUN_80030318(int *param_1,undefined2 param_2,uint charindex,undefined4 param_4,
undefined4 param_5,undefined4 param_6,undefined4 param_7)
{
uint uVar1;
int iVar2;
int *piVar3;
undefined2 *puVar4;
uint charcnt_perline;
int *iVar7;
undefined4 local_38;
undefined4 local_34;

iVar7 = (int *)((int *)param_1[2])[4];
iVar2 = FUN_80031784((int *)param_1[2],charindex);
piVar3 = (int *)FUN_8001b238(param_5);
(**(code **)(*piVar3 + 0x10))
(piVar3,4,*(undefined4 *)(param_1[2] + 0x28),3,iVar7[2] & 0xffff,iVar7[1] & 0xffff);
uVar1 = countLeadingZeros((uint)*(byte *)((int)param_1 + 0x12));
(**(code **)(*piVar3 + 0x1c))(piVar3,param_4,uVar1 >> 5);
piVar3 = (int *)FUN_80024ba8(param_1[1],piVar3);
local_38 = param_6;
local_34 = param_7;
(**(code **)(piVar3[2] + 0x18))(piVar3,&local_38);
*(undefined *)((int)piVar3 + 0x11) = *(undefined *)(param_1 + 4);
charcnt_perline = 0x200 / (*(ushort *)(iVar2 + 2) + 1); //!!!!!
FUN_80028c68(piVar3,(iVar7[2] + 1) * (charindex - (charindex / charcnt_perline) * charcnt_perline)
* 0x1000,(iVar7[1] + 2) * (charindex / charcnt_perline) * 0x1000);
puVar4 = (undefined2 *)malloc_simple(0xc);
puVar4[1] = *(undefined2 *)(iVar2 + 2);
*puVar4 = param_2;
*(int *)(puVar4 + 4) = param_1[3];
*(int **)(puVar4 + 2) = piVar3;
param_1[3] = (int)puVar4;
return;
}

By inspecting register values and the memory, I learn that iVar7 points to the FNT file’s contents. I learn that the 3rd parameter of this function is the index of the upcoming character by recording its value on each invocation, so I renamed it to charindex. iVar7[1] and iVar7[2] are the grids’ height and width respectively.

The suspicious line charcnt_perline = 0x200 / (*(ushort *)(iVar2 + 2) + 1) near the end of the function triggers the memory breakpoint. It reads a character’s width to calculate the number of characters in each row of the grid. The next line seems to calculate where in the texture to crop out the character to render. (iVar7[2] + 1) * (charindex - (charindex / charcnt_perline) * charcnt_perline) is the horizontal position and (iVar7[1] + 2) * (charindex / charcnt_perline) * 0x1000) is the vertical position.

Now I know exactly where this bug comes from: due to a developer’s oversight, when calculating the number of characters in each row, the character width is used instead of the grid width. Thus the grid position calculated from this result is wrong. The characters in the first row are unaffected because their row number is always zero.

The Fix

Now that I know the root of this bug, I can easily fix it by changing charcnt_perline = 0x200 / (*(ushort *)(iVar2 + 2) + 1) to charcnt_perline = 0x200 / (iVar7[2] + 1). Specifically, by changing the instruction at 0x800303f8 to 80 9a 00 08. This can be easily done manually in a hex editor.

I made a FNT file (picture below) and some text, patched them into the game and see if it works.

fnt file used

result comparison

You can see that the English letters in the second row can now be properly displayed. (Some letters have their lower parts cut off because my script to build the FNT file didn’t process the shapes correctly.)

Conclusion

Although I now have a fix to this bug on the Wii version of the game, it seems not to have much value. The PSP English patch already has a workaround, and the Chinese patch (if I were to make it) doesn’t even have to modify the grid sizes. Not to mention that this bug on PSP version still doesn’t have a fix, and I can see that fixing this on the PSP version can be much harder: the PSP executable doesn’t have embedded text, which makes it harder to figure out where the problematic function is.

Anyway, I hope that this information can be useful to somebody. Maybe I can try to find a similar function in the PSP version and fix this bug. Also, I had some fun diving into this game, which I guess is the most valuable thing of this experience.

Comments

⬆︎TOP