Reverse engineering software licensing from early-2000s abandonware – Part 3
In part 2, we reverse engineered the decrypted format of the licence file data for this particular software. In this part, we investigate that how exactly that licence file is encrypted.
Into the fray
In part 2, we identified that the decrypted licence file data appears to originate from a call to FUN_004e8028. The relevant disassembly of this function is:
undefined FUN_004e8028()
undefined AL:1 <RETURN>
FUN_004e8028
; ...
004e803b b1 01 MOV CL,0x1
004e803d 8b d6 MOV EDX,ESI
004e803f 8b 18 MOV EBX,dword ptr [EAX]
004e8041 ff 53 14 CALL dword ptr [EBX + 0x14]
; ...
Again, this is a dynamic call, so we use winedbg/GDB to investigate:
$ winedbg --gdb foobar.exe
Wine-gdb> b *0x4e8041
Breakpoint 1 at 0x4e8041
Wine-gdb> c
Continuing.
Breakpoint 1, 0x004e8041 in ?? ()
Wine-gdb> info reg
eax 0x1012800 16852992
ecx 0x32fe01 3341825
edx 0x10123f4 16851956
ebx 0x4e1300 5116672
[...]
The call is derived from an address at ebx, which is 0x4e1300. 0x4e1300 points to a location within memory identified by Ghidra as VMT_4E12B4_TCipher_Blowfish:
Googling for TCipher_Blowfish leads us to suspect that this is probably from the Delphi Encryption Compendium (DEC) library. Since this software binary is from 2004, the likely version of DEC used was DEC 3.0, released in 1999. Having access to the source code of DEC 3.0 will be of great value later in this process.
Blowfish is a symmetric-key block cipher designed in 1993. Its blocks are 64 bits (8 bytes). Although not considered insecure, it has been superseded in modern use by the Advanced Encryption Standard (AES), which was established in 2001 – too late for AES to be included in DEC 3.0.
Continuing to follow where the call leads:
Wine-gdb> si
0x004d12ac in ?? ()
The called function at 0x4d12ac is a large function which calls into many other functions, so let us turn to another approach to narrow the search.
We know, from part 2, that the decrypted licence data is written to 0x1013ff4, so we set a watchpoint in winedbg/GDB and wait for the decrypted data to be written:
Wine-gdb> watch *0x1013ff4
Hardware watchpoint 1: *0x1013ff4
Wine-gdb> commands 1
>x/8bx 0x1013ff4
>end
Wine-gdb> c
Continuing.
[...]
Hardware watchpoint 1: *0x1013ff4
Old value = [...]
New value = [...]
0x00402a3f in ?? ()
0x1013ff4: 0x86 0x0d 0x96 0x9f 0xb8 0xa3 0xec 0x46
Wine-gdb> bt
#0 0x00402a3f in ?? ()
#1 0x004e8044 in ?? ()
#2 0x004e85e9 in ?? ()
#3 0x00404623 in ?? ()
#4 0x7b62dd10 in ?? ()
#5 0x7bc54377 in ?? ()
#6 0x7bc54a30 in ?? ()
#7 0x00000000 in ?? ()
0x402a3f is located within System.Move, so we need to climb up the call stack to identify where this is being called from. The backtrace generated by winedbg/GDB here is incorrect, so we manually set a breakpoint for the return instruction in System.Move and see where execution returns to:
Wine-gdb> b *0x402a4f
Breakpoint 2 at 0x402a4f
Wine-gdb> c
Continuing.
Breakpoint 2, 0x00402a4f in ?? ()
Wine-gdb> si
0x004d138a in ?? ()
The call to Move, then, is the code immediately preceding 0x4d138a. The Ghidra disassembly for this is:
004d137c 8b d0 MOV EDX,EAX
004d137e 8b 45 f8 MOV EAX,dword ptr [EBP + local_c]
004d1381 8b 40 04 MOV EAX,dword ptr [EAX + 0x4]
004d1384 59 POP ECX
004d1385 e8 86 16 CALL Move
f3 ff
From the documentation, we know that Move will copy the data at address eax (first parameter) to address edx (second parameter). We can therefore set a breakpoint at the function call to determine where the decrypted data is being copied from:
$ winedbg --gdb foobar.exe
Wine-gdb> b *0x4d1385
Breakpoint 1 at 0x4d1385
Wine-gdb> c
Continuing.
Breakpoint 1, 0x004d1385 in ?? ()
Wine-gdb> info reg
eax 0x925a08 9591304
ecx 0x3c 60
edx 0x1013ff4 16859124
ebx 0x32fe01 3341825
[...]
Wine-gdb> x/32bx $eax
0x925a08: 0x86 0x0d 0x96 0x9f 0xb8 0xa3 0xec 0x46
0x925a10: 0x13 0x1f 0x7b 0x8a 0x4a 0x96 0x31 0xbd
0x925a18: 0x24 0x43 0x30 0x2c 0x72 0xc2 0x6c 0x0e
0x925a20: 0xd3 0x58 0xc3 0xca 0xed 0xf6 0xb9 0x13
We see, then, that the decrypted data is actually being copied from a buffer at 0x925a08. Therefore, to try to identify where the actual decryption is being performed, we repeat the process, set a watchpoint for 0x925a08 and wait for the decrypted data to be written:
$ winedbg --gdb foobar.exe
Wine-gdb> watch *0x925a08
Hardware watchpoint 1: *0x925a08
Wine-gdb> commands 1
>x/8bx 0x925a08
>end
Wine-gdb> c
Continuing.
[...]
Hardware watchpoint 1: *0x925a08
Old value = [...]
New value = [...]
0x00402a23 in ?? ()
0x925a08: 0x86 0x0d 0x96 0x9f 0xb8 0xa3 0xec 0x46
Wine-gdb>
This is again within the Move function, and the backtrace generated by winedbg/GDB is again incorrect, but after manually traversing the call stack a few levels, we reach FUN_004d0f4c, which is one of the functions called by FUN_004d12ac.
We see that this function is referred to in the virtual method tables (VMTs) of a number of classes, including TProtection, TCipher (and subclasses like TCipher_Blowfish), THash (and subclasses like THash_SHA1) and TMAC. We therefore surmise that this must be a high-level function defined in a mutual superclass of these. Examining the DEC 3.0 source code, we identify the likely culprit, TProtection.
The FUN_004d0f4c function has a distinctive structure when viewed in Ghidra's decompiler view, outlined below:
void FUN_004d0f4c(code **param_1,code **param_2,code **param_3,byte param_4,int param_5) {
// ...
if (param_2 == (code **)0x0) {
return;
}
local_10 = param_3;
if (param_3 == (code **)0x0) {
local_10 = param_2;
}
// ...
if (param_5 < 0) {
// ...
}
// ...
if (param_4 == 3) {
while (0 < iVar1) {
// ...
}
}
else {
while (0 < iVar1) {
// ...
}
}
// ...
}
Scrolling through the DEC source code for TProtection in DECUtil.pas, we can match this structure with the TProtection.CodeStream procedure:
procedure TProtection.CodeStream(Source, Dest: TStream; DataSize: Integer; Action: TPAction);
// ...
begin
if Source = nil then Exit;
if Dest = nil then Dest := Source;
if DataSize < 0 then
begin
// ...
end;
// ...
try
// ...
if Action = paCalc then
begin
while DataSize > 0 do
begin
// ...
end;
end else
begin
while DataSize > 0 do
begin
// ...
CodeBuf(Buf^, Len, Action);
// ...
end;
end;
// ...
end;
end;
By examining the Delphi source code, we observe that TProtection.CodeBuf is the relevant function which proceeds with the decryption, which calls TCipher.CodeBuf and TCipher.DecodeBuffer. By following the equivalent disassembly, we identify FUN_004e28b8 as TCipher.DecodeBuffer.
Nonstandard crypto from the '90s
The main body of FUN_004e28b8 is a large switch statement with multiple cases. By stepping through the function in winedbg/GDB, we identify that the software lands in the equivalent of the following case:
type
// ...
TCipherMode = (cmCTS, cmCBC, cmCFB, cmOFB, cmECB, cmCTSMAC, cmCBCMAC, cmCFBMAC);
{ the Cipher Modes:
cmCTS Cipher Text Stealing, a Variant from cmCBC, but relaxes
the restriction that the DataSize must be a mulitply from BufSize,
this is the Defaultmode, fast and Bytewise
[...]
}
// ...
procedure TCipher.DecodeBuffer(const Source; var Dest; DataSize: Integer);
var
S,D,F,B: PByte;
begin
// ...
S := @Source;
D := @Dest;
case FMode of
// ...
cmCTS:
begin
if S <> D then Move(S^, D^, DataSize);
F := FFeedback;
B := FBuffer;
while DataSize >= FBufSize do
begin
XORBuffers(D, F, FBufSize, B);
Decode(D);
XORBuffers(D, F, FBufSize, D);
S := B;
B := F;
F := S;
Inc(D, FBufSize);
Dec(DataSize, FBufSize);
end;
if F <> FFeedback then Move(F^, FFeedback^, FBufSize);
if DataSize > 0 then
begin
Move(FFeedback^, FBuffer^, FBufSize);
Encode(FBuffer);
XORBuffers(FBuffer, D, DataSize, D);
XORBuffers(FBuffer, FFeedback, FBufSize, FFeedback);
end;
end;
// ...
end;
end;
This algorithm is a nonstandard modification of the ciphertext stealing block cipher mode of operation.1
Note that in the algorithm, each iteration involves XOR with FFeedback, which in the first iteration will act as an initialisation vector (IV). Using winedbg/GDB, we can extract the initial value of FFeedback (i.e. the IV), noting that Blowfish operates on blocks of 64 bits (8 bytes):2
$ winedbg --gdb foobar.exe
Wine-gdb> b *0x4e29ee
Breakpoint 1 at 0x4e29ee
Wine-gdb> c
Continuing.
Breakpoint 1, 0x004e29ee in ?? ()
Wine-gdb> x/8bx $eax
0x1012f70: 0x01 0x23 0x45 0x67 0x89 0xab 0xcd 0xef
Mangled ciphertext?
From TCipher.DecodeBuffer, we can also extract the raw input data, Source, passed to the Blowfish decryption algorithm:
$ winedbg --gdb foobar.exe
Wine-gdb> b *0x4e28b8
Breakpoint 1 at 0x4e28b8
Wine-gdb> c
Continuing.
Breakpoint 1, 0x004e28b8 in ?? ()
Wine-gdb> x/64bx $edx
0x1014038: 0x69 0xa6 0x9a 0x69 0xa6 0x9a 0x69 0xa6
0x1014040: 0x9a 0x69 0xa6 0x9a 0x69 0xa6 0x9a 0x69
0x1014048: 0xa6 0x9a 0x69 0xa6 0x9a 0x69 0xa6 0x9a
0x1014050: 0x69 0xa6 0x9a 0x69 0xa6 0x9a 0x69 0xa6
0x1014058: 0x9a 0x69 0xa6 0x9a 0x69 0xa6 0x9a 0x69
0x1014060: 0xa6 0x9a 0x69 0xa6 0x9a 0x69 0xa6 0x9a
0x1014068: 0x69 0xa6 0x9a 0x69 0xa6 0x9a 0x69 0xa6
0x1014070: 0x9a 0x69 0xa6 0x9a 0x00 0x00 0x00 0x00
Now this is unusual. Recall that our license.bin file contains 0x50 ASCII a
characters (0x61). However, the input to Blowfish is instead only of length 0x3c, and appears to be a repetition of the bytes "0x69 0xa6 0x9a".
By putting different-length strings of ASCII a
characters into license.bin, and running echo 'b *0x4e28b8\nc\nx/64bx $edx' | winedbg --gdb foobar.exe
, we can see what input is passed to Blowfish for varying-length inputs:
- 1 character: breakpoint is never reached
- 2 characters:
0x1014000: 0x69 0x00
- 3 characters:
0x1014000: 0x69 0xa6 0x00
- 4 characters:
0x1013ff4: 0x69 0xa6 0x9a 0x00
- 5 characters:
0x1013ff4: 0x69 0xa6 0x9a 0x00
- 6 characters:
0x1013ff4: 0x69 0xa6 0x9a 0x69 0x00
- 7 characters:
0x1013ff4: 0x69 0xa6 0x9a 0x69 0xa6 0x00
- 8 characters:
0x1014000: 0x69 0xa6 0x9a 0x69 0xa6 0x9a 0x00
- 9 characters:
0x1014000: 0x69 0xa6 0x9a 0x69 0xa6 0x9a 0x00
It appears only 3 of every 4 license.bin bytes affects the input to Blowfish. The result also appears periodic, repeating "0x69 0xa6 0x9a". Astute readers may already have an idea of what this algorithm is – in Base64, each Base64 character represents 6 bits of data, so each set of 4 Base64 characters represents 3 bytes of data.3
We can confirm this by noting that a long string of a
s in Base64 decodes to repetitions of "0x69 0xa6 0x9a":
$ echo -n aaaaaaaaaaaaaaaa | base64 -d | xxd
00000000: 69a6 9a69 a69a 69a6 9a69 a69a i..i..i..i..
We deduce, then, that license.bin contains the Base64 encoding of the raw ciphertext that will be passed to Blowfish.
Identifying the encryption key
At this point, we have everything we need to reimplement the Blowfish encryption/decryption, except for the Blowfish key. The key is not passed to TCipher.DecodeBuffer, so we must look elsewhere.
The Blowfish key schedule involves initialising a P-array and S-boxes, which are then combined with the key. The initial values in the P-array and S-boxes comprise a large number of distinctive nothing-up-my-sleeve numbers. For example, the first entry of the P-array has initial value 0x243F6A88.
Using Ghidra, we can search for this value, and locate it at address 0x502b20:
This address is referred to by FUN_004e3338. Ghidra's decompilation of this function relevantly reads:
void FUN_004e3338(int *param_1,int param_2,int param_3,undefined4 param_4) {
// ...
local_c = param_3;
local_8 = param_1;
// ...
iVar5 = 0;
iVar6 = 0;
do {
puVar1 = (uint *)(iVar3 + iVar6 * 4);
*puVar1 = *puVar1 ^ (uint)*(byte *)(param_2 + iVar5 % local_c) * 0x1000000 +
(uint)*(byte *)(param_2 + (iVar5 + 1) % local_c) * 0x10000 +
(uint)*(byte *)(param_2 + (iVar5 + 2) % local_c) * 0x100 +
(uint)*(byte *)(param_2 + (iVar5 + 3) % local_c);
iVar5 = (iVar5 + 4) % local_c;
iVar6 = iVar6 + 1;
} while (iVar6 != 0x12);
// ...
}
Compare this with this algorithm from Wikipedia for the initialisation of the P-array:
/* initialize P box w/ key*/
uint32_t k;
for (short i = 0, p = 0; i < 18; i++) {
k = 0x00;
for (short j = 0; j < 4; j++) {
k = (k << 8) | (uint8_t) key[p];
p = (p + 1) % key_len;
}
P[i] ^= k;
}
At first glance, these algorithms do not look entirely alike. However, note that i loops from 0 to 18, which is equivalent to iVar6 looping from 0 to 0x12. Similarly, the 8-bit left shifts in C are equivalent to multiplying by powers of 0x100.
Now, note that key is indexed into by p, which is a counter modulo key_len. Equivalently, param_2 is indexed into by iVar5 modulo local_c/param_3.
We therefore identify param_2 (edx) as a pointer to the key and param_3 (ecx) as the length of the key.
Using winedbg/GDB one final time, we can now inspect these registers when FUN_004e3338 is called to extract the Blowfish key:4
$ winedbg --gdb foobar.exe
Wine-gdb> b *0x4e3338
Breakpoint 1 at 0x4e3338
Wine-gdb> c
Continuing.
Breakpoint 1, 0x004e3338 in ?? ()
Wine-gdb> info reg
eax 0x1012800 16852992
ecx 0x20 32
edx 0x1011f04 16850692
ebx 0x4e1300 5116672
[...]
Wine-gdb> x/32bx $edx
0x1011f04: 0x11 0x22 0x33 0x44 0x55 0x66 0x77 0x88
0x1011f0c: 0x99 0xaa 0xbb 0xcc 0xdd 0xee 0xff 0x11
0x1011f14: 0x22 0x33 0x44 0x55 0x66 0x77 0x88 0x99
0x1011f1c: 0xaa 0xbb 0xcc 0xdd 0xee 0xff 0x11 0x22
Implementing decryption
With this, we can now implement the complete decryption process used in this software:
ciphertext = b'a' * 0x50
import base64
ciphertext = base64.b64decode(ciphertext)
from Crypto.Cipher import Blowfish
key = bytes.fromhex('11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff 11 22')
iv = bytes.fromhex('01 23 45 67 89 ab cd ef')
cipher = Blowfish.new(key, Blowfish.MODE_ECB)
# Nonstandard DEC CTS mode
f = iv
data_remains = False
for i in range(0, len(ciphertext), Blowfish.block_size):
d = ciphertext[i:i+Blowfish.block_size]
if len(d) < Blowfish.block_size:
data_remains = True
break
b = bytes(x ^ y for x, y in zip(d, f))
d = cipher.decrypt(d)
d = bytes(x ^ y for x, y in zip(d, f))
f = b
print(d.hex())
if data_remains:
b = f
b = cipher.encrypt(b)
d = bytes(x ^ y for x, y in zip(b, d))
print(d.hex())
We confirm that this code produces the same decrypted ‘licence data’ as was found at 0x1013ff4 in part 2.
Generating valid licence files
We can now reverse the decryption process to derive valid licence files containing arbitrary licence data of our choice:
plaintext = b"Functionality=1\nRegBy=That's all folks!\nRegTo=Thanks for playing!\nEmail=foobar@example.com"
from Crypto.Cipher import Blowfish
key = bytes.fromhex('11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff 11 22')
iv = bytes.fromhex('01 23 45 67 89 ab cd ef')
cipher = Blowfish.new(key, Blowfish.MODE_ECB)
# Nonstandard DEC CTS mode
ciphertext = bytearray()
f = iv
data_remains = False
for i in range(0, len(plaintext), Blowfish.block_size):
s = plaintext[i:i+Blowfish.block_size]
if len(s) < Blowfish.block_size:
data_remains = True
break
d = bytes(x ^ y for x, y in zip(s, f))
d = cipher.encrypt(d)
f = bytes(x ^ y for x, y in zip(d, f))
ciphertext.extend(d)
if data_remains:
s = plaintext[i:]
b = f
b = cipher.encrypt(b)
d = bytes(x ^ y for x, y in zip(s, b))
ciphertext.extend(d)
import base64
print(base64.b64encode(ciphertext).decode('utf-8'))
Thanks for reading to the end! If you enjoyed this series, you may enjoy my writeup of an earlier reverse engineering project looking at an early-2010s gaming DRM system.
Footnotes
-
In DEC 3.0, this was the default block cipher mode of operation! In the modern DEC library, this mode is referred to as ‘CTS3’ and described as ‘a proprietary mode developed by Frederik Winkelsdorf’, which ‘has a less secure padding of the truncated final block’, and is disabled by default. See also here where it is reviewed unfavourably. ↩
-
In this series, magic numbers have been replaced with placeholders for demonstration purposes. ↩
-
For less astute readers, such as myself, the process may involve wallowing around in Ghidra until stumbling upon a relevant reference to TStringFormat_MIME64, i.e. the Base64 transfer encoding for MIME. Interestingly, the TStringFormat_MIME64 implementation of Base64 accepts invalid Base64 input without warning, producing garbage output or, alternately, stack overflows and crashes. This was very fun to diagnose when reverse engineering(!) ↩
-
Further analysis shows this key is derived from computing the RIPEMD-256 hash of
"innocuous-looking string"
in TSecurity.Create from part 2. Because the key is fixed, it was unnecessary to go into the details here, but it is an interesting ‘hiding in plain sight’ approach. ↩