Steganography: decoding Pico-8 cartridges
PICO-8 is a very cool product. It’s a fantasy gaming console that allows you to make, share and play small games on Windows, Mac and Linux.
Here’s a nice animation taken from their homepage and, as you can see, it includes a command line, development tools and the ability to play games.
Cartridges
This is a PICO-8 cartridge. The interesting part is that besides looking like an old-school console cartridge, the source code and the game assets are embedded inside the image.
I was familiar with steganography so I supposed that the developers might have used it to hide the game assets inside the image and this was confirmed by taking a look at the P8.PNG file format.
Just for fun, let’s try to decode a cartridge and extract the game source code. By the way, the decoder full source code is available here.
Steganography
The most common use case for steganography is hiding secret data within an ordinary non-secret file in order to avoid detection. For example, you could insert a secret message in one of your pictures, publish it on Instagram and as long as inserting the message did not alter the original image that much, it would be quite hard to detect.
However, in this case, steganography is being used in a very creative way for convenience reasons: you just need to share an image that looks like an actual cartridge and that’s it, everything is self-contained in that single file.
One of the most basic forms of image steganography is replacing the color channels least significant bits and store secret bits of information instead. Let’s suppose that we have the following pixel in RGB format:
RGB color (139, 81, 255)
- R: 10001011 = 139
- G: 01010001 = 81
- B: 11111111 = 255
We could insert the following secret bits (101010) replacing the last two bits of each channel.
- R: 10001010 -> 138
- G: 01010010 -> 82
- B: 11111110 -> 254
The colors are different, but it’s very hard to notice. If you were shown a picture which included the second shade of purple, it would look perfectly normal, especially if you haven’t even seen the original in first place.
Original
New
PICO-8 does something similar; each byte is stored as the two least significant bits of each of the four color channels, ordered ARGB. Also cartridges are 205 pixels high and 160 pixels wide, which means a maximum capacity of 32,800 bytes.
The code for this part is quite simple. The first step is using pypng to get the PNG data.
image = png.Reader(filename)
(width, height, rows, info) = image.read()
hidden_data = unsteganize_png(width, height, rows, info)
And this is the method responsible for reversing the steganographic process:
def unsteganize_png(width, height, rows, info):
hidden_data = [0] * width * height
# planes 4 = [R G B A]
planes = info['planes']
assert planes == 4
for row, row_data in enumerate(rows):
for col in range(width):
# keep the last 2 bits only
R = row_data[col * planes + 0] & int('00000011', 2)
G = row_data[col * planes + 1] & int('00000011', 2)
B = row_data[col * planes + 2] & int('00000011', 2)
A = row_data[col * planes + 3] & int('00000011', 2)
# PICO likes them in ARGB format
pico_byte = A << 6 | R << 4 | G << 2 | B
hidden_data[(row * width) + col] = pico_byte
return hidden_data
Data Layout
- Bytes 0x0000-0x42ff are used for game assets such as the spritesheet, music, sound effects, etc.
- Bytes 0x4300-0x7fff are used to store the source code, which is written in a subset of Lua.
For the purposes of this post, we are only interested in the Lua source code, so we are going to work with the bytes starting at 0x4300.
Dealing with Compression
P8.PNG has three different compression options:
- No compression
- Old compression
- New compression
Remember that due to the size of the PNG image (205 high x 160 wide), the amount of data is very limited so the developers had to come up with creative ways to put more stuff inside the cartridges.
This is how we can detect which compression version are we dealing with:
def get_version(hidden_data):
if bytes(hidden_data[0x4300:0x4304]) == b'\x00pxa':
return FORMAT.NEW_COMPRESSED_FORMAT
elif bytes(hidden_data[0x4300:0x4304]) == b':c:\x00':
return FORMAT.OLD_COMPRESSED_FORMAT
else:
return FORMAT.PLAINTEXT_FORMAT
No compression
This case is quite easy, we just need to dump the whole thing as ASCII.
def get_code_plaintext(hidden_data):
# the code is stored as plaintext (ASCII), up to the first
# null byte
code = []
code_pos = 0x4300
while code_pos < 0x8000:
curr_byte = hidden_data[code_pos]
if curr_byte == 0:
break
code.append(chr(curr_byte))
code_pos += 1
return "".join(code) + "\n"
Let’s try with The Adventures of Jelpi. Here’s an extract of the output:
-- the adventures of jelpi
-- by zep
-- to do:
-- levels and monsters
-- title / restart logic
-- block loot
-- top-solid ground
-- better duping
-- config: num_players 1 or 2
num_players = 1
corrupt_mode = false
max_actors = 128
music(0, 0, 3)
Finally, let’s confirm the results using PICO-8 itself to inspect the source code.
Old Compression
The old compression case is more complicated. Basically we have to read each byte and based on its value, there are three things that can happen:
- We emit a character directly.
- We search for a character using a lookup table.
- We get an offset/length and we use it to copy/paste a segment of the already decoded data.
def get_code_oldcompression(hidden_data):
CHAR_TABLE = \
' \n 0123456789abcdefghijklmnopqrstuvwxyz!#%(){}[]<>+=/*:;.,~_'
# bytes 0x4304-0x4305 are the length of the decompressed code,
# stored MSB first.
decompressed_length = (hidden_data[0x4304] << 8) | \
hidden_data[0x4305]
# The next two bytes (0x4306-0x4307) are always zero.
assert hidden_data[0x4306] == 0
assert hidden_data[0x4307] == 0
code = []
code_pos = 0x4308
while len(code) < decompressed_length:
curr_byte = hidden_data[code_pos]
if curr_byte == 0x00:
# 0x00: Copy the next byte directly to the output
# stream.
code.append(chr(hidden_data[code_pos + 1]))
code_pos += 2
elif curr_byte <= 0x3b:
# 0x01-0x3b: Emit a character from a lookup table
code.append(CHAR_TABLE[curr_byte])
code_pos += 1
else:
# 0x3c-0xff: Calculate an offset and length from this byte
# and the next byte, then copy those bytes from what has
# already been emitted. In other words, go back "offset"
# characters in the output stream, copy "length" characters,
# then paste them to the end of the output stream.
next_byte = hidden_data[code_pos + 1]
# this magic stuff comes from the format specification
offset = (curr_byte - 0x3c) * 16 + (next_byte & 0xf)
index = len(code) - offset
length = (next_byte >> 4) + 2
try:
for i in range(length):
b = code[index + i]
code.append(b)
except IndexError as e:
return f"ERROR DECODING\noffset={offset} length={length}\n\n" \
+ "".join(code)
code_pos += 2
return "".join(code)
This time let’s try with Barp The Balldragon. These are the first lines of the decoded source code:
-- barp the balldragon
a,b,
c,d=
0,
1,
1,
{0,0,0,0}
e=0
f={0,0xffff}
g={0xffff,0,h=f}
i={0,1,h=g}
j={1,0,h=i}
f.h=j
k="abcdefghijklmnopqrstuvwxyz1234567890`~!@#$%^&*()-_=+[{]}|;:',./?"
function l()
m={}
for n=1,64 do
m[sub(k,n,n)]=n-1
And they look fine compared to what PICO-8 displays (not considering that PICO-8 displays everything in uppercase).
New Compression
Finally, the new compression format is the most complicated one. In this case, we need to consider the data like a stream of bits and interpret them bit by bit. Groups of bits (they can be smaller or larger than a byte) have a header and two things can happen based on the header:
- We get a value and then use it to emit a character using a move-to-front mapping.
- We get an offset/length and we use it to copy/paste a segment of the already decoded data. Unlike the old compression format, there’s a case that allows the copied data to be pasted multiple times.
stream_str = ""
stream_pos = 0
def read_bit():
global stream_pos
bit = stream_str[stream_pos: stream_pos + 1]
stream_pos += 1
return bit
def read_bits(positions):
global stream_pos
# inverts the bits
inv_bits = stream_str[stream_pos: stream_pos + positions][::-1]
stream_pos += positions
return inv_bits
def get_code_newcompression(hidden_data):
global stream_pos, stream_str
code_str = ""
# bytes 0x4304-0x4305 are the length of the decompressed code,
# stored MSB first.
decompressed_code_length = (hidden_data[0x4304] << 8) \
| hidden_data[0x4305]
# The next two bytes (0x4306-0x4307) are the length of the
# compressed data + 8 for this 8-byte header, stored MSB first.
compressed_data_length = hidden_data[0x4306] << 8 \
| hidden_data[0x4307]
# The decompression algorithm maintains a "move-to-front" mapping
# of the 256 possible bytes. Initially, each of the 256 possible
# bytes maps to itself.
move_to_front = []
for i in range(256):
move_to_front.append(i)
# The decompression algorithm processes the compressed data bit
# by bit - going from LSB to MSB of each byte - until the data
# length of decompressed characters has been emitted. We create
# a string with all bytes inverted to simulate the decoding stream
stream = []
code_pos = 0x4308
while code_pos < 0x8000:
# convert to binary and reverse
stream.append(format(hidden_data[code_pos], '08b')[::-1])
code_pos += 1
stream_str = "".join(stream)
stream_pos = 0
while len(code_str) < decompressed_code_length:
# Each group of bits starts with a single header bit,
# specifying the group's type.
header = read_bit()
if header == "1":
# header bit = 1 -> get a character from the index
# these values and bit manipulations are documented
# in the P8.PNG spec
unary = 0
while read_bit() == "1":
unary += 1
unary_mask = ((1 << unary) - 1)
bin_str = read_bits(4 + unary)
index = int(bin_str, 2) + (unary_mask << 4)
try:
# get and emit character
c = chr(move_to_front[index])
code_str += c
except IndexError as e:
err_str = f"ERROR DECODING\nindex={index}\n\n"
print(err_str)
return err_str + code_str
# update move_to_front data structure
move_to_front.insert(0, move_to_front.pop(index))
else:
# header bit = 2 -> copy/paste a segment
# these values and bit manipulations are documented
# in the P8.PNG spec
if read_bit() == "1":
if read_bit() == "1":
offset_positions = 5
else:
offset_positions = 10
else:
offset_positions = 15
offset_bits = read_bits(offset_positions)
offset_backwards = int(offset_bits, 2) + 1
length = 3
while True:
part = int(read_bits(3), 2)
length += part
if part != 7:
break
# Then we go back "offset" characters in the output stream,
# and copy "length" characters to the end of the output stream.
# "length" may be larger than "offset", in which case we
# effectively repeat a pattern of "offset" characters.
if offset_backwards > len(code_str):
err_str = f"ERROR DECODING\nback_offset={offset_backwards} len code_str={len(code_str)}\n\n"
print(err_str)
return err_str + code_str
else:
if -offset_backwards + length >= 0:
chunk = code_str[-offset_backwards:]
else:
chunk = code_str[-offset_backwards: -offset_backwards + length]
assert len(chunk) > 0
if length > offset_backwards:
chunk += repeat_to_length(chunk, length - offset_backwards)
code_str += chunk
return code_str
def repeat_to_length(string_to_expand, length):
return (string_to_expand * (int(length/len(string_to_expand))+1))[:length]
Let’s try with Wolf Hunter. Here’s an extract of the output:
-- game state
-- create the two units and
-- provide helper functions to switch turns.
function new_game_state()
-- create player unit
local player_events = {"menu"}
local player_items = {"crossbow", "elixir", "silver knife"}
local player_hp = 100
local player = new_unit("player", player_hp, player_events, player_items)
-- create enemy unit
local enemy_events = {"slash", "dark charge", "strong defend", "raging strike", "ravage", "cleave"}
local enemy_hp = 400
local enemy = new_unit("werewolf", enemy_hp, enemy_events)
-- game state
local state = {
And it looks like we’ve decoded it correctly.
Special characters
Besides what’s documented, I guess PICO-8 is initializing the move-to-front data structure with some unicode values. For example, in this screenshot the IDE shows a heart symbol, which is specifically Unicode U+2665.
Final thoughts
PICO-8 is a very cool and interesting gaming console and I very much liked their use of steganography to allow sharing cartridges in a more convenient way. I’ve even bought PICO-8 because I wanted to make sure the decoding was going well, but also because it’s fun and maybe I’ll try to code a simple game in the future. In any case, the decoder full source code is available here.
Thanks for reading and happy new year!!! 😃