The Python interpreter parses scripts and compiles them down to bytecode. This bytecode is cached as .pyc extension files usually in a __pycache__ directory or next to the Python source file. Depending on how the interpreter, libraries and code have been installed you can end up in a number of exploitable situations.

The Attack Vector

If you find a .pyc file owned by a user with higher privileges than the owner of the source file then you have a vector for privilege escalation. At some point the higher privilege user ran some code which utilised a library that you have write permissions over and if they do it again you can have them execute any code you want.

If you have a writable .pyc file in a path shared by other users or system services then as above, you can generate your own Python bytecode and put it in place. When the users run the programmes calling the stomped code you again have code execution.

Mitigations (and How They Fail)

There are a couple of mitigations in place, namely the .pyc file has a header that contains a timestamp. This has to match the source file timestamp; if it doesn’t then the interpreter knows the source is newer than the .pyc and it should recompile.

The second mitigating control brought about around Python 3.7 is a SipHash which is included in the compiled file header (https://peps.python.org/pep-0552/). If this doesn’t match the source then you’re out of luck as hash collisions on a 64-bit SipHash are going to take some time.

Some experimentation showed that these hashes are actually irrelevant; they’re almost never used (in a sample of about 20,000 files I spotted maybe a hundred that had this header). Also if you do spot a .pyc with this hash you can just remove it and place a file with a timestamp-based header instead.

Orphaned Bytecode

If a .pyc file is present but it has no associated source you have two avenues of attack:

One: adding the .py file which will force a recompilation
Two: stomping on the .pyc file as before

Research Process

Thanks to GPT Codex 5 and OpenCode I was able to rig up a set of tests and verifications in a few hours. Now the slow task of finding some exploitable conditions.