Illustrate how to partially or completely obfuscate a python package with cython.
This is useful for the case where you want to distribute a package but do not want to share the source code, and get at least the same level of obfuscation as with a language compiled with gcc/clang/cl.
Note that even though we are using cython, do not expect huge speedups. The compiled code will not be optimized, since no type annotations are added. Nonetheless, the cython developers claim that you can get a 20%-50% speedup when compiling pure python files with cython.
The goal of this project is to share obfuscated code, not to speed up your code.
Free software: MIT license
Invoking the following command:
python partial_cythonization/cli.py path/to/repo_dir/pkg_to_obfuscate path/to/obfuscated_pkg_destination
will look at all the source files in pkg_to_obfuscate
and look for those with the following comment at the top:
# obfuscate_with_cython: True
def foo(): ...
def bar(): ...
For each such file, it will compile the file to a python extension file and copy it to the destination directory. Other source files will be copied as is.
By using the --compile-all
/-a
flag, all eligible files will be obfuscated.
python partial_cythonization/cli.py path/to/repo_dir/pkg_to_obfuscate path/to/obfuscated_pkg_destination -a
cd path/to/repo_dir
export PYTHONPATH=path/to/obfuscated_pkg_destination
pytest tests/ -v
An eligible file is a file that:
- is a python source file with extension
.py
- does not use
numba
- is not ignored by the
always_exclude
rule in the configuration file
Since we build extensions in-place, after a successful run, the source package will contain .c
files and corresponding .pyd
/.so
.
You may use the -c
/--clean
flag to remove them.
The best way to make sure that the resulting package works the same as the original one is to run the tests. This program will copy the tests/
folder in the destination dir, if it exists. These tests should be runnable with the obfuscated package the same way you run them on the original package, assuming tests source files are relocatable.
If the tests are not relocatable, we do not provide a turn-key solution.
You may however run the original test suite by changing the PYTHONPATH
variable to point to the obfuscated package directory instead of the original one in the source tree.
For instance:
python partial_cythonization/cli.py path/to/repo_dir/pkg_to_obfuscate path/to/obfuscated_pkg_destination
cd path/to/repo_dir
export PYTHONPATH=path/to/obfuscated_pkg_destination
pytest tests/ -v
The configuration file is a toml
file that must be passed to the command line with the --config
option.
It supports two keys:
include_data
: a list of file patterns for data files to be included in the obfuscated package.always_exclude
: a list of file patterns for files to be excluded from the obfuscated package in every case.never_obfuscate
: a list of file patterns for files that should never be obfuscated with cython.
include_data = [
"*data/*.csv",
"*.txt",
]
always_exclude = [
"some_package/subpkg2/*"
]
never_obfuscate = [
# usually, no added value to obfuscate these files
"*/__init__.py",
]
- Compile selected or all
.py
files to python extension files. - Copy an obfuscated version of the package to a destination directory.
- Detect source files using numba and skip them.
- Detect source files using cython and copy the compiled extension files, skipping the
.pyx
This utility should work on any platform supported by cython. It was been tested on :
- Windows 10
- MacOs 10.14 (Apple Silicon)
- The tests are running on Ubuntu 20.04
With cython 3.x and Python 3.11 and 3.10.
If a module uses numba, it will be skipped. Numba relies on analysis of the python AST to compile optimized version. After conversion with cython, this AST is gone.
If you want to obfuscate numba code for delivery, you should precompile your numba code using the Ahead-of-Time (AOT) compiler: numba-aot, or its future replacement.
If this is not possible, you may convert your python code to cython and precompile them before running this tool, as explained in the next section. You may also make a native C or C++ extension with tools such as pybind11 and cffi or Rust extension using PyO3.
All these approaches are orthogonal to this tool.
If you have you own .pyx
files to compile with cython, you should compile them before running this tool.
Additionally, this tool is not intended to package a python program with all its dependencies as a stand-alone redistributable. It does not intend to replace tools such as PyInstaller, Py2App, cx_freeze etc.
However, it can be used to pre-process a collection of packages to be then included in such a redistributable.
Any dynamic discovery of python modules should not only filter using the ".py"
file suffix.
For the equivalent code to work after conversion with cython, you must also handle native file suffix such as ".cpXYY-platform_arch.pyd"
.
In python, you can know the extension that cython will use with:
import sysconfig
sysconfig.get_config_var("EXT_SUFFIX")
If you use python type annotations, the generated cython code will use them to add runtime type checking.
def foo(a: int, b: int): ...
foo(3.2, 12) # No error in pure python, but TypeError will be raised after the module is cythonized
# Although python linters and typecheckers will also warn you about mismatching types
Similar to last example. Pure python will convert and IntEnum value to int, cythonized code will not.
from enum import IntEnum
class State(IntEnum):
AAA = 0
BBB = 1
def foo(a: int, b: int): ...
foo(State.AAA, 2) # ok in pure python, TypeError in cython
Two ways around are possible.
This may be preferable if you want to restrict input values.
from enum import IntEnum
from typing import Union
class State(IntEnum):
AAA = 0
BBB = 1
def foo(a: Union[int, State], b: int): ...
foo(State.AAA, 2) # ok everywhere
foo(1, 2) # still ok
In case int is more appropriate, or you can't modify the function.
from enum import IntEnum
class State(IntEnum):
AAA = 0
BBB = 1
def foo(a: int, b: int): ...
foo(State.AAA.value, 2) # ok everywhere
foo(State.AAA, 2) # not ok, same as initial example.
In python, the following snippet is legal:
def bar(msg: str): ...
def foo(value: int):
value = str(value) # TypeError in cython, `value` is typed as an int
bar(value)
In cython, like in C and C++, it is not possible to reassign the type of variable this way.
Instead, use this:
def bar(msg: str): ...
def foo(value: int):
msg = str(value)
bar(msg)
Sometimes, you may also find more appropriate names for the converted value.
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.