Faster Python made easier with Cython’s pure Python mode

Now you can have Cython’s speed boost without its cumbersome syntax, using the pure Python syntax you know and love.

Faster Python made easier with Cython’s pure Python mode
Peter Griffin (CC0)

Cython has long been one of the great secret weapons of Python performance, letting you turn Python code into C for speed. But Cython also has long suffered from a cumbersome and counterintuitive syntax, an odd hybrid of C and Python. To add insult to injury, Cython code can’t be processed by any of the current roster of Python linting and formatting tools.

The good news: In recent years Cython has developed an alternate syntax, called pure Python mode. As the name implies, pure Python mode uses native Python syntax to express Cython’s behaviors and constructs, making it much easier for Python programmers to get started with Cython.

Pure Python mode also enhances one of Cython’s biggest advantages: It makes it easier to start with a conventional Python codebase and incrementally transform it into C code. Furthermore, Cython code written in pure Python mode can optionally run as regular Python, although without Cython’s speed gains.

Finally, pure Python mode allows Python linting and code analysis tools to work with Cython modules. The existing culture of Python tooling doesn’t have to end at the Cython barrier.

The original Cython syntax

Below is a short module written with conventional Cython syntax. It calculates (not very efficiently) the Fibonacci sequence for a given number. (Note that we’re using classes here not because they are the best way to solve this problem, but because it’s worth demonstrationg how they map to equivalent elements in Cython.)

class Fibonacci:    
    def __init__(self, start: int):
        if start<0:
            raise ValueError("Starting number must be greater than 0")
        self.n = start

    def calc(self) -> int:
        return self._calc(self.n)
    
    def _calc(self, val: int) -> int:
        if val == 0:
            return 0
        elif val == 1 or val == 2:
            return 1
        else:
            return self._calc(val-1)+self._calc(val-2)

On an AMD Ryzen 5 3600, this code executes in about 20 seconds. Python is inefficient when it comes to math. If we rewrite this code in Cython, we can speed things up considerably.

Here is a Cython version of the same code (saved with a .pyx file extension):

cdef class Fibonacci:
    cdef int n
    def __init__(self, int start):
        if start<0:
            raise ValueError("Starting number must be greater than 0")
        self.n = start

    cpdef int calc(self):
        return self._calc(self.n)
    
    cdef int _calc(self, int val):
        if val == 0:
            return 0
        elif val == 1 or val == 2:
            return 1
        else:
            return self._calc(val-1)+self._calc(val-2)

This Cython code runs far faster: about half a second on the same hardware! But as you might have noticed, Cython’s syntax can be confusing.

If you squint hard, you can see the original Python syntax still in there, albeit buried under a number of other things that aren’t Python. cdef and cpdef, for instance, are used to declare Cython-only and Cython-wrapped functions. Also, the type decorations used on objects and function signatures is nothing like the type hinting syntax we use in Python generally.

There are many other ways Cython syntax can be hard to parse, but this example should give you the general idea.

Pure Python syntax in Cython

Here is the same module, rewritten in pure Python mode (and saved with the regular .py extension):

import cython

@cython.cclass
class Fibonacci:    
    n: cython.int
    def __init__(self, start:cython.int):
        if start<0:
            raise ValueError("Starting number must be greater than 0")
        self.n = start

    @cython.ccall
    def calc(self) -> cython.int:
        return self._calc(self.n)
    
    @cython.cfunc
    def _calc(self, val: cython.int) -> cython.int:
        if val == 0:
            return 0
        elif val == 1 or val == 2:
            return 1
        else:
            return self._calc(val-1)+self._calc(val-2)

Several things about this code should stand out right away:

  • We add Cython features to our code by way of the cython import, not custom syntax. All of the syntax shown here is standard Python.
  • Type hints for our variables are done in the conventional Python way. For instance, the variable n is declared at the class level with a Python type hint. Likewise, function signatures use Python-style type hints to indicate what they take in and return.
  • To declare Cython functions and classes, we use a decorator (a standard bit of Python syntax) instead of the cdef/cpdef keywords (not standard at all).

Another useful aspect about using the pure Python syntax: This code can run as-is in regular Python. We can simply import the module and use it without compiling it, although we won’t get the speed benefits Cython provides. If we do compile it, importing it will import the compiled version. This makes it easier to perform the kinds of incremental transformations of Python code that Cython was designed to make possible.

Compiled and uncompiled code in pure Python mode

A useful feature in pure Python mode is a way to create alternate code paths within a module based on whether the code is running in regular Python mode or compiled Cython mode. An example:

if cython.compiled:
    data = cython.cast(
        cython.p_int, PyMem_Malloc(array_size * cython.sizeof(cython.int))
    )
else:
    data = arr.array("i", [0] * array_size)]

data[0] = 32

Here we’re assigning one of two possible values to data based on whether or not this code is compiled. If compiled, data is a pointer to a region of memory allocated using the Python runtime. If not compiled, data is a Python array.array object made of 32-bit integers. In both cases, we’re able to access the array elements and set them with the same code, regardless of whether the code is compiled or not.

What pure Python mode doesn’t do (yet)

Pure Python mode has a few limitations that mean you can’t yet use it in every case where “classic” Cython works.

First, pure Python mode doesn’t support the full range of PEP 484 type annotations. Annotations such as Final or Union aren’t respected. The main reason for using PEP 484-style annotations is to provide a convenient way to include Cython’s own type hints, so many type hints used in regular Python aren’t supported yet.

Second, pure Python mode doesn’t support packed C structs (vitally important for working with some C libraries) or C style enums. Both of these features could conceivably be supported in pure Python mode in some form, but right now the only way to work with them is with the legacy Cython format.

Finally, you can’t call into C functions from pure Python mode the way you can from regular Cython. Normally, in Cython, you can call into a C function by including a reference to it like this:

cdef extern from "math.h":
    cpdef double sin(double x)

Pure Python mode provides no mechanism to do this directly, but you can use selective imports as described in the Cython documentation to engineer a workaround.

Copyright © 2022 IDG Communications, Inc.