Samuel Martins I am a full-stack developer who loves sharing the knowledge accumulated over the years with people. The different technologies that I have encountered through my journey allows me to relate to beginners and seniors alike. I write about all things tech.

Improve Python performance using Cython

7 min read 1989

Improve Python Performance Using Cython

Cython is both a module and a language that Pythoneers use to speed up their code.

How does Cython work? What is it? Should you write all your Python code with Cython? Just how fast does it make your code? And does it always work?

In this tutorial, we’ll introduce you to Cython and explain why you should use it when writing Python code. We’ll also review Cylon’s compilation pipeline and common usage scenarios and walk you through installation and setup.

We’ll cover the following with practical examples:

The aim of this guide is to help you develop a better understanding of Cython and how it speeds up Python using a simple prime finding program.

What is Cython?

Cython can be considered both a module and a programming language that (sort of) extends Python by enabling the use of static typing borrowed from C/C++. Basically, all Python code is valid Cython, but not the other way around.

Keep in mind, you can convert Python to Cython and vice versa. If this is not easy to grasp, think about the relationship between C and C++ or JavaScript and TypeScript. You can directly copy your existing Python code into a Cython file and then compile it to boost performance.

What does Cython bring to the table?

It’s common knowledge that Python is more efficient than C given that it’s a high-level language. While this is ture, there is a downside to using Python as opposed to C/C++.

Python is efficient but slow. C, on the other hand, is less efficient but faster than Python. Cython, therefore, aims to bring all the benefits of C to Python while maintaining the efficiency Pyther developers have come to expect.

To understand this further, you need to first understand how Python code is executed. In the execution process (i.e., in the interpreter), the Python source code goes through a compiler, which acts as a translator to convert the source code into an intermediate platform independent bytecode.

We made a custom demo for .
No really. Click here to check it out.

After that, the python virtual machine executes the bytecode line by line. Since this happens on the fly during runtime, line-by-line execution makes the process slow compared to a compiled language.

If you compare this a to the block diagram of a compiled language, the source code is converted into machine code that can directly run on the architecture. This is very fast compared to the process by an interpreter.

The downside to this approach is that machine code is dependent on the platfor, meaning you cannot run the same code on different platforms.

Now you can see what both concepts bring to the table. C brings static typing to Python and Python brings efficiency to C.

Cython’s compilation pipeline

What does the Cython pipeline look like? Compilation in Cython is a two-step process.

In the first step, your Cython code is converted into equivalent optimized and platform-independent C or C++ code. From there, the C or C++ source code is converted into a shared object file through a C or C++ compiler. However, this shared object file is platform-dependent. It has a *.so extension on Linux or Mac OS and a *.pyd extension on Windows.

When to use Cython

In which scenarios might you need to use Cython? Does it work everywhere every time?

Well, yes and no. Using Cython everywhere doesn’t always guarantee increasd speed. However, you can use it in functions that involve a lot of mathematical operations and loop iterations. That’s because defining the types before running operations makes it easier when it comes to execution, especially in loops where variables are analyzed and iterated over multiple times.

Another great use case is when you already have a C or C++ library that needs a Python interface. In this case, you can use Cython to create a wrapper for the library.

Python vs. Cython: Comparing performance

Now let’s create an example project to see Cython in action.

The first step is to open up the terminal, set up a safe environment to work in (optional), and install Cython with other required dependencies.

$ sudo apt install build-essential

This will make the gcc compiler available in case your computer doesn’t have it.

$ sudo apt install python3-venv

This provides a safe environment for you to work safely. This step is not necessary, but it’s always good to create your projects in a separate virtual environment so dependencies don’t conflict.

$ sudo pip3 install cython

This installs Cython onto your machine.

Now that installation is complete, we can get started.

In this demonstration, we’ll write two simple functions in the same file, called main.pyx, to find some prime number. We’ll write one in basic Python and another in Cython. From there, we’ll execute both and measure the difference in execution time.

Note that all your files for this demonstration will be in one directory. Also, instead of putting the .py extension in this file, you’ll use .pyx since you installed Cython already on your machine or environment.

# 1. The basic Python function

"""
In this function, you are going to expect as a return value is a list of the first couple of number depending on what you feed it as an input parameter. the list of the prime numbers found is going to be empty in the beginning
"""
def prime_finder_py ( amount ):
  primes = []
  found = 0
  number = 2

  while found < amount:
  for x in primes:
    if number % x == 0:
      break
    else:
      primes.append ( number )

  found += 1
  number += 1
  
  return primes

"""
the only thing you are checking for in line 12 is if the new number you are currently checking is divisible by the prime a number appended in this array will only be there if and only if the has not been a single number below it that is able to divide it. 

line 19 ensures that the loop runs from one number to the next progressively regardless of whether or not it was added to the primes array
"""
# 2. The Cython Function

"""
first of all,you should define these variables because you don’t want to be defining them on the fly since we are trying to optimize python using the C syntax.

Also, in C programming, you always have to define your arrays with a fixed size just like I have done in line 10

Line 13 is a fail safe just incase you choose a number that is beyond this limit ( which you can change by the way )
"""

def prime_finder_cy ( int amount ):
  cdef int number, x, found
  cdef prime[50000]
  amount = min ( amount, 50000 )

  found = 0
  number = 2
  while found < amount:
    for x in primes[ :found]:
      if number % x == 0:
        break
      else:
        primes[found] = number
        found += 1

      number += 1
  
  return_list = [p for p in primes[ :found]]
  return return_list


'''
for the for loop on line 19, you need to tweak it a little bit because you don't really want to go through the whole value of your fixed array even when you don't have that much numbers in the array. Therefore, the loop need only go upto the index of 'found'. That way, the loop will only run upto the last index of found

line 28 makes sure that you only have the elements you need and not the entire length of the array.
'''

As you can see, the logic of how we find the prime numbers is the exactly the same. You are not changing anything. You actually have more code in the Cython syntax.

If you look at the Cython implementation, you’ll notice that you have a fixed size array with superfluous free slots. You have type definitions and some extra code. You’d think this would make for slower performance due to the simple fact that there is more code. Still, you’ll see that the Cython code is way faster than the Python code.

Create another file within the same directory and name it anything with a .py extension. For this example, I named mine, setup.py.

In the setup.py file, import from setuptools and cythonize from Cython.Build, like so:

from setuptools import setup
from Cython.Build import cythonize

All you need to do in this file is add the following snippet of code:

from setuptools import setup
from Cython.Build import cythonize

setup (
ext_modules = cythonize ( ‘main.pyx’ )
)

After that, you don’t just run this in your IDE; you have to run it from the terminal. Open that directory in the terminal and execute the following command:

$ python setup.py build_ext --inplace

This command will generate a main.c file and the .so file in case you’re working with Linux or a .pyd if you’re working with Windows.

From here, you no longer need the main.pyx file. You only need the *.so file and another new file to test the functions.

You can call the new .py file anything you want; for the purpose of this example, we’ll name it test.py

In the test.py file, you need to import main, which is the binary file, and time, which you’ll use to compare the execution times.

Don’t worry — you’re almost there.

After importing main and time, you can start calling your function by looking into the main import, like this:

import main
import time

# example call
print( main.prime_finder_py(x) )
print( main.prime_finder_cy(x) )

'''
the Xs in the parameter bracket it the number of prime numbers
the program is supposed to display for you.
'''

Now for the fun part.

To determine the amount of time the functions are running, you need to add a time variable and use the time module you imported.

import main
import time

start_py = time.time() '''records time before function runs'''
print( main.prime_finder_py(x) )
end_py = time.time() '''records time after function has run'''

time_py = end_py – start_py

start_cy = time.time() '''records time before function runs'''
print( main.prime_finder_cy(x) )
end_cy = time.time() '''records time after function has run'''

time_cy = end_cy – start_cy

if time_cy < time_py:
print ( ‘ The Cython implementation is faster ’)
else:
print ( ‘The Python implementation is faster ’ )

For the most part, this code is pretty straightforward. Basically, if you run this test.py file in your IDE, the first part records the time taken by the Python function to run. The second part does the same for the Cython function. The if statement compares the two computed execution time values and evaluates which function is faster than the other.

Keep in mind that you have to use big numbers in your parameters else you will not notice the difference. Try 20,000 as your parameter and see what happens. You can even try to add print statements to see the exact values of the time variables for every function. Have fun with it.

Anyway, this is just because Cython has added static typing. You didn’t change any algorithmic complexity things or cache something by mistake. Basically, you sacrificed some of the Python flexibility for a massive improvement in execution time.

Conclusion

Now that we’ve gone through this exercise, does introducing Cython to your Python code helps? Yes, but not always.

When operations are CPU-bound, meaning all the runtime is spent manipulating a few values inside CPU registers and little to no data movement is required, Cython will very likely improve performance by introducing statically type variables and shared object libraries. However, it cannot help when IO-bound (e.g., reading a large file from disk) or network-bound (i.e., downloading a file from an FTP server) operations are the bottleneck.

So, when introducing Cython to your Python code, you first need to profile your code and determine what kind of bottleneck you have.

: Full visibility into your web apps

LogRocket is a frontend application monitoring solution that lets you replay problems as if they happened in your own browser. Instead of guessing why errors happen, or asking users for screenshots and log dumps, LogRocket lets you replay the session to quickly understand what went wrong. It works perfectly with any app, regardless of framework, and has plugins to log additional context from Redux, Vuex, and @ngrx/store.

In addition to logging Redux actions and state, LogRocket records console logs, JavaScript errors, stacktraces, network requests/responses with headers + bodies, browser metadata, and custom logs. It also instruments the DOM to record the HTML and CSS on the page, recreating pixel-perfect videos of even the most complex single-page and mobile apps.

.
Samuel Martins I am a full-stack developer who loves sharing the knowledge accumulated over the years with people. The different technologies that I have encountered through my journey allows me to relate to beginners and seniors alike. I write about all things tech.

Leave a Reply