8.1. Performance Optimization

8.1.1. PyPy

8.1.3. Line Profiling

  • pip install line_profiler

8.1.4. Numpy vectorization

Scipy Ecosystem:

../_images/scipy-ecosystem.png

8.1.5. Specialized data structures

  • scipy.spatial - for spatial queries like distances, nearest neighbours, etc.

  • pandas - for SQL-like grouping and aggregation

  • xarray - for grouping across multiple dimensions

  • scipy.sparse - sparse metrics for 2-dimensional structured data

  • sparse (package) - for N-dimensional structured data

  • scipy.sparse.csgraph - for graph-like problems (e.g. finding shortest paths)

8.1.6. Cython

def primes(int kmax):   # The argument will be converted to int or raise a TypeError.
    cdef int n, k, i    # These variables are declared with C types.
    cdef int p[1000]    # Another C type
    result = []         # A Python type

    if kmax > 1000:
        kmax = 1000

    k = 0
    n = 2

    while k < kmax:
        i = 0

        while i < k and n % p[i] != 0:
            i = i + 1

        if i == k:
            p[k] = n
            k = k + 1
            result.append(n)

        n = n + 1
    return result
In [1]: %load_ext Cython

In [2]: %%cython
   ...: def f(n):
   ...:     a = 0
   ...:     for i in range(n):
   ...:         a += i
   ...:     return a
   ...:
   ...: cpdef g(int n):
   ...:     cdef int a = 0, i
   ...:     for i in range(n):
   ...:         a += i
   ...:     return a
   ...:

In [3]: %timeit f(1000000)
42.7 ms ± 783 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [4]: %timeit g(1000000)
74 µs ± 16.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# which gives a 585 times improvement over the pure-python version

Cython compiling:

../_images/performance-cython.png

8.1.7. Numba

Numba gives you the power to speed up your applications with high performance functions written directly in Python. With a few annotations, array-oriented and math-heavy Python code can be just-in-time compiled to native machine instructions, similar in performance to C, C++ and Fortran, without having to switch languages or Python interpreters.

from numba import jit, int32


@jit(nogil=True)
def do_something():
    pass


@jit(int32(int32, int32))
def add(x, y):
    return x + y

8.1.8. Dask

Dask natively scales Python. Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love

8.1.9. Find existing implementation

8.1.10. Contains

  • Use set instead of list

  • Jeżeli masz listę w której sprawdzasz czy element występuje, to zamień listę na set, dzięki temu będzie lepsza złożoność

NAMES = ['José', 'Иван', 'Max']

if 'Max' in NAMES:
    pass
NAMES = {'José', 'Иван', 'Max'}

if 'Max' in NAMES:
    pass

8.1.11. String Concatenation

How many string are there in a memory?:

firstname = 'Jan'
lastname = 'Twardowski'

firstname + ' ' + lastname
# Jan Twardowski

How many string are there in a memory?:

firstname = 'Jan'
lastname = 'Twardowski'

f'{firstname} {lastname}'
# Jan Twardowski

How many string are there in a memory?:

firstname = 'Jan'
lastname = 'Twardowski'
age = 42

'Hello ' + firstname + ' ' + lastname + ' ' + str(age) + '!'
# 'Hello Jan Twardowski 42!'

How many string are there in a memory?:

firstname = 'Jan'
lastname = 'Twardowski'
age = 42

f'Hello {firstname} {lastname} {age}!'
# 'Hello Jan Twardowski 42!'

Use list.append() instead of str + str:

# Performance - Method concatenates strings using + in a loop
html = '<table>'

for element in lista:
    html += f'<tr><td>{element}</td></tr>'

html += '</table>'
print(html)
# Problem solved
html = ['<table>']

for element in lista:
    html.append(f'<tr><td>{element}</td></tr>')

html.append('</table>')
print(''.join(html))

8.1.12. Range between two float

  • Uwaga na set zawierający floaty, bo pomiędzy dwoma wartościami jest nieskończona ilość wyrażeń

range(0, 2)
# 0
# 1

range(0.0, 2.0)
# ...

8.1.13. Inne

  • Jeżeli coś collections.deque - Double ended Queue

  • Serializowanie kolejki przy wielowątkowości

8.1.15. Assignments

Code 8.1. Solution
"""
* Assignment: Memoization
* Complexity: medium
* Lines of code: 5 lines
* Time: 13 min

English:
    TODO: English Translation
    X. Run doctests - all must succeed

Polish:
    1. Użyj danych z sekcji "Given" (patrz poniżej)
    2. Stwórz pusty `dict` o nazwie `CACHE`
    3. W słowniku będziemy przechowywali wyniki wyliczenia funkcji dla zadanych parametrów:
        a. klucz: argument funkcji
        b. wartość: wynik obliczeń
    4. Zmodyfikuj funkcję `factorial_cache(n: int)`
    5. Przed uruchomieniem funkcji, sprawdź czy wynik został już wcześniej obliczony:
        a. jeżeli tak, to zwraca dane z `CACHE`
        b. jeżeli nie, to oblicza, aktualizuje `CACHE`, a następnie zwraca wartość
    6. Porównaj prędkość działania
    X. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    TODO: Doctests
"""


# Given
from timeit import timeit
import sys
sys.setrecursionlimit(5000)


def factorial_cache(n: int) -> int:
    ...


# Do not modify anything below
def factorial_nocache(n: int) -> int:
    if n == 0:
        return 1
    else:
        return n * factorial_nocache(n - 1)


duration_cache = timeit(
    stmt='factorial_cache(500); factorial_cache(400); factorial_cache(450); factorial_cache(350)',
    globals=globals(),
    number=10000,
)

duration_nocache = timeit(
    stmt='factorial_nocache(500); factorial_nocache(400); factorial_nocache(450); factorial_nocache(350)',
    globals=globals(),
    number=10000
)

print(f'factorial_cache time: {round(duration_cache, 4)} seconds')
print(f'factorial_nocache time: {round(duration_nocache, 3)} seconds')
print(f'Cached solution is {round(duration_nocache / duration_cache, 1)} times faster')