This page was generated from docs/tutorials/PerformanceCliffordTutorial.ipynb. Interactive online version: .
Writing high(ish) performance code with Clifford and Numba via Numpy¶
This document describes how to take algorithms developed in the clifford package with notation that is close to the maths and convert it into numerically efficient and fast running code. To do this we will expose the underlying representation of multivector as a numpy array of canonical basis vector coefficients and operate directly on these arrays in a manner that is conducive to JIT compilation with numba.
First import the Clifford library as well as numpy and numba¶
[1]:
import clifford as cf
import numpy as np
import numba
Choose a specific space¶
For this document we will use 3d euclidean space embedded in the conformal framework giving a Cl(4,1) algebra. We will also rename some of the variables to match the notation that used by Lasenby et al. in “A Covariant Approach to Geometry using Geometric Algebra”
[2]:
from clifford import g3c
# Get the layout in our local namespace etc etc
layout = g3c.layout
locals().update(g3c.blades)
ep, en, up, down, homo, E0, ninf, no = (g3c.stuff["ep"], g3c.stuff["en"],
g3c.stuff["up"], g3c.stuff["down"], g3c.stuff["homo"],
g3c.stuff["E0"], g3c.stuff["einf"], -g3c.stuff["eo"])
# Define a few useful terms
E = ninf^(no)
I5 = e12345
I3 = e123
Performance of mathematically idiomatic Clifford algorithms¶
By default the Clifford library sacrifices performance for syntactic convenience.
Consider a function that applies a rotor to a multivector:
[3]:
def apply_rotor(R,mv):
return R*mv*~R
We will define a rotor that takes one line to another:
[4]:
line_one = (up(0)^up(e1)^ninf).normal()
line_two = (up(0)^up(e2)^ninf).normal()
R = 1 + line_two*line_one
Check that this works
[5]:
print(line_two)
print(apply_rotor(R,line_one).normal())
(1.0^e245)
(1.0^e245)
We would like to improve the speed of our algorithm, first we will profile it and see where it spends its time
[6]:
#%%prun -s cumtime
#for i in range(1000000):
# apply_rotor(R,line_one)
An example profile output from running this notebook on the author’s laptop is as follows:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 66.290 66.290 {built-in method builtins.exec}
1 0.757 0.757 66.290 66.290 <string>:2(<module>)
1000000 3.818 0.000 65.534 0.000 <ipython-input-13-70a01003bf51>:1(apply_rotor)
2000000 9.269 0.000 55.641 0.000 __init__.py:751(__mul__)
2000000 3.167 0.000 29.900 0.000 __init__.py:717(_checkOther)
2000000 1.371 0.000 19.906 0.000 __init__.py:420(__ne__)
2000000 6.000 0.000 18.535 0.000 numeric.py:2565(array_equal)
2000000 10.505 0.000 10.505 0.000 __init__.py:260(mv_mult)
We can see that the function spends almost all of its time in __mul__
and within __mul__
it spends most of its time in _checkOther
. In fact it only spends a small fraction of its time in mv_mult
which does the numerical multivector multiplication. To write more performant code we need to strip away the high level abstractions and deal with the underlying representations of the blade component data.
Canonical blade coefficient representation in Clifford¶
In Clifford a multivector is internally represented as a numpy array of the coefficients of the canonical basis vectors, they are arranged in order of grade. So for our 4,1 algebra the first element is the scalar part, the next 5 are the vector coefficients, the next 10 are the bivectors, the next 10 the triectors, the next 5 the quadvectors and the final value is the pseudoscalar coefficient.
[7]:
(5.0*e1 - e2 + e12 + e135 + np.pi*e1234).value
[7]:
array([ 0. , 5. , -1. , 0. , 0. ,
0. , 1. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
1. , 0. , 0. , 0. , 0. ,
0. , 3.14159265, 0. , 0. , 0. ,
0. , 0. ])
Exploiting blade representation to write a fast function¶
We can rewrite our rotor application function using the functions that the layout exposes for operations on the numpy arrays themselves.
[8]:
def apply_rotor_faster(R,mv):
return layout.MultiVector(layout.gmt_func(R.value,layout.gmt_func(mv.value,layout.adjoint_func(R.value))) )
[9]:
#%%prun -s cumtime
#for i in range(1000000):
# apply_rotor_faster(R,line_one)
This gives a much faster function
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 19.567 19.567 {built-in method builtins.exec}
1 0.631 0.631 19.567 19.567 <string>:2(<module>)
1000000 7.373 0.000 18.936 0.000 <ipython-input-35-6a5344d83bdb>:1(apply_rotor_faster)
2000000 9.125 0.000 9.125 0.000 __init__.py:260(mv_mult)
1000000 1.021 0.000 1.619 0.000 __init__.py:677(__init__)
1000000 0.819 0.000 0.819 0.000 __init__.py:244(adjoint_func)
We have successfully skipped past the higher level checks on the multivectors while maintaining exactly the same function signature. It is important to check that we still have the correct answer:
[10]:
print(line_two)
print(apply_rotor_faster(R,line_one).normal())
(1.0^e245)
(1.0^e245)
The performance improvements gained by rewriting our function are significant but it comes at the cost of readability.
By loading the layouts gmt_func and adjoint_func into the global namespace before the function is defined and separating the value operations from the multivector wrapper we can make our code more concise.
[11]:
gmt_func = layout.gmt_func
adjoint_func = layout.adjoint_func
def apply_rotor_val(R_val,mv_val):
return gmt_func(R_val,gmt_func(mv_val,adjoint_func(R_val)))
def apply_rotor_wrapped(R,mv):
return cf.MultiVector(layout,apply_rotor_val(R.value,mv.value))
[12]:
#%%prun -s cumtime
#for i in range(1000000):
# apply_rotor_wrapped(R,line_one)
The time cost is essentially the same, there is probably some minor overhead from the function call itself
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 19.621 19.621 {built-in method builtins.exec}
1 0.557 0.557 19.621 19.621 <string>:2(<module>)
1000000 1.421 0.000 19.064 0.000 <ipython-input-38-a1e0b5c53cdc>:7(apply_rotor_wrapped)
1000000 6.079 0.000 16.033 0.000 <ipython-input-38-a1e0b5c53cdc>:4(apply_rotor_val)
2000000 9.154 0.000 9.154 0.000 __init__.py:260(mv_mult)
1000000 1.017 0.000 1.610 0.000 __init__.py:677(__init__)
1000000 0.800 0.000 0.800 0.000 __init__.py:244(adjoint_func)
[13]:
print(line_two)
print(apply_rotor_wrapped(R,line_one).normal())
(1.0^e245)
(1.0^e245)
The additional advantage of splitting the function like this is that the numba JIT compiler can reason about the memory layout of numpy arrays in no python mode as long as no pure python objects are operated upon within the function. This means we can JIT our function that operates on the value directly.
[14]:
@numba.njit
def apply_rotor_val_numba(R_val,mv_val):
return gmt_func(R_val,gmt_func(mv_val,adjoint_func(R_val)))
def apply_rotor_wrapped_numba(R,mv):
return cf.MultiVector(layout,apply_rotor_val_numba(R.value,mv.value))
[15]:
#%%prun -s cumtime
#for i in range(1000000):
# apply_rotor_wrapped_numba(R,line_one)
This gives a small improvement in performance but more importantly it allows us to write larger functions that also use the jitted apply_rotor_val_numba
and are themselves jitted.
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 16.033 16.033 {built-in method builtins.exec}
1 0.605 0.605 16.033 16.033 <string>:2(<module>)
1000000 2.585 0.000 15.428 0.000 <ipython-input-42-1142126d93ca>:5(apply_rotor_wrapped_numba)
1000000 8.606 0.000 8.606 0.000 <ipython-input-42-1142126d93ca>:1(apply_rotor_val_numba)
1 0.000 0.000 2.716 2.716 dispatcher.py:294(_compile_for_args)
7/1 0.000 0.000 2.716 2.716 dispatcher.py:554(compile)
Composing larger functions¶
By chaining together functions that operate on the value arrays of multivectors it is easy to construct fast and readable code
[16]:
I5_val = I5.value
omt_func = layout.omt_func
def dual_mv(mv):
return -I5*mv
def meet_unwrapped(mv_a,mv_b):
return -dual_mv(dual_mv(mv_a)^dual_mv(mv_b))
@numba.njit
def dual_val(mv_val):
return -gmt_func(I5_val,mv_val)
@numba.njit
def meet_val(mv_a_val,mv_b_val):
return -dual_val( omt_func( dual_val(mv_a_val) , dual_val(mv_b_val)) )
def meet_wrapped(mv_a,mv_b):
return cf.layout.MultiVector(meet_val(mv_a.value, mv_b.value))
sphere = (up(0)^up(e1)^up(e2)^up(e3)).normal()
print(sphere.meet(line_one).normal().normal())
print(meet_unwrapped(sphere,line_one).normal())
print(meet_wrapped(line_one,sphere).normal())
(1.0^e14) - (1.0^e15) - (1.0^e45)
(1.0^e14) - (1.0^e15) - (1.0^e45)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [16], in <module>
22 print(sphere.meet(line_one).normal().normal())
23 print(meet_unwrapped(sphere,line_one).normal())
---> 24 print(meet_wrapped(line_one,sphere).normal())
Input In [16], in meet_wrapped(mv_a, mv_b)
18 def meet_wrapped(mv_a,mv_b):
---> 19 return cf.layout.MultiVector(meet_val(mv_a.value, mv_b.value))
AttributeError: module 'clifford' has no attribute 'layout'
[17]:
#%%prun -s cumtime
#for i in range(100000):
# meet_unwrapped(sphere,line_one)
ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 13.216 13.216 {built-in method builtins.exec} 1 0.085 0.085 13.216 13.216 <string>:2(<module>) 100000 0.418 0.000 13.131 0.000 <ipython-input-98-f91457c8741a>:7(meet_unwrapped) 300000 0.681 0.000 9.893 0.000 <ipython-input-98-f91457c8741a>:4(dual_mv) 300000 1.383 0.000 8.127 0.000 __init__.py:751(__mul__) 400000 0.626 0.000 5.762 0.000 __init__.py:717(_checkOther) 400000 0.270 0.000 3.815 0.000 __init__.py:420(__ne__) 400000 1.106 0.000 3.544 0.000 numeric.py:2565(array_equal) 100000 0.460 0.000 2.439 0.000 __init__.py:783(__xor__) 800000 0.662 0.000 2.053 0.000 __init__.py:740(_newMV) 400000 1.815 0.000 1.815 0.000 __init__.py:260(mv_mult)
[18]:
#%%prun -s cumtime
#for i in range(100000):
# meet_wrapped(sphere,line_one)
ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 1.951 1.951 {built-in method builtins.exec} 1 0.063 0.063 1.951 1.951 <string>:2(<module>) 100000 0.274 0.000 1.888 0.000 <ipython-input-98-f91457c8741a>:18(meet_wrapped) 100000 1.448 0.000 1.448 0.000 <ipython-input-98-f91457c8741a>:14(meet_val) 100000 0.096 0.000 0.166 0.000 __init__.py:677(__init__)
Algorithms exploiting known sparseness of MultiVector value array¶
The standard multiplication generator function for two general multivectors is as follows:
[19]:
def get_mult_function(mult_table,n_dims):
'''
Returns a function that implements the mult_table on two input multivectors
'''
non_zero_indices = mult_table.nonzero()
k_list = non_zero_indices[0]
l_list = non_zero_indices[1]
m_list = non_zero_indices[2]
mult_table_vals = np.array([mult_table[k,l,m] for k,l,m in np.transpose(non_zero_indices)],dtype=int)
@numba.njit
def mv_mult(value,other_value):
output = np.zeros(n_dims)
for ind,k in enumerate(k_list):
l = l_list[ind]
m = m_list[ind]
output[l] += value[k]*mult_table_vals[ind]*other_value[m]
return output
return mv_mult
There are however instances in which we might be able to use the known sparseness of the input data value representation to speed up the operations. For example, in Cl(4,1) rotors only contain even grade blades and we can therefore remove all the operations accessing odd grade objects.
[20]:
def get_grade_from_index(index_in):
if index_in == 0:
return 0
elif index_in < 6:
return 1
elif index_in < 16:
return 2
elif index_in < 26:
return 3
elif index_in < 31:
return 4
elif index_in == 31:
return 5
else:
raise ValueError('Index is out of multivector bounds')
def get_sparse_mult_function(mult_table,n_dims,grades_a,grades_b):
'''
Returns a function that implements the mult_table on two input multivectors
'''
non_zero_indices = mult_table.nonzero()
k_list = non_zero_indices[0]
l_list = non_zero_indices[1]
m_list = non_zero_indices[2]
mult_table_vals = np.array([mult_table[k,l,m] for k,l,m in np.transpose(non_zero_indices)],dtype=int)
# Now filter out the sparseness
filter_mask = np.zeros(len(k_list), dtype=bool)
for i in range(len(filter_mask)):
if get_grade_from_index(k_list[i]) in grades_a:
if get_grade_from_index(m_list[i]) in grades_b:
filter_mask[i] = 1
k_list = k_list[filter_mask]
l_list = l_list[filter_mask]
m_list = m_list[filter_mask]
mult_table_vals = mult_table_vals[filter_mask]
@numba.njit
def mv_mult(value,other_value):
output = np.zeros(n_dims)
for ind,k in enumerate(k_list):
l = l_list[ind]
m = m_list[ind]
output[l] += value[k]*mult_table_vals[ind]*other_value[m]
return output
return mv_mult
[21]:
left_rotor_mult = get_sparse_mult_function(layout.gmt,layout.gaDims,[0,2,4],[0,1,2,3,4,5])
right_rotor_mult = get_sparse_mult_function(layout.gmt,layout.gaDims,[0,1,2,3,4,5],[0,2,4])
@numba.njit
def sparse_apply_rotor_val(R_val,mv_val):
return left_rotor_mult(R_val,right_rotor_mult(mv_val,adjoint_func(R_val)))
def sparse_apply_rotor(R,mv):
return cf.MultiVector(layout,sparse_apply_rotor_val(R.value,mv.value))
[22]:
#%%prun -s cumtime
#for i in range(1000000):
# sparse_apply_rotor(R,line_one)
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 9.490 9.490 {built-in method builtins.exec}
1 0.624 0.624 9.489 9.489 <string>:2(<module>)
1000000 2.684 0.000 8.865 0.000 <ipython-input-146-f75aae3ce595>:8(sparse_apply_rotor)
1000000 4.651 0.000 4.651 0.000 <ipython-input-146-f75aae3ce595>:4(sparse_apply_rotor_val)
1000000 0.934 0.000 1.530 0.000 __init__.py:677(__init__)
1000000 0.596 0.000 0.596 0.000 {built-in method numpy.core.multiarray.array}
[23]:
print(line_two)
print(sparse_apply_rotor(R,line_one).normal())
(1.0^e245)
(1.0^e245)
We can do the same with the meet operation that we defined earlier if we know what grade objects we are meeting
[24]:
left_pseudo_mult = get_sparse_mult_function(layout.gmt,layout.gaDims,[5],[0,1,2,3,4,5])
sparse_omt_2_1 = get_sparse_mult_function(layout.omt,layout.gaDims,[2],[1])
@numba.njit
def dual_sparse_val(mv_val):
return -left_pseudo_mult(I5_val,mv_val)
@numba.njit
def meet_sparse_3_4_val(mv_a_val,mv_b_val):
return -dual_sparse_val( sparse_omt_2_1( dual_sparse_val(mv_a_val) , dual_sparse_val(mv_b_val)) )
def meet_sparse_3_4(mv_a,mv_b):
return cf.layout.MultiVector(meet_sparse_3_4_val(mv_a.value, mv_b.value))
print(sphere.meet(line_one).normal().normal())
print(meet_sparse_3_4(line_one,sphere).normal())
(1.0^e14) - (1.0^e15) - (1.0^e45)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [24], in <module>
13 return cf.layout.MultiVector(meet_sparse_3_4_val(mv_a.value, mv_b.value))
15 print(sphere.meet(line_one).normal().normal())
---> 16 print(meet_sparse_3_4(line_one,sphere).normal())
Input In [24], in meet_sparse_3_4(mv_a, mv_b)
12 def meet_sparse_3_4(mv_a,mv_b):
---> 13 return cf.layout.MultiVector(meet_sparse_3_4_val(mv_a.value, mv_b.value))
AttributeError: module 'clifford' has no attribute 'layout'
[25]:
#%%prun -s cumtime
#for i in range(100000):
# meet_sparse_3_4(line_one,sphere)
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.725 0.725 {built-in method builtins.exec}
1 0.058 0.058 0.725 0.725 <string>:2(<module>)
100000 0.252 0.000 0.667 0.000 <ipython-input-156-f346d0563682>:12(meet_sparse_3_4)
100000 0.267 0.000 0.267 0.000 <ipython-input-156-f346d0563682>:8(meet_sparse_3_4_val)
100000 0.088 0.000 0.148 0.000 __init__.py:677(__init__)
Future work on performance¶
Investigate efficient operations on containers of large numbers of multivectors.
Possibly investigate http://numba.pydata.org/numba-doc/0.13/CUDAJit.html for larger algebras/other areas in which GPU memory latency will not be such a large factor, ie, lots of bulk parallel numerical operations.
[ ]: