Elementwise and vector operations in Python
Download original file: 5_elementwise_and_vector_operations.ipynb
View original file in nbviewer: 5_elementwise_and_vector_operations.ipynb
Speeding up Python code using elementwise computation
Standard Python is fast enough to satisfy the computational
needs of most users. However, some problems require even more
rapid computation. One technique for speeding up computational
speed is performing calculations elementwise
using arrays.
To illustrate how effective this technique can be, we will use a function to help us compare the computational time of performing the same calculation different ways.
import time
def time_it(msg='', verbose=True, start=False):
'''
Used to output the time difference between calls to time_it.
Parameters
==========
msg : str (optional)
This message will be printed before the time value in seconds
verbose: boolean (optional, default=True)
If True, the function will print out the message and time.
start : boolean
If True, an initial time is stored in a global variable.
Returns
=======
time_difference : float
The time difference in seconds as a floating point value.
'''
def _set_start():
global _start_time_for_timing
_start_time_for_timing = time.time()
if verbose and start:
print('Setting initial time to {} seconds past epoch.'.format(_start_time_for_timing))
if start:
_set_start()
elif '_start_time_for_timing' in globals():
dt = time.time() - _start_time_for_timing
if verbose:
print('{} {:.4f} seconds'.format(msg, dt))
_set_start()
return dt
else:
raise UserWarning("Start time not set.")
Use NumPy to perform elementwise calculations
The Python package NumPy is widely used by the Python
community to perform both elementwise and matrix
calculations in Python. The NumPy package provides an
array type, numpy.ndarray,
that redefines almost all basic Python operations such as +
and +=
to work
on
entire arrays instead of single objects.
Now let us see how this can speed up calculation time:
import numpy as np
# Set the array length
N = 5000000
python_list = [1] * N
numpy_array = np.ones(N)
print('Add 1 to each element:\n')
time_it(start=True, verbose=False)
for index, item in enumerate(python_list):
python_list[index] += 1
dtA = time_it('A) Using a Python loop:')
python_list = [item + 1 for item in python_list]
dtB = time_it('B) Using a Python list comprehension:')
python_list = [1] * N
python_list = [item + 1 for item in python_list]
dtC = time_it('C) Using a Python list comprehension (including list creation):')
python_list = list(map(lambda a: a + 1, python_list))
dtD = time_it('D) Using Python iterators:')
numpy_array = numpy_array + 1
dtE = time_it('E) The elementwise way using numpy:')
numpy_array = np.ones(N)
numpy_array = numpy_array + 1
dtF = time_it('F) The elementwise way using numpy (including array creation):')
numpy_array += 1
dtG = time_it('G) The fancy elementwise way using numpy:')
numpy_array = np.ones(N)
numpy_array += 1
dtH = time_it('H) The fancy elementwise way using numpy (including array creation):')
print('\nConclusion:\n\tThe fastest Python method is using a list comprehension: {:.4f} seconds.'.format(dtB))
print('\nConclusion:\n\tThe fastest elementwise method is the fancy way (with +=): {:.4f} seconds.'.format(dtG))
print('\nConclusion:\n\tThe fastest elementwise approach including array creation is {:.0f} times faster'.format(dtC/dtH) +\
'\n\tthan the fastest native Python approach including list creation.')
print('\nConclusion:\n\tThe fastest elementwise approach is {:.0f} times faster'.format(dtB/dtG) +\
'\n\tthan the fastest native Python approach.')
print('\nFinal conclusion:\n\tElementwise operations can speed up your code significantly and the speed up'
'\n\tis more dramatic if the array object is used for more than one elementwise operation.')
Add 1 to each element:
A) Using a Python loop: 1.2347 seconds
B) Using a Python list comprehension: 0.3354 seconds
C) Using a Python list comprehension (including list creation): 0.3514 seconds
D) Using Python iterators: 0.8107 seconds
E) The elementwise way using numpy: 0.0313 seconds
F) The elementwise way using numpy (including array creation): 0.0528 seconds
G) The fancy elementwise way using numpy: 0.0102 seconds
H) The fancy elementwise way using numpy (including array creation): 0.0416 seconds
Conclusion:
The fastest Python method is using a list comprehension: 0.3354 seconds.
Conclusion:
The fastest elementwise method is the fancy way (with +=): 0.0102 seconds.
Conclusion:
The fastest elementwise approach including array creation is 8 times faster
than the fastest native Python approach including list creation.
Conclusion:
The fastest elementwise approach is 33 times faster
than the fastest native Python approach.
Final conclusion:
Elementwise operations can speed up your code significantly and the speed up
is more dramatic if the array object is used for more than one elementwise operation.
A few examples of how one can use elementwise operations
These look like normal Python expressions, and thereby make your code more readable:
N = 10
print('Make {} spheres with random radii between 0 and 1:\n'.format(N))
radii = np.random.rand(N)
volume = 4/3*3.14159*radii**3
for v, r in zip(volume, radii):
print('\tVolume: {:<8.5f} Radii: {:.3f}'.format(v, r))
Make 10 spheres with random radii between 0 and 1:
Volume: 3.75505 Radii: 0.964
Volume: 0.15448 Radii: 0.333
Volume: 0.16575 Radii: 0.341
Volume: 2.29283 Radii: 0.818
Volume: 0.03711 Radii: 0.207
Volume: 0.63781 Radii: 0.534
Volume: 2.43708 Radii: 0.835
Volume: 2.90491 Radii: 0.885
Volume: 0.01744 Radii: 0.161
Volume: 1.85815 Radii: 0.763
N = 10
print('\nScale and offset {} random xyz coordinates:\n'.format(N))
random_coords = np.random.rand(N, 3)
# Offset all coordinates to span +/- 0.5 of the origin
offset_coords = random_coords - 0.5
# Scale all coordinates to lie between -1 and +1
scale_coords = offset_coords / 0.5
line_tmpl = '{:<20}{:<20}{:<20}'
coord_tmpl = '({:.2f}, {:.2f}, {:.2f})'
print(line_tmpl.format('Random (0 to 1)', 'Offset (-0.5 to 0.5)', 'Scaled (-1 to 1)'))
for rc, oc, sc in zip(random_coords, offset_coords, scale_coords):
print(line_tmpl.format(coord_tmpl.format(*rc),
coord_tmpl.format(*oc),
coord_tmpl.format(*sc)))
Scale and offset 10 random xyz coordinates:
Random (0 to 1) Offset (-0.5 to 0.5)Scaled (-1 to 1)
(0.67, 0.89, 0.28) (0.17, 0.39, -0.22) (0.33, 0.77, -0.44)
(0.16, 0.20, 0.38) (-0.34, -0.30, -0.12)(-0.67, -0.59, -0.24)
(0.49, 0.84, 0.20) (-0.01, 0.34, -0.30)(-0.02, 0.67, -0.60)
(0.91, 0.09, 0.88) (0.41, -0.41, 0.38) (0.82, -0.83, 0.75)
(0.94, 0.40, 0.35) (0.44, -0.10, -0.15)(0.88, -0.20, -0.30)
(0.64, 0.23, 0.71) (0.14, -0.27, 0.21) (0.29, -0.53, 0.41)
(0.15, 0.09, 0.50) (-0.35, -0.41, -0.00)(-0.70, -0.82, -0.01)
(0.62, 0.52, 0.61) (0.12, 0.02, 0.11) (0.24, 0.04, 0.23)
(0.31, 0.22, 0.06) (-0.19, -0.28, -0.44)(-0.37, -0.57, -0.88)
(0.99, 0.83, 0.89) (0.49, 0.33, 0.39) (0.98, 0.66, 0.78)
Now we should look at actual vector operations using coordinates
The vector operations below are provided to illustrate the difference between these and the elementwise operations above.
coord_A = np.array([1,2,3])
coord_B = np.array([6,5,4])
print('Operations on coordinates A{} and B{}:\n'.format(coord_A, coord_B))
print('Dot product: {}'.format(coord_A.dot(coord_B)))
print('Distance between points: {:.3f}'.format(np.linalg.norm(coord_A - coord_B)))
print('Cross product: {}'.format(np.cross(coord_A, coord_B)))
print('Scalar product (elementwise operation): {}'.format(coord_A * coord_B))
Operations on coordinates A[1 2 3] and B[6 5 4]:
Dot product: 28
Distance between points: 5.916
Cross product: [-7 14 -7]
Scalar product (elementwise operation): [ 6 10 12]
Real world example calculation
Lucas, Mia, Leon and Hannah each bought different quantities of three different chocolate varieties.
The brand names are A, B and C.
Lucas bought 100 g of brand A, 175 g of brand B and 210 of C. Mia choose 90 g of A, 160 g of B and 150 g of C. Leon bought 200 g of A, 50 of B and 100 g of C. Hannah didn’t purchase brand B, but did purchase 310 g of C and 120 g of A.
A costs 2.98€ per 100 g. B costs 3.90€ per 100 g. C costs 1.99€ per 100 g.
To calculate how much each of them paid for their chocolates, we can use Python, NumPy and Matrix multiplication.
import numpy as np
# Each column is a chocolate variety and each row is a person.
mass_matrix = np.array([[100, 175, 210],
[90, 160, 150],
[200, 50, 100],
[120, 0, 310]])
print('Numpy mass_matrix:')
print(mass_matrix)
print()
# The columns are the same chocolate varieties as the mass_matrix.
cost_per_100g = np.array([2.98, 3.90, 1.99])
print('Numpy cost matrix:')
print(cost_per_100g)
print()
# Matrix calculation using numpy:
money_spent_in_cents = np.dot(mass_matrix, cost_per_100g)
money_spent_in_euros = money_spent_in_cents / np.array([100] * 4)
print('Numpy solution:')
print(money_spent_in_euros)
print()
Numpy mass_matrix:
[[100 175 210]
[ 90 160 150]
[200 50 100]
[120 0 310]]
Numpy cost matrix:
[ 2.98 3.9 1.99]
Numpy solution:
[ 13.984 11.907 9.9 9.745]
Using pandas for the same task
To supplement NumPy matrix calculation, we can use pandas for better visualization of the answer:
import pandas as pd
# Using a pandas.DataFrame object, we can keep track of the names of columns and rows:
chocolate_brands = ['A', 'B', 'C']
names_of_people = ['Lucas', 'Mia', 'Leon', 'Hannah']
# pandas.DataFrame version of mass_matrix:
mass_df = pd.DataFrame(mass_matrix, columns=chocolate_brands, index=names_of_people)
print('Pandas mass matrix:')
print(mass_df)
print()
# pandas.DataFrame version of cost matrix:
cost_df = pd.DataFrame(cost_per_100g, index=chocolate_brands, columns=['Cost in euros per 100g'])
print('Pandas cost matrix:')
print(cost_df)
print()
# Matrix calculation using pandas:
# NB!: The columns of the left matrix must match the index values of the right matrix.
money_spent_df = mass_df.dot(cost_df)
money_spent_df.columns = ['Money spent in euro cents']
money_spent_df['Money spent in euros'] = money_spent_df['Money spent in euro cents'] / 100
print('Pandas solution:')
print(money_spent_df)
Pandas mass matrix:
A B C
Lucas 100 175 210
Mia 90 160 150
Leon 200 50 100
Hannah 120 0 310
Pandas cost matrix:
Cost in euros per 100g
A 2.98
B 3.90
C 1.99
Pandas solution:
Money spent in euro cents Money spent in euros
Lucas 1398.4 13.984
Mia 1190.7 11.907
Leon 990.0 9.900
Hannah 974.5 9.745
from __future__ import print_function
import os
os.path.join('hello', 'world', '')
'hello/world/'