Elementwise and vector operations in Python

Download original file: 5_elementwise_and_vector_operations.ipynb

View original file in nbviewer: 5_elementwise_and_vector_operations.ipynb

Speeding up Python code using elementwise computation

Standard Python is fast enough to satisfy the computational needs of most users. However, some problems require even more rapid computation. One technique for speeding up computational speed is performing calculations elementwise using arrays.

To illustrate how effective this technique can be, we will use a function to help us compare the computational time of performing the same calculation different ways.

import time

def time_it(msg='', verbose=True, start=False):
    '''
    Used to output the time difference between calls to time_it.

    Parameters
    ==========
    msg : str (optional)
          This message will be printed before the time value in seconds

    verbose: boolean (optional, default=True)
             If True, the function will print out the message and time.

    start : boolean
            If True, an initial time is stored in a global variable.

    Returns
    =======
    time_difference : float
                      The time difference in seconds as a floating point value.
    '''
    def _set_start():
        global _start_time_for_timing
        _start_time_for_timing = time.time() 
        if verbose and start:
            print('Setting initial time to {} seconds past epoch.'.format(_start_time_for_timing))

    if start:
        _set_start()

    elif '_start_time_for_timing' in globals():
        dt = time.time() - _start_time_for_timing

        if verbose:
            print('{} {:.4f} seconds'.format(msg, dt))

        _set_start()
        return dt

    else:              
        raise UserWarning("Start time not set.")

Use NumPy to perform elementwise calculations

The Python package NumPy is widely used by the Python community to perform both elementwise and matrix calculations in Python. The NumPy package provides an array type, numpy.ndarray, that redefines almost all basic Python operations such as + and += to work on entire arrays instead of single objects.

Now let us see how this can speed up calculation time:

import numpy as np

# Set the array length
N = 5000000

python_list = [1] * N
numpy_array = np.ones(N)

print('Add 1 to each element:\n')

time_it(start=True, verbose=False)

for index, item in enumerate(python_list):
    python_list[index] += 1
dtA = time_it('A) Using a Python loop:')

python_list = [item + 1 for item in python_list]
dtB = time_it('B) Using a Python list comprehension:')

python_list = [1] * N
python_list = [item + 1 for item in python_list]
dtC = time_it('C) Using a Python list comprehension (including list creation):')

python_list = list(map(lambda a: a + 1, python_list))
dtD = time_it('D) Using Python iterators:')

numpy_array = numpy_array + 1
dtE = time_it('E) The elementwise way using numpy:')

numpy_array = np.ones(N)
numpy_array = numpy_array + 1
dtF = time_it('F) The elementwise way using numpy (including array creation):')

numpy_array += 1
dtG = time_it('G) The fancy elementwise way using numpy:')

numpy_array = np.ones(N)
numpy_array += 1
dtH = time_it('H) The fancy elementwise way using numpy (including array creation):')

print('\nConclusion:\n\tThe fastest Python method is using a list comprehension: {:.4f} seconds.'.format(dtB))

print('\nConclusion:\n\tThe fastest elementwise method is the fancy way (with +=): {:.4f} seconds.'.format(dtG))

print('\nConclusion:\n\tThe fastest elementwise approach including array creation is {:.0f} times faster'.format(dtC/dtH) +\
      '\n\tthan the fastest native Python approach including list creation.')

print('\nConclusion:\n\tThe fastest elementwise approach is {:.0f} times faster'.format(dtB/dtG) +\
      '\n\tthan the fastest native Python approach.')

print('\nFinal conclusion:\n\tElementwise operations can speed up your code significantly and the speed up' 
      '\n\tis more dramatic if the array object is used for more than one elementwise operation.')

Add 1 to each element:

A) Using a Python loop: 1.2347 seconds
B) Using a Python list comprehension: 0.3354 seconds
C) Using a Python list comprehension (including list creation): 0.3514 seconds
D) Using Python iterators: 0.8107 seconds
E) The elementwise way using numpy: 0.0313 seconds
F) The elementwise way using numpy (including array creation): 0.0528 seconds
G) The fancy elementwise way using numpy: 0.0102 seconds
H) The fancy elementwise way using numpy (including array creation): 0.0416 seconds

Conclusion:
    The fastest Python method is using a list comprehension: 0.3354 seconds.

Conclusion:
    The fastest elementwise method is the fancy way (with +=): 0.0102 seconds.

Conclusion:
    The fastest elementwise approach including array creation is 8 times faster
    than the fastest native Python approach including list creation.

Conclusion:
    The fastest elementwise approach is 33 times faster
    than the fastest native Python approach.

Final conclusion:
    Elementwise operations can speed up your code significantly and the speed up
    is more dramatic if the array object is used for more than one elementwise operation.

A few examples of how one can use elementwise operations

These look like normal Python expressions, and thereby make your code more readable:

N = 10

print('Make {} spheres with random radii between 0 and 1:\n'.format(N))
radii = np.random.rand(N)
volume = 4/3*3.14159*radii**3

for v, r in zip(volume, radii):
    print('\tVolume: {:<8.5f} Radii: {:.3f}'.format(v, r))

Make 10 spheres with random radii between 0 and 1:

    Volume: 3.75505  Radii: 0.964
    Volume: 0.15448  Radii: 0.333
    Volume: 0.16575  Radii: 0.341
    Volume: 2.29283  Radii: 0.818
    Volume: 0.03711  Radii: 0.207
    Volume: 0.63781  Radii: 0.534
    Volume: 2.43708  Radii: 0.835
    Volume: 2.90491  Radii: 0.885
    Volume: 0.01744  Radii: 0.161
    Volume: 1.85815  Radii: 0.763



N = 10

print('\nScale and offset {} random xyz coordinates:\n'.format(N))
random_coords = np.random.rand(N, 3)

# Offset all coordinates to span +/- 0.5 of the origin
offset_coords = random_coords - 0.5

# Scale all coordinates to lie between -1 and +1
scale_coords = offset_coords / 0.5

line_tmpl = '{:<20}{:<20}{:<20}'
coord_tmpl = '({:.2f}, {:.2f}, {:.2f})'

print(line_tmpl.format('Random (0 to 1)', 'Offset (-0.5 to 0.5)', 'Scaled (-1 to 1)'))
for rc, oc, sc in zip(random_coords, offset_coords, scale_coords):
    print(line_tmpl.format(coord_tmpl.format(*rc),              
                           coord_tmpl.format(*oc),
                           coord_tmpl.format(*sc)))


Scale and offset 10 random xyz coordinates:

Random (0 to 1)     Offset (-0.5 to 0.5)Scaled (-1 to 1)    
(0.67, 0.89, 0.28)  (0.17, 0.39, -0.22) (0.33, 0.77, -0.44) 
(0.16, 0.20, 0.38)  (-0.34, -0.30, -0.12)(-0.67, -0.59, -0.24)
(0.49, 0.84, 0.20)  (-0.01, 0.34, -0.30)(-0.02, 0.67, -0.60)
(0.91, 0.09, 0.88)  (0.41, -0.41, 0.38) (0.82, -0.83, 0.75) 
(0.94, 0.40, 0.35)  (0.44, -0.10, -0.15)(0.88, -0.20, -0.30)
(0.64, 0.23, 0.71)  (0.14, -0.27, 0.21) (0.29, -0.53, 0.41) 
(0.15, 0.09, 0.50)  (-0.35, -0.41, -0.00)(-0.70, -0.82, -0.01)
(0.62, 0.52, 0.61)  (0.12, 0.02, 0.11)  (0.24, 0.04, 0.23)  
(0.31, 0.22, 0.06)  (-0.19, -0.28, -0.44)(-0.37, -0.57, -0.88)
(0.99, 0.83, 0.89)  (0.49, 0.33, 0.39)  (0.98, 0.66, 0.78)

Now we should look at actual vector operations using coordinates

The vector operations below are provided to illustrate the difference between these and the elementwise operations above.

coord_A = np.array([1,2,3])
coord_B = np.array([6,5,4])

print('Operations on coordinates A{} and B{}:\n'.format(coord_A, coord_B))

print('Dot product: {}'.format(coord_A.dot(coord_B)))
print('Distance between points: {:.3f}'.format(np.linalg.norm(coord_A - coord_B)))
print('Cross product: {}'.format(np.cross(coord_A, coord_B)))
print('Scalar product (elementwise operation): {}'.format(coord_A * coord_B))

Operations on coordinates A[1 2 3] and B[6 5 4]:

Dot product: 28
Distance between points: 5.916
Cross product: [-7 14 -7]
Scalar product (elementwise operation): [ 6 10 12]

Real world example calculation

Lucas, Mia, Leon and Hannah each bought different quantities of three different chocolate varieties.

The brand names are A, B and C.

Lucas bought 100 g of brand A, 175 g of brand B and 210 of C. Mia choose 90 g of A, 160 g of B and 150 g of C. Leon bought 200 g of A, 50 of B and 100 g of C. Hannah didn’t purchase brand B, but did purchase 310 g of C and 120 g of A.

A costs 2.98€ per 100 g. B costs 3.90€ per 100 g. C costs 1.99€ per 100 g.

To calculate how much each of them paid for their chocolates, we can use Python, NumPy and Matrix multiplication.

import numpy as np

# Each column is a chocolate variety and each row is a person.
mass_matrix = np.array([[100, 175, 210], 
                        [90, 160, 150], 
                        [200, 50, 100], 
                        [120, 0, 310]])
print('Numpy mass_matrix:')
print(mass_matrix)
print()

# The columns are the same chocolate varieties as the mass_matrix.
cost_per_100g = np.array([2.98, 3.90, 1.99])
print('Numpy cost matrix:')
print(cost_per_100g)
print()

# Matrix calculation using numpy:
money_spent_in_cents = np.dot(mass_matrix, cost_per_100g)
money_spent_in_euros = money_spent_in_cents / np.array([100] * 4)
print('Numpy solution:')
print(money_spent_in_euros)
print()

Numpy mass_matrix:
[[100 175 210]
 [ 90 160 150]
 [200  50 100]
 [120   0 310]]

Numpy cost matrix:
[ 2.98  3.9   1.99]

Numpy solution:
[ 13.984  11.907   9.9     9.745]

Using pandas for the same task

To supplement NumPy matrix calculation, we can use pandas for better visualization of the answer:

import pandas as pd

# Using a pandas.DataFrame object, we can keep track of the names of columns and rows:
chocolate_brands = ['A', 'B', 'C']
names_of_people = ['Lucas', 'Mia', 'Leon', 'Hannah']

# pandas.DataFrame version of mass_matrix:
mass_df = pd.DataFrame(mass_matrix, columns=chocolate_brands, index=names_of_people)
print('Pandas mass matrix:')
print(mass_df)
print()

# pandas.DataFrame version of cost matrix:
cost_df = pd.DataFrame(cost_per_100g, index=chocolate_brands, columns=['Cost in euros per 100g'])
print('Pandas cost matrix:')
print(cost_df)
print()

# Matrix calculation using pandas:
# NB!: The columns of the left matrix must match the index values of the right matrix.
money_spent_df = mass_df.dot(cost_df)
money_spent_df.columns = ['Money spent in euro cents']
money_spent_df['Money spent in euros'] = money_spent_df['Money spent in euro cents'] / 100

print('Pandas solution:')
print(money_spent_df)

Pandas mass matrix:
          A    B    C
Lucas   100  175  210
Mia      90  160  150
Leon    200   50  100
Hannah  120    0  310

Pandas cost matrix:
   Cost in euros per 100g
A                    2.98
B                    3.90
C                    1.99

Pandas solution:
        Money spent in euro cents  Money spent in euros
Lucas                      1398.4                13.984
Mia                        1190.7                11.907
Leon                        990.0                 9.900
Hannah                      974.5                 9.745



from __future__ import print_function


import os


os.path.join('hello', 'world', '')




'hello/world/'