Understanding the Basics of NumPy

Subash Basnet
Bajra Technologies Blog
9 min readJan 1, 2020

--

Ndarray: The Heart of the Library

  • Numpy library is based on one main object ndarray (N-dimensional array)
  • Ndarray is multidimensional homogeneous (all items are the same type and same size) array.
  • the shape defines the tuple of integers giving the size of the array along each dimension.
  • dimensions are defined as axes
  • rank is the number of axes
  • size gives array length
>> import numpy as np>> a = np.array([
1,2,3
])
>> b = np.array([
[1,2,3],
[2,3,4]
])
>> c = np.array([
[[1,2,3],
[4,5,6],
[7,8,9]],
[[10,11,12],
[13,14,15],
[16,17,18]]
])
>> print(f'a.shape {a.shape} \nb.shape {b.shape} \nc.shape {c.shape}')a.shape (3,)
b.shape (2, 3)
c.shape (2, 3, 3)
>> print(f'a.ndim {a.ndim} \nb.ndim {b.ndim} \nc.ndim {c.ndim}')a.ndim 1
b.ndim 2
c.ndim 3
>> print(f'a.size {a.size} \nb.size {b.size} \nc.size {c.size}')a.size 3
b.size 6
c.size 18

By default, the array() function can associate the most suitable type according to the values contained in the sequence of lists or tuples. Actually, you can explicitly define the dtype using the dtype option as an argument of the function.

>> d = np.array([[1, 2, 3],[4, 5, 6]], dtype=complex)
>> d
array([[1.+0.j, 2.+0.j, 3.+0.j],
[4.+0.j, 5.+0.j, 6.+0.j]])

1. Intrinsic creation of Array

The zeros() function, for example, creates a full array of zeros with dimensions defined by the shape argument.

>> np.zeros((3,3))array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])

The ones() method creates an array full of ones in a very similar way.

>> np.ones((2,3,3))array([[[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]],
[[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]]])

The arange() method generates NumPy arrays with numerical sequences. You simply specify two arguments:

  • starting value and
  • final value.

Also, you can specify the third argument

  • the gap between one value and the next one in the sequence of values.

The reshape() function divides a linear array into different parts in the manner specified by the shape argument.

>> np.arange(20,50,5).reshape(3,2)array([[20, 25],
[30, 35],
[40, 45]])

The linspace() still takes

  • starting value
  • final value

But, the third argument,

  • defines the number of elements into which we want the interval to be split.
>> np.linspace(0,10,6).reshape(3,2)array([[ 0.,  2.],
[ 4., 6.],
[ 8., 10.]])

The random() function generates as many random elements specified in the argument.

>> np.random.random((2,3,3)) # Random (0.0 - 1.0)array([[[0.5243566 , 0.84181683, 0.3393915 ],
[0.8884298 , 0.30353638, 0.35253877],
[0.52712134, 0.28335799, 0.4991305 ]],
[[0.20654861, 0.61035835, 0.37098931],
[0.9099355 , 0.96222022, 0.60498038],
[0.11343631, 0.38183897, 0.94631226]]])
>> np.random.randn(2,3,3) # Normal Distributionarray([[[-0.48662817, 0.47170094, 0.49174862],
[ 0.56445077, -0.67791087, -0.49320828],
[-0.71970416, 0.49739164, 0.57248729]],
[[-0.70184875, 0.90284421, 0.69676968],
[-0.18445229, -0.38261537, -0.00978085],
[-1.90030073, 0.30344708, -0.39163467]]])
>> np.random.rand(2,3,3) # Random (0 - 1)array([[[0.17477309, 0.35581288, 0.38988341],
[0.67561688, 0.03666337, 0.05290062],
[0.24533978, 0.74405349, 0.0154386 ]],
[[0.72101685, 0.58234122, 0.6292936 ],
[0.41629685, 0.09728456, 0.12894201],
[0.15621351, 0.37102581, 0.92031595]]])
# Random Integers (1-100), 6 numbers
>> np.random.randint(1,100,6).reshape(3,2)
array([[29, 22],
[96, 30],
[25, 55]])

2. Matrix Product

>> A = np.arange(0,9).reshape(3,3)>> B = np.ones((3,3))# Way 1
>> np.dot(A,B)
array([[ 3., 3., 3.],
[12., 12., 12.],
[21., 21., 21.]])
# Way 2
>> A.dot(B)
array([[ 3., 3., 3.],
[12., 12., 12.],
[21., 21., 21.]])

3. Increment Decrement

>> B += 1
>> B
array([[2., 2., 2.],
[2., 2., 2.],
[2., 2., 2.]])
>> B -= 1
>> B
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
>> B *= 2
>> B
array([[2., 2., 2.],
[2., 2., 2.],
[2., 2., 2.]])

4. Universal Functions (ufunc)

A universal function or ufunc is a function operating on an array in an element-by-element fashion.

>> np.sqrt(B)array([[1.41421356, 1.41421356, 1.41421356],
[1.41421356, 1.41421356, 1.41421356],
[1.41421356, 1.41421356, 1.41421356]])
>> np.log(B)array([[0.69314718, 0.69314718, 0.69314718],
[0.69314718, 0.69314718, 0.69314718],
[0.69314718, 0.69314718, 0.69314718]])
>> np.sin(B)array([[0.90929743, 0.90929743, 0.90929743],
[0.90929743, 0.90929743, 0.90929743],
[0.90929743, 0.90929743, 0.90929743]])

5. Aggregate Functions

Aggregate functions perform an operation on a set of values and produce a single result.

>> Aarray([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>> A.sum()36>> A.min()0>> A.max()8>> A.mean()4.0>> A.std()2.581988897471611

6. Indexing, Slicing, and Iterating

>> print(f'A[0] -> {A[0]}')A[0] -> [0 1 2]
>> print(f'A[0][1] -> {A[0][1]}')
>> print(f'A[0,1] -> {A[0,1]}')
A[0][1] -> 1
A[0,1] -> 1

>> print(f'A[-1] -> {A[-1]}')
A[-1] -> [6 7 8]
>> print(f'A[-1][-1] -> {A[-1][-1]}')
>> print(f'A[-1,-1] -> {A[-1,-1]}')
A[-1][-1] -> 8
A[-1,-1] -> 8

# take 0th-2nd row. [0th][1st]

>> A[0:2]
array([[0, 1, 2],
[3, 4, 5]])

# take 0th-3rd rows, but every 2nd row starting from 0. [0th][2nd]

>> A[0:3:2]
array([[0, 1, 2],
[6, 7, 8]])

# take all rows, but every 2nd row starting from 0. [0th][2nd]

>> A[::2]
array([[0, 1, 2],
[6, 7, 8]])

# take rows 0th-2nd, column 0th-2nd, rows[0,1] column[0,1]

>> A[0:2,0:2]
array([[0, 1],
[3, 4]])

# take rows 0th & 2nd, column 0th-2nd, rows[0,2] column[0,1]

# (0,0),(0,1),(2,0),(2,1)
>> A[[0,2],0:2]
array([[0, 1],
[6, 7]])

7. Iterating

>> for row in A:
print(row)
[0 1 2]
[3 4 5]
[6 7 8]
>> for item in A.flat:
print(item)
0
1
2
3
4
5
6
7
8

If you want to launch an aggregate function that returns a value calculated for every single column or on every single row, there is an optimal way that leaves it to NumPy to manage the iteration: the apply_along_axis() function

# Aggregate Function
# axis = 0, elements by column
>> np.apply_along_axis(np.mean,axis=0,arr=A)
array([3., 4., 5.])
# axis = 1, elements by row
>> np.apply_along_axis(np.mean,axis=1,arr=A)
array([1., 4., 7.])
# Can define our own ufunc
>> def foo(x):
return x/2
>> np.apply_along_axis(foo,axis=1,arr=A)
array([[0. , 0.5, 1. ],
[1.5, 2. , 2.5],
[3. , 3.5, 4. ]])
>> np.apply_along_axis(lambda x:x/2,axis=1,arr=A)array([[0. , 0.5, 1. ],
[1.5, 2. , 2.5],
[3. , 3.5, 4. ]])
# NumPy's ufunc
>> np.apply_along_axis(np.sqrt,axis=0,arr=A)
array([[0. , 1. , 1.41421356],
[1.73205081, 2. , 2.23606798],
[2.44948974, 2.64575131, 2.82842712]])

8. Conditions and Boolean Arrays

>> A < 4array([[ True,  True,  True],
[ True, False, False],
[False, False, False]])
# All the elements which are less than 4.
>> A[A < 4]
array([0, 1, 2, 3])

9. Shape Manipulation

Converting two-dimensional array to one-dimensional array, we can use ravel()

>> A = A.ravel()
>> A
array([0, 1, 2, 3, 4, 5, 6, 7, 8])

We can also use shape() and reshape() to reshape

>> A = A.reshape(3,3)
>> A
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>> A.shape = (9)
>> A
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
>> A.shape = (3,3)
>> A
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])

Also, we can invert columns with the rows, using transpose()

>> A.transpose()array([[0, 3, 6],
[1, 4, 7],
[2, 5, 8]])

10. Array Manipulation

Joining Arrays

#Vertical Stacking
>> np.vstack((A,B))
array([[0., 1., 2.],
[3., 4., 5.],
[6., 7., 8.],
[2., 2., 2.],
[2., 2., 2.],
[2., 2., 2.]])
#Horizontal stacking
>> np.hstack((A,B))
array([[0., 1., 2., 2., 2., 2.],
[3., 4., 5., 2., 2., 2.],
[6., 7., 8., 2., 2., 2.]])
>> np.row_stack((A,B))array([[0., 1., 2.],
[3., 4., 5.],
[6., 7., 8.],
[2., 2., 2.],
[2., 2., 2.],
[2., 2., 2.]])
>> np.column_stack((A,B))array([[0., 1., 2., 2., 2., 2.],
[3., 4., 5., 2., 2., 2.],
[6., 7., 8., 2., 2., 2.]])

Splitting Arrays

>> C = np.arange(16).reshape((4,4))
>> C
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
>> [D,E] = np.hsplit(C,2)
>> print(f'D:{D}\n')
>> print(f'E:{E}')
D:[[ 0 1]
[ 4 5]
[ 8 9]
[12 13]]
E:[[ 2 3]
[ 6 7]
[10 11]
[14 15]]
# Upto Column 3, then remaining
>> [H,I] = np.hsplit(C,(3,))
>> print(f'H:{H}\n')
>> print(f'I:{I}\n')
H:[[ 0 1 2]
[ 4 5 6]
[ 8 9 10]
[12 13 14]]
I:[[ 3]
[ 7]
[11]
[15]]

# Upto Column 2, then 2 to column 3, then remaining.

>> [H,I,J] = np.hsplit(C,(2,3))
>> print(f'H:{H}\n')
>> print(f'I:{I}\n')
>> print(f'J:{J}\n')
H:[[ 0 1]
[ 4 5]
[ 8 9]
[12 13]]
I:[[ 2]
[ 6]
[10]
[14]]
J:[[ 3]
[ 7]
[11]
[15]]
>> [F,G] = np.vsplit(C,2)
>> print(f'F:{F}\n')
>> print(f'G:{G}')
F:[[0 1 2 3]
[4 5 6 7]]
G:[[ 8 9 10 11]
[12 13 14 15]]
# Upto Row 3, then remaining
>> [K,L] = np.vsplit(C,(3,))
>> print(f'K:{K}\n')
>> print(f'L:{L}')
K:[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
L:[[12 13 14 15]]
# Upto Row 2, then 2 to Row 3, then remaining.
>> [K,L,M] = np.vsplit(C,(2,3))
>> print(f'K:{K}\n')
>> print(f'L:{L}\n')
>> print(f'M:{M}')
K:[[0 1 2 3]
[4 5 6 7]]
L:[[ 8 9 10 11]]M:[[12 13 14 15]]

We can also use split() function to split NumPy arrays.

# Upto Column 1, then 1 to 3, then remaining
>> [A1,A2,A3] = np.split(C,[1,3],axis=1)
>> print(f'A1:{A1}\n')
>> print(f'A2:{A2}\n')
>> print(f'A3:{A3}')
A1:[[ 0]
[ 4]
[ 8]
[12]]
A2:[[ 1 2]
[ 5 6]
[ 9 10]
[13 14]]
A3:[[ 3]
[ 7]
[11]
[15]]
# Upto row 1, then 1 to 3, then remaining
>> [A1,A2,A3] = np.split(C,[1,3],axis=0)
>> print(f'A1:{A1}\n')
>> print(f'A2:{A2}\n')
>> print(f'A3:{A3}')
A1:[[0 1 2 3]]A2:[[ 4 5 6 7]
[ 8 9 10 11]]
A3:[[12 13 14 15]]

11. Copies or Views of Objects

None of the NumPy assignments produce copies of arrays, nor any element contained in them.

>> a = np.array([1,2,3,4])
>> b = a
>> a[2] = 0
>> b
array([1, 2, 0, 4])

The Array b is just calling array a. Even though we changed the 2nd element of a, we can see that the value of b is changed.

>> c = a[0:2]
>> a[0] = 0
>> c
array([0, 2])

Even when slicing, you are actually pointing to the same object.

We can use the copy() method to copy the array to another variable and use it independently.

>> a = np.array([1,2,3,4])
>> c = a.copy()
>> a[0] = 0
>> c
array([1, 2, 3, 4])

In this case, even when you change the items in the array a, array c remains unchanged.

12. Vectorization

Vectorization, along with broadcasting, is the basis of the internal implementation of NumPy. Vectorization is the absence of an explicit loop during the developing of the code. These loops actually cannot be omitted, but are implemented internally and then are replaced by other constructs in the code.

NumPy allows multiplication of two arrays as:

A*B

But in other languages, such operations would be expressed with many nested loops and for construct.

for( i=0; i < rows; i++){
for(j=0; j < columns; j++){
c[i][j] = a[i][j]*b[i][j];
}
}

13. Broadcasting

Broadcasting allows an operator or a function to act on two or more arrays to operate even if these arrays do not have the same shape. That said, not all the dimensions can be subjected to broadcasting; they must meet certain rules.

There are two rules of broadcasting:

  • you must add a 1 to each missing dimension. If the compatibility rules are now satisfied, you can apply broadcasting and move to the second rule.
  • This rule explains how to extend the size of the smallest array so that it’s the size of the biggest array so that the element-wise function or operator is applicable.

Let’s see an example

>> A = np.arange(16).reshape(4,4)
>> b = np.arange(4)
>> print(A.shape)
>> print(b.shape)
(4, 4)
(4,)
>> A + barray([[ 0, 2, 4, 6],
[ 4, 6, 8, 10],
[ 8, 10, 12, 14],
[12, 14, 16, 18]])

Rule 1 sets b’s shape to (4,1).
Rule 2 applies in the following way.

Let’s see another example.

>> m = np.arange(6).reshape(3,1,2)
>> n = np.arange(6).reshape(3,2,1)
>> print(f'm:\n{m}\n\n n:\n{n}')m = [[[0 1]] n = [[[0]
[[2 3]] [1]]
[[4 5]]] [[2]
[3]]
[[4]
[5]]]

In this case, both arrays undergo the extension of dimensions (broadcasting). 3 x 2 x 2

m* = [[[0 1]             n* = [[[0 0]
[0 1]] [1 1]]
[[2 3] [[2 2]
[2 3]] [3 3]]
[[4 5] [[4 4]
[4 5]]] [5 5]]]

When we add them,

>> m + narray([[[ 0,  1],
[ 1, 2]],
[[ 4, 5],
[ 5, 6]],
[[ 8, 9],
[ 9, 10]]])

14. Reading and Writing Array Data on Files

>> data = np.arange(0,120).reshape(12,-1)>> np.save('saved_data',data)>> loaded_data = np.load('saved_data.npy')
>> loaded_data
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[ 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[ 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[ 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[ 70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
[ 80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
[ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99],
[100, 101, 102, 103, 104, 105, 106, 107, 108, 109],
[110, 111, 112, 113, 114, 115, 116, 117, 118, 119]])

We have a CSV file with contents.

id,value1,value2,value3
1,123,1.4,23
2,110,0.5,18
3,164,2.1,19

>> data = np.genfromtxt('data.csv',delimiter=",",names=True)
>> data
array([(1., 123., 1.4, 23.), (2., 110., 0.5, 18.), (3., 164., 2.1, 19.)],
dtype=[('id', '<f8'), ('value1', '<f8'), ('value2', '<f8'), ('value3', '<f8')])

>> data['value2']
array([1.4, 0.5, 2.1])
>> data[0]
(1., 123., 1.4, 23.)

This post is based on the book Python Data Analytics with Pandas, Numpy and Matplotlib by Fabio Nelli.

--

--

I am a Computer Engineer graduated from Kathmandu University, Nepal. The existence is a program, we are here to add our part of code. Website: sbasnet.com.np