A Primer on Numpy

A Primer on Numpy

The core data structure used in numpy is array.

Import Numpy

  • After install numpy, simply import it. Usually people import it as np for abbreviation.
import numpy as np
  • Check version
print(np.__version__)

Why using numpy? Why not python list?

Python List

L = [i for i in range(10)]
print(L)

output

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
  • Python’s list may contain different data types
    • pro: flexibility
    • con: inefficient

Python Array

Python also has data type array which can only take one type of data structure

import array
arr = array.array("i", (i for i in range(10)))
# "i" is the data type integer

An error will be raised if we assign another data type to array

arr[5] = "python"

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-58-249200ff8ac8> in <module>
----> 1 arr[5]="python"

TypeError: an integer is required (got type str)
  • Python array does not treat the array data as vector or matrix, hence the methods used for vector or matrix are not implemented

Numpy Array

It’s simple to create a numpy array:

nparr = np.array([i for i in range(10)])

nparr

output

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

numpy array can only store one type of data

  • Check data type
nparr.dtype

output

dtype('int32')

If we change an element to a different data type (e.g. from int to float),
numpy array will automatically change this value back to the array datatype

nparr[5] = 5.8
print(nparr)
print(nparr.dtype)

output

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
dtype('int32')

To create an array with float data type

nparr2 = np.array([1,2,3.0])
nparr2.dtype

output

dtype('float64')

Create Numpy Array and Matrix

Alternative ways to create array

There are a number of other built-in methods to create arrays

  • Create a zero array
npzero = np.zeros(10)
npzero
npzero.dtype

output

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
dtype('float64')

If we want to create a zero array of int data type

np.zeros(10, dtype=int)

output

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

To create a matrix of 5 rows x 6 cols, instead of passing the size, we pass a tuple of row and col sizes

np.zeros((5,6))

output

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])
  • Create a one array
np.ones(10)

output

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

Create a one matrix

np.ones((3,4))

output

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])
  • Create an array/matrix with specific value
np.full((3,4), 20) # int
# same as np.full(shape=(3,4), fill=20)
np.full((3,4), 20.0) # float

output

array([[20, 20, 20, 20],
       [20, 20, 20, 20],
       [20, 20, 20, 20]])

array([[20., 20., 20., 20.],
       [20., 20., 20., 20.],
       [20., 20., 20., 20.]])

arange method

In python, there’s a method called range

[i for i in range(1,20,3)]
#range(start, end not inclusive, step)

output

[1, 4, 7, 10, 13, 16, 19]

In numpy, there’s a built-in method called arange, with similar usage

[i for i in np.arange(1,20,3)]

output is the same

[1, 4, 7, 10, 13, 16, 19]

However the big difference here is in python, the step can only be int,
whereas float number can be used for step in np arange

[i for i in range(0,20,0.4)]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-89-ccd0e40a6b64> in <module>
----> 1 [i for i in range(0,20,0.4)]

TypeError: 'float' object cannot be interpreted as an integer

For np.arange

[i for i in np.arange(15,20,0.5)]

output

[15.0, 15.5, 16.0, 16.5, 17.0, 17.5, 18.0, 18.5, 19.0, 19.5]

Linspace method

linspace method is used to create an evenly segmented range.

To create 10 evenly distributed data points between a range from 0 to 20

Note: for linspace the end point 20 is inclusive, which is different from aranage

np.linspace(0, 20, 10)
array([ 0.        ,  2.22222222,  4.44444444,  6.66666667,  8.88888889,
       11.11111111, 13.33333333, 15.55555556, 17.77777778, 20.        ])

Random

To use the random module

np.random.random

Int random

  • To get a random int between 0 and 10 (non-inclusive)
np.random.randint(0,10)
  • To get an 8 element array containing int number between 0 and 100
np.random.randint(0,100,8)

output

array([47,  9, 23, 48, 79, 42, 32, 80])
  • To get a 4x5 random int matrix
np.random.randint(0,100,(4,5))

output

array([[69,  3, 25, 83, 48],
       [19, 52, 52, 13, 62],
       [ 6, 22, 79, 63, 48],
       [ 8,  8, 53, 21, 20]])

Set seed

  • all random numbers generated by computers are pseudo-random.
  • Use a random see to make sure that the random numbers generated each time are the same

To set a seed

np.random.seed(42)

# run random matrix
np.random.randint(0,100,(4,5))
array([[51, 92, 14, 71, 60],
       [20, 82, 86, 74, 74],
       [87, 99, 23,  2, 21],
       [52,  1, 87, 29, 37]])

repeat above

np.random.seed(42)
np.random.randint(0,100,(4,5))
array([[51, 92, 14, 71, 60],
       [20, 82, 86, 74, 74],
       [87, 99, 23,  2, 21],
       [52,  1, 87, 29, 37]])

Float Random

  • Single random float
np.random.random()
0.9922115592912175
  • A random array
np.random.random(10)
array([0.61748151, 0.61165316, 0.00706631, 0.02306243, 0.52477466,
       0.39986097, 0.04666566, 0.97375552, 0.23277134, 0.09060643])
  • A random 3x4 matrix
np.random.random((3,4))
array([[0.61838601, 0.38246199, 0.98323089, 0.46676289],
       [0.85994041, 0.68030754, 0.45049925, 0.01326496],
       [0.94220176, 0.56328822, 0.3854165 , 0.01596625]])

Normal Distribution

In statistics, it’s common to generate a normal distribution.
To generate a random normal distribution:

  • A random normal distribution number
np.random.normal()
0.230893825622149
  • A random normal distribution number with mean=10, variance=4
np.random.normal(10,4)
13.116770538060019
  • To get a random normal distribution array
np.random.normal(0,1,10)
array([-1.10109776,  1.13022819,  0.37311891, -0.38647295, -1.15877024,
        0.56611283, -0.70445345, -1.3779393 , -0.35311665, -0.46146572])
  • To get a random normal distribution matrix
np.random.normal(0,1,(3,5))
array([[ 0.39637394, -0.6256163 , -0.52246148,  0.0130339 ,  0.37776959],
       [ 0.0628246 ,  0.50159637, -0.14694539,  0.18062297,  0.96481058],
       [-1.06483115,  0.1087118 ,  0.10576365,  0.92066672, -0.22672246]])

   Reprint policy


《A Primer on Numpy》 by Isaac Zhou is licensed under a Creative Commons Attribution 4.0 International License
  TOC