NumPy
NumPy is a first class library for numerical programming
Numerical libraries such as NumPy provide code for common numerical operations
- Creating and manipulating arrays of numbers
- sums, means, matrix operations
- Generating random numbers, etc.
In non-compiled languages such as Python, MATLAB, etc., there is an extra value:
These languages are inherently slower than compiled languages like C and Fortran
Using numerical libraries, individual operations on large arrays can be shifted out to fast compiled code
Introduction to NumPy
Used by scientists all around the world
- Engineering, biology, astronomy, geosciences, artificial intelligence, etc.
Mature, fast and and stable
NumPy can be installed individually or as part of SciPy
- I recommend the latter as we will be using SciPy soon
- Follow the download instructions at scipy.org
NumPy provides the definition of a numerical array class, plus some utilities
- Arrays are implemented in C
- Array operations use highly optimized, compiled C code
The utilities are Python interfaces to fast numerical routines compiled from C and Fortran
The rest of the lecture assumes that NumPy has been imported as
np
>>> import numpy as np
Although this lecture cannot cover all the details of NumPy, remember
- The docstrings are your friends, so use them
- In the Python shell
help(np.poly1d)
- In an IPython shell
np.poly1d?
- In the Python shell
NumPy Arrays
Let's now turn to the basics of NumPy arrays
Internally, a NumPy array is an instance of
numpy.ndarray
(np.ndarray
)
Like a native Python list, except that
- Data
must be homogeneous (all elements of the same type) - These types must be one of the data types (
dtypes
) provided by NumPy
Data Types
Some dtypes:
bool | 8 bit True or False |
int32 | 32 bit integer |
int64 | 64 bit integer |
float32 | 32 bit floating point number |
float64 | 64 bit floating point number |
There are also dtypes to represent complex numbers, unsigned integers, etc
At present, the default dtype for arrays is
float64
- When you create arrays, they are of this type unless otherwise specified
Here's an example
>>> Z = np.zeros(10)
>>> Z
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
>>> type(Z)
<type 'numpy.ndarray'>
>>> type(Z[0])
<type 'numpy.float64'>
zeros()
is a NumPy function that returns an ndarray
of zeros
If we had wanted to use integers we could have specified as follows:
>>> Z = np.zeros(10, dtype=int) # Here 'int' is equivalent to 'np.int32'
>>> Z
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
There's no problem mixing native Python numeric types and NumPy
dtypes
:>>> Z = np.zeros(1)
>>> Z
array([ 0.])
>>> Z[0] = Z[0] + 1 # Add a Python int to a NumPy float64
>>> Z
array([ 1.]) # The int is upcast to a float64 before addition
>>> type(Z[0])
<type 'numpy.float64'>
Shape and Dimension
Suppose we create an array such as
>>> Z = np.zeros(10)
At present,
Z
has no dimension:- It is neither a row vector nor a column vector
- We say that the array is "flat"
The dimension is recorded in the
shape
attribute, which is a tuple>>> Z.shape
(10,)
The shape tuple has only one element, which is the length of the array
- Recall that tuples with one element have a comma after that element
To give it dimension, we need to change the
shape
attribute>>> Z.shape = (10, 1) # Column vector: 10 rows and 1 column
>>> Z
array([[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.]])
>>> Z = np.zeros(4)
>>> Z.shape = (2, 2)
>>> Z
array([[ 0., 0.],
[ 0., 0.]])
In the last case, we could also pass a tuple to the
zeros()
function>>> Z = np.zeros((2, 2))
>>> Z.shape
(2, 2)
>>> Z
array([[ 0., 0.],
[ 0., 0.]])
Creating Arrays
NumPy arrays can be created from Python lists, tuples, etc. using
array()
or asarray()
>>> Z = np.array([10, 20]) # ndarray from Python list
>>> Z
array([10, 20])
>>> type(Z)
<type 'numpy.ndarray'>
>>> Z = np.array((10, 20), dtype=float) # Here 'float' is equivalent to 'np.float64'
>>> Z
array([ 10., 20.])
>>> Z = np.array([[1, 2], [3, 4]]) # 2D array from a list of lists
>>> Z
array([[1, 2],
[3, 4]])
>>> Z.shape
(2, 2)
To read in the data from a text file use
loadtxt()
Here is the text file
test.txt
1 2
3 4
Now
>>> Z = np.loadtxt('test.txt')
>>> Z
array([[ 1., 2.],
[ 3., 4.]])
In addition there are utility functions for creating standard arrays
- We already met
zeros()
- Another one is
empty()
>>> Z = np.empty(3)
>>> Z
array([ -2.41128814e-039, 1.48539705e-313, -8.30185262e-040])
What are those numbers, by the way?
- Python allocates 3 contiguous pieces of memory (RAM), each with 64 bits
- Contents of those memory slots is interpreted as a
float64
To generate an array of ones, use
ones()
:>>> Z = np.ones((4, 4))
>>> Z
array([[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.]])
To set up a grid of evenly spaced numbers use
linspace()
>>> Z = np.linspace(2, 4, 5) # From 2 to 4, with 5 elements
>>> Z
array([ 2. , 2.5, 3. , 3.5, 4. ])
>>> Z = np.linspace(2, 4, 6) # From 2 to 4, with 6 elements
>>> Z
array([ 2. , 2.4, 2.8, 3.2, 3.6, 4. ])
To create an identity matrix use
identity()
>>> Z = np.identity(2)
>>> Z
array([[ 1., 0.],
[ 0., 1.]])
Array Indexing
For a flat array, indexing is the same as Python sequences:
>>> Z = np.linspace(1, 2, 5)
>>> Z
array([ 1. , 1.25, 1.5 , 1.75, 2. ])
>>> Z[0]
1.0
>>> Z[0:2]
array([ 1. , 1.25])
>>> Z[-1]
2.0
For 2D arrays the syntax is as follows:
>>> Z = np.array([[1, 2], [3, 4]])
>>> Z
array([[1, 2],
[3, 4]])
>>> Z[0, 0]
1
>>> Z[0, 1]
2
And so on
Note that indices are still zero-based, to maintain compatibility with Python sequences
Columns and rows can be extracted as follows
>>> Z[0,:] # First row
array([1, 2])
>>> Z[:,1] # Second column
array([2, 4])
NumPy arrays of integers can be used to extract elements:
>>> Z = np.linspace(2, 4, 5)
>>> Z
array([ 2. , 2.5, 3. , 3.5, 4. ])
>>> indices = np.array((0, 2, 3))
>>> Z[indices]
array([ 2. , 3. , 3.5])
We can also use an array of
dtype bool
to extract elements>>> Z
array([ 2. , 2.5, 3. , 3.5, 4. ])
>>> d = np.array([0, 1, 1, 0, 0], dtype=bool)
>>> d
array([False, True, True, False, False], dtype=bool)
>>> Z[d]
array([ 2.5, 3. ])
An aside: all elements of an array can be set equal to one number using slice notation:
>>> Z = np.empty(3)
>>> Z
array([ 3.45740277e-269, -6.35113306e-040, 5.88173752e-313])
>>> Z[:] = 42
>>> Z
array([ 42., 42., 42.])
Comparisons
In general, comparisons are done elementwise:
>>> Z = np.array([2, 3])
>>> Y = np.array([2, 3])
>>> Z == Y
array([ True, True], dtype=bool)
>>> Y[0] = 5
>>> Z == Y
array([False, True], dtype=bool)
>>> Z != Y
array([ True, False], dtype=bool)
The situation is similar for
>
, <
, >=
and <=
>>> Z > Y
array([False, False], dtype=bool)
>>> Z < Y
array([ True, False], dtype=bool)
>>> Z >= Y
array([False, True], dtype=bool)
>>> Z <= Y
array([ True, True], dtype=bool)
We can also do comparisons against scalars:
>>> Z = np.linspace(0, 10, 5)
>>> Z
array([ 0. , 2.5, 5. , 7.5, 10. ])
>>> Z > 3
array([False, False, True, True, True], dtype=bool)
This is particularly useful for conditional extraction
>>> b = Z > 3
>>> b
array([False, False, True, True, True], dtype=bool)
>>> Z[b]
array([ 5. , 7.5, 10. ]) # All elements of Z that are > 3
Of course we can do this in one step
>>> Z[Z > 3]
array([ 5. , 7.5, 10. ])
Algebraic Operations in NumPy
The algebraic operators
+
, -
, *
, /
and **
all act elementwise on arrays>>> A = np.array([1, 2, 3, 4])
>>> B = np.array([5, 6, 7, 8])
>>> A + B
array([ 6, 8, 10, 12])
>>> A * B
array([ 5, 12, 21, 32])
We can also add a scalar to each element as follows
>>> A + 10
array([11, 12, 13, 14])
Scalar multiplication is similar
>>> A = np.array([1, 2, 3, 4])
>>> A * 10
array([10, 20, 30, 40])
The two dimensional arrays follow the same general rules:
>>> A = np.ones((2, 2))
>>> B = np.ones((2, 2))
>>> A + B
array([[ 2., 2.],
[ 2., 2.]])
>>> A + 10
array([[ 11., 11.],
[ 11., 11.]])
>>> A * B
array([[ 1., 1.],
[ 1., 1.]])
Note that
A * B
is not the matrix product, it is an elementwise product
To do matrix multiplication we use the
dot()
function>>> A
array([[ 1., 1.],
[ 1., 1.]])
>>> B
array([[ 1., 1.],
[ 1., 1.]])
>>> np.dot(A, B)
array([[ 2., 2.],
[ 2., 2.]])
With
dot()
we can take the inner product of two flat arrays>>> A = np.array([1, 2])
>>> B = np.array([10, 20])
>>> np.dot(A, B) # Returns a scalar in this case
50
We can use
dot()
with a Python list or tuple>>> A = np.empty((2, 2))
>>> A
array([[ -2.41124330e-039, 0.00000000e+000],
[ 2.12199579e-314, 9.88131292e-324]])
>>> np.dot(A, (0, 1))
array([ 0.00000000e+000, 9.88131292e-324])
Here
dot()
knows we are postmultiplying, so (0, 1)
is treated as a column vector
I could go on but you get the general idea...
Array Methods
Here are some useful methods of the
ndarray
class>>> A = np.array((4, 3, 2, 1))
>>> A
array([4, 3, 2, 1])
>>> A.sort() # Sorts A in place
>>> A
array([1, 2, 3, 4])
>>> A.sum() # Sum
10
>>> A.mean() # Mean
2.5
>>> A.max() # Max
4
>>> A.argmax() # Returns the index of the maximal element
3
>>> A.cumsum() # Cumulative sum of the elements of A
array([ 1, 3, 6, 10])
>>> A.cumprod() # Cumulative product of the elements of A
array([ 1, 2, 6, 24])
>>> A.var() # Variance
1.25
>>> A.std() # Standard deviation
1.1180339887498949
>>> A.shape = (2, 2)
>>> A.T # Equivalent to A.transpose()
array([[1, 3],
[2, 4]])
A slightly more complicated method is
searchsorted()
- Suppose that
Z
is a nondecreasing array, then Z.searchsorted(a)
returns index of firstz
inZ
such thatz >= a
>>> Z = np.linspace(2, 4, 5)
>>> Z
array([ 2. , 2.5, 3. , 3.5, 4. ])
>>> Z.searchsorted(2.2)
1
>>> Z.searchsorted(2.5)
1
>>> Z.searchsorted(2.6)
2
Note that these methods have equivalent functions in the NumPy namespace
>>> A = np.array((4, 3, 2, 1))
>>> np.sum(A) # Use NumPy's sum(). The built-in sum() is slower.
10
>>> np.mean(A)
2.5
>>> np.sort(A)
array([1, 2, 3, 4])
Exercises
Problem 1
Consider evaluating the polynomial
at some x given the vector of coefficients
We can do this using
np.poly1d
, but for the sake of the exercise don't use this class
We also want to avoid
for
and while
loops, which are slow
Write a function which takes the vector of coefficients and a number x, and returns the value p(x)
- Hint: Use
np.cumprod()
Solution
import numpy as np
def p(coef, x):
X = np.empty(len(coef))
X[0] = 1
X[1:] = x
Y = np.cumprod(X) # Y = [1, x, x**2,...]
return np.dot(coef, Y)
0 comments:
Post a Comment