NumPy
NumPy is a first class library for numerical programming
Numerical libraries such as NumPy provide code for common numerical operations
- Creating and manipulating arrays of numbers
- sums, means, matrix operations
- Generating random numbers, etc.
In non-compiled languages such as Python, MATLAB, etc., there is an extra value:
These languages are inherently slower than compiled languages like C and Fortran
Using numerical libraries, individual operations on large arrays can be shifted out to fast compiled code
Introduction to NumPy
Used by scientists all around the world
- Engineering, biology, astronomy, geosciences, artificial intelligence, etc.
Mature, fast and and stable
NumPy can be installed individually or as part of SciPy
- I recommend the latter as we will be using SciPy soon
- Follow the download instructions at scipy.org
NumPy provides the definition of a numerical array class, plus some utilities
- Arrays are implemented in C
- Array operations use highly optimized, compiled C code
The utilities are Python interfaces to fast numerical routines compiled from C and Fortran
The rest of the lecture assumes that NumPy has been imported as
np>>> import numpy as np
Although this lecture cannot cover all the details of NumPy, remember
- The docstrings are your friends, so use them
- In the Python shell
help(np.poly1d) - In an IPython shell
np.poly1d?
- In the Python shell
NumPy Arrays
Let's now turn to the basics of NumPy arrays
Internally, a NumPy array is an instance of
numpy.ndarray (np.ndarray)
Like a native Python list, except that
- Data
must be homogeneous (all elements of the same type) - These types must be one of the data types (
dtypes) provided by NumPy
Data Types
Some dtypes:
| bool | 8 bit True or False |
| int32 | 32 bit integer |
| int64 | 64 bit integer |
| float32 | 32 bit floating point number |
| float64 | 64 bit floating point number |
There are also dtypes to represent complex numbers, unsigned integers, etc
At present, the default dtype for arrays is
float64- When you create arrays, they are of this type unless otherwise specified
Here's an example
>>> Z = np.zeros(10)
>>> Z
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
>>> type(Z)
<type 'numpy.ndarray'>
>>> type(Z[0])
<type 'numpy.float64'>
zeros() is a NumPy function that returns an ndarray of zeros
If we had wanted to use integers we could have specified as follows:
>>> Z = np.zeros(10, dtype=int) # Here 'int' is equivalent to 'np.int32'
>>> Z
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
There's no problem mixing native Python numeric types and NumPy
dtypes:>>> Z = np.zeros(1)
>>> Z
array([ 0.])
>>> Z[0] = Z[0] + 1 # Add a Python int to a NumPy float64
>>> Z
array([ 1.]) # The int is upcast to a float64 before addition
>>> type(Z[0])
<type 'numpy.float64'>
Shape and Dimension
Suppose we create an array such as
>>> Z = np.zeros(10)
At present,
Z has no dimension:- It is neither a row vector nor a column vector
- We say that the array is "flat"
The dimension is recorded in the
shape attribute, which is a tuple>>> Z.shape
(10,)
The shape tuple has only one element, which is the length of the array
- Recall that tuples with one element have a comma after that element
To give it dimension, we need to change the
shape attribute>>> Z.shape = (10, 1) # Column vector: 10 rows and 1 column
>>> Z
array([[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.]])
>>> Z = np.zeros(4)
>>> Z.shape = (2, 2)
>>> Z
array([[ 0., 0.],
[ 0., 0.]])
In the last case, we could also pass a tuple to the
zeros() function>>> Z = np.zeros((2, 2))
>>> Z.shape
(2, 2)
>>> Z
array([[ 0., 0.],
[ 0., 0.]])
Creating Arrays
NumPy arrays can be created from Python lists, tuples, etc. using
array() or asarray()>>> Z = np.array([10, 20]) # ndarray from Python list
>>> Z
array([10, 20])
>>> type(Z)
<type 'numpy.ndarray'>
>>> Z = np.array((10, 20), dtype=float) # Here 'float' is equivalent to 'np.float64'
>>> Z
array([ 10., 20.])
>>> Z = np.array([[1, 2], [3, 4]]) # 2D array from a list of lists
>>> Z
array([[1, 2],
[3, 4]])
>>> Z.shape
(2, 2)
To read in the data from a text file use
loadtxt()
Here is the text file
test.txt1 2
3 4
Now
>>> Z = np.loadtxt('test.txt')
>>> Z
array([[ 1., 2.],
[ 3., 4.]])
In addition there are utility functions for creating standard arrays
- We already met
zeros() - Another one is
empty()
>>> Z = np.empty(3)
>>> Z
array([ -2.41128814e-039, 1.48539705e-313, -8.30185262e-040])
What are those numbers, by the way?
- Python allocates 3 contiguous pieces of memory (RAM), each with 64 bits
- Contents of those memory slots is interpreted as a
float64
To generate an array of ones, use
ones():>>> Z = np.ones((4, 4))
>>> Z
array([[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.]])
To set up a grid of evenly spaced numbers use
linspace()>>> Z = np.linspace(2, 4, 5) # From 2 to 4, with 5 elements
>>> Z
array([ 2. , 2.5, 3. , 3.5, 4. ])
>>> Z = np.linspace(2, 4, 6) # From 2 to 4, with 6 elements
>>> Z
array([ 2. , 2.4, 2.8, 3.2, 3.6, 4. ])
To create an identity matrix use
identity()>>> Z = np.identity(2)
>>> Z
array([[ 1., 0.],
[ 0., 1.]])
Array Indexing
For a flat array, indexing is the same as Python sequences:
>>> Z = np.linspace(1, 2, 5)
>>> Z
array([ 1. , 1.25, 1.5 , 1.75, 2. ])
>>> Z[0]
1.0
>>> Z[0:2]
array([ 1. , 1.25])
>>> Z[-1]
2.0
For 2D arrays the syntax is as follows:
>>> Z = np.array([[1, 2], [3, 4]])
>>> Z
array([[1, 2],
[3, 4]])
>>> Z[0, 0]
1
>>> Z[0, 1]
2
And so on
Note that indices are still zero-based, to maintain compatibility with Python sequences
Columns and rows can be extracted as follows
>>> Z[0,:] # First row
array([1, 2])
>>> Z[:,1] # Second column
array([2, 4])
NumPy arrays of integers can be used to extract elements:
>>> Z = np.linspace(2, 4, 5)
>>> Z
array([ 2. , 2.5, 3. , 3.5, 4. ])
>>> indices = np.array((0, 2, 3))
>>> Z[indices]
array([ 2. , 3. , 3.5])
We can also use an array of
dtype bool to extract elements>>> Z
array([ 2. , 2.5, 3. , 3.5, 4. ])
>>> d = np.array([0, 1, 1, 0, 0], dtype=bool)
>>> d
array([False, True, True, False, False], dtype=bool)
>>> Z[d]
array([ 2.5, 3. ])
An aside: all elements of an array can be set equal to one number using slice notation:
>>> Z = np.empty(3)
>>> Z
array([ 3.45740277e-269, -6.35113306e-040, 5.88173752e-313])
>>> Z[:] = 42
>>> Z
array([ 42., 42., 42.])
Comparisons
In general, comparisons are done elementwise:
>>> Z = np.array([2, 3])
>>> Y = np.array([2, 3])
>>> Z == Y
array([ True, True], dtype=bool)
>>> Y[0] = 5
>>> Z == Y
array([False, True], dtype=bool)
>>> Z != Y
array([ True, False], dtype=bool)
The situation is similar for
>, <, >= and <=>>> Z > Y
array([False, False], dtype=bool)
>>> Z < Y
array([ True, False], dtype=bool)
>>> Z >= Y
array([False, True], dtype=bool)
>>> Z <= Y
array([ True, True], dtype=bool)
We can also do comparisons against scalars:
>>> Z = np.linspace(0, 10, 5)
>>> Z
array([ 0. , 2.5, 5. , 7.5, 10. ])
>>> Z > 3
array([False, False, True, True, True], dtype=bool)
This is particularly useful for conditional extraction
>>> b = Z > 3
>>> b
array([False, False, True, True, True], dtype=bool)
>>> Z[b]
array([ 5. , 7.5, 10. ]) # All elements of Z that are > 3
Of course we can do this in one step
>>> Z[Z > 3]
array([ 5. , 7.5, 10. ])
Algebraic Operations in NumPy
The algebraic operators
+, -, *, / and ** all act elementwise on arrays>>> A = np.array([1, 2, 3, 4])
>>> B = np.array([5, 6, 7, 8])
>>> A + B
array([ 6, 8, 10, 12])
>>> A * B
array([ 5, 12, 21, 32])
We can also add a scalar to each element as follows
>>> A + 10
array([11, 12, 13, 14])
Scalar multiplication is similar
>>> A = np.array([1, 2, 3, 4])
>>> A * 10
array([10, 20, 30, 40])
The two dimensional arrays follow the same general rules:
>>> A = np.ones((2, 2))
>>> B = np.ones((2, 2))
>>> A + B
array([[ 2., 2.],
[ 2., 2.]])
>>> A + 10
array([[ 11., 11.],
[ 11., 11.]])
>>> A * B
array([[ 1., 1.],
[ 1., 1.]])
Note that
A * B is not the matrix product, it is an elementwise product
To do matrix multiplication we use the
dot() function>>> A
array([[ 1., 1.],
[ 1., 1.]])
>>> B
array([[ 1., 1.],
[ 1., 1.]])
>>> np.dot(A, B)
array([[ 2., 2.],
[ 2., 2.]])
With
dot() we can take the inner product of two flat arrays>>> A = np.array([1, 2])
>>> B = np.array([10, 20])
>>> np.dot(A, B) # Returns a scalar in this case
50
We can use
dot() with a Python list or tuple>>> A = np.empty((2, 2))
>>> A
array([[ -2.41124330e-039, 0.00000000e+000],
[ 2.12199579e-314, 9.88131292e-324]])
>>> np.dot(A, (0, 1))
array([ 0.00000000e+000, 9.88131292e-324])
Here
dot() knows we are postmultiplying, so (0, 1) is treated as a column vector
I could go on but you get the general idea...
Array Methods
Here are some useful methods of the
ndarray class>>> A = np.array((4, 3, 2, 1))
>>> A
array([4, 3, 2, 1])
>>> A.sort() # Sorts A in place
>>> A
array([1, 2, 3, 4])
>>> A.sum() # Sum
10
>>> A.mean() # Mean
2.5
>>> A.max() # Max
4
>>> A.argmax() # Returns the index of the maximal element
3
>>> A.cumsum() # Cumulative sum of the elements of A
array([ 1, 3, 6, 10])
>>> A.cumprod() # Cumulative product of the elements of A
array([ 1, 2, 6, 24])
>>> A.var() # Variance
1.25
>>> A.std() # Standard deviation
1.1180339887498949
>>> A.shape = (2, 2)
>>> A.T # Equivalent to A.transpose()
array([[1, 3],
[2, 4]])
A slightly more complicated method is
searchsorted()- Suppose that
Zis a nondecreasing array, then Z.searchsorted(a)returns index of firstzinZsuch thatz >= a
>>> Z = np.linspace(2, 4, 5)
>>> Z
array([ 2. , 2.5, 3. , 3.5, 4. ])
>>> Z.searchsorted(2.2)
1
>>> Z.searchsorted(2.5)
1
>>> Z.searchsorted(2.6)
2
Note that these methods have equivalent functions in the NumPy namespace
>>> A = np.array((4, 3, 2, 1))
>>> np.sum(A) # Use NumPy's sum(). The built-in sum() is slower.
10
>>> np.mean(A)
2.5
>>> np.sort(A)
array([1, 2, 3, 4])
Exercises
Problem 1
Consider evaluating the polynomial
at some x given the vector of coefficients
We can do this using
np.poly1d, but for the sake of the exercise don't use this class
We also want to avoid
for and while loops, which are slow
Write a function which takes the vector of coefficients and a number x, and returns the value p(x)
- Hint: Use
np.cumprod()
Solution
import numpy as np
def p(coef, x):
X = np.empty(len(coef))
X[0] = 1
X[1:] = x
Y = np.cumprod(X) # Y = [1, x, x**2,...]
return np.dot(coef, Y)
0 comments:
Post a Comment