Accessing Data Along Multiple Dimensions in an Array
In this section, we will:
Define the “dimensionality” of an array.
Discuss the usefulness of NDarrays.
Introduce the indexing and slicing scheme for accessing a multidimensional array’s contents
We will encounter arrays of varying dimensionalities:
# A 0D array
np.array(8)
# A 1D array, shape(3,)
np.array([2.3, 0.1, 9.1])
# A 2D array, shape(3, 2)
np.array([[93, 95],
[84, 100],
[99, 87]])
# A 3D array, shape(2, 2, 2)
np.array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
Similar to Python’s sequences, we use 0based indices and slicing to access the content of an array. However, we must specify an index/slice for each dimension of an array:
>>> import numpy as np
# A 3D array
>>> x = np.array([[[0, 1],
... [2, 3]],
...
... [[4, 5],
... [6, 7]]])
# get: sheet0, both rows, flip order of columns
>>> x[0, :, ::1]
array([[1, 0],
[3, 2]])
Onedimensional Arrays
Let’s begin our discussion by constructing a simple NDarray containing three floatingpoint numbers.
>>> simple_array = np.array([2.3, 0.1, 9.1])
This array supports the same indexing scheme as Python’s sequences (lists, tuples, and strings):
++++
 2.3  0.1  9.1 
++++
0 1 2
3 2 1
The first row of numbers gives the position of the indices 0…3 in the array; the second row gives the corresponding negative indices. The slice from \(i\) to \(j\) returns an array containing of all numbers between the edges labeled \(i\) and \(j\), respectively:
>>> simple_array[0]
2.3
>>> simple_array[2]
0.1
>>> simple_array[1:3]
array([ 0.1, 9.1])
>>> simple_array[3]
IndexError: index 3 is out of bounds for axis 0 with size 3
Given this indexing scheme, only one integer is needed to specify a unique entry in the array. Similarly only one slice is needed to uniquely specify a subsequence of entries in the array. For this reason, we say that this is a 1dimensional array. In general, the dimensionality of an array specifies the number of indices that are required to uniquely specify one of its entries.
Definition:
The dimensionality of an array specifies the number of indices that are required to uniquely specify one of its entries.
This definition of dimensionality is common far beyond NumPy; one must use three numbers to uniquely specify a point in physical space, which is why it is said that space consists of three dimensions.
Twodimensional Arrays
Before proceeding further down the path of highdimensional arrays, let’s briefly consider a very simple dataset where the desire to access the data along multiple dimensions is manifestly desirable. Consider the following table from a gradebook:
Exam 1 (%) 
Exam 2 (%) 


Ashley 
\(93\) 
\(95\) 
Brad 
\(84\) 
\(100\) 
Cassie 
\(99\) 
\(87\) 
This dataset contains 6 gradevalues. It is almost immediately clear that storing these in a 1dimensional array is not ideal:
# using a 1dimensional array to store the grades
>>> grades = np.array([93, 95, 84, 100, 99, 87])
While no data has been lost, accessing this data using a single index is less than convenient; we want to be able to specify both the student and the exam when accessing a grade  it is natural to ascribe two dimensions to this data. Let’s construct a 2D array containing these grades:
# using a 2dimensional array to store the grades
>>> grades = np.array([[93, 95],
... [84, 100],
... [99, 87]])
NumPy is able to see the repeated structure among the listoflistsofnumbers passed to np.array
, and resolve the two dimensions of data, which we deem the ‘student’ dimension and the ‘exam’ dimension, respectively.
Axis vs Dimension:
Although NumPy does formally recognize the concept of dimensionality precisely in the way that it is discussed here, its documentation refers to an individual dimension of an array as an axis. Thus you will see “axes” (pronounced “aksēz”) used in place of “dimensions”; however, they mean the same thing.
NumPy specifies the rowaxis (students) of a 2D array as “axis0” and the columnaxis (exams) as axis1. You must now provide two indices, one for each axis (dimension), to uniquely specify an element in this 2D array; the first number specifies an index along axis0, the second specifies an index along axis1. The zerobased indexing schema that we reviewed earlier applies to each axis of the NDarray:
 axis1 >
2 1
0 1
 +++
 3, 0 93  95
 +++
axis0 2, 1 84 100
 +++
 1, 2 99  87
V +++
Because grades
has three entries along axis0 and two entries along axis1, it has a “shape” of (3, 2)
.
>>> grades.shape
(3, 2)
Integer Indexing
Thus, if we want to access Brad’s (item1 along axis0) score for Exam 1 (item0 along axis1) we simply specify:
# providing two numbers to access an element
# in a 2Darray
>>> grades[1, 0] # Brad's score on Exam 1
84
# negative indices work as with lists/tuples/strings
>>> grades[2, 0] # Brad's score on Exam 1
84
Slice Indexing
We can also uses slices to access subsequences of our data. Suppose we want the scores of all the students for Exam 2. We can slice from 0 through 3 along axis0 (refer to the indexing diagram in the previous section) to include all the students, and specify index 1 on axis1 to select Exam 2:
>>> grades[0:3, 1] # Exam 2 scores for all students
array([ 95, 100, 87])
As with Python sequences, you can specify an “empty” slice to include all possible entries along an axis, by default: grades[:, 1]
is equivalent to grades[0:3, 1]
, in this instance. More generally, withholding either the ‘start’ or ‘stop’ value in a slice will result in the use smallest or largest valid index, respectively:
>>> grades[1:, 1] # equivalent to `grades[1:3, 1]
array([ 100, 87])
>>> grades[:, :1] # equivalent to `grades[0:3, 0:1]
array([[93],
[84],
[99]])
The output of grades[:, :1]
might look somewhat funny. Because the axis1 slice only includes one column of numbers, the shape of the resulting array is (3, 1). 0 is thus only valid (nonnegative) index for axis1, since there is only one column to specify in the array.
You can also supply a “step” value to the slice. grades[::1, :]
will returns the array of grades with the studentaxis flipped (reversealphabetical order).
Negative Indices
As indicated above, negative indices are valid too and are quite useful. If we want to access the scores of the latest exam for all of the students, you can specify:
# using a negative index and a slice
>>> grades[:, 1] # Latest exam scores (Exam 2), for all students
array([ 95, 100, 87])
Note the value of using the negative index is that it will always provide you with the latest exam score  you need not check how many exams the students have taken.
Supplying Fewer Indices Than Dimensions
What happens if we only supply one index to our array? It may be surprising that grades[0]
does not throw an error since we are specifying only one index to access data from a 2dimensional array. Instead, NumPy it will return all of the exam scores for student0 (Ashley):
>>> grades[0]
array([ 93, 95])
This is because NumPy will automatically insert trailing slices for you if you don’t provide as many indices as there are dimensions for your array. grades[0]
was treated as grades[0, :]
.
Suppose you have an \(N\)dimensional array, and only provide \(j\) indices for the array; NumPy will automatically insert \(Nj\) trailing slices for you. In the case that \(N=5\) and \(j=3\), d5_array[0, 0, 0]
is treated as d5_array[0, 0, 0, :, :]
Thus far, we have discussed some rules for accessing data in arrays, all of which fall into the category that is designated “basic indexing” by the NumPy documentation. We will discuss the details of basic indexing and of “advanced indexing”, in full, in a later section. Note, however, that all of the indexing/slicing reviewed here produces a “view” of the original array. That is, no data is copied when you index into an array using integer indices and/or slices. Recall that slicing lists and tuples do produce copies of the data.
FYI:
Keeping track of the meaning of an array’s various dimensions can quickly become unwieldy when working with real datasets. xarray is a Python library that provides functionality comparable to NumPy, but allows users provide explicit labels for an array’s dimensions; that is, you can name each dimension. Using an xarray
to select Brad’s scores could look like grades.sel(student='Brad')
, for instance. This is a valuable library to look into at
your leisure.
Ndimensional Arrays
Let’s build up some intuition for arrays with a dimensionality higher than 2. The following code creates a 3dimensional array:
# a 3D array, shape(2, 2, 2)
>>> d3_array = np.array([[[0, 1],
... [2, 3]],
...
... [[4, 5],
... [6, 7]]])
You can think of axis0 denoting which of the 2x2 “sheets” to select from. Then axis1 specifies the row along the sheets, and axis2 the column within the row:
Depicting the layout of a 3D array
sheet 0:
[0, 1]
[2, 3]
sheet 1:
[4, 5]
[6, 7]
  axis2 >
 
 axis1 [0, 1]
  [2, 3]
 V
axis0
  axis2 >
 
 axis1 [4, 5]
  [6, 7]
V V
Thus d3_array[0, 1, 0]
specifies the element residing in sheet0, at row1 and column0:
# retrieving a single element from a 3Darray
>>> d3_array[0, 1, 0]
2
d3_array[:, 0, 0]
specifies the elements in row0 and column0 of both sheets:
# retrieving a 1D subarray from a 3Darray
>>> d3_array[:, 0, 0]
array([0, 4])
d3_array[1]
, which recall is shorthand for d3_array[1, :, :]
, selects both rows and both columns of sheet1:
# retrieving a 2D subarray from a 3Darray
>>> d3_array[1]
array([[4, 5],
[6, 7]])
In four dimensions, one can think of “stacks of sheets with rows and columns” where axis0 selects the stack of sheets you are working with, axis1 chooses the sheet, axis2 chooses the row, and axis3 chooses the column. Extrapolating to higher dimensions (“collections of stacks of sheets …”) continues in the same tedious fashion.
Reading Comprehension: Multidimensional Indexing
Given the 3D, shape(3, 3, 3) array:
>>> arr = np.array([[[ 0, 1, 2],
... [ 3, 4, 5],
... [ 6, 7, 8]],
...
... [[ 9, 10, 11],
... [12, 13, 14],
... [15, 16, 17]],
...
... [[18, 19, 20],
... [21, 22, 23],
... [24, 25, 26]]])
Index into the array to produce the following results
#1
array([[ 2, 5, 8],
[11, 14, 17],
[20, 23, 26]])
#2
array([[ 3, 4, 5],
[12, 13, 14]])
#3
array([2, 5])
#4
array([[11, 10, 9],
[14, 13, 12],
[17, 16, 15]])
Zerodimensional Arrays
A zero dimensional array is simply a single number (a.k.a. a scalar value):
# creating a 0dimensional array
>>> x = np.array(15.2)
This is not equivalent to a length1 1Darray: np.array([15.2])
. According to our definition of dimensionality, zero numbers are required to index into a 0D array as it is unnecessary to provide an identifier for a standalone number. Thus you cannot index into a 0D array.
# you cannot index into a 0D array
>>> x[0]

IndexError Traceback (most recent call last)
<ipythoninput102f755f117ac9> in <module>()
> 1 x[0]
IndexError: too many indices for array
You must use the syntax arr.item()
to retrieve the numerical entry from a 0D array:
>>> x.item()
15.2
Zerodimensional arrays do not show up in real applications very often. They are, however, important from the point of view of NumPy being selfconsistent in how it treats dimensionality in its arrays, and it is important that you are at least exposed to a 0D array and understand its nuances.
Takeaway:
Although accessing data along varying dimensions is ultimately all a matter of judicious bookkeeping (you could access all of this data from a 1dimensional array, after all), NumPy’s ability to provide users with an interface for accessing data along dimensions is incredibly useful. It affords us an ability to impose intuitive, abstract structure to our data.
Manipulating Arrays
NumPy provides an assortment of functions that allow us manipulate the way that an array’s data can be accessed. These permit us to reshape an array, change its dimensionality, and swap the positions of its axes:
>>> x = np.array([[ 1, 2, 3, 4],
... [ 5, 6, 7, 8],
... [ 9, 10, 11, 12]])
# reshaping an array
>>> x.reshape(3, 2, 2)
array([[[ 1, 2],
[ 3, 4]],
[[ 5, 6],
[ 7, 8]],
[[ 9, 10],
[11, 12]]])
# Transposing an array: reversing
# the ordering of its axes. This interchanges
# the rows and columns of `x`
>>> x.transpose()
array([[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11],
[ 4, 8, 12]])
A complete listing of the available arraymanipulation functions can be found in the official NumPy documentation. Among these functions, the reshape function is especially useful.
Introducing the reshape
Function
The reshape
function allows you to change the dimensionality and axislayout of a given array. This adjusts the indexing interface used to access the array’s underlying data, as was discussed in earlier in this module. Let’s take a shape(6,) array, and reshape it to a shape(2, 3) array:
>>> import numpy as np
>>> x = np.array([0, 1, 2, 3, 4, 5])
# reshape a shape(6,) array into a shape(2,3) array
>>> x.reshape(2, 3)
array([[0, 1, 2],
[3, 4, 5]])
You can also conveniently reshape an array by “setting” its shape via assignment:
# equivalent to: x = x.reshape(2, 3)
>>> x.shape = (2, 3)
Of course, the size the the initial array must match the size of the tobe reshaped array:
# an array with 5 numbers are cannot be reshaped
# into a (3, 2) array
>>> np.array([0, 1, 2, 3, 4]).reshape(3, 2)
ValueError: total size of new array must be unchanged
Multidimensional arrays can be reshaped too:
# reshaping a multidimensional array
>>> x = np.array([[ 0, 1, 2, 3],
... [ 4, 5, 6, 7],
... [ 8, 9, 10, 11]])
# reshape from (3, 4) to (2, 3, 2)
>>> x.reshape(2, 3, 2)
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]]])
Because the size of an input array and the resulting reshaped array must agree, you can specify one of the dimensionsizes in the reshape function to be 1, and this will cue NumPy to compute that dimension’s size for you. For example, if you are reshaping a shape(36,) array into a shape(3, 4, 3) array. The following are valid:
# Equivalent ways of specifying a reshape
# np.arange(36) produces the shape(36,) array ([0, 1, 2, ..., 35])
np.arange(36).reshape(3, 4, 3) # (36,) reshape> (3, 4, 3)
np.arange(36).reshape(3, 4, 1) # NumPy replaces 1 with 36/(3*4) > 3
np.arange(36).reshape(3, 1, 3) # NumPy replaces 1 with 36/(3*3) > 4
np.arange(36).reshape(1, 4, 3) # NumPy replaces 1 with 36/(3*4) > 3
You can use 1 to specify only one dimension:
>>> np.arange(36).reshape(3, 1, 1) # this is an ambiguous specification, and thus

ValueError Traceback (most recent call last)
<ipythoninput3207d18d18af2> in <module>()
> 1 np.arange(36).reshape(3, 1, 1)
ValueError: can only specify one unknown dimension
Reshaping Does Not Make a Copy of an Array:
For all straightforward applications of reshape, NumPy does not actually create a new copy of an array’s data when performing a reshape
operation. Instead, the original array and the reshaped array reference the same underlying data. The reshaped array simply provides a new indexinterface for accessing said data, and is thus referred to as a “view” of the original array (more on this “views” in a later section).
Links to Official Documentation
Reading Comprehension Solutions
Reading Comprehension: Multidimensional Indexing
>>> arr = np.array([[[ 0, 1, 2],
... [ 3, 4, 5],
... [ 6, 7, 8]],
...
... [[ 9, 10, 11],
... [12, 13, 14],
... [15, 16, 17]],
...
... [[18, 19, 20],
... [21, 22, 23],
... [24, 25, 26]]])
#1
>>> arr[:, :, 2]
array([[ 2, 5, 8],
[11, 14, 17],
[20, 23, 26]])
#2
>>> arr[0:2, 1, :]
array([[ 3, 4, 5],
[12, 13, 14]])
#3
>>> arr[0, :2, 2]
array([2, 5])
#4
>>> arr[1, :, ::1]
array([[11, 10, 9],
[14, 13, 12],
[17, 16, 15]])