Import: Modules and Packages

Our tour through the essentials of Python and NumPy required us to regularly make use of import statements. This allowed us to access the functions and objects that are provided by the standard library and by NumPy.

# accessing `defaultdict` from the standard library's `collections` package
from collections import defaultdict

# import the entire numpy package, giving it the alias 'np'
import numpy as np

Despite our regular use of the import statement, we have thus far swept its details under the rug. Here, we will finally pay our due diligence and discuss Python’s import system, which entails understanding the way that code can be organized into modules and packages. Modules are individual .py files from which we can import functions and objects, and packages are collections of such modules. Detailing this packaging system will not only provide us with insight into the organization of the standard library and other collections of Python code, but it will permit us to create our own packages of code.

To conclude this section, we will demonstrate the process of installing a Python package on your system; supposing that you have written your own Python package, installing it enables you to import it anywhere on your system. A brief overview will be provided of the two most popular venues for hosting Python packages to the world at large: the Python Package Index (PyPI) and Anaconda.org.

The official Python tutorial provides a nice overview of this material and dives into details that we will not concern ourselves with here.

Auto-reload:

As you follow along with this section in a Jupyter notebook include the following code at the top of you notebook:

%load_ext autoreload
%autoreload 2

Executing these “magic commands” will inform your notebook to actively reload all imported modules and packages as they are modified. If you do not execute these commands, your notebook will not “see” any changes that you have made to a module that has already been imported, unless you restart the kernel for that notebook.

Modules

A Python ‘module’ refers to a single .py file that contains function definitions and variable-assignment statements. Importing a module will execute these statements, rendering the resulting objects available via the imported module.

Let’s create our own module and import it into an interactive Python session. Open a Jupyter notebook or IPython console in a known directory on your computer. Now, with an IDE or simple text editor (not software like Microsoft Word!) create a text file named my_module.py in the same directory as you Python session. The contents of my_module.py should be:

"""
Our first Python module. This initial string is a module-level documentation string.
It is not a necessary component of the module. It is a useful way to describe the
purpose of your module.
"""

print("I am being executed!")

some_list = ["a", 1, None]

def square(x):
    return x ** 2

def cube(x):
    return x ** 3

Returning to our interactive Python session we can import this module into our session. Python is able to “find” this module because it is in the present directory - more on this later. Importing my_module.py will execute all of its code in order from top to bottom, and will produce a Python object named my_module; this is an instance of the built-in module type. Note that we do not include the .py suffix in our import statement.

# importing my_module in our interactive session
>>> import my_module
I am being executed!

# produced is a object that is an instance of the module-type
>>> my_module
<module 'my_module' from 'usr/my_dir/my_module.py'>

>>> type(my_module)
module

As expected, importing our module causes our print statement to be executed, which explains why 'I am being executed!' is printed to the console. Next, the objects some_list, square, and cube are defined as the remaining code is executed. These are made available as attributes of our module object.

# all of the variables assigned in the module are made
# available as attributes of the module object
>>> my_module.some_list
['a', 1, None]

>>> my_module.square
<function my_module.square(x)>

>>> my_module.square(2)
4

>>> my_module.cube(2)
8

It is critical to understand that this is the means by which the contents of a module is made available to the environment in which it was imported. In this vein, a good way to get to know the contents of a module is to make use of the auto-completion feature provided by IDEs and interactive consoles to list all of the attributes of a module object. The built-in help function can also be used to summarize a module’s contents:

>>> help(my_module)
Help on module my_module:

NAME
    my_module

DESCRIPTION
    Our first Python module. This initial string is a module-level documentation string.
    It is not a necessary component of the module.

FUNCTIONS
    cube(x)

    square(x)

DATA
    some_list = ['a', 1, None]

FILE
    c:\users\ryan soklaski\desktop\learning_python\python\module5_oddsandends\my_module.py

Takeaway:

A module is simply a text file named with a .py suffix, whose contents consist of Python code. A module can be imported into an interactive console environment (e.g. a Jupyter notebook) or into another module. Importing a module executes that module’s code and produces a module-type object instance. Any variables that were assigned during the import are bound as attributes to that object.

Reading Comprehension: Creating a simple module

Create a simple math module named basic_math.py. It should make available the irrational numbers \(\pi\) and \(e\), and the function deg_to_rad, which converts an angle from degrees to radians.

Next, import this module and compute \(e^{i\pi}\) and compute 45 degrees in radians.

Import Statements

Python provides a flexible framework for importing modules and specific members of a module. We have already seen in our work with NumPy that we can specify an alias in our import statement; this can be especially convenient for import modules with long names:

# the module object for numpy is returned with the variable name `np`
>>> import numpy as np
>>> np.array([2., 3.])
array([2., 3.])

In general we can perform an alias for an import via: import <module_name> as <alias_name>.

Next, the pattern from <module_name> import <thing1>, <thing2>, ... permits us to import specific objects from the module instead of the importing the entire module as a whole. Let’s import square and some_list from basic_module.py:

>>> from my_module import square, some_list
I am being executed!

>>> some_list
['a', 1, None]

>>> square(2)
4

Note that the module is still being executed in full, however instead of producing the module-instance my_module, this imort statement instead only returns the specified objects that were defined in the module.

Lastly, you can specify * to refer to all of the module’s attributes.

# import all of the contents of `my_module`
>>> from my_module import *

You can include in your module a list named __all__, which stores attribute names as strings, to restrict those attributes that are referred to by *. That is, if we included __all__ = ["cube", "some_list"] within my_module.py, then from my_module import * will only import cube and some_list, and not square.

Lastly, we can also make use of aliasing for this style of import:

>>> from my_module import cube as my_cube
>>> my_cube(2)
8

Packages

It is not unusual, when working on larger scale projects, to want to organize one’s code into several modules. For example, suppose that we are writing software for performing facial recognition. We might want to have a camera module for taking pictures, a face-detection module for storing a model-class capable of detecting faces, and a database module that is used to store and update “seen” faces. These modules can be stored in a collective package.

A Python package is a directory containing a file with the name __init__.py, along with other Python modules and subpackages (i.e. subdirectories, each with its own __init__.py file and associated modules). The __init__.py file is of special importance - it serves as the indicator that its housing directory is to be treated as a package. For example, let’s build a bare-bones package with the following directory structure:

- your current Jupyter notebook / console session
- a_dir/
    |--__init__.py

Note that your interactive Python session (e.g. Jupyter notebook) should be in active in the same directory as a_dir/. The name of the directory containing the __init__.py file is the name of the package, thus the name of this package is a_dir. Suppose the contents of __init__.py is as follows:

def sum_func(x, y):
    return x + y

def divide_func(x, y):
    return x / y

As with a module, importing this package will execute the contents of __init__.py and make available sum_func and divide_func as attributes of the resulting module object:

# importing a python package
>>> import a_dir
>>> a_dir.divide_func(1, 2)
0.5

Let’s graduate to a more fleshed out package, which contains modules and subpackages. The following package, face_detection, possesses the modules utils, database, and model. It also contains the subpackage camera, which contains the config module and the calibration module.

- face_detection/
    |-- __init__.py
    |-- utils.py
    |-- database.py
    |-- model.py
    |-- camera/
        |-- __init__.py
        |-- calibration.py
        |-- config.py

We can access the contents of these modules via <package>.<module>, <package>.<subpackage>.<module> and so on. Consider the following examples:

# importing a function from the `database` module
>>> from face_detection.database import load_database

# importing the entire `model` module
>>> from face_detection import model

# importing a function from the `camera` subpackage
>>> from face_detection.camera.config import restore_default

See that the . syntax allows us to drill progressively deeper down into modules and subpackages, relative to our top-level package.

Reading Comprehension: Packages

Suppose that we are working with a package named mail. The mail/ directory contains an __init__.py module whose contents are:

def send_mail(x):
    return x

phrase_of_the_day = "get that package delievered!"

It also contains the module delivery.py with the contents:

def get_zip():
    """just a dummy function"""
    return 871092

Create this package.
Import the send_mail function
Import the delivery module and then execute it’s get-zip function

Intra-Module Imports

Modules within a package can import from one another; suppose, for instance, that both face_detection.database and face_detection.camera.calibration both want to leverage the face_detection.utils module. Recall the layout of this package:

- face_detection/
    |-- __init__.py
    |-- utils.py
    |-- database.py
    |-- model.py
    |-- camera/
        |-- __init__.py
        |-- calibration.py
        |-- config.py

There are two import styles that can be used to facilitate these inter-module imports: absolute imports and relative imports.

Absolute Imports

The absolute import style works by specifying all modules in terms of their absolute position in relation to the topmost package. Suppose that we want to import the utils module in our model module, stated as an absolute import, this would be:

import face_detection.utils

or, using an alias

import face_detection.utils as utils

Suppose that we want to import the utils module in face_detection.camera.calibration as well; the absolute import would look identical since the absolute location of utils relative to the top-level package does not change based on where we are using this import statement.

As an additional example, to import the camera configuration module anywhere in the package, the absolute import statement would appear as:

import face_detection.camera.config

The absolute import syntax supports all of the variations the we enumerated above, such as aliased imports and from <module> import <object>.

Relative Imports

Relative imports use dots to indicate the relative position of the imported module, based on the location of the module doing the importing. For example, suppose that we want to import the utils module in our model module using a relative import statement:

from . import utils

See that . serves to represent “the current package”. .. then refers to “one package above the current one”. Thus a relative import of utils from face_detection/camera/calibration.py would look like:

from .. import utils

We use .. because utils is not in the same package as calibration.py, but is one package above it. We can also import specific contents from a module in this way:

from ..utils import some_util_func

The relative import style is more restricted than the absolute import style. It is only able to embody the from <module> import <thing1>, <thing2> form of import statements. Despite their limited form, relative imports can be nice to use if you are dealing with a project with deeply nested sub-packages.

Installing a Package

PYTHONPATH and Site-Packages

Thus far in our reading of Python packages we have had to take care that all of the modules and packages that we have written reside in the same directory as our interactive Python session. How is it that we are able to import NumPy in any session or module, without any knowledge of where that package is located? This is because we have installed NumPy, which means that the package has been placed in our “Python path”, which is indicated as PYTHONPATH.

The PYTHONPATH specifies the directories that the Python interpreter will look in when importing modules. You can check your PYTHONPATH using sys.path:

# looking up `PYTHONPATH`
>>> import sys
>>> sys.path
['',
 '/home/TerranceWasabi/miniconda3/bin',
 '/home/TerranceWasabi/miniconda3/lib/python36.zip',
 '/home/TerranceWasabi/miniconda3/lib/python3.6',
 '/home/TerranceWasabi/miniconda3/lib/python3.6/lib-dynload',
 '/home/TerranceWasabi/miniconda3/lib/python3.6/site-packages'
]

Note that the first entry in PYTHONPATH is '', meaning that the Python interpreter will first look in the current directory when trying to do an import. If it does not find any package or module that satisfies the import statement, then it will proceed to check the next entry in PYTHONPATH. This is why we took care to create our modules and packages in the same directory as our active Python session.

Now note the last directory in PYTHONPATH: site-packages. Site-packages is the target directory in which all installed Python packages are placed by default. We can import NumPy wherever we’d like because it was placed in site-packages when it was installed, and the Python interpreter will always look in site-packages when attempting to fulfill an import statement.

In lieu of printing out your PYTHONPATH, you can look up the location of your site-packages directly:

# looking up your site-packages
>>> import site
>>> site.getsitepackages()
['/home/TerranceWasabi/miniconda3/lib/python3.6/site-packages']

It must be mentioned that we are sweeping some details under the rug here. Installing NumPy does not merely entail copying its various modules and packages wholesale into site-packages. That being said, we will not dive any deeper into the technical details of package installation beyond understanding where packages are installed and thus where our Python interpreter looks to import them.

Installing Your Own Python Package

Suppose that we are happy with the work we have done on our face_detector project. We will want to install this package - placing it in our site-packages directory so that we can import it irrespective of our Python interpreter’s working directory. Here we will construct a basic setup script that will allow us to accomplish this.

We note outright that the purpose of this section is strictly to provide you with the minimum set of instructions needed to install a package. We will not be diving into what is going on under the hood at all. Please refer to An Introduction to Distutils and Packaging Your Project for a deeper treatment of this topic.

Carrying on, we will want to create a setup-script, setup.py, in the same directory as our package. That is, our directory structure should look like:

- setup.py
- face_detection/
    |-- __init__.py
    |-- utils.py
    |-- database.py
    |-- model.py
    |-- camera/
        |-- __init__.py
        |-- calibration.py
        |-- config.py

The bare bones build script for preparing your package for installation, setup.py, is as follows:

# contents of setup.py
import setuptools

setuptools.setup(
    name="face_detection",
    version="1.0",
    packages=setuptools.find_packages(),
)

If you read through the additional materials linked above, you will see that there are many more fields of optional information that can be provided in this setup script, such as the author name, any installation requirements that the package has, and more.

Armed with this script, we are ready to install our package locally on our machine! In your terminal, navigate to the directory containing this setup script and your package that it being installed. Run

python setup.py install

and voilà, your package face_detection will have been installed to site-packages. You are now free to import this package from any directory on your machine. In order to uninstall this package from your machine execute the following from your terminal:

pip uninstall face_detection

One final but important detail. The installed version of your package will no longer “see” the source code. That is, if you go on to make any changes to your code, you will have to uninstall and reinstall your package before your will see the effects system-wide. Instead you can install your package in develop mode, such that a symbolic link to your source code is placed in your site-packages. Thus any changes that you make to your code will immediately be reflected in your system-wide installation. Thus, instead of running python setup.py install, execute the following to install a package in develop mode:

python setup.py develop

pip and conda: Package Managers

Python packages can be shared worldwide. There are two widely-used Python package managers, pip and conda. pip downloads and installs packages from The Python Package Index (PyPI), whereas conda downloads and installs packages from the Anaconda Cloud. Both conda and pip are installed as part of the Anaconda distribution.

To install a package via pip, you execute

pip install <package_name>

To install a package via conda, you execute

conda install <package_name>

Both managers will install packages to your site-packages directory.

There are substantial benefits for using conda rather than pip to install packages. First and foremost, conda has a powerful “environment solver”, which tracks the inter-dependencies of Python packages. Thus it will attempt to install, upgrade, and downgrade packages as needed to accommodate your installations. Additionally, the default list of packages available via conda are curated and maintained by Continuum Analytics, the creators of Anaconda. To elucidate one of the benefits of this, installing NumPy via pip will deliver the vanilla version of NumPy to you; conda will install an mkl-optimized version of NumPy, which can execute routines substantially faster. Finally, conda also serves as an environment manager, which allows you to maintain multiple, non-conflicting environments that can house different configurations of installed Python packages and even different versions of Python itself.

That being said, there are some benefits to using pip. PyPi is accessible and easy to upload packages to; this is likely the easiest means for distributing a Python package worldwide. As such, pip provides access to a wider range of Python packages. That being said, conda can also be directed to install packages from custom channels - providing access to packages outside of the curated Anaconda distribution. This has become a popular method of installation for machine learning libraries like PyTorch and TensorFlow.

You are free to install some packages using conda and others with pip. Just take care not to accidentally install the same package with both - this can lead to a real mess.

Links to Official Documentation

Reading Comprehension Exercise Solutions:

Creating a simple module: Solution

Create a simple math module named basic_math.py. It should make available the irrational numbers \(\pi\) and \(e\), and the function deg_to_rad, which converts an angle from degrees to radians.

The contents of basic_math.py should be:

"""Basic math constants and functions"""

pi = 3.141592653589793
e = 2.718281828459045

def deg_to_rad(angle):
    return (pi / 180) * angle

>>> import basic_math

# Euler's formula: e**(i * pi) = -1
>>> basic_math.e ** (basic_math.pi * complex(0,1))
(-1+1.2246467991473532e-16j)

>>> basic_math.deg_to_rad(45)
0.7853981633974483

Packages: Solutions

Create this package.

mail/
    |-- __init__.py
    |-- delivery.py

Import the send_mail function

>>> from mail import send_mail

Import the delivery module and then execute it’s get-zip function

>>> from mail import delivery
>>> delivery.get_zip()
871092