Package management systems

Package managers install and keep track of the different software packages (and their versions) that you use within an environment. There are quite a few to choose from, for example Yum, Zypper, dpkg, and Nix (which will be mentioned briefly later in the Binder section). We’re going to focus on Conda, which has a number of useful functionalities.

What does Conda do?

Conda allows users to create any number of environments which are entirely separate, and to quickly and easily change between them. For example, say a researcher has a project: Project One, which has its own environment defined by Conda which is made up of a set of packages and versions of those packages:

Package name

Version

Package A

1.5.2

Package B

2.1.10

Package C

0.7.9

Later the researcher starts Project Two in its own environment:

Package name

Version

Package B

2.1.10

Package C

1.2.4

Package D

1.5.2

Package E

3.7.1

Note here that the version of package C used in Project Two has been updated from the version used in Project One. If these project environments were not separate then the researcher would have the choice of:

  • A) Using the older version of package C forever and not benefiting from updates and bugfixes in later versions.

  • B) Installing the updated version of the package and hoping that it doesn’t impact Project One.

  • C) Installing the updated version of the package for use in Project Two then uninstalling it and reinstalling the old one whenever they need to do work on Project One. This would be extremely annoying, and is a step that risks being forgotten.

All of these options are extremely poor, hence the utility of Conda for creating distinct environments which are easily interchangeable.

Conda can also be used to easily capture and export computational environments. It can go in the other direction too; it can generate computational environments from configuration files which can be used to recreate someone else’s environment.

Another benefit of Conda is that it offers much greater flexibility to users who do not have admin privileges on the machines they are working on (as is very common when working with high performance computing facilities). Without Conda it is typically very difficult to install required software onto such machines. However, because Conda creates and changes new environments rather than making changes to a machine’s overall system environment, admin privileges are not required.

Finally, while Conda is Python-centric to a degree, it is also well integrated for use with other languages, for example the base version of Conda includes the C++ standard library.

Installing Conda

Note that these installation instructions are directed towards Linux systems. Instructions for installing Conda on Windows or Mac systems can be found here.

Go to https://repo.continuum.io/miniconda/ and download the latest Miniconda 3 installer for your system (32 bit or 64 bit), which will have a name like Miniconda_version_number.sh. Run the installer using

bash Miniconda_version_number.sh

You can check that Conda has installed successfully by typing

conda --version

which should output a version number.

Making and using environments

Conda automatically installs a base environment with some commonly used software packages. It is possible to just work in this base environment, however it is good practise to create a new environment for each project you start.

To create an environment use conda create --name your_project_env_name followed by a list of packages to include. To include the packages scipy and matplotlib, add them to the end of the command:

conda create --name Project_One scipy matplotlib

You can specify the versions of certain (or all) packages by using =package_number after the name. For example, to specify scipy 1.2.1 in the above environment

conda create --name Project_One scipy=1.2.1 matplotlib

When creating environments you can also specify versions of languages to install, for example to use Python 3.7.1 in the Project_One environment:

conda create --name Project_One python=3.7.1 scipy=1.2.1 matplotlib

Now that an environment has been created it’s time to activate (start using) it via conda activate environment_name, so in this example:

conda activate Project_One

Note that you may need to use source instead of conda if you’re using an old version of conda.

Once an environment is activated you should see the environment name before each prompt in your terminal:

(Project_One) $ python --version
Python 3.7.1

Deactivating and deleting environments

You can deactivate (get out of) an environment using

conda deactivate

and remove (delete) an environment as shown here for removing the Project_One environment

conda env remove --name Project_One

To check if an environment has been successfully removed you can look at a list of all the Conda environments on the system using

conda env list

However deleting an environment may not delete package files that were associated with it. This can lead to a lot of memory being wasted on packages that are no longer required. Packages that are no longer referenced by any environments can be deleted using

conda clean -pts

Alternatively you can delete an environment (such as Project_One) along with its associated packages via:

conda remove --name Project_One --all

Installing and removing packages within an environment

Within an environment you can install more packages using

conda install package_name

and similarly you can remove them via

conda remove package_name

This is the best way to install packages from within Conda as it will also install a Conda-tailored version of the package. However it is possible to use other methods if a Conda-specific version of a package is not available. For example pip is commonly used to install Python packages, so a command like

pip install scipy

will list ‘scipy’ package explicitly - as long as pip is installed inside the currently active conda environment. Unfortunately, when conda and pip are used together to create an environment, it can lead to a state that can be hard to reproduce. Specifically, running conda after pip may potentially overwrite or break packages installed via pip. One way to avoid this is by installing as many requirements as possible with conda, and then use pip. Detailed information can be read on the post Using Pip in a Conda Environment.

Although Python packages have been used in many of the examples given here Conda packages do not have to be Python packages, for example here the R base language is installed along with the R package r-yaml:

conda create --name Project_One r-base r-yaml

To see all of the installed packages in the current environment

conda list

To check if a particular package is installed, for example, scipy in this case:

conda list scipy

A Conda channel is where it downloaded a package from. Common channels include Anaconda (a company which provides the defaults conda package channel), and conda-forge (a community-driven packaging endeavour). You can explicitly install a package from a certain channel by specifying it like:

conda install -c channel_name package_name

Exporting and reproducing computational environments

Conda environments can be exported easily to human-readable files in the YAML format. YAML files are discussed in more detail later in this chapter.

To export a conda environment to a file called environment.yml activate the environment and then run

conda env export > environment.yml

Similarly Conda environments can be created from YAML files via

conda env create -f environment.yml

This allows researchers to easily reproduce one another’s computational environments. Note that the list of packages is not just those explicitly installed. It can include OS-specific dependency packages so environment files may require some editing to be portable to different operating systems.

Environments can also be cloned. This may be desirable, for example, if a researcher begins a new project and wants to make a new environment to work on it in, but the new project’s environment (at least initially) requires the same packages as a previous project’s environment.

For example to clone the Project_One environment, and give this new environment the name Project_Two:

conda create --name Project_Two --clone Project_One