Understanding Package Management

Before getting into the details of how to configure a custom package management solution in your environment, it can be useful to have more information about:

  • How package management tools work
  • Which tools come with which operating systems
  • Each tool's configuration files

How Do Packaging and Package Management Tools Interact?

Packages (rpm or deb files) help ensure that installations complete successfully by encoding each package's dependencies. That means that if you request the installation of a solution, all required elements can be installed at the same time. For example, hadoop-0.20-hive depends on hadoop-0.20. Package management tools, such as yum (RedHat), zypper (SUSE), or apt-get (Debian/Ubuntu) are tools that can find and install any required packages. For example, for RedHat, you might enter yum install hadoop-0.20-hive. Yum would inform you that the hive package requires hadoop-0.20 and offers to complete that installation for you. Zypper and apt-get provide similar functionality.

How Do Package Management Tools Find all Available Packages?

Package management tools rely on a list of repositories. Information about the tool's repository is stored in configuration files, the location of which varies according to the particular package management tool.

  • Yum on RedHat/CentOS: /etc/yum.repos.d
  • Zypper on SUSE: /etc/zypp/zypper.conf
  • Apt-get on Debian/Ubuntu: /etc/apt/apt.conf (Additional repositories are specified using *.list files in the /etc/apt/sources.list.d/ directory.)

For example, on a typical CentOS system, you might find:

[user@localhost ~]$ ls -l /etc/yum.repos.d/
total 24
-rw-r--r-- 1 root root 2245 Apr 25  2010 CentOS-Base.repo
-rw-r--r-- 1 root root  626 Apr 25  2010 CentOS-Media.repo

Inside those .repo files are pointers to one or many repositories. There are similar pointers inside configuration files for zypper and apt-get. In the following snippet from CentOS-Base.repo, there are two repositories defined: one named Base and one named Updates. The mirrorlist parameter points to a website which has a list of places where this repository can be downloaded.

# ...
[base]
name=CentOS-$releasever - Base
mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os
#baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-5

#released updates
[updates]
name=CentOS-$releasever - Updates
mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates
#baseurl=http://mirror.centos.org/centos/$releasever/updates/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-5
# ...

You can list the repositories you have enabled. The command varies according to operating system:

  • RedHat/CentOS: yum repolist
  • SUSE: zypper repos
  • Debian/Ubuntu: Apt-get does not include a command to display sources, but you can determine sources by reviewing the contents of /etc/apt/sources.list and any files contained in /etc/apt/sources.list.d/.

The following shows an example of what you might find on a CentOS system in repolist:

[root@localhost yum.repos.d]$ yum repolist
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * addons: mirror.san.fastserv.com
 * base: centos.eecs.wsu.edu
 * extras: mirrors.ecvps.com
 * updates: mirror.5ninesolutions.com
repo id                        repo name                                 status
addons                         CentOS-5 - Addons                         enabled:     0
base                           CentOS-5 - Base                           enabled: 3,434
extras                         CentOS-5 - Extras                         enabled:   296
updates                        CentOS-5 - Updates                        enabled: 1,137
repolist: 4,867