Upgrading to the Latest Version of CDH4 in a Cloudera Manager Deployment

  Important:

Use the right instructions: the following instructions describe how to upgrade to the latest CDH4 release from an earlier CDH4 release in a Cloudera Managed Deployment. If you are upgrading from a CDH3 release, use the instructions under Upgrading CDH3 to CDH4 in a Cloudera Managed Deployment instead.

As of Cloudera Manager 4.5, you can upgrade to CDH 4.1.2 (or later) within the Cloudera Manager Admin Console, using parcels and an upgrade wizard. This vastly simplifies the upgrade process. In addition, this will enable Cloudera Manager to automate the deployment and rollback of CDH versions. Electing to upgrade using packages means that future upgrades and rollbacks will still need to be done manually.

If you are running Cloudera Manager 4.5 or later, and want to upgrade to CDH 4.1.2 or later, see Upgrading to CDH4.1.2 or Later Using the Upgrade Wizard for instructions. If you want to upgrade to a version of CDH4 earlier than 4.1.2, you will still need to follow the package upgrade instructions below (see Upgrading Using Packages).

  Note:

Upgrading Impala: If you have CDH 4.1.x with a beta version of Cloudera Impala installed, and you plan to upgrade to CDH 4.2 or CDH 4.3, you must also upgrade Impala from the beta version to version 1.0. With a parcel installation you can download and activate both parcels before you proceed to restart the cluster.

You will need to change the remote parcel repo URL to point to the location of the released product. Instructions are included below.

Before You Begin

  • Before upgrading, be sure to read about the latest Incompatible Changes and Known Issues and Workarounds in the CDH4 Release Notes.
  • If you are upgrading a cluster that is part of a production system, be sure to plan ahead. As with any operational work, be sure to reserve a maintenance window with enough extra time allotted in case of complications. The Hadoop upgrade process is well understood, but it is best to be cautious. For production clusters, Cloudera recommends allocating up to a full day maintenance window to perform the upgrade, depending on the number of hosts, the amount of experience you have with Hadoop and Linux, and the particular hardware you are using.
  Important:

Hive underwent a major version change between CDH 4.0 to 4.1 and between CDH 4.1 and 4.2. (CDH 4.0 had Hive 0.8.0, CDH 4.1 used Hive 0.9.0, and 4.2 or later has 0.10.0). This requires the user to manually back up and upgrade their Hive metastore database when upgrading between a major Hive version. ---  If you are upgrading between major versions, you must follow the steps in Step 4. "Upgrading Your Metastore" in the Hive Installation section in the CDH4 Installation Guide for upgrading the metastore BEFORE you start the Hive service. This applies whether you are upgrading using packages or parcels.

Performing a Rolling Upgrade from CDH4.x to CDH4.1.2 or later

  Note:

This feature is available only with Cloudera Enterprise.

The feature described in this section is not available in Cloudera Manager with Cloudera Standard.

If you have been using the Cloudera Enterprise Trial Edition, this feature will no longer be available after your trial license expires.

To obtain a license for Cloudera Enterprise, please contact sales@cloudera.com. When you install your Enterprise license, this feature will be enabled.

If you are using Cloudera Enterprise, are performing an upgrade between CDH4 versions, and have enabled HDFS High Availability, you may optionally follow the Rolling Upgrade procedure. Rolling Upgrade is not available with Cloudera Standard.

Upgrading to CDH4.1.2 or Later Using the Upgrade Wizard

If you want to upgrade to CDH 4.1.2 or later, you can do the upgrade using parcels from within the Cloudera Manager Admin Console.

To upgrade Impala from the beta version to version 1.0:

You must point the Impala parcel repo URL to the released Impala parcel:

  1. From the Administration tab, select Properties.
  2. Go to the Parcels category.
  3. Under the Remote Parcel Repository URLs property, find the entry http://beta.cloudera.com/impala/parcels/ and replace it with http://archive.cloudera.com/impala/parcels/.
  4. Save your change.

Now you can proceed to upgrade your installation.

Step 1. Download, Distribute, and Activate the CDH4 (and Impala) Parcels.

  1. In the Cloudera Manager Admin Console, click the Parcels indicator in the top navigation bar ( images/image4.jpeg or images/image3.jpeg ) to go to the Parcels page.
  2. In the parcels page, click Download for the version(s) you want to download. If you want to run both CDH and Cloudera Impala, you should download both the CDH and Impala parcels.
  3. When the download has completed, click Distribute for the version you downloaded.
  4. When the parcel has been distributed and unpacked, the button will change to say Activate.
  5. Click Activate. This will display a pop-up that will offer to restart your services. DO NOT RESTART services at this point – click Close to remove the pop-up.
      Important:

    If you are upgrading between major Hive versions (i.e from CDH 4.0 to 4.1 or 4.2, or from CDH 4.1 to 4.2) DO NOT restart the services – you must upgrade your Hive metastore before you restart Hive.

Step 2. Upgrade the Hive Metastore

Go to Step 4. Upgrade your Hive Metastore below and follow the instructions there to upgrade the Hive metastore.

Step 3. (If Upgrading to CDH 4.2) Upgrade the Oozie Sharelib

  1. In the Cloudera Manager Admin Console, select Oozie from the Services tab.
  2. From the Actions button, choose Stop.
  3. When the service has stopped, from the Actions button choose Install Oozie Sharelib. The commands to perform this function are run.

Step 4. Restart the Services

  1. In the Cloudera Manager Admin Console, select All Services from the Services tab.
  2. Click the top Actions button that corresponds to the cluster and choose Restart. The Command Details window shows the progress of starting services.

Step 5. Deploy the new client configuration files

  1. From the top Actions button that corresponds to the cluster and choose Deploy Client Configuration....
  2. Click the Deploy Client Configuration button in the confirmation pop-up that appears.

Step 6. Remove the previous CDH version packages.

If your previous installation of CDH 4 (4.0.x or 4.1.x) was done using packages, you must remove those packages and refresh the symlinks so that clients will run the new software versions.

  1. If Hue is configured to use SQLite as its database, back up the desktop.db to a temporary location before deleting the old Hue Common package. The location of the database can be found in the Hue service Configuration tab under Service > Database > Hue's Database Directory.
      Important: Removing the Hue Common package will remove your Hue database; if you do not back it up you may lose all your Hue user account information.

    Make sure the new Hue service is running before you remove the old packages.

  2. To uninstall the CDH packages (not including Impala):
    Operating System Command

    RHEL

    $ sudo yum remove hadoop hue-common bigtop-jsvc bigtop-tomcat

    SLES

    $ sudo zypper remove hadoop hue-common bigtop-jsvc bigtop-tomcat

    Ubuntu or Debian

    $ sudo apt-get purge hadoop hue-common bigtop-jsvc bigtop-tomcat
  3. To uninstall the CDH packages including Impala:
    Operating System Command

    RHEL

    $ sudo yum remove hadoop hue-common 'bigtop-*'

    SLES

    $ sudo zypper remove hadoop hue-common 'bigtop-*'

    Ubuntu or Debian

    $ sudo apt-get purge hadoop hue-common 'bigtop-*'

Step 7. Update symlinks for the newly installed components.

Restart all the Cloudera Manager agents to force an update of the symlinks to point to the newly installed components.

To restart the Cloudera Manager agents:

On each host:

$ sudo service cloudera-scm-agent restart

Upgrading Using Packages

Upgrading Unmanaged Components

Upgrading unmanaged components is a process that is separate from upgrading managed components. Upgrade the unmanaged components before proceeding to upgrade managed components. For example, if you have unmanaged Flume installed, upgrade that before proceeding to upgrade managed components. Components that you might have installed that are not managed by Cloudera Manager include:

  • Flume 1.x
  • Sqoop
  • Pig
  • Whirr
  • Mahout

For information on upgrading these unmanaged components, see CDH4 Installation Guide.

Step 1. Stop all the CDH Services on All Hosts

You must stop all Hadoop services before upgrading CDH.To stop all services

  1. In the Cloudera Manager Admin Console, select Services > All Services.
  2. Click the top Actions button that corresponds to the cluster and choose Stop....

    Click Stop in the confirmation screen. The Command Details window shows the progress of stopping services.

    When All services successfully stopped appears, the task is complete and you may close the Command Details window.

  3. For each Cloudera Management Service entry, click Actions and click Stop.... Click Stop in the confirmation screen.

    The Command Details window shows the progress of stopping services.

    When All services successfully stopped appears, the task is complete and you may close the Command Details window.

Repeat this process for all clusters hosting CDH4 machines to be upgraded.

Step 2. Back up the HDFS Metadata on the NameNode

  Important:

Do the following when you are sure that all Hadoop services have been shut down. It is particularly important that the NameNode service is not running so that you can make a consistent backup.

  Note: Cloudera recommends backing up HDFS metadata on a regular basis, as well as before a major upgrade.
  1. On the Services page of Cloudera Manager, click the HDFS service, then the Configuration tab. Navigate to the NameNode category and find NameNode Data Directory.
  2. From the command line on the NameNode machine, back up that directory; for example, if the data directory is /mnt/hadoop/hdfs/name, do the following as root:
    # cd /mnt/hadoop/hdfs/name
    # tar -cvf /root/nn_backup_data.tar .

    You should see output like this:

    ./
    ./current/
    ./current/fsimage
    ./current/fstime
    ./current/VERSION
    ./current/edits
    ./image/
    ./image/fsimage
  3. Check the output.
      Warning:

    If you see a file containing the word lock, the NameNode is probably still running. Repeat the preceding steps, starting by shutting down the Hadoop services.

Step 3. Upgrade Managed Components

There are a variety of strategies that you can use to upgrade to the latest version of CDH4.

  • You can use your operating system's package management tools to update all packages to the latest version using standard repositories. This approach works well because it minimizes the amount of configuration required and uses the simplest commands. Be aware that this can take a considerable amount of time if you have not upgraded the system recently.
  • You can target the cloudera.com repository that is added during a typical install, only updating Cloudera components. This limits the scope of updates to be completed, so the process takes less time. This will not work if you created and used a custom repository.
  • You can use a custom repository. This process can be more complicated, but enables updating Cloudera components for CDH machines that are not connected to the Internet.

Updating Everything

You can update all components on your system, including Cloudera components. Note that this may take a significant amount of time. To update all packages on your system, use the following command:

Operating System Command

RHEL

$ sudo yum update

SLES

$ sudo zypper up

Ubuntu or Debian

$ sudo apt-get upgrade

Once you complete the process of updating all components, proceed to Start the Services you Stopped.

Updating Cloudera Components Using Default Repositories

To install the new version, you can upgrade from Cloudera's repository by adding an entry to your operating system's package management configuration file. The repository location varies by operating system.

Operating System

Configuration File Repository Entry

Red Hat

http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/4/

SLES

http://archive.cloudera.com/cdh4/sles/11/x86_64/cdh/4/

Debian Squeeze

[arch=amd64] http://archive.cloudera.com/cdh4/debian/squeeze squeeze-cdh4 contrib

Ubuntu Lucid

[arch=amd64] http://archive.cloudera.com/cdh4/ubuntu/lucid/amd64/cdh lucid-cdh4 contrib

Ubuntu Precise

[arch=amd64] http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh precise-cdh4 contrib

For example, under Red Hat, to upgrade from Cloudera's repository you can run commands such as the following on the CDH host to update only CDH:

$ sudo yum clean all
$ sudo yum update 'cloudera-*'  
  Note:
– cloudera-cdh4 is the name of the repository on your system; the name is usually in square brackets on the first line of the repo file, in this example /etc/yum.repos.d/cloudera-cdh4.repo:
[chris@ca727 yum.repos.d]$ more cloudera-cdh4.repo
[cloudera-cdh4]
...

yum clean all cleans up yum's cache directories, ensuring that you download and install the latest versions of the packages. – If your system is not up to date, and any underlying system components need to be upgraded before this yum update can succeed, yum will tell you what those are.

On a SLES system, use commands like this to clean cached repository information and then update only the CDH components. For example:

$ sudo zypper clean --all
$ sudo zypper up -r http://archive.cloudera.com/cdh4/sles/11/x86_64/cdh/4

To verify the URL, open the Cloudera repo file in /etc/zypp/repos.d on your system (for example /etc/zypp/repos.d/cloudera-cdh4.repo) and look at the line beginning

baseurl=

Use that URL in your sudo zypper up -r command.

On a Debian/Ubuntu system, use commands like this to clean cached repository information and then update only the CDH components. First:

$ sudo apt-get clean

After cleaning the cache, use one of the following upgrade commands to upgrade CDH.

Precise:

$ sudo apt-get upgrade -t precise-cdh4

Lucid:

$ sudo apt-get upgrade -t lucid-cdh4

Squeeze:

$ sudo apt-get upgrade -t squeeze-cdh4

At the end of this process you should have the most recent versions of the CDH packages installed on the host and you can now proceed to Start the Services you Stopped.

Updating Cloudera Components Using Custom Repositories

You can create your own repository, as described in Appendix A - Understanding Custom Installation Solutions. Creating your own repository is necessary if you are upgrading a cluster that does not have access to the Internet.

If you used a custom repository to complete the installation of current files and now you want to update using a custom repository, the details of the steps to complete the process are variable.

In general, begin by updating any existing custom repository that you will use with the installation files you wish to use. This can be completed in a variety of ways. For example, you might use wget to copy the necessary installation files. Once the installation files have been updated, use the custom repository you established for the initial installation to update CDH.

OS Command

RHEL

Ensure you have a custom repo that is configured to use your internal repository. For example, if you could have custom repo file in /etc/yum.conf.d/ called cdh_custom.repo in which you specified a local repository. In such a case, you might use the following commands:

$ sudo yum clean all
$ sudo yum update 'cloudera-*'  

SLES

Use commands such as the following to clean cached repository information and then update only the CDH components:

$ sudo zypper clean --all
$ sudo zypper up -r http://internalserver.example.com/path_to_cdh_repo

Ubuntu or Debian

Use a command that targets upgrade of your CDH distribution using the custom repository specified in your apt configuration files. These files are typically either the /etc/apt/apt.conf file or in various files in the /etc/apt/apt.conf.d/ directory. Information about your custom repository must be included in the repo files. The general form of entries in Debian/Ubuntu is:

deb http://server.example.com/directory/ dist-name pool

For example, the entry for the default repo is:

deb http://us.archive.ubuntu.com/ubuntu/ precise universe

On a Debian/Ubuntu system, use commands such as the following to clean cached repository information and then update only the CDH components:

$ sudo apt-get clean
$ sudo apt-get upgrade -t your_cdh_repo

Step 4. Upgrade your Hive Metastore

If you are upgrading from CDH4.2 to CDH 4.3, you do not need to perfrom this step. If you are upgrading from an earlier version of CDH to either 4.2 or 4.3, you do need to do this.

  1. (Strongly recommended) Make a backup copy of your Hive metastore database.
  2. Follow the instructions at Step 4. "Upgrading Your Metastore" in the Hive Installation section in the CDH4 Installation Guide to run the metastore upgrade script.
    • If you are upgrading to packages, the upgrade script is at /usr/lib/hive/scripts/metastore/upgrade/
    • If you are upgrading to parcels, then the upgrade script is located at /opt/cloudera/parcels/<parcel_name>/lib/hive/scripts/metastore/upgrade/<database>. <parcel_name> should be the name of the parcel to which you have upgraded. <database> is the type of database you are running (i.e. mysql, postgres, etc.) For example, if you are installing a CDH 4.2.0 parcel using the default location for the local repository, and using the default database (PostgreSQL) the script will be at: /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10-e16.parcel/lib/hive/scripts/metastore/upgrade/postgres
    • You must cd to the directory the scripts are in.
    • Execute the script in the appropriate DB command shell.
      Important:

    You must know the password for the database; if you installed Cloudera Manager using the default (embedded PostgreSQL) database, the password was displayed on the Database Setup page during the Cloudera Manager installation wizard. If you do not know the password for your Hive metastore database, you can find it as follows:

    • cat /etc/cloudera-scm-server/db.properties This shows you Cloudera Manager's internal database credentials.
    • Run the following command:
      psql -p 7432 -U scm scm -c "select s.display_name as hive_service_name, s.name as hive_internal_name, c.value as metastore_password from configs c, services s where attr='hive_metastore_database_password' and c.service_id = s.service_id"
    • Use the password from com.cloudera.cmf.db.password. This will output the passwords for the hive service metastore as follows:
       hive_service_name | hive_internal_name | metastore_password
      -------------------+--------------------+--------------------
       hive1             | hive1              | lF3Cv2zsvI
      (1 row)
  3. If you have multiple instances of Hive, run the upgrade script on each metastore database.

Step 5. (If Upgrading to CDH 4.2) Upgrade the Oozie Sharelib

  1. In the Cloudera Manager Admin Console, select Oozie from the Services tab. The service should already be stopped.
  2. From the Actions button choose Install Oozie Sharelib. The commands to perform this function are run.

Step 6. Start the Services you Stopped

You can now start the services that you stopped in Step 1. Proceed as follows:

  1. In the Cloudera Manager Admin Console, click the Services tab.
  2. Click the top Actions button that corresponds to the cluster and choose Start. The Command Details window shows the progress of starting services. When All services successfully started appears, the task is complete and you may close the Command Details window.

Repeat this process for all clusters that you previously stopped.

Step 7. Deploy client configurations

  1. From the top Actions button that corresponds to the cluster and choose Deploy Client Configuration....
  2. Click the Deploy Client Configuration button in the confirmation pop-up that appears.