Configuring High Availability for the JobTracker
Starting with Cloudera Manager 4.6 you can use Cloudera Manager to configure CDH4.3 or later for JobTracker High Availability (HA). Job Tracker High Availability is not supported for CDH3 clusters, or with CDH4 prior to CDH4.2.

JobTracker High Availability is available for MapReduce v1. It is not available for YARN (MR v2).
A JobTracker HA cluster is configured with an Active and a Standby JobTracker. Only one JobTracker can be active at any point in time. JT High Availability depends on maintaining a log of all namespace modifications in a location available to both NameNodes, so that in the event of a failure the Standby NameNode has up-to-date information about the edits and location of blocks in the cluster.
Cloudera Manager supports automatic failover of the JobTracker. It does not provide a mechanism to manually force a failover through the Cloudera Manager user interface.
See the Configuring High Availability for the JobTracker in the CDH High Availability Guide for a more detailed introduction to JobTracker High Availability.

Enabling or Disabling High Availability will shut down your HDFS service, and the services that depend on it – MapReduce, YARN, and HBase. Therefore, you should not do this while you have jobs running on your cluster. Further, once HDFS has been restored, the services that depend upon it must be restarted, and the client configurations for HDFS must be redeployed.

Enabling or Disabling High Availability will cause the previous monitoring history to become unavailable.
Enabling JobTracker High Availability
After you have installed MapReduce on your CDH4.2 cluster, the Enable High Availability workflow leads you through adding a second (Standby) JobTracker and configuring JournalNodes.
- From the Services tab, select your MapReduce service.
- Pull down the Actions menu and select Enable High Availability. This starts the wizard for enabling HA. (The menu option does not appear if this is a CDH3 version of the MapReduce service.)
- The next screen shows the hosts that are eligible to run a StandbyJobTracker. Select the host where you want the Standby JobTracker to be installed, and click Continue. The host where the current JobTracker is running is not available as a choice.
- Enter a directory location on the local filesystem for each
JobTracker host. These directories will be used to store job configuration data.
- You may enter more than one directory, though it is not required. The names/paths do not need to be the same on both JobTrackers.
- The directories you specify must exist, and should be empty, and must have the appropriate permissions.
- If the directories are not empty, Cloudera Manager will not delete the contents; however, in that case the data should be in sync across the edits directories of the JournalNodes and should have the same version data as the NameNodes.
- Optionally use the checkbox under Advanced Options to force initialize the ZooKeeper ZNode for autofailover.
- Click Continue.
Cloudera Manager proceeds to execute the set of commands that stops the MapReduce service, adds a Standby JobTracker and Failover controller, initializes the JobTracker Hig Availability state in ZooKeeper, creates the JobStatus directory, restarts MapReduce and redeploys the relevant client configurations.
Disabling JobTracker High Availability
To disable JobTracker High Availability
- From the Services tab, select your MapReduce service.
- Pull down the Actions menu and select Disable High Availability...
- Select which JobTracker (host) you want to remain as the single
JobTracker, and click Continue.
Cloudera Manager proceeds to execute the set of commands that stops the MapReduce service, removes the Standby JobTracker and the Failover Controller, restarts the MapReduce service and redeploys client configurations.
<< Previous: Running the Balancer | Next: The Hive Service >> |