Configuring HDFS High Availability
You can use Cloudera Manager to configure your CDH4 cluster for HDFS High Availability (HA). High Availability is not supported for CDH3 clusters.
An HDFS HA cluster is configured with two NameNodes - an Active NameNode and a Standby NameNode. Only one NameNode can be active at any point in time. HDFS High Availability depends on maintaining a log of all namespace modifications in a location available to both NameNodes, so that in the event of a failure the Standby NameNode has up-to-date information about the edits and location of blocks in the cluster.
There are two implementations available for maintaining the copies of the edit logs:
- High Availability using Quorum-based Storage
- High Availability using an NFS-mounted shared edits directory
Quorum-based Storage relies upon a set of JournalNodes, each of which maintains a local edits directory that logs the modifications to the namespace metadata.
The other alternative is to use a NFS-mounted shared edits directory (typically a remote Filer) to which both the Active and Standby NameNodes have read/write access.
Once you have enabled High Availability, you can enable Automatic Failover, which will automatically failover to the Standby NameNode in case the Active NameNode fails. You can also initiate a manual failover from Cloudera Manager.
See the CDH4 High Availability Guide for a more detailed introduction to High Availability with CDH4.

Enabling or Disabling High Availability will shut down your HDFS service, and the services that depend on it – MapReduce, YARN, and HBase. Therefore, you should not do this while you have jobs running on your cluster. Further, once HDFS has been restored, the services that depend upon it must be restarted, and the client configurations for HDFS must be redeployed.

Enabling or Disabling High Availability will cause the previous monitoring history to become unavailable.
Enabling High Availability with Quorum-based Storage
After you have installed HDFS on your CDH4 cluster, the Enable High Availability workflow leads you through adding a second (Standby) NameNode and configuring JournalNodes.
- From the Services tab, select your HDFS service.
- Click the Instances tab.
- Click Enable High Availability (This button does not appear if this is a CDH3 version of the HDFS service.)
- The next screen shows the hosts that are eligible to run a Standby NameNode and the JournalNodes.
- Select Enable High Availability with Quorum-based Storage as the High Availability Type.
- Select the host where you want the Standby NameNode to be set up. The Standby NameNode cannot be on the same host as the Active NameNode, and the host that is chosen should have the same hardware configuration (RAM, Disk space, number of cores, etc.) as the Active NameNode.
- Select an odd number of hosts (a minimum of three) to act as JournalNodes. JournalNodes should be hosted on machines with similar hardware specification as the NameNodes. It is recommended that you put a JournalNode each on the same hosts as the Active and Standby NameNodes, and the third JournalNode on similar hardware, such as the JobTracker.
- Click Continue.
- Enter a directory location for the JournalNode edits directory into the fields for each JournalNode host.
- You may enter only one directory for each JournalNode. The names/paths do not need to be the same on every JournalNode.
- The directories you specify should be empty, and must have the appropriate permissions.
- If the directories are not empty, Cloudera Manager will not delete the contents; however, in that case the data should be in sync across the edits directories of the JournalNodes and should have the same version data as the NameNodes.
- You can choose whether the workflow will restart the dependent services and redeploy the client configuration for HDFS. To do this manually rather than have it done as part of the workflow, uncheck these extra options.
- Click Continue. Cloudera Manager proceeds to execute the set of commands that will stop the dependent services, delete, create, and configure roles and directories as appropriate, and will restart the dependent services and deploy the new client configuration if those options were selected.
- There are some additional steps you must perform if you want to use Hive, Impala, or Hue in a cluster with High Availability configured. Follow the Post Setup Steps described below.
Enabling High Availability using NFS Shared Edits Directory
After you have installed HDFS on your CDH4 cluster, the Enable High Availability workflow leads you through adding a second (Standby) NameNode and configuring the shared edits directory.
The shared edits directory is what the Standby NameNode uses to stay up-to-date with all the file system changes the Active NameNode makes. Note that you must have a shared directory already configured to which both NameNode machines have read/write access. Typically, this is a remote filer which supports NFS and is mounted on each of the NameNode machines. This directory must be writable by the hdfs user, and must be empty before you run the Enable HA workflow.
You can enable High Availability from the Actions menu on the HDFS Service page in a CDH4 cluster, or from the HDFS Service Instances tab.
- From the Services tab, select your HDFS service.
- Click the Instances tab.
- Click Enable High Availability (This button does not appear if this is a CDH3 version of the HDFS service.)
- The next screen shows the hosts that are eligible to run a Standby NameNode.
- Select Enable High Availability with NFS shared edits directory as the High Availability Type.
- Select the host where you want the Standby NameNode to be installed, and click Continue. The Standby NameNode cannot be on the same host as the Active NameNode, and the host that is chosen should have the same hardware configuration (RM, Disk space, number of cores, etc.) as the Active NameNode.
- Confirm or enter the directories to be used as the name directories for the NameNode.
- Enter the absolute path of the local directory, on each NameNode host, that is mounted to the remote shared edits directory. For example, hostA has /dfs/sharedA mounted to nfs:///exported/namenode, and hostB has /dfs/sharedB mounted to the same NFS location. The user should enter /dfs/sharedA for hostA and /dfs/sharedB for hostB. (/dfs/sharedA and /dfs/sharedB can be the same paths). You should only configure one shared edits directory. This directory must be mounted read/write on both NameNode machines. This directory must be writable by the hdfs user, and must be empty when you run the enable HA command.
- You can choose whether the workflow will restart the dependent services and redeploy the client configuration for HDFS. To do this manually rather than have it done as part of the workflow, uncheck these extra options.
- Click Continue to proceed.
- Cloudera Manager will now perform the steps to set up the Active and Standby NameNodes.
- When all the steps have been completed, click Finish. If the workflow fails, inspect the error message and logs for the cause of failure. After addressing the cause of failure, click Retry to re-execute all the steps. Alternatively, perform the remaining steps using the commands available in the Actions menu. Note that Retry will not work for workflows that fail after the "Bootstrapping Standby NameNode" step. To revert changes made by the failed workflow, use the Disable High Availability action available in the Instances tab. Note that when HA is enabled, there will no longer be a Secondary NameNode role running on your cluster. However, the Secondary NameNode's checkpoint directories are not deleted from the host. Make sure you start your services and re-deploy your client configurations before you try to run jobs on your cluster, if you did not have the Enable High Availability workflow do this automatically.
- There are some additional steps you must perform if you want to use Hive, Impala, or Hue in a cluster with High Availability configured. Follow the Post Setup Steps for Hue and Hive described below.

Post Setup Steps for Hue and Hive
There are several configuration changes you must make in order to successfully enable High Availability, whether you will be using Quorum-based storage or NFS-mounted shared edits directory. Before you enable HA, you must do the following:
- Configure the HDFS Web Interface Role for Hue to be a HTTPFS role. See Configuring Hue to work with High Availability.
- Upgrade the Hive Metastore to use High Availability. You must do this for each Hive service in your cluster. See Upgrading the Hive Metastore for HDFS High Availability.
Configuring Hue to work with High Availability
- From the Services tab, select your HDFS service.
- Click the Instances tab.
- Click the Add button.
- Under the HttpFS column, select a host where you want to install the HttpFS role and click Continue.
- After you are returned to the Instances page, select the new HttpFS role.
- From the Actions for Selected menu, select Start (and confirm).
- After the command has completed, go to the Services tab and select your Hue service.
- From the Configuration menu, select View and Edit.
- The HDFS Web Interface Role property will now show the httpfs role you just added. Select it instead of the namenode role, and Save your changes. (The HDFS Web Interface Role property is under the Service-Wide Configuration category.)
- Restart the Hue service for the changes to take effect.
Updating the Hive Metastore for HDFS High Availability
To update the Hive metastore to work with High Availability, do the following:
- Go to the Services tab and select the Hive service.
- From the Actions menu, select Stop....
Note
: You may want to stop the Hue and Impala services first, if present, as they depend on the Hive service.Confirm that you want to stop the service.
- When the service has stopped, back up the Hive metastore database to persistent storage.
- From the Actions menu, click Update Hive Metastore NameNodes... and confirm the command.
- From the Actions menu on the Hive Service page, Start... the Hive Metastore Service. Also restart the Hue and Impala services if you stopped them prior to updating the metastore.
Enabling Automatic Failover
You must have HDFS High Availability enabled in order to enable Automatic Failover.

Enabling or Disabling Automatic Failover will shut down your HDFS service, and requires the services that depend on it to be shut down.
To enable Automatic Failover:
- From the Services tab, select your HDFS service.
- Click the Instances tab.
- Click Enable Automatic Failover...
- Confirm that you want to take this action. This will stop the NameNodes for the Nameservice, create and configure Failover Controllers for each NameNode, initialize the High Availability state in ZooKeeper, and start the NameNodes and Failover Controllers.

If you are using NFS-based High Availability, a fencing method must be configured in order for failover (either automatic or manual) to function — Cloudera Manager configures this automatically. This is not required with Quorum-based Storage. See Fencing Methods if you want more information.

If you started your services and re-deployed your client configurations after you enabled HA, you should not need to do so again now. If you did not start them after enabling HA, you must do so now, before you attempt to run any jobs on your cluster.

- Stop HDFS.
- Configure the service RPC port in the Service-Wide HDFS configuration:
- From the HDFS service, Configuration tab, select View and Edit.
- Search for "dfs.namenode.servicerpc" which should display the NameNode Service RPC Port property. (It is found under the NameNode (Default) role group, Ports and Addresses category).
- Change the port value as needed.
- On a ZooKeeper server host, run the ZooKeeper client CLI:
- Parcels - /opt/cloudera/parcels/CDH/lib/zookeeper/bin/zkCli.sh
- Packages - /usr/lib/zookeeper/bin/zkCli.sh
- Execute the following to remove the pre-configured nameservice. This example assumes the name of the nameservice is nameservice1. You can identify the nameservice from the "High Availability and Federation" region on the Instances tab of HDFS:
rmr /hadoop-ha/nameservice1
- Navigate to the HDFS Instances tab. From the Actions menu at the right of the nameservice in the "Federation and High Availability" section, select Initialize High Availability State in ZooKeeper.
- Start HDFS.
Disabling Automatic Failover

You must disable Automatic Failover before you can disable High Availability.
To disable Automatic Failover
- From the Services tab, select your HDFS service.
- Click the Instances tab.
- Click Disable Automatic Failover...
- Confirm that you want to take this action. Cloudera Manager will stop the NameNodes, remove the Failover Controllers, and restart the NameNodes, transitioning one of them to be the Active NameNode.
- Execute the following to remove the pre-configured nameservice. This example assumes the name of the nameservice is nameservice1. You can identify the nameservice from the "High Availability and Federation" region on the Instances tab of HDFS:
rmr /hadoop-ha/nameservice1
Disabling High Availability

If you have enabled Automatic Failover, you must disable it before you can disable High Availability.
To disable High Availability
- From the Services tab, select your HDFS service.
- Click the Instances tab.
- Click Disable High Availability...
- Confirm that you want to take this action. If you are using Quorum-based Storage, you will have the option of disabling the Quorum-based Storage, or leaving it enabled. If you are using NameNode Federation, you should consider leaving it enabled. Cloudera Manager ensures that one NameNode is active, and saves the namespace. Then it stops the Standby NameNode, creates a SecondaryNameNode, removes the Standby NameNode role, and restarts all the HDFS services. Note that although the Standby NameNode role is removed, its name directories are not deleted. Empty these directories after making a backup of their contents. As when you enabled High Availability, you have the choice to have your dependent services restarted, and your client configuration redeployed as part of the Disable High Availability workflow. If you choose not to do this, you must do this manually.
- Update the Hive Metastore NameNode.
Fencing Methods
In order to ensure that only one NameNode is active at a time, a fencing method is required for the shared directory. During a failover, the fencing method is responsible for ensuring that the previous Active NameNode no longer has access to the shared edits directory, so that the new Active NameNode can safely proceed writing to it.
For details of the fencing methods supplied with CDH4, and how fencing is configured, see the Fencing Configuration section in the CDH4 High Availability Guide.
By default, Cloudera Manager configures HDFS to use a shell fencing method (shell(./cloudera_manager_agent_fencer.py)) that takes advantage of the Cloudera Manager agent. However, you can configure HDFS to use the sshfence method, or you can add your own shell fencing scripts, instead of or in addition to the one Cloudera Manager provides. .
The fencing parameters are found in the Service-Wide section of the Configuration tab for your HDFS service.
Converting from NFS-mounted shared edits directory to Quorum-based Storage
Converting your High Availability configuration from using a NFS-mounted shared edits directory to Quorum-based Storage just involves disabling your current High Availability configuration, then enabling High Availability using Quorum-based Storage.
- Disable High Availability (see Disabling High Availability).
- Although the Standby NameNode role is removed, its name directories are not deleted. Empty these directories.
- Enable High Availability with Quorum-based Storage (see Enabling High Availability with Quorum-based Storage).
Converting from Quorum-based Storage to NFS-mounted shared edits directory
To convert your High Availability configuration from using Quorum-based Storage to using a NFS-mounted shared edits directory you disable your current High Availability configuration, configure your NFS-mounted shared edits directories, then enable High Availability using your NFS-mounted directories.
- Disable High Availability (see Disabling High Availability).
- Although the Standby NameNode role is removed, its name directories are not deleted. Empty these directories.
- Enable High Availability using the NFS-mounted directory. Note that you must have a shared directory already configured to which both NameNode machines have read/write access. See Enabling High Availability using NFS Shared Edits Directory for detailed instructions.
<< |