Setting Up Hive Authorization with Sentry

Support for Sentry has been added in Cloudera Manager as of version 4.7. This means that the configuration of Sentry can be done entirely through the Cloudera Manager Admin Console. It is possible to install Sentry in a cluster managed by Cloudera Manager 4.5 or 4.6 by undertaking some manual configuration steps, but installation with Cloudera Manager 4.7 is strongly recommended.

Sentry enables role-based, fine-grained authorization for HiveServer2. It provides classic database-style authorization for Hive and Cloudera Impala.
  Important: When using Sentry, you must use Impala or HiveServer2 to access Hive tables. You cannot use the Hive CLI, Hue Beeswax, or WebHCat with Sentry.

For detailed information about Sentry, see the Sentry Guide.

Requirements for Sentry

The requirements for using Sentry for Hive and Impala authorization are:

  • CDH4.3.0 or later and, if using Impala, Impala 1.1 or later. Auditing of authentication failures is supported only with CDH4.4 and Impala 1.1.1 or later.
  • HiveServer2 running with strong authentication (Kerberos or LDAP).
  • A secure Hadoop cluster.
  Note: In order to use Sentry with CDH4.3, you will need to install Sentry manually before you begin the following procedure; it is not included in the CDH4.3 parcel or package. Sentry is included with CDH4.4.0.
In addition, make sure that the following are true:
  • The Hive warehouse directory (/user/hive/warehouse or the path you have specified as hive.metastore.warehouse.dir in your Hive configuration) must be owned by the Hive user and group.
  • Permissions on the warehouse directory must be set as follows:
    • 770 on the directory itself (for example, /user/hive/warehouse)
    • 770 on all subdirectories (for example, /user/hive/warehouse/mysubdir)
  Important: If you enable Sentry authorization, you should enable it on both Hive and Impala, not just for one or the other.

Configuring Sentry Authorization

The following instructions assume that the Sentry parcel or package has been installed.

If you are using CDH4.3, or have upgraded from CDH4.3 to CDH4.4 and have not already installed the Sentry parcel, you can add it separately.
  • You do not need to do this if you have done a new installation of CDH4.4 — the Sentry parcel is included.
  • If you have upgraded to CDH4.4 from CDH4.3 and did have the separate Sentry parcel installed with CDH4.3, you must remove the stand-alone parcel.
To add the Sentry parcel for CDH4.3:
  1. Under the Administration tab, go to Settings, then Parcels.
  2. In the Remote Parcel Repository URLs property, click the Plus sign to add a remote repository location. The Sentry parcel for CDH4.3 can be found at http://archive.cloudera.com/sentry/parcels/latest/. Once this is done the Sentry parcel should appear on the Hosts > Parcels page.
  3. Now Download, Distribute, and Activate the parcel from the Hosts > Parcels page. See Upgrading or Adding Software on Your Cluster for details about adding a parcel.
  Note: To enable Sentry for Impala, follow the steps below, then Enable Sentry Authorization under the Impala configuration settings.

Sentry authorization is not set up automatically by the Cloudera Manager installation or upgrade wizards. To enable authorization, do the following:

  1. In the Cloudera Manager Admin console, go to the HiveServer2 role configuration, and disable impersonation.
    1. From the Admin console, select the Hive service.
    2. Under the Configuration menu, select View and Edit.
    3. Under the HiveServer2 role group, uncheck the HiveServer2 Enable Impersonation property, and Save Changes.
  2. Create the policy file sentry-provider.ini as an HDFS file.

    Please read the information in the Sentry Guide, specifically the section on the Policy file. The file must be owned by the hive user in the hive group, with perms=640.

    By default Cloudera Manager assumes the file is in /user/hive/sentry. The path is configurable under the Configuration settings for the Hive service: under the Service-Wide category, select Sentry and modify the path in the Sentry Global Policy File property.

    The following is an example of a simple policy file:

    [groups]
    ann=default_admin
    bob=sample_reader
    joe=admin_role
    [roles]
    # can read both sample tables
    sample_reader = server=server1->db=default->table=sample_07->action=select, \
    server=server1->db=default->table=sample_08->action=select
    # implies everything on server1, default db
    default_admin = server=server1->db=default
    # implies everything on server1
    admin_role = server=server1
  3. Make sure the Hive warehouse directory ownership and permissions are as described in the requirements section above.
  4. Under the MapReduce service, TaskTracker role group(s) and/or the YARN service NodeManager role group(s), set the Minimum User ID for Job Submission to 0. Note that you must do this for every TaskTracker or NodeManager role group for the MapReduce or YARN service that is associated with Hive, if more than one exists.
    1. Select the MapReduce or YARN service and from the Configuration menu select View and Edit.
    2. Under a TaskTracker or NodeManager role group go to the Security category.
    3. Change the Minimum User ID for Job Submission to zero (the default is 1000) and Save Changes.
    4. Do this for each TaskTracker role group or NodeManager role group. (Often there are different role groups for the TaskTracker or NodeManager roles colocated on the system with the JobTracker or ResourceManager roles, vs. TaskTracker or NodeManager roles running on slave nodes.)
  5. Restart your MapReduce or YARN service.
  6. For your Hive service, under its configuration settings, go to the Service-Wide category, Sentry section, check Enable Sentry Authorization, then Save Changes.
  7. Restart the Hive service.
  Note: To prevent users from accessing the Hive metastore and the Hive metastore database using any method other than through HiveServer2, the following actions are recommended: Add a firewall rule on the metastore service host to allow access to the metastore port only from the HiveServer2 host. You can do this using iptables.

Enabling Sentry for Impala

To enable Sentry authorization for Impala after completing the configuration steps above:

  1. Go to the Impala service, and from the Configuration menu select View and Edit.
  2. Under the Service-Wide category, go to the Sentry section.
  3. Check Enable Sentry Authorization, then Save Changes.
  4. Restart the Impala service.