Adding Cloudera Search

Cloudera Search Beta is supported by Cloudera Manager 4.6. You can install Search (Solr) with Cloudera Manager using one of the two following methods:

Note that Cloudera Manager installs the Cloudera Search Beta packages, but does not configure or start the service automatically. You must use the Add Service workflow to start the Solr service. The following sections will guide you through:

Adding a Solr Service to the Cluster

Once the initial cluster has been set up, you can add the Cloudera Search service to this cluster.

  Note: HDFS and ZooKeeper services must be running before adding Cloudera Search.
  1. Connect to the Cloudera Manager Admin Console.
  2. Click the Services tab, then choose All Services.
  3. From the Actions menu, select Add a Service. A list of possible services are displayed. You can add one type of service at a time. Choose the Solr service.
  4. Follow the wizard for adding Solr service to your cluster. Select which hosts on your cluster to add and configure the Solr Servers.

After completing the wizard, Cloudera Manager automatically initializes Solr home in ZooKeeper and HDFS.

Once you have set up the Solr service, you can create collections by following the instructions in the Cloudera Search Installation Guide, in the section "Deploying Cloudera Search in SolrCloud Mode", under the heading "Administering Solr with the solrctl Tool."

  Important: You must provide the ZooKeeper ensemble connection string using the "--zk" option with the "/solr" suffix for "solrctl" commands. For example:
  solrctl --zk <zkhost1>:2181,<zkhost2>:2181,<zkhost3>:2181/solr ...  

Adding a Flume Service to the Cluster

To use Flume-NG Solr sink, the Flume service must be running on your cluster. You can add a Flume service from the All Services Actions menu in the same way you added the Solr service:

  1. Connect to the Cloudera Manager Admin Console.
  2. Click the Services tab, then choose All Services.
  3. From the Actions menu, select Add a Service. A list of possible services are displayed. You can add one type of service at a time. Choose the Flume service.
  4. Follow the wizard for adding Flume service to your cluster. Select which hosts on your cluster to add and configure the Flume agents.

Configuring Flume Morphline Solr Sink for use with the Solr Service

See the Cloudera Search User Guide, specifically the section "Flume Near Real-Time Indexing Reference" for information about how to configure Flume Morphline Solr Sink.

Cloudera Manager provides a set of configuration settings under the Flume Service to help configure Flume Morphline Solr Sink. These settings are templates that you will need to modify for your deployment.
  1. Go to the Flume service, pull down the Configuration tab and select View and Edit.
  2. Under the Agent role group, find the Configuration File property that holds the flume.conf file. This is the primary configuration file for Flume agents. Modify this file (or paste your own version in here). Note that there could be more than one Agent role group -- if so, you will need to configure each one appropriately.
  3. Under the Agent role group, go to the Flume-NG Solr Sink category. Here you will find the following properties:
    • Morphlines File (morphlines.conf) - Configures Morphlines for Flume agents. Note that you should use $ZK_HOST in this file instead of specifying a ZooKeeper quorum. Cloudera Manager automatically replaces the $ZK_HOST variable with the correct value during the Flume configuration deployment.
    • Custom MIME-types File (custom-mimetypes.xml) — for use with the detectMimeTypes command. See the Cloudera Morphlines Reference Guide for details on this command.
    • Grok Dictionary File (grok-dictionary.conf) — for use with the grok command. See the Cloudera Morphlines Reference Guide for details of this command.

Once configuration is complete, Cloudera Manager automatically deploys the required files to the Flume agent's process directory when it starts the Flume agent. Therefore, you can reference the files in the Flume agent's configuration file using only their (relative path) names. For example, in flume.conf you can use the name morphlines.conf to refer to the location of the morphline configuration file.

Deploying Search with Hue

In order to use Cloudera Search with Hue, you must update the URL for the Solr Server in the Hue Server safety valve.
  1. From the Services menu, select the Hue service.
  2. Click Configuration > View and Edit.
  3. Search for the word "safety". This will display a set of Hue Safety Valve properties
  4. Add information about your Solr host to Hue Server (Default) / Advanced. For example, if your hostname is SOLR_HOST, you might add the following:
    [search]
    ## URL of the Solr Server
    solr_url=http://SOLR_HOST:8983/solr
  5. Restart the Hue Service.
  Important: If you are using parcels, you must register the "hue-search" application manually or access will fail.
  1. Stop the Hue service.
  2. From the command line do the following:
    1. cd /opt/cloudera/parcels/CDH4.3.0-1.cdh4.3.0.pXXX/share/hue 
      (Substitute your own local repository path for the /opt/cloudera/parcels... if yours is different, and specify the appropriate name of the CDH4.3 parcel that exists in your repository.)
    2. ./build/env/bin/python ./tools/app_reg/app_reg.py --install
            /opt/cloudera/parcels/SOLR-0.9.0-1.cdh4.3.0.pXXX/share/hue/apps/search  
    3. sed -i 's/\.\/apps/..\/..\/..\/..\/..\/apps/g' ./build/env/lib/python2.X/site-packages/hue.pth 
      where python should be the version you are using (e.g. python2.4).
  3. Start the Hue service.