Adding Flume

The Flume NG service must be added separately from the wizard; the packages are installed by the installation wizard, but the agents are not configured or started as part of First Run. As part of adding Flume as a service, you should first configure your Flume agents before you start those role instances.

For details of how to modify configurations and use configuration overrides in Cloudera Manager, see Modifying Service Configurations.

For detailed information about Flume agent configuration, see the Flume User Guide.

For general discussion of adding a service, see Adding a Service .

To install Flume agents on your cluster:

  1. Click the Services tab, then choose All Services.
  2. From the Actions menu, select Add a Service. A list of possible services are displayed. You can add one type of service at a time.
  3. Select the set of dependencies for the Flume service.
  4. Select the hosts on which you want Flume agents to be installed.
  5. Click Continue and the Flume agents are installed on the nodes you've selected.

Configuring your Flume Agents

The Flume agents are not started automatically. You must first configure your agents appropriately before you start them, following the instructions below.

A default Flume flow configuration is provided as an example in the Configuration properties for the Flume agents; you should replace this with the your own configuration. The default configuration, initially in the Agent (Default) role group, provides configuration for a single agent. If all your agents share the same configuration, then they can all be made members of the default role group. Note that different agent role instances can have the same name. The agent names do not have to be unique. You can use this to further simplify the configuration file. This is the recommended method to configure Flume.

A single Flume configuration file can contain the configuration for multiple agents, since each configuration property is prefixed by the agent name. You can then set the agents' names using separate role groups to specify the configuration applicable to each agent.

Flume NG can be installed on a cluster running either CDH3 or CDH4. However, monitoring of Flume is only supported if your cluster is running CDH4.1 or later, or CDH3u5 (refresh 2) or later.

To configure your Flume agents:

  1. Go to the Flume Service page (by selecting your Flume service from the Services menu or from the All Services page).
  2. Pull down the Configuration tab, and select View and Edit.
  3. Select the Agent (Default) role group in the left hand column. The settings you make here apply to the default role group, and thus will apply to all agent instances unless you associate those instances with a different role group, or override them for specific agents.
  4. Set the Agent Name property to the name of the agent (or one of the agents) whose configuration is defined in your flume.conf. You can specify only one agent name here — the name you specify will be used as the default for all Flume agent instances, unless you override the name for specific agents. You can have multiple agents with the same name — they will share the same configuration based on your configuration file.
  5. Copy the contents of your flume.conf file, in its entirety, into the Configuration File field. Unless overridden for specific agent instances, this flume.conf file will apply to all your agents. You can provide multiple agent configurations in this file and use Agent Name overrides to determine which configurations to use for each agent. This is the recommended procedure.

To override the agent name for one or more specific agents:

If you have specified multiple agent configurations in your flume.conf file, you must override the default agent name for the agent instances that should use a different (not the default) configuration.

  1. Pull down the Flume service Configuration tab, select Edit and the select the Agent (Default) role group in the left hand column.
  2. To override the Agent Name for one or more instances, move your cursor over the value area of the Agent Name property, and click Override Instances.
  3. Select the agent (role) instances you want to override.
  4. In the field labeled Change value of selected instances to: select "Other". (You can use the "Inherited Value" setting to return to the service-level value.)
  5. In the field that appears, type the agent name you want to use for the selected agents.
  6. Click Apply to have your change take effect.

After you have completed your configuration changes, you can start the Flume service, which will start all your Flume agents.

  Note:

If you need to modify your Flume configuration file after you have started the Flume service, you can use the Update Config... command from the Actions menu on the Flume Service Status page to update the configuration across flume agents without having to shut down the Flume service.

Using Flume with HDFS or HBase sinks:

If you want to use Flume with HDFS or HBase sinks, you can add a dependency to that service from the Flume configuration page. This will automatically add the correct client configurations to the Flume agent's classpath.

  Note:

If you are using Flume with HBase, please make sure that the /etc/zookeeper/conf/zoo.cfg file either does not exist on the host of the Flume Agent that is using an HBase sink, or or that it contains the correct ZK quorum.

Using Flume with Solr

The Flume Solr Sink provides a flexible, scalable, fault tolerant, transactional, Near Real Time (NRT) oriented system for processing a continuous stream of records into live search indexes. Latency from the time of data arrival to the time of data showing up in search query results is on the order of seconds, and tunable. Completing Near Real-Time (NRT) indexing requires the Flume Solr Sink. Cloudera Manager provides a set of configuration settings under the Flume Service to help configure Flume Morphline Solr Sink. See Configuring Flume Morphline Solr Sink for use with the Solr Service for detailed instructions.