namevaluedescription
dfs.namenode.logging.levelinfoThe logging level for dfs namenode. Other values are "dir"(trac e namespace mutations), "block"(trace block under/over replications and block creations/deletions), or "all".
dfs.secondary.http.address0.0.0.0:50090 The secondary namenode http server address and port. If the port is 0 then the server will start on a free port.
dfs.datanode.address0.0.0.0:50010 The datanode server address and port for data transfer. If the port is 0 then the server will start on a free port.
dfs.datanode.http.address0.0.0.0:50075 The datanode http server address and port. If the port is 0 then the server will start on a free port.
dfs.datanode.ipc.address0.0.0.0:50020 The datanode ipc server address and port. If the port is 0 then the server will start on a free port.
dfs.datanode.handler.count3The number of server threads for the datanode.
dfs.http.address0.0.0.0:50070 The address and the base port where the dfs namenode web ui will listen on. If the port is 0 then the server will start on a free port.
dfs.https.enablefalseDecide if HTTPS(SSL) is supported on HDFS
dfs.https.need.client.authfalseWhether SSL client certificate authentication is required
dfs.https.server.keystore.resourcessl-server.xmlResource file from which ssl server keystore information will be extracted
dfs.https.client.keystore.resourcessl-client.xmlResource file from which ssl client keystore information will be extracted
dfs.datanode.https.address0.0.0.0:50475
dfs.https.address0.0.0.0:50470
dfs.datanode.dns.interfacedefaultThe name of the Network Interface from which a data node should report its IP address.
dfs.datanode.dns.nameserverdefaultThe host name or IP address of the name server (DNS) which a DataNode should use to determine the host name used by the NameNode for communication and display purposes.
dfs.replication.considerLoadtrueDecide if chooseTarget considers the target's load or not
dfs.default.chunk.view.size32768The number of bytes to view for a file on the browser.
dfs.datanode.du.reserved0Reserved space in bytes per volume. Always leave this much space free for non dfs use.
dfs.name.dir${hadoop.tmp.dir}/dfs/nameDetermines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
dfs.name.edits.dir${dfs.name.dir}Determines where on the local filesystem the DFS name node should store the transaction (edits) file. If this is a comma-delimited list of directories then the transaction file is replicated in all of the directories, for redundancy. Default value is same as dfs.name.dir
dfs.web.ugiwebuser,webgroupThe user account used by the web interface. Syntax: USERNAME,GROUP1,GROUP2, ...
dfs.permissionstrue If "true", enable permission checking in HDFS. If "false", permission checking is turned off, but all other behavior is unchanged. Switching from one parameter value to the other does not change the mode, owner or group of files or directories.
dfs.permissions.supergroupsupergroupThe name of the group of super-users.
dfs.block.access.token.enablefalse If "true", access tokens are used as capabilities for accessing datanodes. If "false", no access tokens are checked on accessing datanodes.
dfs.block.access.key.update.interval600 Interval in minutes at which namenode updates its access keys.
dfs.block.access.token.lifetime600The lifetime of access tokens in minutes.
dfs.data.dir${hadoop.tmp.dir}/dfs/dataDetermines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.
dfs.datanode.data.dir.perm700Permissions for the directories on on the local filesystem where the DFS data node store its blocks. The permissions can either be octal or symbolic.
dfs.replication3Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
dfs.replication.max512Maximal block replication.
dfs.replication.min1Minimal block replication.
dfs.block.size67108864The default block size for new files.
dfs.df.interval60000Disk usage statistics refresh interval in msec.
dfs.client.block.write.retries3The number of retries for writing blocks to the data nodes, before we signal failure to the application.
dfs.blockreport.intervalMsec3600000Determines block reporting interval in milliseconds.
dfs.blockreport.initialDelay0Delay for first block report in seconds.
dfs.datanode.directoryscan.threads1Number of threads to use when scanning volumes to generate block reports. A value greater than one means means volumes will be scanned in parallel.
dfs.heartbeat.interval3Determines datanode heartbeat interval in seconds.
dfs.namenode.handler.count10The number of server threads for the namenode.
dfs.safemode.threshold.pct0.999f Specifies the percentage of blocks that should satisfy the minimal replication requirement defined by dfs.replication.min. Values less than or equal to 0 mean not to wait for any particular percentage of blocks before exiting safemode. Values greater than 1 will make safe mode permanent.
dfs.safemode.min.datanodes0 Specifies the number of datanodes that must be considered alive before the name node exits safemode. Values less than or equal to 0 mean not to take the number of live datanodes into account when deciding whether to remain in safe mode during startup. Values greater than the number of datanodes in the cluster will make safe mode permanent.
dfs.safemode.extension30000 Determines extension of safe mode in milliseconds after the threshold level is reached.
dfs.balance.bandwidthPerSec1048576 Specifies the maximum amount of bandwidth that each datanode can utilize for the balancing purpose in term of the number of bytes per second.
dfs.hostsNames a file that contains a list of hosts that are permitted to connect to the namenode. The full pathname of the file must be specified. If the value is empty, all hosts are permitted.
dfs.hosts.excludeNames a file that contains a list of hosts that are not permitted to connect to the namenode. The full pathname of the file must be specified. If the value is empty, no hosts are excluded.
dfs.max.objects0The maximum number of files, directories and blocks dfs supports. A value of zero indicates no limit to the number of objects that dfs supports.
dfs.namenode.decommission.interval30Namenode periodicity in seconds to check if decommission is complete.
dfs.namenode.decommission.nodes.per.interval5The number of nodes namenode checks if decommission is complete in each dfs.namenode.decommission.interval.
dfs.replication.interval3The periodicity in seconds with which the namenode computes repliaction work for datanodes.
dfs.access.time.precision3600000The access time for HDFS file is precise upto this value. The default value is 1 hour. Setting a value of 0 disables access times for HDFS.
dfs.support.append This option is no longer supported. HBase no longer requires that this option be enabled as sync is now enabled by default. See HADOOP-8230 for additional information.
dfs.datanode.pluginsComma-separated list of datanode plug-ins to be activated.
dfs.namenode.pluginsComma-separated list of namenode plug-ins to be activated.
dfs.datanode.failed.volumes.tolerated0The number of volumes that are allowed to fail before a datanode stops offering service. By default any volume failure will cause a datanode to shutdown.
dfs.namenode.delegation.key.update-interval86400000The update interval for master key for delegation tokens in the namenode in milliseconds.
dfs.namenode.delegation.token.max-lifetime604800000The maximum lifetime in milliseconds for which a delegation token is valid.
dfs.namenode.delegation.token.renew-interval86400000The renewal interval for delegation token in milliseconds.
dfs.namenode.name.dir.restorefalseIf true the NameNode will attempt to recover any failed dfs.name.dir directories at the next checkpoint time (triggered by the 2NN).
dfs.datanode.readahead.bytes4193404 While reading block files, if the Hadoop native libraries are available, the datanode can use the posix_fadvise system call to explicitly page data into the operating system buffer cache ahead of the current reader's position. This can improve performance especially when disks are highly contended. This configuration specifies the number of bytes ahead of the current read position which the datanode will attempt to read ahead. This feature may be disabled by configuring this property to 0. If the native libraries are not available, this configuration has no effect.
dfs.datanode.drop.cache.behind.readsfalse In some workloads, the data read from HDFS is known to be significantly large enough that it is unlikely to be useful to cache it in the operating system buffer cache. In this case, the DataNode may be configured to automatically purge all data from the buffer cache after it is delivered to the client. This behavior is automatically disabled for workloads which read only short sections of a block (e.g HBase random-IO workloads). This may improve performance for some workloads by freeing buffer cache spage usage for more cacheable data. If the Hadoop native libraries are not available, this configuration has no effect.
dfs.datanode.drop.cache.behind.writesfalse In some workloads, the data written to HDFS is known to be significantly large enough that it is unlikely to be useful to cache it in the operating system buffer cache. In this case, the DataNode may be configured to automatically purge all data from the buffer cache after it is written to disk. This may improve performance for some workloads by freeing buffer cache spage usage for more cacheable data. If the Hadoop native libraries are not available, this configuration has no effect.
dfs.datanode.sync.behind.writesfalse If this configuration is enabled, the datanode will instruct the operating system to enqueue all written data to the disk immediately after it is written. This differs from the usual OS policy which may wait for up to 30 seconds before triggering writeback. This may improve performance for some workloads by smoothing the IO profile for data written to disk. If the Hadoop native libraries are not available, this configuration has no effect.
dfs.client.use.datanode.hostnamefalseWhether clients should use datanode hostnames when connecting to datanodes.
dfs.datanode.use.datanode.hostnamefalseWhether datanodes should use datanode hostnames when connecting to other datanodes for data transfer.
dfs.client.local.interfacesA comma separated list of network interface names to use for data transfer between the client and datanodes. When creating a connection to read from or write to a datanode, the client chooses one of the specified interfaces at random and binds its socket to the IP of that interface. Individual names may be specified as either an interface name (eg "eth0"), a subinterface name (eg "eth0:0"), or an IP address (which may be specified using CIDR notation to match a range of IPs).
dfs.image.transfer.bandwidthPerSec0 Specifies the maximum amount of bandwidth that can be utilized for image transfer in term of the number of bytes per second. A default value of 0 indicates that throttling is disabled.
dfs.namenode.invalidate.work.pct.per.iteration0.32 *Note*: Advanced property. Change with caution. This determines the percentage amount of block invalidations (deletes) to do over a single DN heartbeat deletion command. The final deletion count is determined by applying this percentage to the number of live nodes in the system. The resultant number is the number of blocks from the deletion list chosen for proper invalidation over a single heartbeat. Value should be a positive, non-zero percentage in float notation (X.Yf), with 1.0f meaning 100%.
dfs.namenode.replication.work.multiplier.per.iteration2 *Note*: Advanced property. Change with caution. This determines the total amount of block transfers to begin in parallel at a DN, for replication, when such a command list is being sent over a DN heartbeat by the NN. The actual number is obtained by multiplying this multiplier with the total number of live nodes in the cluster. The result number is the number of blocks to begin transfers immediately for, per DN heartbeat. This number can be any positive, non-zero integer.
dfs.webhdfs.enabledfalse Enable WebHDFS (REST API) in Namenodes and Datanodes.
dfs.namenode.kerberos.internal.spnego.principal${dfs.web.authentication.kerberos.principal}
dfs.secondary.namenode.kerberos.internal.spnego.principal${dfs.web.authentication.kerberos.principal}
hadoop.fuse.connection.timeout300 The minimum number of seconds that we'll cache libhdfs connection objects in fuse_dfs. Lower values will result in lower memory consumption; higher values may speed up access by avoiding the overhead of creating new connection objects.
hadoop.fuse.timer.period5 The number of seconds between cache expiry checks in fuse_dfs. Lower values will result in fuse_dfs noticing changes to Kerberos ticket caches more quickly.