name | value | description |
dfs.namenode.logging.level | info | The logging level for dfs namenode. Other values are "dir"(trac
e namespace mutations), "block"(trace block under/over replications and block
creations/deletions), or "all". |
dfs.secondary.http.address | 0.0.0.0:50090 |
The secondary namenode http server address and port.
If the port is 0 then the server will start on a free port.
|
dfs.datanode.address | 0.0.0.0:50010 |
The datanode server address and port for data transfer.
If the port is 0 then the server will start on a free port.
|
dfs.datanode.http.address | 0.0.0.0:50075 |
The datanode http server address and port.
If the port is 0 then the server will start on a free port.
|
dfs.datanode.ipc.address | 0.0.0.0:50020 |
The datanode ipc server address and port.
If the port is 0 then the server will start on a free port.
|
dfs.datanode.handler.count | 3 | The number of server threads for the datanode. |
dfs.http.address | 0.0.0.0:50070 |
The address and the base port where the dfs namenode web ui will listen on.
If the port is 0 then the server will start on a free port.
|
dfs.https.enable | false | Decide if HTTPS(SSL) is supported on HDFS
|
dfs.https.need.client.auth | false | Whether SSL client certificate authentication is required
|
dfs.https.server.keystore.resource | ssl-server.xml | Resource file from which ssl server keystore
information will be extracted
|
dfs.https.client.keystore.resource | ssl-client.xml | Resource file from which ssl client keystore
information will be extracted
|
dfs.datanode.https.address | 0.0.0.0:50475 | |
dfs.https.address | 0.0.0.0:50470 | |
dfs.datanode.dns.interface | default | The name of the Network Interface from which a data node should
report its IP address.
|
dfs.datanode.dns.nameserver | default | The host name or IP address of the name server (DNS)
which a DataNode should use to determine the host name used by the
NameNode for communication and display purposes.
|
dfs.replication.considerLoad | true | Decide if chooseTarget considers the target's load or not
|
dfs.default.chunk.view.size | 32768 | The number of bytes to view for a file on the browser.
|
dfs.datanode.du.reserved | 0 | Reserved space in bytes per volume. Always leave this much space free for non dfs use.
|
dfs.name.dir | ${hadoop.tmp.dir}/dfs/name | Determines where on the local filesystem the DFS name node
should store the name table(fsimage). If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy. |
dfs.name.edits.dir | ${dfs.name.dir} | Determines where on the local filesystem the DFS name node
should store the transaction (edits) file. If this is a comma-delimited list
of directories then the transaction file is replicated in all of the
directories, for redundancy. Default value is same as dfs.name.dir
|
dfs.web.ugi | webuser,webgroup | The user account used by the web interface.
Syntax: USERNAME,GROUP1,GROUP2, ...
|
dfs.permissions | true |
If "true", enable permission checking in HDFS.
If "false", permission checking is turned off,
but all other behavior is unchanged.
Switching from one parameter value to the other does not change the mode,
owner or group of files or directories.
|
dfs.permissions.supergroup | supergroup | The name of the group of super-users. |
dfs.block.access.token.enable | false |
If "true", access tokens are used as capabilities for accessing datanodes.
If "false", no access tokens are checked on accessing datanodes.
|
dfs.block.access.key.update.interval | 600 |
Interval in minutes at which namenode updates its access keys.
|
dfs.block.access.token.lifetime | 600 | The lifetime of access tokens in minutes. |
dfs.data.dir | ${hadoop.tmp.dir}/dfs/data | Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices.
Directories that do not exist are ignored.
|
dfs.datanode.data.dir.perm | 700 | Permissions for the directories on on the local filesystem where
the DFS data node store its blocks. The permissions can either be octal or
symbolic. |
dfs.replication | 3 | Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
|
dfs.replication.max | 512 | Maximal block replication.
|
dfs.replication.min | 1 | Minimal block replication.
|
dfs.block.size | 67108864 | The default block size for new files. |
dfs.df.interval | 60000 | Disk usage statistics refresh interval in msec. |
dfs.client.block.write.retries | 3 | The number of retries for writing blocks to the data nodes,
before we signal failure to the application.
|
dfs.blockreport.intervalMsec | 3600000 | Determines block reporting interval in milliseconds. |
dfs.blockreport.initialDelay | 0 | Delay for first block report in seconds. |
dfs.datanode.directoryscan.threads | 1 | Number of threads to use when scanning volumes to
generate block reports. A value greater than one means means
volumes will be scanned in parallel. |
dfs.heartbeat.interval | 3 | Determines datanode heartbeat interval in seconds. |
dfs.namenode.handler.count | 10 | The number of server threads for the namenode. |
dfs.safemode.threshold.pct | 0.999f |
Specifies the percentage of blocks that should satisfy
the minimal replication requirement defined by dfs.replication.min.
Values less than or equal to 0 mean not to wait for any particular
percentage of blocks before exiting safemode.
Values greater than 1 will make safe mode permanent.
|
dfs.safemode.min.datanodes | 0 |
Specifies the number of datanodes that must be considered alive
before the name node exits safemode.
Values less than or equal to 0 mean not to take the number of live
datanodes into account when deciding whether to remain in safe mode
during startup.
Values greater than the number of datanodes in the cluster
will make safe mode permanent.
|
dfs.safemode.extension | 30000 |
Determines extension of safe mode in milliseconds
after the threshold level is reached.
|
dfs.balance.bandwidthPerSec | 1048576 |
Specifies the maximum amount of bandwidth that each datanode
can utilize for the balancing purpose in term of
the number of bytes per second.
|
dfs.hosts | | Names a file that contains a list of hosts that are
permitted to connect to the namenode. The full pathname of the file
must be specified. If the value is empty, all hosts are
permitted. |
dfs.hosts.exclude | | Names a file that contains a list of hosts that are
not permitted to connect to the namenode. The full pathname of the
file must be specified. If the value is empty, no hosts are
excluded. |
dfs.max.objects | 0 | The maximum number of files, directories and blocks
dfs supports. A value of zero indicates no limit to the number
of objects that dfs supports.
|
dfs.namenode.decommission.interval | 30 | Namenode periodicity in seconds to check if decommission is
complete. |
dfs.namenode.decommission.nodes.per.interval | 5 | The number of nodes namenode checks if decommission is complete
in each dfs.namenode.decommission.interval. |
dfs.replication.interval | 3 | The periodicity in seconds with which the namenode computes
repliaction work for datanodes. |
dfs.access.time.precision | 3600000 | The access time for HDFS file is precise upto this value.
The default value is 1 hour. Setting a value of 0 disables
access times for HDFS.
|
dfs.support.append | |
This option is no longer supported. HBase no longer requires that
this option be enabled as sync is now enabled by default. See
HADOOP-8230 for additional information.
|
dfs.datanode.plugins | | Comma-separated list of datanode plug-ins to be activated.
|
dfs.namenode.plugins | | Comma-separated list of namenode plug-ins to be activated.
|
dfs.datanode.failed.volumes.tolerated | 0 | The number of volumes that are allowed to
fail before a datanode stops offering service. By default
any volume failure will cause a datanode to shutdown.
|
dfs.namenode.delegation.key.update-interval | 86400000 | The update interval for master key for delegation tokens
in the namenode in milliseconds.
|
dfs.namenode.delegation.token.max-lifetime | 604800000 | The maximum lifetime in milliseconds for which a delegation
token is valid.
|
dfs.namenode.delegation.token.renew-interval | 86400000 | The renewal interval for delegation token in milliseconds.
|
dfs.namenode.name.dir.restore | false | If true the NameNode will attempt to recover any failed
dfs.name.dir directories at the next checkpoint time (triggered by
the 2NN).
|
dfs.datanode.readahead.bytes | 4193404 |
While reading block files, if the Hadoop native libraries are available,
the datanode can use the posix_fadvise system call to explicitly
page data into the operating system buffer cache ahead of the current
reader's position. This can improve performance especially when
disks are highly contended.
This configuration specifies the number of bytes ahead of the current
read position which the datanode will attempt to read ahead. This
feature may be disabled by configuring this property to 0.
If the native libraries are not available, this configuration has no
effect.
|
dfs.datanode.drop.cache.behind.reads | false |
In some workloads, the data read from HDFS is known to be significantly
large enough that it is unlikely to be useful to cache it in the
operating system buffer cache. In this case, the DataNode may be
configured to automatically purge all data from the buffer cache
after it is delivered to the client. This behavior is automatically
disabled for workloads which read only short sections of a block
(e.g HBase random-IO workloads).
This may improve performance for some workloads by freeing buffer
cache spage usage for more cacheable data.
If the Hadoop native libraries are not available, this configuration
has no effect.
|
dfs.datanode.drop.cache.behind.writes | false |
In some workloads, the data written to HDFS is known to be significantly
large enough that it is unlikely to be useful to cache it in the
operating system buffer cache. In this case, the DataNode may be
configured to automatically purge all data from the buffer cache
after it is written to disk.
This may improve performance for some workloads by freeing buffer
cache spage usage for more cacheable data.
If the Hadoop native libraries are not available, this configuration
has no effect.
|
dfs.datanode.sync.behind.writes | false |
If this configuration is enabled, the datanode will instruct the
operating system to enqueue all written data to the disk immediately
after it is written. This differs from the usual OS policy which
may wait for up to 30 seconds before triggering writeback.
This may improve performance for some workloads by smoothing the
IO profile for data written to disk.
If the Hadoop native libraries are not available, this configuration
has no effect.
|
dfs.client.use.datanode.hostname | false | Whether clients should use datanode hostnames when
connecting to datanodes.
|
dfs.datanode.use.datanode.hostname | false | Whether datanodes should use datanode hostnames when
connecting to other datanodes for data transfer.
|
dfs.client.local.interfaces | | A comma separated list of network interface names to use
for data transfer between the client and datanodes. When creating
a connection to read from or write to a datanode, the client
chooses one of the specified interfaces at random and binds its
socket to the IP of that interface. Individual names may be
specified as either an interface name (eg "eth0"), a subinterface
name (eg "eth0:0"), or an IP address (which may be specified using
CIDR notation to match a range of IPs).
|
dfs.image.transfer.bandwidthPerSec | 0 |
Specifies the maximum amount of bandwidth that can be utilized for
image transfer in term of the number of bytes per second. A default
value of 0 indicates that throttling is disabled.
|
dfs.namenode.invalidate.work.pct.per.iteration | 0.32 |
*Note*: Advanced property. Change with caution.
This determines the percentage amount of block
invalidations (deletes) to do over a single DN heartbeat
deletion command. The final deletion count is determined by applying this
percentage to the number of live nodes in the system.
The resultant number is the number of blocks from the deletion list
chosen for proper invalidation over a single heartbeat.
Value should be a positive, non-zero percentage in float notation (X.Yf),
with 1.0f meaning 100%.
|
dfs.namenode.replication.work.multiplier.per.iteration | 2 |
*Note*: Advanced property. Change with caution.
This determines the total amount of block transfers to begin in
parallel at a DN, for replication, when such a command list is being
sent over a DN heartbeat by the NN. The actual number is obtained by
multiplying this multiplier with the total number of live nodes in the
cluster. The result number is the number of blocks to begin transfers
immediately for, per DN heartbeat. This number can be any positive,
non-zero integer.
|
dfs.webhdfs.enabled | false |
Enable WebHDFS (REST API) in Namenodes and Datanodes.
|
dfs.namenode.kerberos.internal.spnego.principal | ${dfs.web.authentication.kerberos.principal} | |
dfs.secondary.namenode.kerberos.internal.spnego.principal | ${dfs.web.authentication.kerberos.principal} | |
hadoop.fuse.connection.timeout | 300 |
The minimum number of seconds that we'll cache libhdfs connection objects
in fuse_dfs. Lower values will result in lower memory consumption; higher
values may speed up access by avoiding the overhead of creating new
connection objects.
|
hadoop.fuse.timer.period | 5 |
The number of seconds between cache expiry checks in fuse_dfs. Lower values
will result in fuse_dfs noticing changes to Kerberos ticket caches more
quickly.
|