HBase Health Checks
HBase Active HBase Master Health
This is an HBase service-level health check that checks for the presence of an active, running and healthy HBase Master. The check returns "Bad" health if the service is running and a running, active Master cannot be found. In all other cases it returns the health of the running, active Master. A failure of this health check may indicate stopped or unhealthy Master roles, or it may indicate a problem with communication between the Cloudera Manager Service Monitor and the HBase service. Check the status of the HBase service's Master roles and look in the Cloudera Manager Service Monitor's log files for more information when this check fails. This test can be enabled or disabled using the Active Master Health Check HBase service-wide monitoring setting. In addition, the HBase Active Master Detection Window can be used to adjust the amount of time that the Cloudera Manager Service Monitor has to detect the active HBase Master before this health check fails.
Short Name: Active HBase Master Health
Property Name | Description | Template Name | Default Value | Unit |
---|---|---|---|---|
Active Master Health Check | When computing the overall HBase cluster health, consider the active HBase Master's health. | hbase_master_health_enabled | true | no unit |
HBase Active Master Detection Window | The tolerance window that will be used in HBase service tests that depend on detection of the active HBase Master. | hbase_active_master_detecton_window | 3 | MINUTES |
HBase Backup HBase Master Health
This is an HBase service-level health check that checks for running, healthy HBase Masters in backup mode. The check is disabled if the HBase service is not configured with multiple HBase Masters. Otherwise, the check returns "Concerning" health if either of two conditions are met. First, if there is no HBase Master running in backup mode. Second, if any of the HBase Masters running in backup mode are in less than "Good" health. This second condition is included because a failure of the active HBase Master leads to a race condition between all backup HBase Masters. When there is a less than healthy backup HBase Master, it is possible that it could become the active HBase Master if it won such a race, and the HBase service could end up with a less than healthy active HBase Master. A failure of this health check may indicate one or more stopped or unhealthy backup HBase Masters, or it may indicate a problem with communication between the Cloudera Manager Service Monitor and the HBase service. Check the status of the HBase service's Master roles and the Cloudera Manager Service Monitor's log files for more information when this check fails. This test can be enabled or disabled using the Backup Masters Health Check HBase service-wide monitoring setting. In addition, the HBase Active Master Detection Window can be used to adjust the amount of time that the Cloudera Manager Service Monitor has to detect the active HBase Master before this health check fails.
Short Name: Backup HBase Master Health
Property Name | Description | Template Name | Default Value | Unit |
---|---|---|---|---|
Backup Masters Health Check | When computing the overall HBase cluster health, consider the health of the backup HBase Masters. | hbase_backup_masters_health_enabled | true | no unit |
HBase Active Master Detection Window | The tolerance window that will be used in HBase service tests that depend on detection of the active HBase Master. | hbase_active_master_detecton_window | 3 | MINUTES |
HBase RegionServers Health
This is an HBase service-level health check that checks that enough of the RegionServers in the cluster are healthy. The check returns "Concerning" health if the number of healthy RegionServers falls below a warning threshold, expressed as a percentage of the total number of RegionServers. The check returns "Bad" health if the number of healthy and "Concerning" RegionServers falls below a critical threshold, expressed as a percentage of the total number of RegionServers. For example, if this check is configured with a warning threshold of 95% and a critical threshold of 90% for a cluster of 100 RegionServers, this check would return "Good" health if 95 or more RegionServers have good health. This check would return "Concerning" health if at least 90 RegionServers have either "Good" or "Concerning" health. If more than 10 RegionServers have bad health, this check would return "Bad" health. A failure of this health check indicates unhealthy RegionServers. Check the status of the individual RegionServers for more information. This test can be configured using the Healthy HBase Region Servers Monitoring Thresholds HBase service-wide monitoring setting.
Short Name: RegionServers Health
Property Name | Description | Template Name | Default Value | Unit |
---|---|---|---|---|
Healthy HBase Region Servers Monitoring Thresholds | The health check thresholds of the overall HBase Region Servers health. The check returns "Concerning" health if the percentage of "Healthy" HBase Region Servers falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" HBase Region Servers falls below the critical threshold. | hbase_regionservers_healthy_thresholds | critical:90.000000, warning:95.000000 | PERCENT |
<< | ||
Terms and Conditions Privacy Policy |