AWS OpenSearch metrics

Once you’ve integrated with AWS CloudWatch, you have access to all metrics for OpenSearch, a managed service that makes it easy to deploy, operate, and scale OpenSearch clusters in the AWS Cloud.

See all AWS integrations.

To verify metrics are reporting, search for the metrics in the Metric details section of the Settings page.

The following table shows the OpenSearch metrics ingested by Cloud Observability.

Metric Name Unit Description
aws.es.cluster_status_green integer 1 for all index shards are allocated to nodes inside the cluster. Relevant statistics: Maximum
aws.es.cluster_status_yellow integer 1 for the primary shards for all indexes are allocated to nodes in the cluster, but replica shards for at least one index are not. Relevant.statistics: Maximum
aws.es.cluster_status_red integer 1 for the primary and replica shards for at least one index are not allocated to nodes in the cluster. For more information, see Red cluster status. Relevant statistics: Maximum
aws.es.shards_active count The number of all active primary and replica shards. Relevant statistics: Maximum, Sum
aws.es.shards_unassigned count The number of all shards those are not allocated to nodes in the cluster. Relevant statistics: Maximum, Sum
aws.es.shards_delayed_unassigned count The number of all shards whose node allocation has been delayed by the timeout settings. Relevant statistics: Maximum, Sum
aws.es.shards_active_primary count The total number of active primary shards. Relevant statistics: Maximum, Sum
aws.es.shards_initializing count The number of initializaning shards. Relevant statistics: Sum
aws.es.shards_relocating count The number of relocating shards. Relevant statistics: Sum
aws.es.nodes count The total number of nodes in the OpenSearch Service cluster with dedicated master nodes and UltraWarm nodes. Relevant statistics: Maximum
aws.es.searchable_documents count The total number of searchable documents in all data nodes inside the cluster. Relevant statistics: Minimum, Maximum, Average
aws.es.deleted_documents count The total number of documents marked for deletion in all data nodes in the cluster. Relevant statistics: Minimum, Maximum, Average
aws.es.cpu_utilization percentage The percentage of CPU usage for data nodes in the cluster. Maximum shows the node with the highest CPU usage. Average for all nodes in the cluster. This metric is also available for individual nodes. Relevant statistics: Maximum, Average
aws.es.free_storage_space mebibyte The free space in data nodes inside the cluster. Sum shows total free space for the cluster. Minimum and Maximum show the nodes with the least and most free space, accordingly. This metric is also available for individual nodes. Relevant statistics: Minimum, Maximum, Average, Sum
aws.es.cluster_used_space mebibyte The total used space for the cluster. You must leave the period at one minute to get an accurate value. Relevant statistics: Minimum, Maximum
aws.es.cluster_index_writes_blocked integer 0 if the cluster is accepting requests, 1 if it is blocking requests. Relevant statistics: Maximum
aws.es.jvm_memory_pressure percentage The maximum percentage of the Java heap used for across data nodes in the cluster. Relevant statistics: Maximum
aws.es.old_gen_jvm_memory_pressure percentage The maximum percentage of the Java heap used for the "old generation" across all data nodes in the cluster. This metric is also available at the node level. Relevant statistics: Maximum
aws.es.automated_snapshot_failure count The number of failed automated snapshots for the cluster. 1 shows that no automated snapshot was taken for the domain in the previous 36 hours. Relevant statistics: Minimum, Maximum
aws.es.cpu_credit_balance count The remaining CPU credits available for T2 data nodes inside the cluster. Relevant statistics: Minimum
aws.es.open_search_dashboards_healthy_nodes
(previously kibana_healthy_nodes)
count A health check for OpenSearch Dashboard. 1 for the minimum, maximum, and average means Dashboards are working as expected. Relevant statistics: Minimum, Maximum, Average
aws.es.kibana_reporting_failed_request_sys_err_count count The number of failed due to server problems or feature limitations requests to generate OpenSearch Dashboards reports. Relevant statistics: Sum
aws.es.kibana_reporting_failed_request_user_err_count count The number of failed due to client issues requests to generate OpenSearch Dashboards reports. Relevant statistics: Sum
aws.es.kibana_reporting_request_count count The number of all requests to generate OpenSearch Dashboards reports. Relevant statistics: Sum
aws.es.kibana_reporting_success_count count The number of successful requests to generate OpenSearch Dashboards reports. Relevant statistics: Sum
aws.es.kms_key_error count 1 shows that the AWS KMS key used to encrypt data at rest has been disabled and needs to be re-enabled. Relevant statistics: Minimum, Maximum
aws.es.kms_key_inaccessible integer 1 shows that the AWS KMS key used to encrypt data at rest has been deleted or grants were revoked. This metric available for domains that encrypt data at rest only. Relevant statistics: Minimum, Maximum
aws.es.invalid_host_header_requests count The number of HTTP requests with invalid host header. Relevant statistics: Sum
aws.es.open_search_requests
(previously elasticsearch_requests)
count The number of requests made to the OpenSearch cluster. Relevant statistics: Sum
aws.es.2xx,_3xx,_4xx,_5xx count The number of responces with the requested HTTP response code (2xx, 3xx, 4xx, 5xx). Relevant statistics: Sum
aws.es.throughput_throttle integer 1 shows that some requests were throttled within the selected timeframe, 0 is for normal behavior. Relevant statistics: Minimum, Maximum
aws.es.master_cpu_utilization percentage The maximum percentage of CPU resources used by the dedicated master nodes. Relevant statistics: Maximum
aws.es.master_jvm_memory_pressure percentage The maximum percentage of the Java heap used by all dedicated master nodes in the cluster. Relevant statistics: Maximum
aws.es.master_old_gen_jvm_memory_pressure percentage The maximum percentage of the Java heap used by the "old generation" per master node. Relevant statistics: Maximum
aws.es.master_cpu_credit_balance count The remaining CPU credits available for T2 dedicated master nodes in the cluster. Relevant statistics: Minimum
aws.es.master_reachable_from_node integer A health check for MasterNotDiscovered exceptions. 1 represents normal behavior, 0 shows that /_cluster/health/ is failing. Relevant statistics: Minimum
aws.es.master_sys_memory_utilization percentage The percentage of the master node's memory in use. Relevant statistics: Maximum
aws.es.read_latency second The latency, in seconds, for read operations on EBS volumes. This metric is also available for individual nodes. Relevant statistics: Minimum, Maximum, Average
aws.es.write_latency second The latency, in seconds, for write operations on EBS volumes. This metric is also available for individual nodes. Relevant statistics: Minimum, Maximum, Average
aws.es.read_throughput byte The throughput, in bytes per second, for read operations on EBS volumes. This metric is also available for individual nodes. Relevant statistics: Minimum, Maximum, Average
aws.es.write_throughput byte The throughput, in bytes per second, for write operations on EBS volumes. This metric is also available for individual nodes. Relevant statistics: Minimum, Maximum, Average
aws.es.disk_queue_depth count The number of pending input and output (I/O) requests for an EBS volume. Relevant statistics: Minimum, Maximum, Average
aws.es.read_iops count The number of input and output (I/O) operations per second for read operations on EBS volumes. This metric is also available for individual nodes. Relevant statistics: Minimum, Maximum, Average
aws.es.write_iops count The number of input and output (I/O) operations per second for write operations on EBS volumes. This metric is also available for individual nodes. Relevant statistics: Minimum, Maximum, Average
aws.es.burst_balance percentage The percentage of input and output (I/O) credits remaining in the burst bucket for an EBS volume. Relevant statistics: Minimum, Maximum, Average
aws.es.indexing_latency millisecond The average time that it takes a shard to complete an indexing operation. Relevant node statistics: Average Relevant cluster statistics: Average, Maximum
aws.es.indexing_rate count The number of indexing operations per minute. Relevant node statistics: Average Relevant cluster statistics: Average, Maximum, Sum
aws.es.search_latency millisecond The average time of search operation. Relevant node statistics: Average Relevant cluster statistics: Average, Maximum
aws.es.search_rate count The total number of search requests per minute for all shards on a data node. Relevant node statistics: Average Relevant cluster statistics: Average, Maximum, Sum
aws.es.segment_count count The number of segments on a data node. Relevant node statistics: Maximum, Average Relevant cluster statistics: Sum, Maximum, Average
aws.es.sys_memory_utilization percentage The percentage of the instance's memory that is in use. Relevant node statistics: Minimum, Maximum, Average Relevant cluster statistics: Minimum, Maximum, Average
aws.es.jvmgc_young_collection_count count The number of times when "young generation" garbage collection was launched. Relevant node statistics: Maximum Relevant cluster statistics: Sum, Maximum, Average
aws.es.jvmgc_young_collection_time millisecond The amount of time that was spent for "young generation" garbage collection. Relevant node statistics: Maximum Relevant cluster statistics: Sum, Maximum, Average
aws.es.jvmgc_old_collection_count count The number of times that "old generation" garbage collection was launched. In a cluster with sufficient resources, this number should remain small and grow infrequently. Relevant node statistics: Maximum Relevant cluster statistics: Sum, Maximum, Average
aws.es.jvmgc_old_collection_time millisecond The amount of time that the cluster was spent for "old generation" garbage collection. Relevant node statistics: Maximum Relevant cluster statistics: Sum, Maximum, Average
aws.es.open_search_dashboards_concurrent_connections
(previously kibana_concurrent_connections)
count The number of active concurrent connections to OpenSearch Dashboards. Relevant node statistics: Maximum Relevant cluster statistics: Sum, Maximum, Average
aws.es.open_search_dashboards_healthy_node
(previously kibana_healthy_node)
integer A health check for the individual OpenSearch Dashboards node. 1 means normal behavior, 0 if Dashboard is inaccessible. Relevant node statistics: Minimum Relevant cluster statistics: Minimum, Maximum, Average
aws.es.open_search_dashboards_heap_total
(previously kibana_heap_total)
mebibyte The amount of heap memory allocated to OpenSearch Dashboards. Relevant node statistics: Maximum Relevant cluster statistics: Sum, Maximum, Average
aws.es.open_search_dashboards_heap_used
(previously kibana_heap_used)
mebibyte The absolute amount of heap memory used by OpenSearch Dashboards. Relevant node statistics: Maximum Relevant cluster statistics: Sum, Maximum, Average
aws.es.open_search_dashboards_heap_utilization
(previously kibana_heap_utilization)
percentage The maximum percentage of available heap memory used by OpenSearch Dashboards. Relevant node statistics: Maximum Relevant cluster statistics: Minimum, Maximum, Average
aws.es.open_search_dashboards_os_1_minute_load
(previously kibana_os_1_minute_load)
count The one-minute CPU load average for OpenSearch Dashboards, ideally should stay below 1.00. Relevant node statistics: Average Relevant cluster statistics: Average, Maximum
aws.es.open_search_dashboards_request_total
(previously kibana_request_total)
count The total number of HTTP calls to OpenSearch Dashboards. Relevant node statistics: Sum Relevant cluster statistics: Sum
aws.es.open_search_dashboards_response_times_max_in_millis
(previously kibana_response_times_max_in_millis)
millisecond The maximum OpenSearch Dashboards response time. Relevant node statistics: Maximum Relevant cluster statistics: Maximum, Average
aws.es.threadpool_force_merge_queue count The number tasks those have been queued in the force merge thread pool. Relevant node statistics: Maximum Relevant cluster statistics: Sum, Maximum, Average
aws.es.threadpool_force_merge_rejected count The number tasks those have been rejected in the force merge thread pool. Relevant node statistics: Maximum Relevant cluster statistics: Sum
aws.es.threadpool_force_merge_threads count The number of items in the force merge thread pool. Relevant node statistics: Maximum Relevant cluster statistics: Average, Sum
aws.es.threadpool_index_queue count The number tasks those have been queued in the index thread pool. Relevant node statistics: Maximum Relevant cluster statistics: Sum, Maximum, Average
aws.es.threadpool_index_rejected count The number tasks those have been rejected in the index thread pool. Relevant node statistics: Maximum Relevant cluster statistics: Sum
aws.es.threadpool_index_threads count The number of items in the index thread pool. Relevant node statistics: Maximum Relevant cluster statistics: Average, Sum
aws.es.threadpool_search_queue count The number tasks those have been queued in the search thread pool. Relevant node statistics: Maximum Relevant cluster statistics: Sum, Maximum, Average
aws.es.threadpool_search_rejected count The number tasks those have been rejected in the search thread pool. Relevant node statistics: Maximum Relevant cluster statistics: Sum
aws.es.threadpool_search_threads count The number of items in the search thread pool. Relevant node statistics: Maximum Relevant cluster statistics: Average, Sum
aws.es.threadpoolsql_worker_queue count The number of tasks those have been queued in the SQL search thread pool. Relevant node statistics: Maximum Relevant cluster statistics: Sum, Maximum, Average
aws.es.threadpoolsql_worker_rejected count The number of tasks those have been rejected in the SQL search thread pool. Relevant node statistics: Maximum Relevant cluster statistics: Sum
aws.es.threadpoolsql_worker_threads count The number of items in the SQL search thread pool. Relevant node statistics: Maximum Relevant cluster statistics: Average, Sum
aws.es.threadpool_bulk_queue count The number tasks those have been queued in the bulk thread pool. Relevant node statistics: Maximum Relevant cluster statistics: Sum, Maximum, Average
aws.es.threadpool_bulk_rejected count The number tasks those have been rejected in the search thread pool. Relevant node statistics: Maximum Relevant cluster statistics: Sum
aws.es.threadpool_bulk_threads count The number of items in the bulk thread pool. Relevant node statistics: Maximum Relevant cluster statistics: Average, Sum
aws.es.threadpool_write_threads count The number of items in the write thread pool. Relevant node statistics: Maximum Relevant cluster statistics: Average, Sum
aws.es.threadpool_write_queue count The number of queued tasks in the write thread pool. Relevant node statistics: Maximum Relevant cluster statistics: Average, Sum
aws.es.threadpool_write_rejected count The number of rejected tasks in the write thread pool. Relevant node statistics: Maximum Relevant cluster statistics: Average, Sum
aws.es.coordinating_write_rejected count The total number of rejections happened on the coordinating node due to indexing pressure since the last OpenSearch Service process startup. Relevant node statistics: Maximum Relevant cluster statistics: Average, Sum
aws.es.primary_write_rejected count The total number of rejections happened on the primary shards due to indexing pressure since the last OpenSearch Service process startup. Relevant node statistics: Maximum Relevant cluster statistics: Average, Sum
aws.es.replica_write_rejected count The total number of rejections happened on the replica shards due to indexing pressure since the last OpenSearch Service process startup. Relevant node statistics: Maximum Relevant cluster statistics: Average, Sum
aws.es.warm_cpu_utilization percentage The percentage of CPU usage for UltraWarm nodes in the cluster. Maximum shows the node with the highest CPU usage. Average represents all UltraWarm nodes in the cluster. This metric is also available for individual UltraWarm nodes. Relevant statistics: Maximum, Average
aws.es.warm_free_storage_space mebibyte The amount of free warm storage space. Because UltraWarm uses Amazon S3 rather than attached disks, Sum is the only relevant statistic. You must leave the period at one minute to get an accurate value. Relevant statistics: Sum
aws.es.warm_searchable_documents count The total number of searchable documents across all warm indexes in the cluster. You must leave the period at one minute to get an accurate value. Relevant statistics: Sum
aws.es.warm_search_latency millisecond The average time that it takes a shard on an UltraWarm node to complete a search operation. Relevant node statistics: Average Relevant cluster statistics: Average, Maximum
aws.es.warm_search_rate count The total number of search calls per minute on an UltraWarm node for all shards. A one call to the _search API might return results from many different shards. If five of these shards are on one node, the node would report 5 for this metric, even though the client only made one request. Relevant node statistics: Average Relevant cluster statistics: Average, Maximum, Sum
aws.es.warm_storage_space_utilization mebibyte The total amount of warm storage space that the cluster is using. Relevant statistics: Maximum
aws.es.hot_storage_space_utilization mebibyte The total amount of hot storage space that the cluster is using. Relevant statistics: Maximum
aws.es.warm_sys_memory_utilization percentage The percentage of the warm node's memory that is in use. Relevant statistics: Maximum
aws.es.hot_to_warm_migration_queue_size count The number of indexes currently waiting to migrate from hot to warm storage. Relevant statistics: Maximum
aws.es.warm_to_hot_migration_queue_size count The number of indexes currently waiting to migrate from warm to hot storage. Relevant statistics: Maximum
aws.es.hot_to_warm_migration_failure_count count The total number of failed hot to warm migrations. Relevant statistics: Sum
aws.es.hot_to_warm_migration_force_merge_latency second The average latency of the force merge stage of the migration process. Relevant statistics: Average
aws.es.hot_to_warm_migration_snapshot_latency second The average latency of the snapshot stage of the migration process. Relevant statistics: Average
aws.es.hot_to_warm_migration_processing_latency second The average latency of successful hot to warm migrations, not including time spent in the queue. Relevant statistics: Average
aws.es.hot_to_warm_migration_success_count count The total number of successful hot to warm migrations. Relevant statistics: Sum
aws.es.hot_to_warm_migration_success_latency second The average latency of successful hot to warm migrations, including time spent in the queue. Relevant statistics: Average
aws.es.warm_threadpool_search_threads count The number of items in the UltraWarm search thread pool. Relevant node statistics: Maximum Relevant cluster statistics: Average, Sum
aws.es.warm_threadpool_search_rejected count The number of rejected tasks in the UltraWarm search thread pool. Relevant node statistics: Maximum Relevant cluster statistics: Sum
aws.es.warm_threadpool_search_queue count The number of queued tasks in the UltraWarm search thread pool. Relevant node statistics: Maximum Relevant cluster statistics: Sum, Maximum, Average
aws.es.warm_jvm_memory_pressure percentage The maximum percentage of the Java heap used for the UltraWarm nodes. Relevant statistics: Maximum
aws.es.warm_old_gen_jvm_memory_pressure percentage The maximum percentage of the Java heap used for the "old generation" per UltraWarm node. Relevant statistics: Maximum
aws.es.warm_jvmgc_young_collection_count count The number of times that "young generation" garbage collection has run on UltraWarm nodes. Relevant node statistics: Maximum Relevant cluster statistics: Sum, Maximum, Average
aws.es.warm_jvmgc_young_collection_time millisecond The amount of time that the cluster has spent performing "young generation" garbage collection on UltraWarm nodes. Relevant node statistics: Maximum Relevant cluster statistics: Sum, Maximum, Average
aws.es.warm_jvmgc_old_collection_count count The number of times that "old generation" garbage collection has run on UltraWarm nodes. Relevant node statistics: Maximum Relevant cluster statistics: Sum, Maximum, Average
aws.es.cold_storage_space_utilization mebibyte The total amount of cold storage space that the cluster is using. Relevant statistics: Max
aws.es.cold_to_warm_migration_failure_count count The total number of failed cold to warm migrations. Relevant statistics: Sum
aws.es.cold_to_warm_migration_latency second The amount of time for successful cold to warm migrations to complete. Relevant statistics: Average
aws.es.cold_to_warm_migration_queue_size count The number of indexes currently waiting to migrate from cold to warm storage. Relevant statistics: Maximum
aws.es.cold_to_warm_migration_success_count count The total number of successful cold to warm migrations. Relevant statistics: Sum
aws.es.warm_to_cold_migration_failure_count count The total number of failed warm to cold migrations. Relevant statistics: Sum
aws.es.warm_to_cold_migration_latency second The amount of time for successful warm to cold migrations to complete. Relevant statistics: Average
aws.es.warm_to_cold_migration_queue_size count The number of indexes currently waiting to migrate from warm to cold storage. Relevant statistics: Maximum
aws.es.warm_to_cold_migration_success_count count The total number of successful warm to cold migrations. Relevant statistics: Sum
aws.es.alerting_degraded integer 1 indicates that either the alerting index is red or one or more nodes is not on schedule, 0 shows normal behavior. Relevant statistics: Maximum
aws.es.alerting_index_exists integer 1 shows the .opensearch-alerting-config index exists, 0 indicates it does not. Until you use the alerting feature for the first time, this value remains 0. Relevant statistics: Maximum
aws.es.alerting_index_status_green integer The health of the index. 1 means green, 0 shows that the index either doesn't exist or isn't green. Relevant statistics: Maximum
aws.es.alerting_index_status_red integer The health of the index. 1 means red, 0 indicates that the index either doesn't exist or isn't red. Relevant statistics: Maximum
aws.es.alerting_index_status_yellow integer The health of the index. 1 means yellow, 0 indicates that the index either doesn't exist or isn't yellow. Relevant statistics: Maximum
aws.es.alerting_nodes_not_on_schedule integer 1 means some jobs are not running on schedule, 0 means that all alerting jobs are running on schedule or no alerting jobs exist. Relevant statistics: Maximum
aws.es.alerting_nodes_on_schedule integer 1 means that all alerting jobs are running on schedule or that no alerting jobs exist, 0 means some jobs are not running on schedule. Relevant statistics: Maximum
aws.es.alerting_scheduled_job_enabled integer 1 means that the opensearch.scheduled_jobs.enabled cluster setting is true, 0 means it is false, and scheduled jobs are disabled. Relevant statistics: Maximum
aws.es.ad_plugin_unhealthy integer 1 means that the anomaly detection plugin is not functioning properly, either because of a high number of failures or because one of the indexes that it uses is red, 0 indicates the plugin is working as expected. Relevant statistics: Maximum
aws.es.ad_execute_request_count count The number of requests to detect anomalies. Relevant statistics: Sum
aws.es.ad_execute_failure_count count The number of failed requests to detect anomalies. Relevant statistics: Sum
aws.es.adhc_execute_failure_count count The number of failed requests to detect anomalies for high cardinality detectors. Relevant statistics: Sum
aws.es.adhc_execute_request_count count The number of requests to detect anomalies for high cardinality detectors. Relevant statistics: Sum
aws.es.ad_anomaly_results_index_status_index_exists integer 1 means the index that the .opensearch-anomaly-results alias points to exists. Without using anomaly detection this value remains 0. Relevant statistics: Maximum
aws.es.ad_anomaly_results_index_status_red integer 1 means the index that the .opensearch-anomaly-results alias points to is red, 0 means it is not. Without using anomaly detection this value remains 0. Relevant statistics: Maximum
aws.es.ad_anomaly_detectors_index_status_index_exists integer 1 means that the .opensearch-anomaly-detectors index exists, 0 means it does not. Without using anomaly detection this value remains 0. Relevant statistics: Maximum
aws.es.ad_anomaly_detectors_index_status_red integer 1 means that the .opensearch-anomaly-detectors index is red, 0 means it is not. Without using anomaly detection this value remains 0. Relevant statistics: Maximum
aws.es.ad_models_checkpoint_index_status_index_exists integer 1 means that the .opensearch-anomaly-checkpoints index exists, 0 means it does not. Without using anomaly detection this value remains 0. Relevant statistics: Maximum
aws.es.ad_models_checkpoint_index_status_red integer 1 means that the .opensearch-anomaly-checkpoints index is red, 0 means it is not. Without using anomaly detection this value remains 0. Relevant statistics: Maximum
aws.es.asynchronous_search_submission_rate count The number of asynchronous searches submitted in the last minute.
aws.es.asynchronous_search_initialized_rate count The number of asynchronous searches initialized in the last minute.
aws.es.asynchronous_search_running_current count The number of asynchronous searches currently running.
aws.es.asynchronous_search_completion_rate count The number of asynchronous searches successfully completed in the last minute.
aws.es.asynchronous_search_failure_rate count The number of asynchronous searches that completed and failed in the last minute.
aws.es.asynchronous_search_persist_rate count The number of asynchronous searches that persisted in the last minute.
aws.es.asynchronous_search_persist_failed_rate count The number of asynchronous searches that failed to persist in the last minute.
aws.es.asynchronous_search_rejected count The total number of asynchronous searches rejected since the node up time.
aws.es.asynchronous_search_cancelled count The total number of asynchronous searches cancelled since the node up time.
aws.es.asynchronous_search_max_running_time second The duration of longest running asynchronous search on a node in the last minute.
aws.es.asynchronous_search_store_health count The health of the store in the persisted index (RED/non-RED) in the last minute.
aws.es.asynchronous_search_store_size count The size of the system index across all shards in the last minute.
aws.es.asynchronous_search_stored_response_count count The numbers of stored responses in the system index in the last minute.
aws.es.sql_failed_request_count_by_cus_err count The number of requests to the _sql API that failed due to a client issue. Relevant statistics: Sum
aws.es.sql_failed_request_count_by_sys_err count The number of requests to the _sql API that failed due to a server problem or feature limitation. Relevant statistics: Sum
aws.es.sql_request_count count The number of requests to the _sql API. Relevant statistics: Sum
aws.es.sql_default_cursor_request_count count Similar to SQLRequestCount but only counts pagination requests. Relevant statistics: Sum
aws.es.sql_unhealthy integer 1 means that, in response to certain requests, the SQL plugin is returning 5xx response codes or passing invalid query DSL to OpenSearch, 0 means no recent failures. Relevant statistics: Maximum
aws.es.knn_cache_capacity_reached count Per-node metric for whether cache capacity has been reached. This metric is only relevant to approximate k-NN search. Relevant statistics: Maximum
aws.es.knn_circuit_breaker_triggered count Per-cluster metric for whether the circuit breaker is triggered. If any nodes return a value of 1 for KNNCacheCapacityReached, this value will also return 1. This metric is only relevant to approximate k-NN search. Relevant statistics: Maximum
aws.es.knn_eviction_count count Per-node metric for the number of graphs that have been evicted from the cache due to memory constraints or idle time. Explicit evictions that occur because of index deletion are not counted. This metric is only relevant to approximate k-NN search. Relevant statistics: Sum
aws.es.knn_graph_index_errors count Per-node metric for the number of requests to add the knn_vector field of a document to a graph that produced an error. Relevant statistics: Sum
aws.es.knn_graph_index_requests count Per-node metric for the number of requests to add the knn_vector field of a document to a graph. Relevant statistics: Sum
aws.es.knn_graph_memory_usage kilobyte Per-node metric for the current cache size, total size of all graphs in memory. This metric is only relevant to approximate k-NN search. Relevant statistics: Average
aws.es.knn_graph_query_errors count Per-node metric for the number of graph queries that produced an error. Relevant statistics: Sum
aws.es.knn_graph_query_requests count Per-node metric for the number of graph queries. Relevant statistics: Sum
aws.es.knn_hit_count count Per-node metric for the number of cache hits. A cache hit occurs when a user queries a graph that is already loaded into memory. This metric is only relevant to approximate k-NN search. Relevant statistics: Sum
aws.es.knn_load_exception_count count Per-node metric for the number of times an exception occurred while trying to load a graph into the cache. This metric is only relevant to approximate k-NN search. Relevant statistics: Sum
aws.es.knn_load_success_count count Per-node metric for the number of times the plugin successfully loaded a graph into the cache. This metric is only relevant to approximate k-NN search. Relevant statistics: Sum
aws.es.knn_miss_count count Per-node metric for the number of cache misses. A cache miss occurs when a user queries a graph that is not yet loaded into memory. This metric is only relevant to approximate k-NN search. Relevant statistics: Sum
aws.es.knn_query_requests count Per-node metric for the number of query requests the k-NN plugin received. Relevant statistics: Sum
aws.es.knn_script_compilation_errors count Per-node metric for the number of errors during script compilation. This statistic is only relevant to k-NN score script search. Relevant statistics: Sum
aws.es.knn_script_compilations count Per-node metric for the number of times the k-NN script has been compiled. This value should usually be 1 or 0, but if the cache containing the compiled scripts is filled, the k-NN script might be recompiled. This statistic is only relevant to k-NN score script search. Relevant statistics: Sum
aws.es.knn_script_query_errors count Per-node metric for the number of errors during script queries.This statistic is only relevant to k-NN score script search. Relevant statistics: Sum
aws.es.knn_script_query_requests count Per-node metric for the total number of script queries. This statistic is only relevant to k-NN score script search. Relevant statistics: Sum
aws.es.knn_total_load_time nanosecond The time that k-NN has taken to load graphs into the cache. This metric is only relevant to approximate k-NN search. Relevant statistics: Sum
aws.es.cross_cluster_outbound_connections count Number of connected nodes. If your response includes one or more skipped domains, use this metric to trace any unhealthy connections. If this number drops to 0, then the connection is unhealthy.
aws.es.cross_cluster_outbound_requests count Number of search requests sent to the destination domain.
aws.es.cross_cluster_inbound_requests count Number of incoming connection requests received from the source domain.
aws.es.replication_rate count The average rate of replication operations per second. This metric is similar to the IndexingRate metric.
aws.es.leader_check_point count For a specific connection, the sum of leader checkpoint values across all replicating indexes.
aws.es.follower_check_point count For a specific connection, the sum of follower checkpoint values across all replicating indexes.
aws.es.replication_num_syncing_indices count The number of indexes that have a replication status of SYNCING.
aws.es.replication_num_bootstrapping_indices count The number of indexes that have a replication status of BOOTSTRAPPING.
aws.es.replication_num_paused_indices count The number of indexes that have a replication status of PAUSED.
aws.es.replication_num_failed_indices count The number of indexes that have a replication status of FAILED.
aws.es.auto_follow_num_success_start_replication count The number of follower indexes that have been successfully created by a replication rule for a specific connection.
aws.es.auto_follow_num_failed_start_replication count The number of follower indexes that failed to be created by a replication rule when there was a matching pattern.
aws.es.auto_follow_leader_call_failure integer Whether there have been any failed queries from the follower index to the leader index to pull new data. 1 means that there have been 1 or more failed calls in the last minute.
aws.es.ltr_request_total_count count Total count of ranking requests.
aws.es.ltr_request_error_count count Total count of unsuccessful requests.
aws.es.ltr_status_red count Tracks if one of the indexes needed to run the plugin is red.
aws.es.ltr_memory_usage count Total memory used by the plugin.
aws.es.ltr_feature_memory_usage_in_bytes byte The amount of memory used by Learning to Rank feature fields.
aws.es.ltr_featureset_memory_usage_in_bytes byte The amount of memory used by all Learning to Rank feature sets.
aws.es.ltr_model_memory_usage_in_bytes byte The amount of memory used by all Learning to Rank models.
aws.es.ppl_failed_request_count_by_cus_err count The number of requests to the _ppl API that failed due to a client issue.
aws.es.ppl_failed_request_count_by_sys_err count The number of requests to the _ppl API that failed due to a server problem or feature limitation.
aws.es.ppl_request_count count The number of requests to the _ppl API.

See also

Ingest metrics from Amazon

Create and manage dashboards

Create alerts

Updated Jan 13, 2023