Once you’ve integrated with AWS CloudWatch, you have access to metrics from AWS Managed Streaming for Apache Kafka (MSK), which is a fully managed service that makes it simple to create and run programs that process streaming data using Apache Kafka.
See all AWS integrations.
The following table shows the MSK metrics ingested by Cloud Observability.
Metric Name | Unit | Description |
---|---|---|
aws.msk.active_controller_count | count | The number of controller active in each cluster at any given moment. |
aws.msk.burst_balance | count | The number of cluster's remaining balance of input-output burst credits for EBS volumes. |
aws.msk.bytes_in_per_sec | bytes/second | The rate at which clients send data. |
aws.msk.bytes_out_per_sec | bytes/second | The rate of data sent to customers. |
aws.msk.client_connection_count | count | The number of open connections between authenticated clients. |
aws.msk.connection_count | count | The number of active inter-broker connections, unauthenticated connections, and authenticated connections. |
aws.msk.cpu_credit_balance | count | The number CPU credit balance on the brokers. |
aws.msk.cpu_idle | percent | The propotion of idle CPU time. |
aws.msk.cpu_io_wait | percent | The proportion of CPU downtime while a disk operation is still running. |
aws.msk.cpu_system | count | The number of CPUs used by the kernel. |
aws.msk.cpu_user | count | The number of CPUs used in user space. |
aws.msk.global_partition_count | count | The total number of partitions in the cluster, excluding replicas, across all subjects. |
aws.msk.global_topic_count | count | The number of aggregate topic count for all brokers in the cluster. |
aws.msk.estimated_max_time_lag | milliseconds | The amount of time will take to exhaust MaxOffsetLag, in seconds. |
aws.msk.kafka_app_logs_disk_used | percent | The proportion of disk space taken up by program logs. |
aws.msk.kafka_data_logs_disk_used | percent | The proportion of disk space dedicated to data logs. |
aws.msk.leader_count | count | The total number of partition leaders per broker, excluding replicas. |
aws.msk.max_offset_lag | count | The number of offset lag among all topic partitions. |
aws.msk.memory_buffered | bytes | The number of broker's buffered memory size, expressed in bytes. |
aws.msk.memory_cached | bytes | The number of broker's cached memory size in bytes. |
aws.msk.memory_free | bytes | The amount of memory that is unoccupied and accessible for the broker, measured in bytes. |
aws.msk.heap_memory_after_gc | percent | The percentage of the heap's total memory that is still in use after garbage collection. |
aws.msk.memory_used | bytes | The amount of memory that the broker is currently using. |
aws.msk.messages_in_per_sec | count/second | The number broker's capacity for the number of incoming messages per second. |
aws.msk.network_rx_dropped | count | The number of packets dropped during delivery. |
aws.msk.network_rx_errors | count | The number of network receive errors. |
aws.msk.network_rx_packets | count | The number of packets the broker has received. |
aws.msk.network_tx_dropped | count | The number of transmit packets that were dropped. |
aws.msk.network_tx_errors | count | The total number of network transmit errors. |
aws.msk.network_tx_packets | count | The number of packets the broker sent out. |
aws.msk.offline_partitions_count | count | The total number of offline partitions in the cluster. |
aws.msk.partition_count | count | The total number of topic divisions for each broker, including replicas. |
aws.msk.produce_total_time_ms_mean | milliseconds | The number of milliseconds of the average producing time. |
aws.msk.request_bytes_mean | bytes | The average number of request bytes. |
aws.msk.request_time | milliseconds | The amount of time that the broker's I/O and network threads spend processing requests. |
aws.msk.root_disk_used | percent | The proportion of the broker's root disk that is being used. |
aws.msk.sum_offset_lag | count | The total offset lag for all partitions in a topic. |
aws.msk.swap_free | bytes | The amount of swap memory that is free to the broker, measured in bytes. |
aws.msk.swap_used | bytes | The amount of swap memory that the broker is currently using, measured in bytes. |
aws.msk.traffic_shaping | count | The number of packets that were shaped (dropped or queued). |
aws.msk.under_min_isr_partition_count | count | The number of broker under-minIsr partitions. |
aws.msk.under_replicated_partitions | count | The number of broker's under-replicated partition count. |
aws.msk.zoo_keeper_request_latency_ms_mean | milliseconds | The average response time for requests sent to Apache ZooKeeper from a broker. |
aws.msk.zoo_keeper_session_state | float | The number of connection statuses for the broker's ZooKeeper session. |
aws.msk.bw_in_allowance_exceeded | packets | The amount of packets changed as a result of the broker's maximum allocatable bandwidth being surpassed by the inbound aggregate bandwidth. |
aws.msk.bw_out_allowance_exceeded | packets | The number of packets changed as a result of the outgoing aggregate bandwidth being more than the broker's limit. |
aws.msk.conn_track_allowance_exceeded | count | The number of connection tracking resulted in more packets than the broker could handle. |
aws.msk.connection_close_rate | count | The number of listener connections that are lost each second. |
aws.msk.connection_creation_rate | count | The number of per-listener new connections made in a second |
aws.msk.cpu_credit_usage | count | The number of CPU credit is being used by the instances. |
aws.msk.fetch_consumer_local_time_ms_mean | milliseconds | The average amount of time needed for the leader to process a customer request. |
aws.msk.fetch_consumer_request_queue_time_ms_mean | milliseconds | The average amount of time that a consumer request spends in the request queue. |
aws.msk.fetch_consumer_response_queue_time_ms_mean | milliseconds | The average amount of time that the consumer request sits in the response queue. |
aws.msk.fetch_consumer_response_send_time_ms_mean | milliseconds | The average amount of time that it takes a customer to send a response. |
aws.msk.fetch_consumer_total_time_ms_mean | milliseconds | The average amount of time users spend obtaining data from the broker. |
aws.msk.fetch_follower_local_time_ms_mean | milliseconds | The average amount of time it takes the leader to complete a follower request. |
aws.msk.fetch_follower_request_queue_time_ms_mean | milliseconds | The amount of time a follower request sits in the queue. |
aws.msk.fetch_follower_response_queue_time_ms_mean | milliseconds | The average amount of time a follower request hangs out in the response queue. |
aws.msk.fetch_follower_response_send_time_ms_mean | milliseconds | The response time a follower sends after receiving a message. |
aws.msk.fetch_follower_total_time_ms_mean | milliseconds | The average amount of time followers spend obtaining data from the broker. |
aws.msk.fetch_message_conversions_per_sec | count/second | The broker's fetch message conversion rate in terms of conversions per second. |
aws.msk.fetch_throttle_byte_rate | bytes/second | The number of bytes per second that were throttled. |
aws.msk.fetch_throttle_queue_size | count | The number of messages currently in the throttling queue. |
aws.msk.fetch_throttle_time | milliseconds | The time fetch throttle lasts on average. |
aws.msk.network_processor_avg_idle_percent | percent | The percentage of time when the network processors are not in use. |
aws.msk.pps_allowance_exceeded | counts | The number of bidirectional PPS surpassed the broker's allowed maximum, resulting in the form of the number of packets. |
aws.msk.produce_local_time_ms_mean | milliseconds | The amount of time it takes the leader to process the request |
aws.msk.produce_message_conversions_per_sec | milliseconds | The broker's capacity to produce a certain number of message conversions per second. |
aws.msk.produce_message_conversions_time_ms_mean | milliseconds | The average amount of time spent converting message formats, measured. |
aws.msk.produce_request_queue_time_ms_mean | milliseconds | The average amount of time that request messages spend in the queue. |
aws.msk.produce_response_queue_time_ms_mean | milliseconds | The average amount of time that response messages spend in the queue. |
aws.msk.produce_response_send_time_ms_mean | milliseconds | The average amount of time spent sending answer messages. |
aws.msk.produce_throttle_byte_rate | bytes | The number of bytes per second that were throttled. |
aws.msk.produce_throttle_queue_size | count | The number messages are currently in the throttling queue. |
aws.msk.produce_throttle_time | milliseconds | The amount of time that the production throttle lasts. |
aws.msk.produce_total_time_ms_mean | milliseconds | The average producing time. |
aws.msk.remote_bytes_in_per_sec | bytes | The total amount of bytes moved to tiered storage, including information from log segments, indexes, and other auxiliary files. |
aws.msk.remote_bytes_out_per_sec | bytes | The total amount of data that was moved from tiered storage in response to consumer fetches. |
aws.msk.remote_log_manager_tasks_avg_idle_percent | percent | The proportion of idle time for the remote log management. |
aws.msk.remote_log_reader_avg_idle_percent | percent | The average proportion of idle time the reader spent. |
aws.msk.remote_log_reader_task_queue_size | count | The amount of tasks that are awaiting scheduling that are in charge of reading data from tiers of storage. |
aws.msk.remote_read_error_per_sec | count | The overall error rate for read requests made by the given broker to tiered storage to obtain data in response to consumer fetches. |
aws.msk.remote_read_requests_per_sec | count | The total number of read requests that the specified broker sends to tier-based storage in response to customer fetches. |
aws.msk.remote_write_error_per_sec | percent | The overall percentage of write requests that failed that the specified broker sends to tiered storage in order to move data upstream. |
aws.msk.replication_bytes_in_per_sec | bytes | The rate of data transmission from other brokers, expressed in bytes per second. |
aws.msk.replication_bytes_out_per_sec | bytes | The amount of data sent to other brokers every second in bytes. |
aws.msk.request_exempt_from_throttle_time | milliseconds | The amount of time that broker network and I/O threads take to execute requests that are not subject to throttling. |
aws.msk.request_handler_avg_idle_percent | percent | The percentage of time that the request handler threads are not in use. |
aws.msk.request_throttle_queue_size | count | The number of messages are currently in the throttling queue. |
aws.msk.request_throttle_time | milliseconds | The milliseconds required to throttle an average request. |
aws.msk.tcp_connections | count | The number of TCP segments with the SYN flag set for both incoming and outgoing traffic. |
aws.msk.total_tier_bytes_lag | bytes | The total amount of data that is eligible for tiering on the broker but hasn't yet been moved to tiered storage. |
aws.msk.traffic_bytes | bytes | The total amount of network traffic between clients (producers and consumers) and brokers in bytes. |
aws.msk.volume_queue_length | count | The number of open read and write operation requests that must be finished within a certain time frame. |
aws.msk.volume_read_bytes | bytes | The number of bytes read in a predetermined amount of time. |
aws.msk.volume_read_ops | count | The number of read operations performed in a predetermined amount of time. |
aws.msk.volume_total_read_time | count | The total amount of time taken by all read operations that were finished during a certain time frame, expressed in seconds. |
aws.msk.volume_total_write_time | count | The total amount of time taken by all write operations that were finished during a certain time frame, expressed in seconds. |
aws.msk.volume_write_bytes | bytes | The total amount of bytes written in a certain amount of time. |
aws.msk.volume_write_ops | count | The number of writing operations performed in a predetermined amount of time. |
aws.msk.fetch_message_conversions_per_sec | count/second | The rate at which messages were transformed after being fetched. |
aws.msk.messages_in_per_sec | count/second | The number of messages that are sent and received each second. |
aws.msk.produce_message_conversions_per_sec | count/second | The amount of communications that are produced and converted each second. |
aws.msk.remote_bytes_in_per_sec | bytes | The amount of data for the subject and broker that were defined that was migrated to tier-based storage. |
aws.msk.remote_bytes_out_per_sec | bytes | The amount of data that was transferred from tiered storage in response to consumer fetches for the chosen subject and broker. |
aws.msk.remote_read_error_per_sec | count | The frequency of errors in response to read requests. |
aws.msk.remote_read_requests_per_sec | count | The number of read requests sent to tier-based storage in response to consumer data requests. |
aws.msk.remote_write_error_per_sec | count | The frequency of errors in response to write requests. |
aws.msk.estimated_time_lag | milliseconds | The estimated amount of time (in seconds) needed to drain the partition offset lag. |
aws.msk.offset_lag | count | The number of offsets for a Partition-level consumer lag. |
Updated Dec 7, 2022