AWS MSK metrics

Once you’ve integrated with AWS CloudWatch, you have access to metrics from AWS Managed Streaming for Apache Kafka (MSK), which is a fully managed service that makes it simple to create and run programs that process streaming data using Apache Kafka.

See all AWS integrations.

The following table shows the MSK metrics ingested by Cloud Observability.

Metric Name Unit Description
aws.msk.active_controller_count count The number of controller active in each cluster at any given moment.
aws.msk.burst_balance count The number of cluster's remaining balance of input-output burst credits for EBS volumes.
aws.msk.bytes_in_per_sec bytes/second The rate at which clients send data.
aws.msk.bytes_out_per_sec bytes/second The rate of data sent to customers.
aws.msk.client_connection_count count The number of open connections between authenticated clients.
aws.msk.connection_count count The number of active inter-broker connections, unauthenticated connections, and authenticated connections.
aws.msk.cpu_credit_balance count The number CPU credit balance on the brokers.
aws.msk.cpu_idle percent The propotion of idle CPU time.
aws.msk.cpu_io_wait percent The proportion of CPU downtime while a disk operation is still running.
aws.msk.cpu_system count The number of CPUs used by the kernel.
aws.msk.cpu_user count The number of CPUs used in user space.
aws.msk.global_partition_count count The total number of partitions in the cluster, excluding replicas, across all subjects.
aws.msk.global_topic_count count The number of aggregate topic count for all brokers in the cluster.
aws.msk.estimated_max_time_lag milliseconds The amount of time will take to exhaust MaxOffsetLag, in seconds.
aws.msk.kafka_app_logs_disk_used percent The proportion of disk space taken up by program logs.
aws.msk.kafka_data_logs_disk_used percent The proportion of disk space dedicated to data logs.
aws.msk.leader_count count The total number of partition leaders per broker, excluding replicas.
aws.msk.max_offset_lag count The number of offset lag among all topic partitions.
aws.msk.memory_buffered bytes The number of broker's buffered memory size, expressed in bytes.
aws.msk.memory_cached bytes The number of broker's cached memory size in bytes.
aws.msk.memory_free bytes The amount of memory that is unoccupied and accessible for the broker, measured in bytes.
aws.msk.heap_memory_after_gc percent The percentage of the heap's total memory that is still in use after garbage collection.
aws.msk.memory_used bytes The amount of memory that the broker is currently using.
aws.msk.messages_in_per_sec count/second The number broker's capacity for the number of incoming messages per second.
aws.msk.network_rx_dropped count The number of packets dropped during delivery.
aws.msk.network_rx_errors count The number of network receive errors.
aws.msk.network_rx_packets count The number of packets the broker has received.
aws.msk.network_tx_dropped count The number of transmit packets that were dropped.
aws.msk.network_tx_errors count The total number of network transmit errors.
aws.msk.network_tx_packets count The number of packets the broker sent out.
aws.msk.offline_partitions_count count The total number of offline partitions in the cluster.
aws.msk.partition_count count The total number of topic divisions for each broker, including replicas.
aws.msk.produce_total_time_ms_mean milliseconds The number of milliseconds of the average producing time.
aws.msk.request_bytes_mean bytes The average number of request bytes.
aws.msk.request_time milliseconds The amount of time that the broker's I/O and network threads spend processing requests.
aws.msk.root_disk_used percent The proportion of the broker's root disk that is being used.
aws.msk.sum_offset_lag count The total offset lag for all partitions in a topic.
aws.msk.swap_free bytes The amount of swap memory that is free to the broker, measured in bytes.
aws.msk.swap_used bytes The amount of swap memory that the broker is currently using, measured in bytes.
aws.msk.traffic_shaping count The number of packets that were shaped (dropped or queued).
aws.msk.under_min_isr_partition_count count The number of broker under-minIsr partitions.
aws.msk.under_replicated_partitions count The number of broker's under-replicated partition count.
aws.msk.zoo_keeper_request_latency_ms_mean milliseconds The average response time for requests sent to Apache ZooKeeper from a broker.
aws.msk.zoo_keeper_session_state float The number of connection statuses for the broker's ZooKeeper session.
aws.msk.bw_in_allowance_exceeded packets The amount of packets changed as a result of the broker's maximum allocatable bandwidth being surpassed by the inbound aggregate bandwidth.
aws.msk.bw_out_allowance_exceeded packets The number of packets changed as a result of the outgoing aggregate bandwidth being more than the broker's limit.
aws.msk.conn_track_allowance_exceeded count The number of connection tracking resulted in more packets than the broker could handle.
aws.msk.connection_close_rate count The number of listener connections that are lost each second.
aws.msk.connection_creation_rate count The number of per-listener new connections made in a second
aws.msk.cpu_credit_usage count The number of CPU credit is being used by the instances.
aws.msk.fetch_consumer_local_time_ms_mean milliseconds The average amount of time needed for the leader to process a customer request.
aws.msk.fetch_consumer_request_queue_time_ms_mean milliseconds The average amount of time that a consumer request spends in the request queue.
aws.msk.fetch_consumer_response_queue_time_ms_mean milliseconds The average amount of time that the consumer request sits in the response queue.
aws.msk.fetch_consumer_response_send_time_ms_mean milliseconds The average amount of time that it takes a customer to send a response.
aws.msk.fetch_consumer_total_time_ms_mean milliseconds The average amount of time users spend obtaining data from the broker.
aws.msk.fetch_follower_local_time_ms_mean milliseconds The average amount of time it takes the leader to complete a follower request.
aws.msk.fetch_follower_request_queue_time_ms_mean milliseconds The amount of time a follower request sits in the queue.
aws.msk.fetch_follower_response_queue_time_ms_mean milliseconds The average amount of time a follower request hangs out in the response queue.
aws.msk.fetch_follower_response_send_time_ms_mean milliseconds The response time a follower sends after receiving a message.
aws.msk.fetch_follower_total_time_ms_mean milliseconds The average amount of time followers spend obtaining data from the broker.
aws.msk.fetch_message_conversions_per_sec count/second The broker's fetch message conversion rate in terms of conversions per second.
aws.msk.fetch_throttle_byte_rate bytes/second The number of bytes per second that were throttled.
aws.msk.fetch_throttle_queue_size count The number of messages currently in the throttling queue.
aws.msk.fetch_throttle_time milliseconds The time fetch throttle lasts on average.
aws.msk.network_processor_avg_idle_percent percent The percentage of time when the network processors are not in use.
aws.msk.pps_allowance_exceeded counts The number of bidirectional PPS surpassed the broker's allowed maximum, resulting in the form of the number of packets.
aws.msk.produce_local_time_ms_mean milliseconds The amount of time it takes the leader to process the request
aws.msk.produce_message_conversions_per_sec milliseconds The broker's capacity to produce a certain number of message conversions per second.
aws.msk.produce_message_conversions_time_ms_mean milliseconds The average amount of time spent converting message formats, measured.
aws.msk.produce_request_queue_time_ms_mean milliseconds The average amount of time that request messages spend in the queue.
aws.msk.produce_response_queue_time_ms_mean milliseconds The average amount of time that response messages spend in the queue.
aws.msk.produce_response_send_time_ms_mean milliseconds The average amount of time spent sending answer messages.
aws.msk.produce_throttle_byte_rate bytes The number of bytes per second that were throttled.
aws.msk.produce_throttle_queue_size count The number messages are currently in the throttling queue.
aws.msk.produce_throttle_time milliseconds The amount of time that the production throttle lasts.
aws.msk.produce_total_time_ms_mean milliseconds The average producing time.
aws.msk.remote_bytes_in_per_sec bytes The total amount of bytes moved to tiered storage, including information from log segments, indexes, and other auxiliary files.
aws.msk.remote_bytes_out_per_sec bytes The total amount of data that was moved from tiered storage in response to consumer fetches.
aws.msk.remote_log_manager_tasks_avg_idle_percent percent The proportion of idle time for the remote log management.
aws.msk.remote_log_reader_avg_idle_percent percent The average proportion of idle time the reader spent.
aws.msk.remote_log_reader_task_queue_size count The amount of tasks that are awaiting scheduling that are in charge of reading data from tiers of storage.
aws.msk.remote_read_error_per_sec count The overall error rate for read requests made by the given broker to tiered storage to obtain data in response to consumer fetches.
aws.msk.remote_read_requests_per_sec count The total number of read requests that the specified broker sends to tier-based storage in response to customer fetches.
aws.msk.remote_write_error_per_sec percent The overall percentage of write requests that failed that the specified broker sends to tiered storage in order to move data upstream.
aws.msk.replication_bytes_in_per_sec bytes The rate of data transmission from other brokers, expressed in bytes per second.
aws.msk.replication_bytes_out_per_sec bytes The amount of data sent to other brokers every second in bytes.
aws.msk.request_exempt_from_throttle_time milliseconds The amount of time that broker network and I/O threads take to execute requests that are not subject to throttling.
aws.msk.request_handler_avg_idle_percent percent The percentage of time that the request handler threads are not in use.
aws.msk.request_throttle_queue_size count The number of messages are currently in the throttling queue.
aws.msk.request_throttle_time milliseconds The milliseconds required to throttle an average request.
aws.msk.tcp_connections count The number of TCP segments with the SYN flag set for both incoming and outgoing traffic.
aws.msk.total_tier_bytes_lag bytes The total amount of data that is eligible for tiering on the broker but hasn't yet been moved to tiered storage.
aws.msk.traffic_bytes bytes The total amount of network traffic between clients (producers and consumers) and brokers in bytes.
aws.msk.volume_queue_length count The number of open read and write operation requests that must be finished within a certain time frame.
aws.msk.volume_read_bytes bytes The number of bytes read in a predetermined amount of time.
aws.msk.volume_read_ops count The number of read operations performed in a predetermined amount of time.
aws.msk.volume_total_read_time count The total amount of time taken by all read operations that were finished during a certain time frame, expressed in seconds.
aws.msk.volume_total_write_time count The total amount of time taken by all write operations that were finished during a certain time frame, expressed in seconds.
aws.msk.volume_write_bytes bytes The total amount of bytes written in a certain amount of time.
aws.msk.volume_write_ops count The number of writing operations performed in a predetermined amount of time.
aws.msk.fetch_message_conversions_per_sec count/second The rate at which messages were transformed after being fetched.
aws.msk.messages_in_per_sec count/second The number of messages that are sent and received each second.
aws.msk.produce_message_conversions_per_sec count/second The amount of communications that are produced and converted each second.
aws.msk.remote_bytes_in_per_sec bytes The amount of data for the subject and broker that were defined that was migrated to tier-based storage.
aws.msk.remote_bytes_out_per_sec bytes The amount of data that was transferred from tiered storage in response to consumer fetches for the chosen subject and broker.
aws.msk.remote_read_error_per_sec count The frequency of errors in response to read requests.
aws.msk.remote_read_requests_per_sec count The number of read requests sent to tier-based storage in response to consumer data requests.
aws.msk.remote_write_error_per_sec count The frequency of errors in response to write requests.
aws.msk.estimated_time_lag milliseconds The estimated amount of time (in seconds) needed to drain the partition offset lag.
aws.msk.offset_lag count The number of offsets for a Partition-level consumer lag.

See also

Ingest metrics from Amazon

Create and manage dashboards

Create alerts

Updated Dec 7, 2022