AWS MSK metrics

Once you’ve integrated with AWS CloudWatch, you have access to metrics from AWS Managed Streaming for Apache Kafka (MSK), which is a fully managed service that makes it simple to create and run programs that process streaming data using Apache Kafka.

See all AWS integrations.

The following table shows the MSK metrics ingested by Cloud Observability.

Metric Name	Unit	Description
aws.msk.active_controller_count	count	The number of controller active in each cluster at any given moment.
aws.msk.burst_balance	count	The number of cluster's remaining balance of input-output burst credits for EBS volumes.
aws.msk.bytes_in_per_sec	bytes/second	The rate at which clients send data.
aws.msk.bytes_out_per_sec	bytes/second	The rate of data sent to customers.
aws.msk.client_connection_count	count	The number of open connections between authenticated clients.
aws.msk.connection_count	count	The number of active inter-broker connections, unauthenticated connections, and authenticated connections.
aws.msk.cpu_credit_balance	count	The number CPU credit balance on the brokers.
aws.msk.cpu_idle	percent	The propotion of idle CPU time.
aws.msk.cpu_io_wait	percent	The proportion of CPU downtime while a disk operation is still running.
aws.msk.cpu_system	count	The number of CPUs used by the kernel.
aws.msk.cpu_user	count	The number of CPUs used in user space.
aws.msk.global_partition_count	count	The total number of partitions in the cluster, excluding replicas, across all subjects.
aws.msk.global_topic_count	count	The number of aggregate topic count for all brokers in the cluster.
aws.msk.estimated_max_time_lag	milliseconds	The amount of time will take to exhaust `MaxOffsetLag,` in seconds.
aws.msk.kafka_app_logs_disk_used	percent	The proportion of disk space taken up by program logs.
aws.msk.kafka_data_logs_disk_used	percent	The proportion of disk space dedicated to data logs.
aws.msk.leader_count	count	The total number of partition leaders per broker, excluding replicas.
aws.msk.max_offset_lag	count	The number of offset lag among all topic partitions.
aws.msk.memory_buffered	bytes	The number of broker's buffered memory size, expressed in bytes.
aws.msk.memory_cached	bytes	The number of broker's cached memory size in bytes.
aws.msk.memory_free	bytes	The amount of memory that is unoccupied and accessible for the broker, measured in bytes.
aws.msk.heap_memory_after_gc	percent	The percentage of the heap's total memory that is still in use after garbage collection.
aws.msk.memory_used	bytes	The amount of memory that the broker is currently using.
aws.msk.messages_in_per_sec	count/second	The number broker's capacity for the number of incoming messages per second.
aws.msk.network_rx_dropped	count	The number of packets dropped during delivery.
aws.msk.network_rx_errors	count	The number of network receive errors.
aws.msk.network_rx_packets	count	The number of packets the broker has received.
aws.msk.network_tx_dropped	count	The number of transmit packets that were dropped.
aws.msk.network_tx_errors	count	The total number of network transmit errors.
aws.msk.network_tx_packets	count	The number of packets the broker sent out.
aws.msk.offline_partitions_count	count	The total number of offline partitions in the cluster.
aws.msk.partition_count	count	The total number of topic divisions for each broker, including replicas.
aws.msk.produce_total_time_ms_mean	milliseconds	The number of milliseconds of the average producing time.
aws.msk.request_bytes_mean	bytes	The average number of request bytes.
aws.msk.request_time	milliseconds	The amount of time that the broker's I/O and network threads spend processing requests.
aws.msk.root_disk_used	percent	The proportion of the broker's root disk that is being used.
aws.msk.sum_offset_lag	count	The total offset lag for all partitions in a topic.
aws.msk.swap_free	bytes	The amount of swap memory that is free to the broker, measured in bytes.
aws.msk.swap_used	bytes	The amount of swap memory that the broker is currently using, measured in bytes.
aws.msk.traffic_shaping	count	The number of packets that were shaped (dropped or queued).
aws.msk.under_min_isr_partition_count	count	The number of broker under-minIsr partitions.
aws.msk.under_replicated_partitions	count	The number of broker's under-replicated partition count.
aws.msk.zoo_keeper_request_latency_ms_mean	milliseconds	The average response time for requests sent to Apache ZooKeeper from a broker.
aws.msk.zoo_keeper_session_state	float	The number of connection statuses for the broker's ZooKeeper session.
aws.msk.bw_in_allowance_exceeded	packets	The amount of packets changed as a result of the broker's maximum allocatable bandwidth being surpassed by the inbound aggregate bandwidth.
aws.msk.bw_out_allowance_exceeded	packets	The number of packets changed as a result of the outgoing aggregate bandwidth being more than the broker's limit.
aws.msk.conn_track_allowance_exceeded	count	The number of connection tracking resulted in more packets than the broker could handle.
aws.msk.connection_close_rate	count	The number of listener connections that are lost each second.
aws.msk.connection_creation_rate	count	The number of per-listener new connections made in a second
aws.msk.cpu_credit_usage	count	The number of CPU credit is being used by the instances.
aws.msk.fetch_consumer_local_time_ms_mean	milliseconds	The average amount of time needed for the leader to process a customer request.
aws.msk.fetch_consumer_request_queue_time_ms_mean	milliseconds	The average amount of time that a consumer request spends in the request queue.
aws.msk.fetch_consumer_response_queue_time_ms_mean	milliseconds	The average amount of time that the consumer request sits in the response queue.
aws.msk.fetch_consumer_response_send_time_ms_mean	milliseconds	The average amount of time that it takes a customer to send a response.
aws.msk.fetch_consumer_total_time_ms_mean	milliseconds	The average amount of time users spend obtaining data from the broker.
aws.msk.fetch_follower_local_time_ms_mean	milliseconds	The average amount of time it takes the leader to complete a follower request.
aws.msk.fetch_follower_request_queue_time_ms_mean	milliseconds	The amount of time a follower request sits in the queue.
aws.msk.fetch_follower_response_queue_time_ms_mean	milliseconds	The average amount of time a follower request hangs out in the response queue.
aws.msk.fetch_follower_response_send_time_ms_mean	milliseconds	The response time a follower sends after receiving a message.
aws.msk.fetch_follower_total_time_ms_mean	milliseconds	The average amount of time followers spend obtaining data from the broker.
aws.msk.fetch_message_conversions_per_sec	count/second	The broker's fetch message conversion rate in terms of conversions per second.
aws.msk.fetch_throttle_byte_rate	bytes/second	The number of bytes per second that were throttled.
aws.msk.fetch_throttle_queue_size	count	The number of messages currently in the throttling queue.
aws.msk.fetch_throttle_time	milliseconds	The time fetch throttle lasts on average.
aws.msk.network_processor_avg_idle_percent	percent	The percentage of time when the network processors are not in use.
aws.msk.pps_allowance_exceeded	counts	The number of bidirectional PPS surpassed the broker's allowed maximum, resulting in the form of the number of packets.
aws.msk.produce_local_time_ms_mean	milliseconds	The amount of time it takes the leader to process the request
aws.msk.produce_message_conversions_per_sec	milliseconds	The broker's capacity to produce a certain number of message conversions per second.
aws.msk.produce_message_conversions_time_ms_mean	milliseconds	The average amount of time spent converting message formats, measured.
aws.msk.produce_request_queue_time_ms_mean	milliseconds	The average amount of time that request messages spend in the queue.
aws.msk.produce_response_queue_time_ms_mean	milliseconds	The average amount of time that response messages spend in the queue.
aws.msk.produce_response_send_time_ms_mean	milliseconds	The average amount of time spent sending answer messages.
aws.msk.produce_throttle_byte_rate	bytes	The number of bytes per second that were throttled.
aws.msk.produce_throttle_queue_size	count	The number messages are currently in the throttling queue.
aws.msk.produce_throttle_time	milliseconds	The amount of time that the production throttle lasts.
aws.msk.produce_total_time_ms_mean	milliseconds	The average producing time.
aws.msk.remote_bytes_in_per_sec	bytes	The total amount of bytes moved to tiered storage, including information from log segments, indexes, and other auxiliary files.
aws.msk.remote_bytes_out_per_sec	bytes	The total amount of data that was moved from tiered storage in response to consumer fetches.
aws.msk.remote_log_manager_tasks_avg_idle_percent	percent	The proportion of idle time for the remote log management.
aws.msk.remote_log_reader_avg_idle_percent	percent	The average proportion of idle time the reader spent.
aws.msk.remote_log_reader_task_queue_size	count	The amount of tasks that are awaiting scheduling that are in charge of reading data from tiers of storage.
aws.msk.remote_read_error_per_sec	count	The overall error rate for read requests made by the given broker to tiered storage to obtain data in response to consumer fetches.
aws.msk.remote_read_requests_per_sec	count	The total number of read requests that the specified broker sends to tier-based storage in response to customer fetches.
aws.msk.remote_write_error_per_sec	percent	The overall percentage of write requests that failed that the specified broker sends to tiered storage in order to move data upstream.
aws.msk.replication_bytes_in_per_sec	bytes	The rate of data transmission from other brokers, expressed in bytes per second.
aws.msk.replication_bytes_out_per_sec	bytes	The amount of data sent to other brokers every second in bytes.
aws.msk.request_exempt_from_throttle_time	milliseconds	The amount of time that broker network and I/O threads take to execute requests that are not subject to throttling.
aws.msk.request_handler_avg_idle_percent	percent	The percentage of time that the request handler threads are not in use.
aws.msk.request_throttle_queue_size	count	The number of messages are currently in the throttling queue.
aws.msk.request_throttle_time	milliseconds	The milliseconds required to throttle an average request.
aws.msk.tcp_connections	count	The number of TCP segments with the SYN flag set for both incoming and outgoing traffic.
aws.msk.total_tier_bytes_lag	bytes	The total amount of data that is eligible for tiering on the broker but hasn't yet been moved to tiered storage.
aws.msk.traffic_bytes	bytes	The total amount of network traffic between clients (producers and consumers) and brokers in bytes.
aws.msk.volume_queue_length	count	The number of open read and write operation requests that must be finished within a certain time frame.
aws.msk.volume_read_bytes	bytes	The number of bytes read in a predetermined amount of time.
aws.msk.volume_read_ops	count	The number of read operations performed in a predetermined amount of time.
aws.msk.volume_total_read_time	count	The total amount of time taken by all read operations that were finished during a certain time frame, expressed in seconds.
aws.msk.volume_total_write_time	count	The total amount of time taken by all write operations that were finished during a certain time frame, expressed in seconds.
aws.msk.volume_write_bytes	bytes	The total amount of bytes written in a certain amount of time.
aws.msk.volume_write_ops	count	The number of writing operations performed in a predetermined amount of time.
aws.msk.fetch_message_conversions_per_sec	count/second	The rate at which messages were transformed after being fetched.
aws.msk.messages_in_per_sec	count/second	The number of messages that are sent and received each second.
aws.msk.produce_message_conversions_per_sec	count/second	The amount of communications that are produced and converted each second.
aws.msk.remote_bytes_in_per_sec	bytes	The amount of data for the subject and broker that were defined that was migrated to tier-based storage.
aws.msk.remote_bytes_out_per_sec	bytes	The amount of data that was transferred from tiered storage in response to consumer fetches for the chosen subject and broker.
aws.msk.remote_read_error_per_sec	count	The frequency of errors in response to read requests.
aws.msk.remote_read_requests_per_sec	count	The number of read requests sent to tier-based storage in response to consumer data requests.
aws.msk.remote_write_error_per_sec	count	The frequency of errors in response to write requests.
aws.msk.estimated_time_lag	milliseconds	The estimated amount of time (in seconds) needed to drain the partition offset lag.
aws.msk.offset_lag	count	The number of offsets for a Partition-level consumer lag.

AWS MSK metrics

See also