Performance test and tune the Collector

Getting started with performance tuning

When running a single OpenTelemetry Collector instance for ingesting Prometheus metrics, there are two main areas of configuration for tuning: machine resources and Collector processor settings. Because the Collector is more sensitive to memory limits than CPU limits, this topic provides guidance on how to manage the memory effectively. It also recommends how to best configure the Collector processor settings.

An example of configuration and then testing and results are also provided.

Prerequisites

Familiarity with the the OpenTelemetry Collector.
A deployed OpenTelemetry Collector with a Prometheus Receiver. See our guide for deploying a single Collector.

Tuning guidance

In general the OpenTelemetry Collector is more sensitive to memory limits than CPU limits, so high-memory instances are ideal. We strongly recommend that the memory_limiter processor be enabled by default. While enabled, if memory usage rises above its default “soft limit” of 80% usage (as set by the limit_mib setting), the Collector will start dropping data and applying back pressure to the pipeline. If memory rises above the “hard limit” of 100% of limit_mib, the Collector will start repeatedly performing garbage collection. While this can prevent out-of-memory situations, ongoing dropped data and frequent garbage collection are not considered ideal conditions. If you see dropped points or unusually frequent GCs in the dashboard, the Collector will require more memory and a higher limit_mib setting.

The batch processor is also highly recommended. It batches outgoing data for better compression and reduction in network connections. There are three parameters for the processor. Two determine when batches are sent, and the third determines how large batches can be.

send_batch_size (default 8192 items): A batch will be sent if there are at least this many items (spans, metric points, or logs), in the processor’s queue.
timeout (default 200ms): A batch will be sent at minimum this often if there are any items in the queue.
send_batch_max_size (no default): If set, batches will contain no more than this many items. By default, there is no maximum batch size.

In general, larger batches and longer timeouts will lead to better compression (and therefore less network usage), but will also require more memory. If the Collector is experiencing memory pressure, try lowering the batch size and/or timeout settings. If you need to decrease Collector traffic, try increasing the batch size. Finally, if the Collector logs show messages being rejected for being too large (for example, “grpc: received message larger than max”), try setting or decreasing the send_batch_max_size setting.

Example load test settings

Load tests for the data below were performed in Google Kubernetes Engine (GKE) with a single OpenTelemetry Collector instance running the Prometheus receiver. We used Avalanche to generate metrics.

If you are attempting to replicate this load test in Cloud Observability, consider creating a separate Cloud Observability project for this purpose to isolate the auto-generated Avalanche metrics from other “real” metric data.

Collector machine configuration

Running isolated on an e2-standard-4 node
Memory: 13 Gi max
CPU: 4000m max

Collector pipeline configuration

See Tuning above for more information about the memory_limiter and batch processors, which are recommended for basic performance. We ran both resourcedetection and resource processors to mimic real life scenarios where label enrichment would likely also be occurring on incoming metrics within a Kubernetes environment.

Receiver: prometheusreceiver configured with scrape_targets copied from a running Prometheus server’s configuration.
Processors (applied in the following sequence):
1. memory_limiter
  - Configuration options: limit_mib: 8000
2. resourcedetection
3. resource
4. batch
  - Configuration options:
    - send_batch_size: 1000
    - send_batch_max_size = 1500
    - timeout: 1s
Exporter: otlp

Load test configuration

Avalanche instances were configured to record 10,000 to 100,000 distinct active timeseries (ATS) for different tests.
The total number of instances (scrape targets) were adjusted to achieve different total timeseries counts.
Each load test was run for approximately one hour to get stable active timeseries readings.

Recorded Performance

When testing the OpenTelemetry Collector running with the Prometheus Receiver we observed the following performance:

ATS per scrape target	# of scrape targets	CPU	Memory usage
100,000	4	1	3.5GB
100,000	7	1.7	5GB
100,000	10	3.2	7GB
20,000	50	1.3	2.5GB