Getting started with performance tuning
When running a single OpenTelemetry Collector instance for ingesting Prometheus metrics, there are two main areas of configuration for tuning: machine resources and Collector processor settings. Because the Collector is more sensitive to memory limits than CPU limits, this topic provides guidance on how to manage the memory effectively. It also recommends how to best configure the Collector processor settings.
An example of configuration and then testing and results are also provided.
Prerequisites
- Familiarity with the the OpenTelemetry Collector.
- A deployed OpenTelemetry Collector with a Prometheus Receiver. See our guide for deploying a single Collector.
Tuning guidance
In general the OpenTelemetry Collector is more sensitive to memory limits than CPU limits, so high-memory instances are ideal. We strongly recommend that the memory_limiter processor be enabled by default. While enabled, if memory usage rises above its default “soft limit” of 80% usage (as set by the limit_mib
setting), the Collector will start dropping data and applying back pressure to the pipeline. If memory rises above the “hard limit” of 100% of limit_mib
, the Collector will start repeatedly performing garbage collection. While this can prevent out-of-memory situations, ongoing dropped data and frequent garbage collection are not considered ideal conditions. If you see dropped points or unusually frequent GCs in the dashboard, the Collector will require more memory and a higher limit_mib
setting.
The batch processor is also highly recommended. It batches outgoing data for better compression and reduction in network connections. There are three parameters for the processor. Two determine when batches are sent, and the third determines how large batches can be.
send_batch_size
(default 8192 items): A batch will be sent if there are at least this many items (spans, metric points, or logs), in the processor’s queue.timeout
(default 200ms): A batch will be sent at minimum this often if there are any items in the queue.send_batch_max_size
(no default): If set, batches will contain no more than this many items. By default, there is no maximum batch size.
In general, larger batches and longer timeouts will lead to better compression (and therefore less network usage), but will also require more memory. If the Collector is experiencing memory pressure, try lowering the batch size and/or timeout settings. If you need to decrease Collector traffic, try increasing the batch size. Finally, if the Collector logs show messages being rejected for being too large (for example, “grpc: received message larger than max”), try setting or decreasing the send_batch_max_size
setting.
Example load test settings
Load tests for the data below were performed in Google Kubernetes Engine (GKE) with a single OpenTelemetry Collector instance running the Prometheus receiver. We used Avalanche to generate metrics.
If you are attempting to replicate this load test in Lightstep, consider creating a separate Lightstep project for this purpose to isolate the auto-generated Avalanche metrics from other “real” metric data.
Collector machine configuration
- Running isolated on an
e2-standard-4
node - Memory: 13 Gi max
- CPU: 4000m max
Collector pipeline configuration
See Tuning above for more information about the memory_limiter
and batch
processors, which are recommended for basic performance. We ran both resourcedetection
and resource
processors to mimic real life scenarios where label enrichment would likely also be occurring on incoming metrics within a Kubernetes environment.
-
Receiver:
prometheusreceiver
configured withscrape_targets
copied from a running Prometheus server’s configuration. - Processors (applied in the following sequence):
memory_limiter
- Configuration options:
limit_mib: 8000
- Configuration options:
resourcedetection
resource
batch
- Configuration options:
send_batch_size: 1000
send_batch_max_size = 1500
timeout: 1s
- Configuration options:
- Exporter:
otlp
Load test configuration
- Avalanche instances were configured to record 10,000 to 100,000 distinct active timeseries (ATS) for different tests.
- The total number of instances (scrape targets) were adjusted to achieve different total timeseries counts.
- Each load test was run for approximately one hour to get stable active timeseries readings.
Recorded Performance
When testing the OpenTelemetry Collector running with the Prometheus Receiver we observed the following performance:
ATS per scrape target | # of scrape targets | CPU | Memory usage |
---|---|---|---|
100,000 | 4 | 1 | 3.5GB |
100,000 | 7 | 1.7 | 5GB |
100,000 | 10 | 3.2 | 7GB |
20,000 | 50 | 1.3 | 2.5GB |