Planning a Collector Deployment in Kubernetes
Lightstep recommends running the OpenTelemtry Collector with the Prometheus receiver to ingest infrastructure metrics. This ensures the highest data quality and completeness, and also allows the Collector to leverage the Prometheus ecosystem of exporters to scrape targets.
Choosing a deployment type
There are three approaches to deploying the Collector in Kubernetes:
- As a single replica Kubernetes deployment (one Collector within a Kubernetes cluster)
- As a Kubernetes DaemonSet deployment alongside a single replica deployment (adds one Collector per Kubernetes node)
- As a sharded multi-replica Kubernetes deployment (multiple Collectors within a Kubernetes cluster) - coming soon
When first deploying the OpenTelemtry Collector, you can start with a single replica deployment within a Kubernetes cluster, or for additional scalability, deploy Collectors as a DaemonSet to scrape appliction metrics. Both modes can be combined. For example, you can use the DaemonSet to scrape application metrics along with a single deploy to scrape infrastructure metrics and static targets.
Once the Collector has been deployed for a subset of services, it’s much easier to estimate metrics traffic for remaining services, and then plan and scale other the deployments appropriately. In any deployment mode, the Collector is configured to scrape Prometheus metrics using scrape targets. Compare and contrast the deployment types below:
Single Replica Deployment
Use Case: Trying out the OpenTelemetry Collector for the first time for a subset of services. Enables scraping application metrics, infrastructure metrics, and static targets in a Kubernetes cluster.
- Simplest to deploy.
- Straightforward to scale vertically (increasing resources for the single collector pod).
- Can be resource efficient compared with a DaemonSet deployment - nodes can have very different workloads, and since there is one Collector pod, you can provision resources for the total workload.
- Single point of failure - if the Collector fails, then this will result in loss of metrics from all nodes.
- Harder to scale horizontally to multi-replica since multiple Collectors will continue to scrape the same targets, unless sharding the Collector deployment (coming soon).
Use case: Scraping application metrics on each node. Best to use when workload on each node is constant or has some fixed limit, or the resources needed by each node is similar. Pods in a DaemonSet have the same configuration, so if all the nodes have similar resource needs, resources will be used more efficiently.
- No single point of failure for the Collector - if one of the pods in the DaemonSet fail, then only a subset of metric data is lost.
- Reduced node-to-node network traffic as the Collector is scraping metrics locally.
- Daemonset pods can’t be scaled individually.
- Can be resource inefficient - if the nodes have different workloads, the configuration for the pods in the DaemonSet would have to be based off the most demanding node in the cluster.