This topic is about our Classic Satellites. If you installed Satellites after 4/06/2021, you are probably running Microsatellites.
What is a Satellite?
Lightstep uses Satellites to collect 100% of the performance data that your tracing instrumentation generates. Satellites collect telemetry data generated by instrumented clients and servers, and then process and temporarily store that data during trace assembly. The SaaS platform records aggregate information about the spans, directs the trace assembly process, and then stores traces durably, all for display in the Lightstep UI.
Learn more about Lightstep’s architecture.
Lightstep offers three Satellite types, used at different times during the development lifecycle:
- Local Developer Mode Satellite installed on your machine: Used like a sandbox when developing your service(s).
Public Satellites: A Lightstep-managed shared pool of Satellites.
- On-premise Satellites: Installed and run in your environment. You maintain and tune these to suit your application’s needs.
Typically during development, especially when instrumenting a single service, you only want to see traces from your local environment, rather than having to wait to deploy to see results. Lightstep’s Developer Mode includes a Satellite you run on your machine. Because this Satellite communicates only with your local code, you see only your traces, speeding up instrumentation, testing, and debugging.
For lower throughput environments that don’t want to maintain Microsatellites, you can use the Public shared Satellite pool. Not having to install and maintain On-premise Satellites accelerates the initial production of meaningful traces right away.
Before using the shared pool, note the following:
All span data is sent to Public Satellites, including possible personally identifiable information (PII). Even though the data is sent over an encrypted connection, this may or may not meet your organization’s security requirements. We recommend running On-premise Satellite if you need to remove PII.
You can’t control the Satellite recall on public Satellites. It depends on the rate and size of incoming spans, which means they may not store enough of your data to generate complete traces 100% of the time.
- Public Satellites are rate-limited. Community tier customer are rate limited to 4 MB per minute, and we recommend that Enterprise customers send less than 600 MB per minute.
- Lightstep offers no SLA or guarantee regarding uptime or availability.
Learn more about Public Satellites.
For production environments, you want complete control over your Satellites. You can download, install, and tune Satellites to fit your exact needs. Satellites are straightforward to deploy using a Docker image, AWS AMI, or Debian package.
If you’re using public Satellites or working in Developer Mode, then that’s all you really need to know about Satellites, as Lightstep installs and maintains them. If you’re using on-premises Satellites, read further to understand how they work, how many to use, and how to install, monitor, and maintain them.
Each Satellite holds 100% of unsampled recent spans in a temporary buffer. When a Satellite receives spans from a client, it places them in that buffer, discarding older spans as newer spans arrive. The length of time between the newest and the oldest span currently held in that buffer is the recall or recall window. This window shows how far back into the past Lightstep can look to find spans while assembling a trace. For example, if the current recall is 5 minutes, then when an application error occurs, any spans reported in the last 5 minutes can be assembled as part of a trace for that error.
At any given time, each Satellite has its own recall value, and each Satellite pool has a distribution of recall values.
Satellite recall is mainly affected by the following factors:
- Number of Satellites
bytes-per-projectsetting on the Satellite
- Rate and size (including logs) of spans reported by the tracer
- Number and uniqueness of span attributes
You can’t directly configure the recall window; it is proportional to the amount of span traffic sent from all tracer clients and the available Satellite memory. You can achieve longer recall by either reducing the amount of span traffic or increasing the available memory of Satellites in the pool (either by increasing the available memory per instance, or the overall number of instances).
Significant variance in recall figures across Satellites within a Satellite pool can be a symptom of load imbalance. Load imbalance can limit your ability to tune the recall window to the desired length, and indicates suboptimal resource usage, as the useful capacity of the overall pool is limited by the lower bound Satellites. See Load Balance Satellites for recommendations on tuning your Satellites.
Insufficient recall happens when your pool of on-premise Satellites is under-provisioned and is a key indicator that you should scale your pool by increasing the number of Satellite you are running and/or choosing machines with more memory.
Remember that insufficient recall does not equal insufficient trace retention. Satellite recall is only relevant to spans that have not yet been assembled as part of a trace. Once Lightstep assembles a trace, it retains all spans in that trace for as long as your Data Retention policy.
How Many Satellites Do I Need?
This depends on the rate at which spans are sent to the Satellites and the number and sizes of logs associated with those spans. Available memory is often the limiting resource. Lightstep recommends that the total amount of memory among all Satellite instances is at least as large as the memory required.
You can use this calculation to get started (it only considers spans, so more memory will likely be needed):
memory_required = requests/second * spans/request * bytes/span * seconds_of_recall
requests/second: The number of requests your software handles.
spans/request: The number of spans created during each request.
bytes/span: This depends on the number of attributes that you use and a few other factors, but the baseline is about 100 bytes per span. Most users can assume less than 500 bytes per span with typical attributes, etc.
seconds_of_recall: Depends on the duration of the longest traces you expect to generate; a good rule of thumb is to use at least 60 seconds + (2 * duration of the longest expected trace).
To complete the calculation, determine the following about your system:
- What is your longest expected trace duration in seconds?
- How many requests per second do you anticipate making to the Satellites?
- How many spans per request do you anticipate?
- How many bytes per span do you expect? (500 is the usual amount with typical attributes)
- What is the recommended/desired recall? (At least 60s + 2x longest trace duration)
Based on this, multiplying values of the 2nd through 5th bullet points should produce the recommended total memory required across the satellite pool. This value will be in bytes. Satellites use most of their allocated memory for internal operation, leaving about one-quarter of the RAM for indexing spans. Because each 16 GB satellite will have around 4GB available for span indexing, divide the recommended total memory by 4e9 to get the total number of 16Gb satellites you should provision, to start with. As you start to receive data in Lightstep, you’ll be better able to tune and load balance
Lightstep recommends using machines with 2 CPU and 16 GB of memory each.
Note that tracer client libraries and Satellites will degrade gracefully if the pool is under-provisioned, so the only impact of making your pool too small is lower quality traces.
Learn more about installing and configuring Satellites here.
Find our recommendations for load balancing your Satellites here
Depending on your application and production environment, you may choose to set up several Satellite pools. A pool is a group of one or more Satellites that use a single configuration for all Satellites in that pool. Pools are isolated from each other, so issues in pool won’t affect the Satellites in other pools. They increase both the reliability and manageability of your Satellites. You might also set up separate pools to isolate non-production and production traffic.
You should set up at least one Satellite in each region where you run a backend, as this will limit cross-region traffic.
Satellites use a Satellite key associated with your organization. Unlike your access token (which provides only the ability to report spans from your tracer), the Satellite key allows the Satellite to read a small amount of configuration associated with projects in your organization, and the current list of operations that appear on dashboards. Access to this data allows Satellites to compute aggregated statistics for these operations. These credentials don’t provide read access to any spans or traces.
Lightstep periodically publishes new Satellite versions to improve the quality of collection and reporting. We offer versions for Docker, AWS, and Debian. You receive notifications when a new version is available.
Using Public Satellites for Development and On-Premise for Production
You may decide to use public Satellites for the development environment for convenience and use on-premise Satellites for production to ensure better performance and isolation from other organizations. You can do this by creating separate projects for each environment.
You can enforce a production project to only use on-premise Satellites from the Project Settings page.
To enforce on-premise Satellite use:
- Open the project to use on-premise Satellites.
From the navigation bar, click Project Settings.
- In the Satellites area, select Private Satellite Pool Only. Any spans sent to this project from public Satellites will be rejected.