LightStep uses Satellites to collect 100% of the performance data that your tracing instrumentation generates. Satellites collect spans generated by instrumented clients and servers, and then process and temporarily store that data during trace assembly. The LightStep Engine, the remote SaaS component, queries the Satellites, records aggregate information about the spans, directs the trace assembly process, and then stores traces durably, all for display in the LightStep UI.
LightStep does not assemble every trace. From the 100% of spans collected, only traces that match Streams - Streams - Persistent time-series trace data that matches a predicate such as a combination of service (component) name, operation, and tag values. Streams allow the analysis of specific facets of the generated tracing data, so you create streams based on data you know you always want to be queried. or serve as examples of application errors, high latency, or other interesting events are assembled.
LightStep offers three Satellite types, used at different times during the development lifecycle:
- Local Developer Mode Satellite installed on your machine
- Public shared Satellite pool
- On-premise Satellite pool
Typically during development, especially when instrumenting a single service, you only want to see traces from your service, rather than having to wait to deploy to see results. LightStep's Developer Mode includes a Satellite you run on your machine. Because this Satellite communicates only with your local code, you see only your traces, speeding up instrumentation, testing, and debugging.
Once several services are instrumented, you want to see how the instrumentation is working throughout your system. At this point, you can configure your tracers to communicate with LightStep's public shared Satellite pool. Not having to install and maintain on-premise Satellites accelerates the initial production of meaningful traces right away. However, you should only use the shared Satellite pool during development - production traffic must always be sent to on-premise Satellites for the following reasons:
- Because the public Satellites are mean for development only, LightStep offers no SLA or guarantee regarding uptime or availability.
- Public Satellites cannot be scrubbed for personally identifiable information (PII). LightStep is not responsible for any PII sent to these Satellites.
- You can't control the Satellite recall on public Satellites. They have a fixed recall, which means they may not store enough of your data to generate complete traces 100% of the time.
- Public Satellites are rate-limited. Only a certain number are allocated to each project and your project may require more Satellites for 100% coverage.
- Because public Satellites are remote, you may experience network latency.
For production environments, you want complete control over your Satellites. You can download, install, and tune Satellites to fit your exact needs. Satellites are straightforward to deploy using a Docker image, AWS AMI, or Debian package.
Using public Satellites or working in Developer Mode?
Then that's all you really need to know about Satellites, as LightStep installs and maintains them. If you're using on-premises Satellites, read further to understand how they work, how many to use, and how to install, monitor, and maintain them.
Each Satellite holds 100% of unsampled recent spans in a temporary buffer. When a Satellite receives spans from a client, it places them in that buffer, discarding older spans as newer spans arrive. The length of time between the newest and the oldest span currently held in that buffer is the recall or recall window. This window shows how far back into the past LightStep can look to find spans while assembling a trace. For example, if the current recall is 5 minutes, then when an application error occurs, any spans reported in the last 5 minutes can be assembled as part of a trace for that error.
At any given time, each Satellite has its own recall value, and each Satelite pool has a distribution of recall values.
Satellite recall is mainly affected by the following factors:
- Number of Satellites
bytes-per-projectsetting on the Satellite
- Rate and size (including logs) of spans reported by the tracer
- Number and uniqueness of span tags
You can't directly configure the recall window; it is proportional to the amount of span traffic sent from all tracer clients and the available Satellite memory. You can achieve longer recall by either reducing the amount of span traffic or increasing the available memory of Satellites in the pool (either by increasing the available memory per instance, or the overall number of instances).
Seeing a variance in the recall window for your Satellites?
Significant variance in recall figures across Satellites within a Satellite pool can be a symptom of load imbalance. Load imbalance can limit your ability to tune the recall window to the desired length, and indicates suboptimal resource usage, as the useful capacity of the overall pool is limited by the lower bound Satellites. See Load Balance Satellites for recommendations on tuning your Satellites.
Insufficient recall happens when your pool of on-premise Satellites is under-provisioned and is a key indicator that you should scale your pool by increasing the number of Satellites you are running and/or choosing machines with more memory.
Remember that insufficient recall does not equal insufficient trace retention. Satellite recall is only relevant to spans that have not yet been assembled as part of a trace. Once LightStep assembles a trace, it retains all spans in that trace for as long as your Data Retention policy.
This depends on the rate at which spans are sent to the Satellites and the number and sizes of logs associated with those spans. Available memory is often the limiting resource. LightStep recommends that the total amount of memory among all Satellite instances is at least as large as the memory required.
You can use this calculation to get started (it only considers spans, so more memory will likely be needed):
memory_required = requests/second * spans/request * bytes/span * seconds_of_recall
requests/second: The number of requests your software handles.
spans/request: The number of spans created during each request.
bytes/span: This depends on the number of tags that you use and a few other factors, but the baseline is about 100 bytes per span. Most users can assume less than 500 bytes per span with typical tags, etc.
seconds_of_recall: Depends on the duration of the longest traces you expect to generate; a good rule of thumb is to use at least 60 seconds + (2 * duration of the longest expected trace).
To complete the calculation, determine the following about your system:
- What is your longest expected trace duration in seconds?
- How many requests per second do you anticipate making to the Satellites?
- How many spans per request do you anticipate?
- How many bytes per span do you expect? (500 is the usual amount with typical tags)
- What is the recommended/desired recall? (At least 60s + 2x longest trace duration)
Based on this, multiplying values of the 2nd through 5th bullet points should produce the recommended total memory required across the satellite pool. This value will be in bytes. Satellites use most of their allocated memory for internal operation, leaving about one-quarter of the RAM for indexing spans. Because each 16 GB satellite will have around 4GB available for span indexing, divide the recommended total memory by 4e9 to get the total number of 16Gb satellites you should provision, to start with. As you start to receive data in LightStep, you'll be better able to tune and load balance
LightStep recommends using machines with 2 CPU and 16 GB of memory each.
Note that tracer client libraries and Satellites will degrade gracefully if the pool is under-provisioned, so the only impact of making your pool too small is lower quality traces.
Learn more about installing and configuring Satellites here.
Find our recommendations for load balancing your Satellites here
Depending on your application and production environment, you may choose to set up several Satellite pools. You should set up at least one Satellite in each region where you run a backend, as this will limit cross-region traffic. You might also set up separate pools to isolate developer and production traffic.
Satellites use a Satellite key associated with your organization. Unlike your access token (which provides only the ability to report spans from your tracer), the Satellite key allows the Satellite to read a small amount of configuration associated with projects in your organization, and the current list of operations that appear on dashboards. Access to this data allows Satellites to compute aggregated statistics for these operations. These credentials don't provide read access to any spans or traces.
LightStep continuously publishes new Satellite versions to improve the quality of collection and reporting. We offer versions for Docker, AWS, and Debian. You receive notifications when a new version is available.
You may decide to use public satellites for the development environment for convenience and use on-premise Satellites for production to ensure better performance and isolation from other organizations. You can do this by creating separate projects for each environment.
You can enforce a production project to only use on-premise Satellites from the Project Settings page.
To enforce on-premise Satellite use:
- Open the project to use on-premise Satellites.
- From the navigation bar, click Project Settings.
- In the Satellites area, select Private Satellite Pool Only.
Any spans sent to this project from public Satellites will be rejected.