Want to use OpenTelemetry instead? Read these docs to get started!
A span can have zero or more key/value tags. Tags allow you to create metadata about the span. For example, you might create tags that hold a customer ID, or information about the environment that the request is operating in, or an app’s release. Tags do not reflect any time-based event (logs handle events). The OpenTracing spec defines several standard tags. For example, here are the tags available using the Java-based tracer.
You can also implement your own tags and logs. The following are tags and logs that work very well with Lightstep in addition to the OpenTracing semantic conventions. Any tag that you add to your span data will enable more segmentation, making it easier to find, filter, and group your span data in Lightstep. Lightstep doesn’t have cardinality limitations, so the more tags you use, the greater your insights will be.
In particular, tags that allow you to segment user pathways are useful. Adding things like “parameters” (
params.count), that correspond to the operation on the span and tell an operation which path to take depending on user input, are also very helpful for grouping, filtering, and segmenting. Otherwise, you may optimize for one use case without noticing some other outlier use case that only gets triggered 1/4 the time. Correlations will also be able to spot the outliers from these tag values.
Best Practices When Creating Tags and Logs
- Standardized tags and logs help ensure efficient root-cause analysis. Make sure your tag and log names are clear, descriptive, and apply to the entirety of the resource they are describing.
- Use semantic names, for example
Define namespaces, for example
This is especially important when multiple service teams have their own tags and logs
- Keep names short and sweet
- Set error tags on error spans, for example
Following are recommended tags (other than the OpenTracing tags) that provide greater visibility into your span data.
Use the OpenTracing semantic tags whenever possible.
User-related attributes provide context about your application’s users.
- Customer segments:
- Anonymous identifiers of transactions:
- Hardware versions
- Identifier of the user’s hardware:
Software-related tags provide context about your application’s software.
- Parameters an operation was called with:
Production code versions:
service.versiontag allows you to monitor deploys in Lightstep.
- Status codes:
http.status_code_groupsuch as 4xx, 5xx, 2xx.
- Boolean error types:
internal.error(differentiating when an error is caused by a user, for example a 404, 400 versus 500)
These help to quickly figure out the magnitude of exceptions or specific error types that are occurring.
- Entity IDs for the entity being fetched from the database or worked with:
- gRPC calls:
- Retry attempts:
- Feature flags:
- A/B tests:
canary: true/false, or other A/B test
pubsub.message_id, and other tags corresponding to
- Stack traces:
- Application flow: a human-readable name of common flows represented by traces, like
- Service name-spacing for service-specific tags:
Data-related tags provide context about the data in your application.
payload.size, or other size tags when sending and receiving data.
Infrastructure-related tags provide context about your application’s infrastructure
region, or any sort of regional, zone, or geographical tag.
- Container management:
node.id, to show when a problem is isolated to a particular cluster, pod or node.
- Sanitized payload of a request and a response (clear any personally identifiable information).
- Events that are occurring within the span, for example,
sanitized payload for request, forwarding to <xyz>.
- Stack trace or exception messages and error messages.
- When things are returning, processing, or waiting, for example,
context deadline exceeded. An operation may go for a few seconds and logging can add context on what it’s doing or what it’s waiting for.
- Any additional context. If a user hits a certain flow and it’s non-obvious by the operations, a simple log message can be helpful, for example, “user entered flow x”.