A span can have zero or more key/value attributes. Attributes allow you to create metadata about the span. For example, you might create attributes that hold a customer ID, or information about the environment that the request is operating in, or an app’s release. Attributes do not reflect any time-based events (log events handle events). The OpenTelemetry spec defines several standard attributes.
You can also implement your own attributes and events. The following are attributes and log events that work very well with Lightstep Observability in addition to the OpenTelemetry semantic conventions. Any attribute that you add to your span data will enable more segmentation, making it easier to find, filter, and group your span data in Lightstep Observability. Lightstep doesn’t have cardinality limitations, so the more attributes you use, the greater your insights will be.
Lightstep Observability can analyze your instrumentation and recommend ways to improve it. If there is metadata that you’d like all services to report to Lightstep Observability (like a customer ID or Kubernetes region), you can register the corresponding attributes and Lightstep Observability will check for those when determining the IQ score.
In particular, attributes that allow you to segment user pathways are useful. Adding things like “parameters” (params.name
, params.count
), that correspond to the operation on the span and tell an operation which path to take depending on user input, are also very helpful for grouping, filtering, and segmenting. Otherwise, you may optimize for one use case without noticing some other outlier use case that only gets triggered 1/4 the time. Correlations will also be able to spot the outliers from these attribute values.
Best practices when creating attributes and log events
- Standardized attributes and event logs help ensure efficient root-cause analysis. Make sure your attribute and event names are clear, descriptive, and apply to the entirety of the resource they are describing.
- Use semantic names, for example
app.service.version
-
Define namespaces, for example
app.component.name
This is especially important when multiple service teams have their own attribute and logs
- Keep names short and sweet
- Set error attributes on error spans, for example
client.error
Useful attributes
Following are recommended attributes (other than the OpenTelemetry attributes) that provide greater visibility into your span data.
Use the OpenTelemetry semantic attributes whenever possible.
User-related attributes
User-related attributes provide context about your application’s users.
- Customer segments:
support.level
oruser.type
- Anonymous identifiers of transactions:
request.id
,uuid
- Hardware versions
- Identifier of the user’s hardware:
platform
,ios.version
Software-related attributes
Software-related attributes provide context about your application’s software.
- Parameters an operation was called with:
params.count
,params.name
,params.type
-
Production code versions:
version
,library.version
,api.version
The
service.version
attribute allows you to monitor deploys in Lightstep Observability. - Status codes:
http.status_code_group
such as 4xx, 5xx, 2xx. - Boolean error types:
client.error
versusinternal.error
(differentiating when an error is caused by a user, for example a 404, 400 versus 500) -
Errors:
exception.class
,exception.message
,unified_error_code
These help to quickly figure out the magnitude of exceptions or specific error types that are occurring.
- Entity IDs for the entity being fetched from the database or worked with:
project.id
,user.id
- gRPC calls:
grpc.method
,grpc.status_code
- Retry attempts:
retry_attempt
,max_retry_attempts
- Feature flags:
feature_flag.<feature_flag_name>: true/false
, - A/B tests:
canary: true/false
, or other A/B test - Pub/Sub:
pubsub.topic
,pubsub.message_id
, and other attributes corresponding topubsub
mechanisms. - Stack traces:
stack_trace_hash
- Application flow: a human-readable name of common flows represented by traces, like
checkout
orsearch
. - Service name-spacing for service-specific attributes:
svc.<service_name>.<thing>
,svc.users.table
,svc.users.database
,svc.users.index_name
Data-related attributes
Data-related attributes provide context about the data in your application.
- Payload:
payload.size
, or other size attributes when sending and receiving data. - Request:
request.bytes
,request.size_bucket
- Response:
response.size
,response.size_bucket
Infrastructure-related attributes
Infrastructure-related attributes provide context about your application’s infrastructure
- Area:
host.dc
,zone.name
,zone.id
,region
, or any sort of regional, zone, or geographical attribute. - Container management:
kubernetes_cluster
,pod.id
,node.id
, to show when a problem is isolated to a particular cluster, pod or node.
Useful events to log
- Sanitized payload of a request and a response (clear any personally identifiable information).
- Events that are occurring within the span, for example,
sanitized payload for request, forwarding to <xyz>
. - Stack trace or exception messages and error messages.
- When things are returning, processing, or waiting, for example,
context deadline exceeded
. An operation may go for a few seconds and logging can add context on what it’s doing or what it’s waiting for. - Any additional context. If a user hits a certain flow and it’s non-obvious by the operations, a simple log message can be helpful, for example, “user entered flow x”.