Lightstep Observability’s Service diagram shows the hierarchy of your services and an aggregate view of trace data as it flows through your system. Here’s the Service diagram for the Hipster Shop: Service diagram of the services in Hipster Shop

You can see that the frontend service talks directly to 6 other services, but there are two that it might be affected by indirectly. It would be good to test that if latency is introduced, how that affects the communication with those services lower in the stack. Using the Service diagram is a great way to find potential issues.

The Lightstep SDK that you’ll run takes a span ID from the service whose dependencies you’re interested in testing, and uses the Lightstep Observability API to traverse the service hierarchy, based on the request path of the associated trace. It then determine the duration of requests between the services and multiplies that latency by 10 to create the chaos attack.

To run the attack, you need to find a span ID and then set that, along with other values, as environment variables.

Find the span ID

  1. In Lightstep Observability, click Explorer in the navigation bar. In the Explorer view, click the Service Diagram tab to open it.Explorer view

  2. Choose the node in the diagram for the service whose hierarchy you want to test. If you’re using the Hipster Shop, make sure the frontend service is selected (has a blue center).

    The panel on the left shows you span data from each operation on the service.Span data in the Service diagram

  3. Click on any span to open it in the Trace view. Trace view

    The panel on the right shows you the span metadata.

  4. Click the Details tab. Find the Span ID and copy the value.Detail trace metadata

Set environment variables

You need to set environment variables for the following:

  • The name of your Lightstep Observability organization and project You can find your project name and organization name in Project settings and Account management.
  • The Span ID
  • Your Lightstep Observability API key
  • Your Gremlin API key

Set Environment Variables

# configuration values
export LIGHTSTEP_API_KEY=your-api-key
export LIGHTSTEP_ORG=your-org
export LIGHTSTEP_PROJECT=your-project
export LIGHTSTEP_SPAN=span-for-attack
export GREMLIN_API_KEY=gremlin-api-key

Run the SDK to start the Chaos attack

Run the following from a command line to start the attack. By default, this script will create a “2x” latency attack that doubles latency between the service you specify and all of that services’ downstream dependencies.

To run this on an app other than the Hipster Shop, replace frontend with the service you are targeting for Gremlin latency attacks.

npx -p git:// lightstep gremlin --project $LIGHTSTEP_PROJECT --trace-id $LIGHTSTEP_SPAN frontend

The console shows Gremlin creating the attack.Console output from Gremlin

In Gremlin, you can see the attacks (defined by the trace from Lightstep Observability) as it injects latency between the services along the request. As the attacks start running together, you can see the impact of cascading latency in the system.Gremlin attacks

Now it’s time to see what the increased latency does to the app.

What did we learn?

  • The Service diagram in Lightstep Observability is a great tool for discovering service dependencies and where things might go wrong.
  • The Lightstep SDK uses the Lightstep Observability API to retrieve trace data and build a graph for Gremlin to use to generate attacks on dependent services.
  • To run this on an app other than the Hipster Shop, you just to specify the service you are targeting for Gremlin latency attacks.