Once you’ve integrated with AWS CloudWatch, you have access to all metrics for the AWS Elastic Inference service, which allows you to attach GPU-powered acceleration to Sagemaker and EC2 instances.

is a resource you can attach to your SageMaker instances, Amazon Deep Learning Containers, and Amazon Elastic Compute Cloud CPU instances.

All available AWS integrations

To verify metrics are reporting, search for the metrics in the Metric details section of the Project Settings page.

The following table shows the Elastic Inference metrics ingested by Lightstep.

Metric Name Unit Description
aws.elasticinference.accelerator_health_check_failed integer Indicates whether a recent status health check on the Elastic Inference accelerator was successful.
aws.elasticinference.connectivity_check_failed count Indicates whether or not connectivity to the Elastic Inference accelerator is currently active or has recently failed.
aws.elasticinference.accelerator_memory_usage bytes The most recent accelerator memory usage.
aws.elasticinference.accelerator_utilization percent The percentage of the Elastic Inference accelerator that was most recently used.
aws.elasticinference.accelerator_total_inference_count count The number of inference requests that have arrived to the Elastic Inference accelerator in the most recent minute.
aws.elasticinference.accelerator_successful_inference_count count The number of inference requests that were successful and made it to the Elastic Inference accelerator in the previous minute.
aws.elasticinference.accelerator_inference_with_client_error_count count The number of inference requests that encountered a 4xx error in the last minute and made it to the Elastic Inference accelerator.
aws.elasticinference.accelerator_inference_with_server_error_count count The number of inference requests that received a 5xx error and were sent to the Elastic Inference accelerator in the last minute.