Once you’ve integrated with AWS CloudWatch, you have access to all metrics for the AWS Elastic Inference service, which allows you to attach GPU-powered acceleration to Sagemaker and EC2 instances.
is a resource you can attach to your SageMaker instances, Amazon Deep Learning Containers, and Amazon Elastic Compute Cloud CPU instances.
See all AWS integrations.
To verify metrics are reporting, search for the metrics in the Metric details section of the Settings page.
The following table shows the Elastic Inference metrics ingested by Cloud Observability.
Metric Name | Unit | Description |
---|---|---|
aws.elasticinference.accelerator_health_check_failed | integer | Indicates whether a recent status health check on the Elastic Inference accelerator was successful. |
aws.elasticinference.connectivity_check_failed | count | Indicates whether or not connectivity to the Elastic Inference accelerator is currently active or has recently failed. |
aws.elasticinference.accelerator_memory_usage | bytes | The most recent accelerator memory usage. |
aws.elasticinference.accelerator_utilization | percent | The percentage of the Elastic Inference accelerator that was most recently used. |
aws.elasticinference.accelerator_total_inference_count | count | The number of inference requests that have arrived to the Elastic Inference accelerator in the most recent minute. |
aws.elasticinference.accelerator_successful_inference_count | count | The number of inference requests that were successful and made it to the Elastic Inference accelerator in the previous minute. |
aws.elasticinference.accelerator_inference_with_client_error_count | count | The number of inference requests that encountered a 4xx error in the last minute and made it to the Elastic Inference accelerator. |
aws.elasticinference.accelerator_inference_with_server_error_count | count | The number of inference requests that received a 5xx error and were sent to the Elastic Inference accelerator in the last minute. |
Updated Dec 6, 2022