Back to articles
Monitoring AWS Batch Jobs with CloudWatch Custom Metrics
How-ToDevOps

Monitoring AWS Batch Jobs with CloudWatch Custom Metrics

via Dev.toVibhuti Sharma

AWS Batch service is used for various compute workloads like data processing pipelines, background jobs and scheduled compute tasks. AWS provides many infrastructure-level metrics for Batch in CloudWatch, however there is a significant gap when it comes to job status monitoring. For example, the number of jobs that are RUNNABLE, RUNNING, FAILED, or SUCCEEDED are not available by default in CloudWatch. These metrics are visible on the AWS Batch dashboard but it does not exist in CloudWatch as a metric. This makes it difficult to answer operational questions such as: Are jobs accumulating in a RUNNABLE state? Are the jobs failing frequently? Is the system keeping up with workload? Without these metrics, building meaningful dashboards or alerts for Batch workloads becomes challenging. In this blog post, we can understand how to close this observability gap by exporting custom AWS Batch job status metrics into CloudWatch, which can then be consumed by any third party observability tool. Th

Continue reading on Dev.to

Opens in a new tab

Read Full Article
2 views

Related Articles