✓ Verified 💻 Development ✓ Enhanced Data

Aws Ecs Monitor

AWS ECS production health monitoring with CloudWatch.

Rating
4.1 (21 reviews)
Downloads
42,272 downloads
Version
1.0.0

Overview

AWS ECS production health monitoring with CloudWatch.

Key Features

1

Health Checks: HTTP probes against your domain, ECS service status (desired vs running), ALB target group health, SSL certificate expiry

2

Log Analysis: Pulls CloudWatch logs, categorizes errors (panics, fatals, OOM, timeouts, 5xx), detects container restarts, filters health check noise

3

Auto-Diagnosis: Reads health status and automatically investigates failing services via log analysis

Complete Documentation

View Source →

AWS ECS Monitor

Production health monitoring and log analysis for AWS ECS services.

What It Does

  • Health Checks: HTTP probes against your domain, ECS service status (desired vs running), ALB target group health, SSL certificate expiry
  • Log Analysis: Pulls CloudWatch logs, categorizes errors (panics, fatals, OOM, timeouts, 5xx), detects container restarts, filters health check noise
  • Auto-Diagnosis: Reads health status and automatically investigates failing services via log analysis

Prerequisites

  • aws CLI configured with appropriate IAM permissions:
  • ecs:ListServices, ecs:DescribeServices
  • elasticloadbalancing:DescribeTargetGroups, elasticloadbalancing:DescribeTargetHealth
  • logs:FilterLogEvents, logs:DescribeLogGroups
  • curl for HTTP health checks
  • python3 for JSON processing and log analysis
  • openssl for SSL certificate checks (optional)

Configuration

All configuration is via environment variables:

VariableRequiredDefaultDescription
ECS_CLUSTERYesECS cluster name
ECS_REGIONNous-east-1AWS region
ECS_DOMAINNoDomain for HTTP/SSL checks (skip if unset)
ECS_SERVICESNoauto-detectComma-separated service names to monitor
ECS_HEALTH_STATENo./data/ecs-health.jsonPath to write health state JSON
ECS_HEALTH_OUTDIRNo./data/Output directory for logs and alerts
ECS_LOG_PATTERNNo/ecs/{service}CloudWatch log group pattern ({service} is replaced)
ECS_HTTP_ENDPOINTSNoComma-separated name=url pairs for HTTP probes

Directories Written

  • ECS_HEALTH_STATE (default: ./data/ecs-health.json) — Health state JSON file
  • ECS_HEALTH_OUTDIR (default: ./data/) — Output directory for logs, alerts, and analysis reports

Scripts

scripts/ecs-health.sh — Health Monitor

bash
# Full check
ECS_CLUSTER=my-cluster ECS_DOMAIN=example.com ./scripts/ecs-health.sh

# JSON output only
ECS_CLUSTER=my-cluster ./scripts/ecs-health.sh --json

# Quiet mode (no alerts, just status file)
ECS_CLUSTER=my-cluster ./scripts/ecs-health.sh --quiet

Exit codes: 0 = healthy, 1 = unhealthy/degraded, 2 = script error

scripts/cloudwatch-logs.sh — Log Analyzer

bash
# Pull raw logs from a service
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh pull my-api --minutes 30

# Show errors across all services
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh errors all --minutes 120

# Deep analysis with error categorization
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh diagnose --minutes 60

# Detect container restarts
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh restarts my-api

# Auto-diagnose from health state file
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh auto-diagnose

# Summary across all services
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh summary --minutes 120

Options: --minutes N (default: 60), --json, --limit N (default: 200), --verbose

Auto-Detection

When ECS_SERVICES is not set, both scripts auto-detect services from the cluster:

bash
aws ecs list-services --cluster $ECS_CLUSTER

Log groups are resolved by pattern (default /ecs/{service}). Override with ECS_LOG_PATTERN:

bash
# If your log groups are /ecs/prod/my-api, /ecs/prod/my-frontend, etc.
ECS_LOG_PATTERN="/ecs/prod/{service}" ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh diagnose

Integration

The health monitor can trigger the log analyzer for auto-diagnosis when issues are detected. Set ECS_HEALTH_OUTDIR to a shared directory and run both scripts together:

bash
export ECS_CLUSTER=my-cluster
export ECS_DOMAIN=example.com
export ECS_HEALTH_OUTDIR=./data

# Run health check (auto-triggers log analysis on failure)
./scripts/ecs-health.sh

# Or run log analysis independently
./scripts/cloudwatch-logs.sh auto-diagnose --minutes 30

Error Categories

The log analyzer classifies errors into:

  • panic — Go panics
  • fatal — Fatal errors
  • oom — Out of memory
  • timeout — Connection/request timeouts
  • connection_error — Connection refused/reset
  • http_5xx — HTTP 500-level responses
  • python_traceback — Python tracebacks
  • exception — Generic exceptions
  • auth_error — Permission/authorization failures
  • structured_error — JSON-structured error logs
  • error — Generic ERROR-level messages
Health check noise (GET/HEAD /health from ALB) is automatically filtered from error counts and HTTP status distribution.

Installation

Terminal bash

openclaw install aws-ecs-monitor
    
Copied!

💻Code Examples

ECS_CLUSTER=my-cluster ./scripts/ecs-health.sh --quiet

ecsclustermy-cluster-scriptsecs-healthsh---quiet.txt
Exit codes: `0` = healthy, `1` = unhealthy/degraded, `2` = script error

### `scripts/cloudwatch-logs.sh` — Log Analyzer

ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh summary --minutes 120

ecsclustermy-cluster-scriptscloudwatch-logssh-summary---minutes-120.txt
Options: `--minutes N` (default: 60), `--json`, `--limit N` (default: 200), `--verbose`

## Auto-Detection

When `ECS_SERVICES` is not set, both scripts auto-detect services from the cluster:

ECS_LOG_PATTERN="/ecs/prod/{service}" ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh diagnose

ecslogpatternecsprodservice-ecsclustermy-cluster-scriptscloudwatch-logssh-diagnose.txt
## Integration

The health monitor can trigger the log analyzer for auto-diagnosis when issues are detected. Set `ECS_HEALTH_OUTDIR` to a shared directory and run both scripts together:
example.sh
# Full check
ECS_CLUSTER=my-cluster ECS_DOMAIN=example.com ./scripts/ecs-health.sh

# JSON output only
ECS_CLUSTER=my-cluster ./scripts/ecs-health.sh --json

# Quiet mode (no alerts, just status file)
ECS_CLUSTER=my-cluster ./scripts/ecs-health.sh --quiet
example.sh
# Pull raw logs from a service
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh pull my-api --minutes 30

# Show errors across all services
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh errors all --minutes 120

# Deep analysis with error categorization
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh diagnose --minutes 60

# Detect container restarts
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh restarts my-api

# Auto-diagnose from health state file
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh auto-diagnose

# Summary across all services
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh summary --minutes 120
example.sh
export ECS_CLUSTER=my-cluster
export ECS_DOMAIN=example.com
export ECS_HEALTH_OUTDIR=./data

# Run health check (auto-triggers log analysis on failure)
./scripts/ecs-health.sh

# Or run log analysis independently
./scripts/cloudwatch-logs.sh auto-diagnose --minutes 30

⚙️Configuration Options

Option Type Default Description
ECS_CLUSTERstring**Yes**
ECS_REGIONstringNo`us-east-1`
ECS_DOMAINstringNo
ECS_SERVICESstringNoauto-detect
ECS_HEALTH_STATEstringNo`./data/ecs-health.json`
ECS_HEALTH_OUTDIRstringNo`./data/`
ECS_LOG_PATTERNstringNo`/ecs/{service}`
ECS_HTTP_ENDPOINTSstringNo

Tags

#devops_and-cloud #monitoring

Quick Info

Category Development
Model Claude 3.5
Complexity One-Click
Author briancolinger
Last Updated 3/10/2026
🚀
Optimized for
Claude 3.5
🧠

Ready to Install?

Get started with this skill in seconds

openclaw install aws-ecs-monitor