24 min read

AWS Observability Stack: Complete Setup Guide with CloudWatch, X-Ray, and OpenTelemetry

Build a comprehensive observability platform on AWS using CloudWatch for logs and metrics, X-Ray for distributed tracing, Container Insights for EKS, and ADOT for OpenTelemetry integration. Includes complete Terraform examples.

AWS Observability Architecture

AWS-native observability with CloudWatch and X-Ray

Key Takeaways

  • CloudWatch provides unified logs, metrics, and alarms for AWS workloads
  • X-Ray enables distributed tracing with service maps and trace analysis
  • Container Insights provides deep visibility into EKS cluster performance
  • ADOT enables vendor-neutral instrumentation with AWS backend integration
  • Terraform IaC ensures reproducible, version-controlled observability setup

AWS Observability Services

CloudWatch Logs

Centralised log collection, storage, and analysis

  • Log Insights queries
  • Metric filters
  • Subscription filters
  • Cross-account

CloudWatch Metrics

Time-series metrics with dashboards and alarms

  • Custom metrics
  • Metric Math
  • Anomaly detection
  • Contributor Insights

AWS X-Ray

Distributed tracing for request flow analysis

  • Service maps
  • Trace analysis
  • Annotations
  • Sampling rules

Container Insights

EKS/ECS performance monitoring

  • Pod metrics
  • Node metrics
  • Cluster dashboard
  • Log correlation

ADOT (AWS Distro for OpenTelemetry)

OpenTelemetry distribution for AWS

  • Vendor-neutral
  • X-Ray integration
  • CloudWatch integration
  • Multi-backend

AWS Observability Overview

AWS provides a comprehensive suite of observability tools that integrate natively with AWS services. CloudWatch serves as the foundation for logs, metrics, and alarms, while X-Ray provides distributed tracing capabilities.

When to Use AWS-Native vs Third-Party

  • AWS-Native: Deep AWS integration, pay-as-you-go, no additional infrastructure
  • Third-Party: Multi-cloud support, advanced features, unified billing

Cost Consideration

AWS observability costs can escalate quickly with high log volumes. Plan retention policies and sampling strategies from the start.

CloudWatch Fundamentals

CloudWatch Logs Setup

CloudWatch Logs collects and stores log data from AWS services and applications. Logs are organised into log groups and log streams.

HCL
# Terraform: CloudWatch Log Group with retention# Log stream for specific instance/container# Log stream for specific instance/container# Log stream for specific instance/container# Log stream for specific instance/container# Log stream for specific instance/containerfic instance/container
resource "aws_cloudwatch_log_stream" "app_stream" {
  name           = "container-1"
  log_group_name = aws_cloudwatch_log_group.app_logs.name
}

CloudWatch Agent Installation

BASH
# Install CloudWatch Agent on EC2# Configure agent# Configure agent# Configure agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agent# Start agenttart agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
  -a fetch-config \
  -m ec2 \
  -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json \
  -s

CloudWatch Logs Insights

SQL
# Find errors in the last hour# Count errors by service# Count errors by service# Count errors by service# Count errors by service# Count errors by service# P99 latency from structured logs# P99 latency from structured logs# P99 latency from structured logs# P99 latency from structured logs# P99 latency from structured logs# Find slow requests# Find slow requests# Find slow requests# Find slow requests# Find slow requests# Find slow requests# Find slow requests# Find slow requests# Find slow requests# Find slow requests# Find slow requeststs
fields @timestamp, @message, duration
| filter duration > 1000
| sort duration desc
| limit 50

Custom Metrics

TYPESCRIPT
// Node.js: Publishing custom metrics// Publish order value metric// Publish order value metric// Publish order value metric// Publish order value metric// Publish order value metric// Publish order value metric// Publish order value metric// Publish order value metric// Publish order value metric// Publish order value metric// Publish order value metric// Publish order value metric// Publish order value metric// Publish order value metric// Publish order value metric// Publish order value metric/ Publish order value metric
await publishMetric(
  'OrderValue',
  99.99,
  'None',
  [{ name: 'Environment', value: 'production' }]
);

AWS X-Ray for Distributed Tracing

X-Ray traces requests as they travel through your application, providing service maps, trace analysis, and performance insights.

X-Ray Concepts

  • Segments: Work done by a service for a request
  • Subsegments: Granular timing for specific operations
  • Annotations: Indexed key-value pairs for filtering
  • Metadata: Non-indexed additional data

X-Ray SDK Integration (Node.js)

JAVASCRIPT
// X-Ray SDK setup// Capture all AWS SDK calls// Capture all AWS SDK calls// Capture all AWS SDK calls// Capture all AWS SDK calls// Capture HTTP requests// Capture HTTP requests// Capture HTTP requests// Capture HTTP requests// Middleware to create segments// Middleware to create segments// Middleware to create segments Middleware to create segments
app.use(AWSXRay.express.openSegment('MyApp'));

app.get('/api/orders/:id', async (req, res) => {
  const segment = AWSXRay.getSegment();
  
  // Add annotation (indexed, searchable)
  segment.addAnnotation('orderId', req.params.id);
  
  // Add metadata (not indexed)
  segment.addMetadata('request', { headers: req.headers });
  
  // Create subsegment for database call
  const subsegment = segment.addNewSubsegment('DynamoDB-GetOrder');
  try {
    const order = await getOrderFromDynamoDB(req.params.id);
    subsegment.close();
    res.json(order);
  } catch (error) {
    subsegment.addError(error);
    subsegment.close();
    throw error;
  }
});

app.use(AWSXRay.express.closeSegment());

Sampling Rules

HCL
# Terraform: X-Ray sampling rule# Higher sampling for errors# Higher sampling for errors# Higher sampling for errors# Higher sampling for errors# Higher sampling for errors# Higher sampling for errors# Higher sampling for errors# Higher sampling for errors# Higher sampling for errors# Higher sampling for errors# Higher sampling for errors# Higher sampling for errors# Higher sampling for errors# Higher sampling for errorsigher sampling for errors
resource "aws_xray_sampling_rule" "error_sampling" {
  rule_name      = "error-requests"
  priority       = 100     # Higher priority
  version        = 1
  reservoir_size = 50
  fixed_rate     = 1.0     # Sample all errors
  url_path       = "*"
  host           = "*"
  http_method    = "*"
  service_type   = "*"
  service_name   = "*"
  resource_arn   = "*"
  
  attributes = {
    "http.status_code" = "5*"
  }
}

Container Insights for EKS

Container Insights provides performance monitoring for EKS clusters, collecting metrics at the cluster, node, pod, and container level.

Enable Container Insights

BASH
# Enable Container Insights on EKS# Deploy CloudWatch agent as DaemonSet# Deploy CloudWatch agent as DaemonSet# Deploy CloudWatch agent as DaemonSet# Deploy CloudWatch agent as DaemonSet# Deploy CloudWatch agent as DaemonSetloudWatch agent as DaemonSet
kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluent-bit-quickstart.yaml

Fluent Bit Configuration for EKS

YAML
# fluent-bit-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: amazon-cloudwatch
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush                     5
        Log_Level                 info
        Daemon                    off
        HTTP_Server               On
        HTTP_Listen               0.0.0.0
        HTTP_Port                 2020

    [INPUT]
        Name                      tail
        Tag                       application.*
        Path                      /var/log/containers/*.log
        Parser                    docker
        DB                        /var/fluent-bit/state/flb_container.db
        Mem_Buf_Limit             50MB
        Skip_Long_Lines           On
        Refresh_Interval          10

    [FILTER]
        Name                      kubernetes
        Match                     application.*
        Kube_URL                  https://kubernetes.default.svc:443
        Kube_Tag_Prefix           application.var.log.containers.
        Merge_Log                 On
        K8S-Logging.Parser        On
        K8S-Logging.Exclude       Off

    [OUTPUT]
        Name                      cloudwatch_logs
        Match                     application.*
        region                    eu-west-1
        log_group_name            /aws/eks/my-cluster/application
        log_stream_prefix         ${HOSTNAME}-
        auto_create_group         true

AWS Distro for OpenTelemetry (ADOT)

ADOT is AWS's distribution of OpenTelemetry, providing vendor-neutral instrumentation that integrates with X-Ray and CloudWatch.

ADOT Collector Configuration

YAML
# adot-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 30s
    send_batch_size: 8192
  
  resource:
    attributes:
      - key: cloud.provider
        value: aws
        action: upsert

exporters:
  awsxray:
    region: eu-west-1
    
  awsemf:
    region: eu-west-1
    namespace: MyApp
    log_group_name: '/aws/otel/metrics'
    dimension_rollup_option: NoDimensionRollup

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, resource]
      exporters: [awsxray]
    metrics:
      receivers: [otlp]
      processors: [batch, resource]
      exporters: [awsemf]

Deploy ADOT on EKS

YAML
# Deploy ADOT Collector as DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: adot-collector
  namespace: observability
spec:
  selector:
    matchLabels:
      app: adot-collector
  template:
    metadata:
      labels:
        app: adot-collector
    spec:
      serviceAccountName: adot-collector
      containers:
        - name: collector
          image: amazon/aws-otel-collector:latest
          args:
            - --config=/etc/otel/config.yaml
          env:
            - name: AWS_REGION
              value: eu-west-1
          resources:
            limits:
              cpu: 500m
              memory: 512Mi
            requests:
              cpu: 200m
              memory: 256Mi
          volumeMounts:
            - name: config
              mountPath: /etc/otel
      volumes:
        - name: config
          configMap:
            name: adot-config

Infrastructure as Code Setup

Complete Terraform Module

HCL
# main.tf - AWS Observability Infrastructure# CloudWatch Log Groups# SNS Topic for Alerts# SNS Topic for Alerts# SNS Topic for Alerts# SNS Topic for Alerts# SNS Topic for Alerts# SNS Topic for Alerts# SNS Topic for Alerts# SNS Topic for Alerts# SNS Topic for Alerts# SNS Topic for Alerts# CloudWatch Alarms# CloudWatch Alarms# CloudWatch Alarms# CloudWatch Alarms# CloudWatch Alarms# CloudWatch Alarms# CloudWatch Alarms# CloudWatch Alarms# CloudWatch Alarms# CloudWatch Alarms# CloudWatch Alarms# CloudWatch Alarms# CloudWatch Alarms# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# IAM Role for X-Ray# CloudWatch Dashboard# CloudWatch Dashboard# CloudWatch Dashboard# CloudWatch Dashboard# CloudWatch Dashboard# CloudWatch Dashboard# CloudWatch Dashboard# CloudWatch Dashboard# CloudWatch Dashboard# CloudWatch Dashboard# CloudWatch Dashboard# CloudWatch Dashboard# CloudWatch Dashboard# CloudWatch Dashboard# CloudWatch Dashboard# CloudWatch Dashboard# CloudWatch Dashboard# CloudWatch Dashboard# CloudWatch Dashboard# CloudWatch Dashboard# CloudWatch Dashboard# CloudWatch Dashboard
resource "aws_cloudwatch_dashboard" "main" {
  dashboard_name = "${var.service_name}-dashboard"
  
  dashboard_body = jsonencode({
    widgets = [
      {
        type   = "metric"
        x      = 0
        y      = 0
        width  = 12
        height = 6
        properties = {
          metrics = [
            ["AWS/ApiGateway", "Count", "ApiName", var.api_name],
            [".", "5XXError", ".", "."],
            [".", "4XXError", ".", "."]
          ]
          period = 300
          stat   = "Sum"
          region = var.region
          title  = "API Requests"
        }
      },
      {
        type   = "metric"
        x      = 12
        y      = 0
        width  = 12
        height = 6
        properties = {
          metrics = [
            ["AWS/ApiGateway", "Latency", "ApiName", var.api_name, { stat = "p50" }],
            ["...", { stat = "p95" }],
            ["...", { stat = "p99" }]
          ]
          period = 300
          region = var.region
          title  = "API Latency"
        }
      }
    ]
  })
}

CloudWatch Dashboards

Dashboard Best Practices

  • Summary view: Key metrics at a glance (error rates, latency, throughput)
  • Service view: Per-service metrics and health
  • Infrastructure view: EC2, RDS, Lambda resource utilisation

Dashboard Tip

Use anomaly detection bands on dashboards to quickly spot unusual behaviour. CloudWatch can automatically calculate expected ranges.

Alerting and Incident Response

Composite Alarms

HCL
# Composite alarm: Alert only when multiple conditions are met
resource "aws_cloudwatch_composite_alarm" "service_degraded" {
  alarm_name = "${var.service_name}-service-degraded"
  
  alarm_rule = "ALARM(\"${aws_cloudwatch_metric_alarm.high_error_rate.alarm_name}\") AND ALARM(\"${aws_cloudwatch_metric_alarm.high_latency.alarm_name}\")"
  
  alarm_actions = [aws_sns_topic.alerts.arn]
  
  alarm_description = "Service is degraded: both error rate and latency are high"
}

EventBridge Integration

HCL
# Trigger automation on alarm state change
resource "aws_cloudwatch_event_rule" "alarm_trigger" {
  name        = "alarm-state-change"
  description = "Trigger on CloudWatch alarm state changes"
  
  event_pattern = jsonencode({
    source      = ["aws.cloudwatch"]
    detail-type = ["CloudWatch Alarm State Change"]
    detail = {
      alarmName = [{ prefix = var.service_name }]
      state     = { value = ["ALARM"] }
    }
  })
}

resource "aws_cloudwatch_event_target" "lambda" {
  rule      = aws_cloudwatch_event_rule.alarm_trigger.name
  target_id = "incident-response"
  arn       = aws_lambda_function.incident_response.arn
}

Cost Optimisation

CloudWatch Pricing Breakdown

  • Logs: £0.50/GB ingested + £0.03/GB stored (after first 5GB)
  • Metrics: £0.30 per metric/month (first 10k metrics)
  • Dashboards: £3.00/dashboard/month
  • Alarms: £0.10/alarm/month (standard resolution)

Cost Reduction Strategies

  • Set appropriate log retention periods
  • Use log filters to reduce stored volume
  • Archive old logs to S3 via subscription filters
  • Use X-Ray sampling to reduce trace costs
  • Choose standard resolution (1 minute) vs high resolution (1 second) metrics
HCL
# Archive logs to S3 via subscription filter
resource "aws_cloudwatch_log_subscription_filter" "archive" {
  name            = "archive-to-s3"
  log_group_name  = aws_cloudwatch_log_group.app.name
  filter_pattern  = ""
  destination_arn = aws_kinesis_firehose_delivery_stream.logs.arn
  role_arn        = aws_iam_role.cloudwatch_to_firehose.arn
}

resource "aws_kinesis_firehose_delivery_stream" "logs" {
  name        = "logs-archive"
  destination = "extended_s3"

  extended_s3_configuration {
    role_arn           = aws_iam_role.firehose.arn
    bucket_arn         = aws_s3_bucket.logs_archive.arn
    prefix             = "logs/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/"
    error_output_prefix = "errors/"
    buffering_size     = 128
    buffering_interval = 300
    compression_format = "GZIP"
  }
}

Troubleshooting

Common issues and solutions when setting up AWS observability.

CloudWatch Agent Not Sending Metrics

Symptom: Custom metrics not appearing in CloudWatch despite agent running.

Common causes:

  • IAM role missing CloudWatch permissions
  • Agent configuration file syntax errors
  • Wrong region configured
  • Metric namespace or dimensions exceeding limits

Solution:

# Check CloudWatch agent status
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
  -a status

# View agent logs for errors
sudo tail -f /opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log

# Verify IAM permissions (attach CloudWatchAgentServerPolicy)
aws iam list-attached-role-policies --role-name EC2-CloudWatch-Role

# Validate configuration file
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
  -a fetch-config -c file:/opt/aws/amazon-cloudwatch-agent/etc/config.json -s

X-Ray Traces Missing or Incomplete

Symptom: Some services appear in traces but others are missing, or traces are incomplete.

Common causes:

  • X-Ray daemon not running or unreachable
  • Sampling rate too low
  • Missing instrumentation in downstream services
  • VPC endpoints not configured for private subnets

Solution:

# Check X-Ray daemon status
sudo systemctl status xray

# Verify daemon can reach X-Ray API
curl -X POST http://127.0.0.1:2000/GetSamplingRules

# Configure higher sampling rate for debugging
{
  "version": 2,
  "rules": [
    {
      "description": "Debug sampling",
      "service_name": "*",
      "http_method": "*",
      "url_path": "*",
      "fixed_target": 10,
      "rate": 1.0
    }
  ]
}

# Ensure trace context propagation headers are forwarded
# X-Amzn-Trace-Id header must be passed between services

Log Group Reaching Retention or Storage Limits

Error: “ResourceLimitExceededException” or unexpectedly high CloudWatch costs.

Common causes:

  • No retention policy set (logs kept indefinitely)
  • Excessive debug logging in production
  • Log events exceeding size limits
  • Too many unique log streams

Solution:

# Set retention policy on log groups
aws logs put-retention-policy \
  --log-group-name /aws/lambda/my-function \
  --retention-in-days 30

# List log groups without retention (potential cost issue)
aws logs describe-log-groups \
  --query 'logGroups[?!retentionInDays].logGroupName'

# Export old logs to S3 for cheaper storage
aws logs create-export-task \
  --log-group-name /aws/ecs/my-service \
  --from 1609459200000 --to 1612137600000 \
  --destination my-log-archive-bucket \
  --destination-prefix exports/

Container Insights Not Showing EKS Metrics

Symptom: Container Insights enabled but no metrics appearing for pods or nodes.

Common causes:

  • CloudWatch agent DaemonSet not deployed
  • IRSA not configured for the agent ServiceAccount
  • Fluent Bit not forwarding logs
  • Cluster name mismatch in configuration

Solution:

# Verify Container Insights addon is enabled
aws eks describe-addon --cluster-name my-cluster \
  --addon-name amazon-cloudwatch-observability

# Check CloudWatch agent pods are running
kubectl get pods -n amazon-cloudwatch -l app=cloudwatch-agent

# Verify IRSA ServiceAccount annotation
kubectl describe sa cloudwatch-agent -n amazon-cloudwatch

# Check agent logs
kubectl logs -n amazon-cloudwatch -l app=cloudwatch-agent --tail=50

CloudWatch Alarms Not Triggering

Symptom: Alarm stays in OK or INSUFFICIENT_DATA despite threshold being breached.

Common causes:

  • Metric dimensions don't match alarm configuration
  • Evaluation period longer than expected
  • Missing datapoints treated as “not breaching”
  • Alarm in wrong region

Solution:

# Check alarm configuration and state
aws cloudwatch describe-alarms --alarm-names MyAlarm

# Verify metric data exists with exact dimensions
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --start-time 2024-01-01T00:00:00Z \
  --end-time 2024-01-01T01:00:00Z \
  --period 300 \
  --statistics Average

# Set treat-missing-data appropriately
aws cloudwatch put-metric-alarm \
  --alarm-name MyAlarm \
  --treat-missing-data breaching

Conclusion

AWS provides a comprehensive observability stack that integrates seamlessly with AWS services. CloudWatch serves as the central hub for logs, metrics, and alarms, while X-Ray provides distributed tracing capabilities essential for debugging microservices.

For organisations standardising on OpenTelemetry, ADOT provides a path to vendor-neutral instrumentation while still leveraging AWS-native backends. Container Insights extends observability into EKS workloads with minimal configuration.

Start with the basics: logs, metrics, and alarms, then progressively add tracing and custom metrics as your observability maturity grows. Use Infrastructure as Code to ensure your observability setup is reproducible and version-controlled.

Frequently Asked Questions

AWS CloudWatch is a monitoring and observability service that provides data and actionable insights for AWS resources and applications. It collects and tracks metrics, collects and monitors log files, sets alarms, and automatically reacts to changes in your AWS resources. CloudWatch serves as the central hub for logs, metrics, and alarms in AWS observability.
AWS X-Ray is a distributed tracing service that helps developers analyse and debug production applications. It traces requests as they travel through your application, providing service maps, trace analysis, and performance insights. X-Ray uses concepts like segments (work done by a service), subsegments (granular timing), annotations (indexed key-value pairs), and metadata to capture detailed request information.
ADOT (AWS Distro for OpenTelemetry) is AWS's distribution of OpenTelemetry, providing vendor-neutral instrumentation that integrates with AWS-native backends like X-Ray and CloudWatch. It allows you to collect traces and metrics using OpenTelemetry protocols (OTLP) and export them to AWS services, enabling standardised instrumentation while leveraging AWS's observability infrastructure.
To set up Container Insights for EKS, first enable cluster logging via AWS CLI, then deploy the CloudWatch agent as a DaemonSet using the quickstart manifest. You'll also need to configure Fluent Bit for log collection. The agent collects metrics at the cluster, node, pod, and container level, providing deep visibility into EKS workloads. Ensure IRSA (IAM Roles for Service Accounts) is configured for the agent ServiceAccount.
To reduce CloudWatch costs: set appropriate log retention periods (logs without retention are kept indefinitely), use log filters to reduce stored volume, archive old logs to S3 via subscription filters with Kinesis Firehose, use X-Ray sampling rules to reduce trace volume, and choose standard resolution (1 minute) metrics instead of high resolution (1 second) where appropriate. Also review log groups without retention policies as they can be a major cost driver.
CloudWatch Logs handles centralised log collection, storage, and analysis with features like Log Insights queries, metric filters, and subscription filters. CloudWatch Metrics provides time-series data with dashboards and alarms, supporting custom metrics, Metric Math, anomaly detection, and Contributor Insights. Logs capture detailed event data while metrics track numerical measurements over time. You can create metric filters to extract metrics from log data.

References & Further Reading

Related Articles

Ayodele Ajayi

Principal Engineer based in Kent, UK. Specialising in cloud infrastructure, DevSecOps, and platform engineering. Passionate about building observable, reliable systems and sharing knowledge through technical writing.