AWS X-Ray Integration Documentation

AWS X-Ray Integration Documentation

1. Purpose of This Document

This document outlines the setup, configuration, and usage of AWS X-Ray for distributed tracing in an ECS-based microservices architecture. It is written from a Cloud Engineer perspective, where infrastructure is prepared in advance and application-level instrumentation is handled by development teams.


2. What is AWS X-Ray?

AWS X-Ray is a distributed tracing service that helps analyze, debug, and monitor applications by tracking requests as they travel through AWS services.

Key Capabilities

  • ๐Ÿงญ End-to-end request tracing across services

  • ๐Ÿ—บ Visual Service Maps showing service dependencies

  • ๐Ÿงช Detailed traces for latency and error analysis

  • ๐Ÿšจ Identification of faults, errors, and throttling

  • ๐Ÿ“Š Native integration with Amazon CloudWatch


3. When to Use AWS X-Ray

AWS X-Ray is especially useful when:

  • ๐Ÿงฉ Applications are built using microservices

  • ๐ŸŒ Services span multiple AWS services or regions

  • ๐Ÿข Latency issues require root-cause analysis

  • ❌ You need to pinpoint where errors occur in a request lifecycle


4. High-Level Architecture Overview

๐Ÿ‘ค Client Request → ๐ŸŽฏ Application Load Balancer (ALB) → ๐Ÿงฉ ECS Service (Application Container) → ๐Ÿ“ก X-Ray Daemon (Sidecar Container) → ☁ AWS X-Ray Service → ๐Ÿ—บ Service Maps & Traces


5. Cloud Engineer Responsibilities

As a Cloud Engineer, your responsibilities include:

  • ๐Ÿงฑ Provisioning and validating X-Ray infrastructure

  • ๐Ÿ“ฆ Adding the X-Ray daemon as a sidecar container

  • ๐Ÿ” Configuring required IAM permissions

  • ๐Ÿ”„ Ensuring compatibility with AWS managed services

  • ๐Ÿ“Š Integrating observability with CloudWatch


6. ECS X-Ray Setup (Infrastructure Side)

This section provides complete infrastructure-level steps for enabling AWS X-Ray on ECS services.

6.1 Prerequisites

  • ✅ ECS services must be running behind an Application Load Balancer

  • ✅ Tasks should have outbound internet access (or NAT access)

  • ✅ AWS X-Ray must be supported in the selected region

No account-level feature enablement is required.


6.2 X-Ray and Application Load Balancer

  • ❌ There is no manual “Enable X-Ray” option in the ALB configuration

  • ✅ ALB automatically supports X-Ray by injecting the X-Amzn-Trace-Id header

  • ๐Ÿ“Œ ALB segments appear only after backend services are instrumented

No ALB configuration changes are required.


6.3 ECS Task Definition – X-Ray Daemon (Sidecar)

For each ECS service (backend, worker, etc.):

  • ๐Ÿณ Add the X-Ray daemon as a sidecar container

  • ๐Ÿ“ฆ Container image: public.ecr.aws/xray/aws-xray-daemon:latest

  • ๐Ÿ”Œ Expose port 2000/UDP

  • ๐Ÿงฉ Use the same network mode as the application container

  • ⚙ Runs alongside the application container within the same task

Responsibilities of the X-Ray Daemon:

  • ๐Ÿ“ฅ Receives trace data from application containers

  • ๐Ÿ“ค Forwards trace data to AWS X-Ray

The daemon itself does not generate traces.


6.4 Environment Variables Configuration

Environment variables allow applications to communicate with the X-Ray daemon and identify the tracing context.

Where to Add

  • ECS Console → Task Definition → Container → Environment Variables

These variables must be added to both the application container and the X-Ray daemon container so that they can correctly communicate within the ECS task.

Required Variables

๐Ÿ”น Backend X-Ray daemon container name: helium-backend-xray

  • Environment variables must be added to both:

    • helium-backend (application container)

    • helium-backend-xray (X-Ray daemon container)

  • Key: AWS_XRAY_DAEMON_ADDRESS

  • Value: helium-backend-xray:2000

๐Ÿ”น Worker X-Ray daemon container name: helium-worker-xray

  • Environment variables must be added to both:

    • helium-worker (application container)

    • helium-worker-xray (X-Ray daemon container)

  • Key: AWS_XRAY_DAEMON_ADDRESS

  • Value: helium-worker-xray:2000

๐Ÿ”น Additional Notes

  • Port 2000/UDP must be exposed on the daemon container

  • Container names act as DNS hostnames inside the ECS task (awsvpc networking)


6.5 IAM Configuration (Task Role)

Attach the AWS managed policy AWSXrayWriteOnlyAccess to the ECS Task Role:

  • ✍ Allows sending trace segments and subsegments

  • ๐Ÿ“ก Enables communication with the X-Ray service

Task execution role does not require X-Ray permissions.


6.6 Deploy Updated ECS Services

  • ๐Ÿš€ Register the updated task definition

  • ๐Ÿ”„ Deploy to backend and worker ECS services

  • ๐Ÿ“Š Confirm X-Ray daemon container is RUNNING


6.7 Validate X-Ray Daemon Logs

Expected log messages:

  • โ„น Successful initialization

  • โ„น Region detection confirmation

  • Get instance id metadata failed warnings (expected in ECS/Fargate)


6.8 Expected State Before Application Instrumentation

  • ❌ No Service Map visible

  • ❌ No traces appear in X-Ray console

This is normal until application code is instrumented and traffic flows.


7. Developer Responsibilities

  • ๐Ÿงฉ Add AWS X-Ray SDKs (Java, Node.js, Python, etc.)

  • ๐Ÿงช Enable tracing in application code

  • ๐Ÿท Use meaningful service and subsegment names

Traces will only appear once application instrumentation is complete.


8. Understanding Service Maps

What is a Service Map?

  • ๐Ÿ”— Visual representation of connected services

  • ๐Ÿ”„ Request flow paths

  • ❌ Locations of errors or faults

  • ⏱ Latency between components

How Service Maps Are Generated

  • ๐Ÿ“ฅ Applications send trace data to X-Ray

  • ๐Ÿง  X-Ray automatically builds the map

  • ๐Ÿ”„ Map updates dynamically with traffic

No traffic means no Service Map.


9. Viewing Traces and Errors

  • ๐Ÿ” AWS Console → X-Ray → Traces

  • ๐Ÿ—บ AWS Console → X-Ray → Service Map

What You Can Analyze

  • ❌ Error and fault traces (4xx / 5xx)

  • ๐Ÿข Slow or degraded requests

  • ๐Ÿ” Latency breakdown by service

  • ๐Ÿ“ Exact failure points in the request flow


10. Common Issues and Observations

Daemon Running but No Traces

  • ❌ Application not instrumented

  • ❌ No incoming traffic

  • ❌ Viewing wrong AWS region

IMDS Errors in Logs

  • ⚠ Expected in ECS

  • ✅ Safe to ignore

  • ❌ Do not impact trace collection


11. Integration with Amazon CloudWatch

  • ๐Ÿ“Š X-Ray integrates natively with CloudWatch

  • ๐Ÿšจ Metrics can be used for alarms

  • ๐Ÿ” Logs, metrics, and traces can be correlated

CloudWatch provides metrics; X-Ray provides request-level visibility.


12. Best Practices

  • ✅ Always run X-Ray daemon as a sidecar

  • ๐Ÿท Use clear and consistent service names

  • ๐Ÿ“ˆ Combine X-Ray with CloudWatch alarms

  • ๐Ÿ”„ Enable tracing early in lower environments


13. X-Ray Daemon Container Image

Recommended Image

  • public.ecr.aws/xray/aws-xray-daemon:latest

  • ๐Ÿ› Official AWS-maintained image on AWS Public ECR

  • ๐Ÿ”„ Actively maintained and updated

  • ๐Ÿ” No Docker Hub pull rate limits

  • ๐Ÿงฉ Optimized for ECS, EKS, and Fargate workloads

Recommended for all new deployments.


14. Conclusion

AWS X-Ray provides deep visibility into distributed systems, showing where requests travel, how long they take, and where failures occur. By preparing infrastructure in advance, Cloud Engineers enable development teams to activate tracing seamlessly once application changes are implemented.

End of Document

Comments

Popular posts from this blog

Staging Deployment & CI/CD Pipeline Documentation

AWS Global Accelerator (GA) & Route 53 Integration Documentation

End-To-End-Documentation