AWS X-Ray
Distributed tracing — analyse and debug production applications by tracing requests as they travel through microservices, APIs, and AWS resources.
Overview
AWS X-Ray is a distributed tracing service that helps analyse and debug applications — it traces requests as they flow through microservices, identifies performance bottlenecks, and visualises the service dependency map.
Core Concepts
| Concept | Description |
|---|---|
| Trace | End-to-end record of a single request as it travels through all services |
| Segment | A named block representing work done by a single service (e.g., one Lambda function) |
| Subsegment | A more granular unit within a segment (e.g., an external HTTP call or DB query) |
| Trace ID | Unique identifier propagated across services to correlate segments into one trace |
| Service Map | Visual graph showing service dependencies, latency, and error rates |
| Annotation | Indexed key-value pair on a segment — used for filtering and searching traces |
| Metadata | Non-indexed key-value data on a segment — for debugging but not searchable |
| Sampling Rule | Controls the percentage of requests traced to manage cost and overhead |
| X-Ray Daemon | Background process that buffers segments and sends them to the X-Ray API |
| X-Ray SDK | Library added to application code to capture trace data automatically |
How X-Ray Works
Client Request
→ API Gateway (Segment A)
→ Lambda (Segment B)
→ DynamoDB (Subsegment B.1)
→ External API (Subsegment B.2)
→ SQS (Segment C)
→ EC2 Consumer (Segment D)
→ RDS (Subsegment D.1)
All segments share the same Trace ID → assembled into one TraceInstrumentation Flow
Application Code + X-Ray SDK
→ Capture segments/subsegments
→ X-Ray Daemon (UDP port 2000)
→ Batch send to X-Ray API
→ Service Map + Trace TimelineIntegration with AWS Services
| Service | Integration Method |
|---|---|
| Lambda | Enable active tracing in function config — no daemon needed |
| API Gateway | Enable tracing on stage — automatic segment creation |
| ECS / EKS | Run X-Ray daemon as a sidecar container |
| EC2 | Install and run X-Ray daemon; instrument app with SDK |
| Elastic Beanstalk | Enable X-Ray in environment configuration |
| App Runner | Built-in X-Ray integration |
| SNS / SQS | Automatic trace header propagation (active tracing) |
| Step Functions | Built-in tracing with state-level visibility |
Sampling Rules
| Parameter | Description | Default |
|---|---|---|
| Reservoir | Fixed number of requests traced per second | 1/s |
| Rate | Percentage of additional requests beyond the reservoir | 5% |
| Service name | Filter rule to specific services | * (all) |
| URL path | Filter rule to specific API paths | * (all) |
Custom sampling rules reduce cost and noise by tracing fewer routine requests while capturing all errors or specific endpoints.
Service Map
The service map provides a real-time visual topology of the application:
- Nodes — Each service or resource (Lambda, DynamoDB, external HTTP)
- Edges — Request flow between nodes with latency and error stats
- Colour coding — Green (healthy), yellow (errors), red (faults)
- Drill-down — Select a node to view traces, latency distribution, and error details
X-Ray vs CloudWatch
| Aspect | X-Ray | CloudWatch |
|---|---|---|
| Focus | Distributed tracing (request-level) | Metrics, logs, alarms (resource-level) |
| Question answered | "Where is the bottleneck in this request?" | "How is this resource performing?" |
| Granularity | Per-request, per-service | Aggregate metrics over time |
| Visualisation | Service map + trace timeline | Dashboards + metric graphs |
| Complementary use | Debug specific slow requests | Monitor overall health and set alarms |
Common Use Cases
- Latency analysis — Identify which downstream service or database query is causing slow response times.
- Error root cause — Trace a failed request through multiple microservices to find the exact failing component.
- Dependency mapping — Visualise all service-to-service interactions in a microservice architecture.
- Performance baseline — Establish normal latency distributions and detect regressions.
- Cold start impact — Measure Lambda cold start duration as a distinct subsegment in the trace.
SAA/SAP Exam Tips
SAA Tip: "Debug latency in a microservice application" or "trace requests across services" → AWS X-Ray. CloudWatch is for metrics/logs; X-Ray is for distributed tracing.
SAP Tip: X-Ray sampling rules control cost — the default is 1 req/s + 5% of additional requests. Adjust sampling for high-traffic services to avoid excessive tracing costs.
Cross-Cloud Equivalents
| Provider | Service / Solution | Notes |
|---|---|---|
| AWS | AWS X-Ray | Baseline |
| Azure | Azure Application Insights (distributed tracing) | Part of Azure Monitor; richer APM features |
| GCP | Google Cloud Trace | Distributed tracing with latency analysis |
| On-Premises | Jaeger, Zipkin, Datadog APM, New Relic | Open-source or SaaS tracing platforms |
Pricing Model
| Dimension | Unit | Notes |
|---|---|---|
| Traces recorded | Per million traces | First 100,000 traces/month free |
| Traces retrieved | Per million traces | First 1 M retrievals/month free |
| Traces scanned | Per million traces | For trace summary and analytics queries |
| X-Ray Insights | Per Insight generated | Automated anomaly detection (additional charge) |
Related Services / See Also
- Amazon CloudWatch — metrics, logs, and alarms (complementary to X-Ray tracing)
- AWS Lambda — enable active tracing for automatic X-Ray instrumentation
- Amazon API Gateway — enable X-Ray tracing at the stage level
- Amazon ECS and EKS — run X-Ray daemon as a sidecar container
AWS Config
Configuration compliance — continuously record, assess, and audit AWS resource configurations against desired rules and conformance packs.
Amazon CloudFront
Global CDN — edge caching, origins, cache behaviours, Origin Access Control, Lambda@Edge, and CloudFront Functions.