Amazon Timestream
Amazon Timestream — AWS's purpose-built serverless time-series database. Covers time-series data model, storage tiers, query engine, and cross-cloud equivalents.
Overview
Amazon Timestream is AWS's fully managed, serverless time-series database — optimized for storing and querying data where time is the primary axis, such as IoT sensor readings, application metrics, server telemetry, and financial tick data.
What is Time-Series Data?
Time-series data is a sequence of values recorded at successive points in time. Every record has three components:
[ timestamp ] [ dimension(s) ] [ measure(s) ]
10:00:01 server="web-01" cpu=72.4
10:00:01 server="web-02" cpu=45.1
10:00:02 server="web-01" cpu=73.1| Component | Description | Example |
|---|---|---|
| Timestamp | When the data point was recorded | 2024-01-15 10:00:01.000 |
| Dimension | Metadata that identifies the source — does not change frequently | server="web-01", region="us-east-1" |
| Measure | The actual value being recorded — changes every interval | cpu=72.4, temperature=21.3 |
Why not just use a regular relational database?
A general-purpose database can store time-series data, but time-series workloads have characteristics that make it inefficient:
| Characteristic | Impact on a regular DB | Timestream's approach |
|---|---|---|
| Write volume is extremely high (millions of points/second) | Table grows unboundedly; INSERT performance degrades | Append-only ingestion; no indexes on writes |
| Queries almost always filter by time range | Full table scans or manual partitioning by date needed | Time is a native, first-class query dimension |
| Recent data is hot; old data is cold | Developer manages partitioning + archival manually | Automatic two-tier storage (memory → S3) |
| Data is naturally ordered by time | B-tree indexes waste space on monotonically increasing timestamps | Columnar storage optimized for time ordering |
Architecture
Two-Tier Automatic Storage
Timestream automatically manages a two-tier storage model — retention policies are configured and data moves between tiers automatically:
Write Memory Store Magnetic Store
│ recent, hot data │ older, cold data │
└──────▶ (in-memory) │ ──auto-moves after TTL──▶ (S3-backed columnar)
ms latency │ ms–seconds latency
hours to days │ months to years| Tier | Storage Type | Query Latency | Typical Retention | Cost |
|---|---|---|---|---|
| Memory Store | In-memory | Milliseconds | Hours to days | Higher |
| Magnetic Store | S3-backed columnar | Milliseconds to seconds | Months to years | Much lower |
A memory store retention (e.g. 24 hours) and a magnetic store retention (e.g. 1 year) are configured per table. Data older than the memory retention threshold is automatically moved to magnetic storage. No manual partitioning or archival jobs needed.
SAA/SAP Tip: The two-tier model is the key exam differentiator for Timestream. Recent data stays fast and expensive (memory); historical data becomes cheap and slightly slower (magnetic). This maps directly to the concept of hot/warm/cold storage tiering.
Serverless Scaling
- No cluster to provision — Timestream automatically scales read and write throughput
- Billing is based on data written, data stored (per tier), and data queried
- No capacity planning required
Data Model
Timestream organizes data into databases → tables. Each table stores time-series records.
A record must have:
- A timestamp (nanosecond precision)
- One or more dimensions (string key-value pairs identifying the source)
- One or more measures (the numeric or string values being tracked)
Database: "infrastructure"
Table: "server_metrics"
Record:
time: 2024-01-15 10:00:01.000000000
dimensions: [server="web-01", region="us-east-1", az="us-east-1a"]
measures: [cpu_utilization=72.4, memory_used_gb=6.2, disk_io_ops=1240]Dimensions are automatically indexed. Measures are stored as columnar data optimized for range queries over time.
Query Engine
Timestream uses a purpose-built SQL dialect with time-series specific functions that would be complex to write in standard SQL:
-- Average CPU per server over the last hour, in 5-minute buckets
SELECT server,
bin(time, 5m) AS time_bucket,
avg(cpu_utilization) AS avg_cpu
FROM "infrastructure"."server_metrics"
WHERE time BETWEEN ago(1h) AND now()
GROUP BY server, bin(time, 5m)
ORDER BY server, time_bucket;Built-in time-series functions:
| Function | What It Does |
|---|---|
ago(duration) | Returns a timestamp N time units in the past; e.g. ago(1h) |
bin(time, interval) | Groups timestamps into fixed-size buckets (downsampling) |
interpolate_linear() | Fills gaps in sparse time-series with linear interpolation |
derivative() | Computes rate of change between consecutive data points |
smooth() | Applies moving average to reduce noise |
Common Use Cases
| Use Case | Example |
|---|---|
| IoT telemetry | Temperature, pressure, vibration from factory sensors every 100ms |
| Infrastructure monitoring | CPU, memory, network from thousands of EC2 instances every minute |
| Application metrics | Request latency, error rates, queue depth from microservices |
| Financial tick data | Stock prices, trade volumes at millisecond granularity |
| DevOps / observability | Custom application metrics feeding dashboards (pairs with Amazon Managed Grafana) |
Timestream vs. Other AWS Databases
| Scenario | Use |
|---|---|
| Store millions of sensor readings per second, query by time range | Amazon Timestream |
| Store user profile data, session records | Amazon DynamoDB |
| Store orders, transactions | Amazon RDS / Aurora |
| Store large historical datasets for BI reports | Amazon Redshift |
| Store server logs for ad-hoc analysis | Amazon S3 + Athena |
Exam Trap: Timestream is not a general-purpose database. Do not use it for relational data, document storage, or workloads where time is not the primary query dimension. The exam will present IoT or metrics scenarios specifically to test whether the exam-taker knows Timestream exists.
SAA/SAP Tip: Any exam scenario mentioning IoT sensor data, DevOps metrics, time-series, or monitoring data at high write volume that needs efficient time-range queries → Amazon Timestream.
Integration with AWS Services
| Service | How It Integrates |
|---|---|
| AWS IoT Core | Route IoT device messages directly to Timestream via IoT Rules |
| Amazon Kinesis Data Streams | Stream high-throughput events into Timestream via Lambda |
| Amazon Managed Grafana | Native Timestream data source; visualize metrics as dashboards |
| Amazon SageMaker | Query Timestream for ML feature engineering on time-series data |
| AWS Lambda | Serverless ingestion layer between event sources and Timestream |
Cross-Cloud Equivalents
| Provider | Service / Solution | Notes |
|---|---|---|
| AWS | Amazon Timestream | Baseline purpose-built time-series DB |
| Azure | Azure Data Explorer (ADX) | More powerful and flexible; used for logs + time-series; steeper learning curve |
| GCP | Google Cloud Bigtable / BigQuery | Bigtable for high-write time-series; BigQuery for analytical queries; neither is purpose-built |
| On-Premises | InfluxDB / TimescaleDB / Prometheus | InfluxDB = purpose-built time-series OSS; TimescaleDB = PostgreSQL extension; Prometheus = metrics only, no long-term storage |
Pricing Model
- Writes: per million time-series data points written
- Memory store: per GB-hour stored
- Magnetic store: per GB-month stored (significantly cheaper than memory)
- Queries: per GB of data scanned
- No charge for the server/cluster — serverless
Related Services / See Also
- Amazon Kinesis and Managed Flink — stream IoT events into Timestream in real time
- Amazon DynamoDB and DocumentDB — NoSQL for non-time-series workloads
- Database Performance Fundamentals — OLTP/OLAP workload types
- Amazon Managed Grafana — visualization layer for Timestream metrics
Amazon RDS and Aurora
Amazon Relational Database Service (RDS) and Amazon Aurora — managed relational databases on AWS. Covers deployment options, Multi-AZ HA, Read Replicas, Aurora architecture, and cross-cloud equivalents.
Amazon API Gateway
Managed API front door — create, publish, and secure REST, HTTP, and WebSocket APIs at any scale with throttling, caching, and authorization.