Xoxoftware - XOXO Creative Studio | Web & Mobile App Development | Fred Cheung | Hong Kong
AWSDatabase

Amazon Timestream

Amazon Timestream — AWS's purpose-built serverless time-series database. Covers time-series data model, storage tiers, query engine, and cross-cloud equivalents.

Overview

Amazon Timestream is AWS's fully managed, serverless time-series database — optimized for storing and querying data where time is the primary axis, such as IoT sensor readings, application metrics, server telemetry, and financial tick data.


What is Time-Series Data?

Time-series data is a sequence of values recorded at successive points in time. Every record has three components:

[ timestamp ]  [ dimension(s) ]  [ measure(s) ]
  10:00:01       server="web-01"   cpu=72.4
  10:00:01       server="web-02"   cpu=45.1
  10:00:02       server="web-01"   cpu=73.1
ComponentDescriptionExample
TimestampWhen the data point was recorded2024-01-15 10:00:01.000
DimensionMetadata that identifies the source — does not change frequentlyserver="web-01", region="us-east-1"
MeasureThe actual value being recorded — changes every intervalcpu=72.4, temperature=21.3

Why not just use a regular relational database?

A general-purpose database can store time-series data, but time-series workloads have characteristics that make it inefficient:

CharacteristicImpact on a regular DBTimestream's approach
Write volume is extremely high (millions of points/second)Table grows unboundedly; INSERT performance degradesAppend-only ingestion; no indexes on writes
Queries almost always filter by time rangeFull table scans or manual partitioning by date neededTime is a native, first-class query dimension
Recent data is hot; old data is coldDeveloper manages partitioning + archival manuallyAutomatic two-tier storage (memory → S3)
Data is naturally ordered by timeB-tree indexes waste space on monotonically increasing timestampsColumnar storage optimized for time ordering

Architecture

Two-Tier Automatic Storage

Timestream automatically manages a two-tier storage model — retention policies are configured and data moves between tiers automatically:

Write                     Memory Store                    Magnetic Store
  │       recent, hot data    │        older, cold data        │
  └──────▶  (in-memory)       │  ──auto-moves after TTL──▶  (S3-backed columnar)
            ms latency        │                               ms–seconds latency
            hours to days     │                               months to years
TierStorage TypeQuery LatencyTypical RetentionCost
Memory StoreIn-memoryMillisecondsHours to daysHigher
Magnetic StoreS3-backed columnarMilliseconds to secondsMonths to yearsMuch lower

A memory store retention (e.g. 24 hours) and a magnetic store retention (e.g. 1 year) are configured per table. Data older than the memory retention threshold is automatically moved to magnetic storage. No manual partitioning or archival jobs needed.

SAA/SAP Tip: The two-tier model is the key exam differentiator for Timestream. Recent data stays fast and expensive (memory); historical data becomes cheap and slightly slower (magnetic). This maps directly to the concept of hot/warm/cold storage tiering.

Serverless Scaling

  • No cluster to provision — Timestream automatically scales read and write throughput
  • Billing is based on data written, data stored (per tier), and data queried
  • No capacity planning required

Data Model

Timestream organizes data into databasestables. Each table stores time-series records.

A record must have:

  • A timestamp (nanosecond precision)
  • One or more dimensions (string key-value pairs identifying the source)
  • One or more measures (the numeric or string values being tracked)
Database: "infrastructure"
  Table: "server_metrics"
    Record:
      time:       2024-01-15 10:00:01.000000000
      dimensions: [server="web-01", region="us-east-1", az="us-east-1a"]
      measures:   [cpu_utilization=72.4, memory_used_gb=6.2, disk_io_ops=1240]

Dimensions are automatically indexed. Measures are stored as columnar data optimized for range queries over time.


Query Engine

Timestream uses a purpose-built SQL dialect with time-series specific functions that would be complex to write in standard SQL:

-- Average CPU per server over the last hour, in 5-minute buckets
SELECT server,
       bin(time, 5m) AS time_bucket,
       avg(cpu_utilization) AS avg_cpu
FROM "infrastructure"."server_metrics"
WHERE time BETWEEN ago(1h) AND now()
GROUP BY server, bin(time, 5m)
ORDER BY server, time_bucket;

Built-in time-series functions:

FunctionWhat It Does
ago(duration)Returns a timestamp N time units in the past; e.g. ago(1h)
bin(time, interval)Groups timestamps into fixed-size buckets (downsampling)
interpolate_linear()Fills gaps in sparse time-series with linear interpolation
derivative()Computes rate of change between consecutive data points
smooth()Applies moving average to reduce noise

Common Use Cases

Use CaseExample
IoT telemetryTemperature, pressure, vibration from factory sensors every 100ms
Infrastructure monitoringCPU, memory, network from thousands of EC2 instances every minute
Application metricsRequest latency, error rates, queue depth from microservices
Financial tick dataStock prices, trade volumes at millisecond granularity
DevOps / observabilityCustom application metrics feeding dashboards (pairs with Amazon Managed Grafana)

Timestream vs. Other AWS Databases

ScenarioUse
Store millions of sensor readings per second, query by time rangeAmazon Timestream
Store user profile data, session recordsAmazon DynamoDB
Store orders, transactionsAmazon RDS / Aurora
Store large historical datasets for BI reportsAmazon Redshift
Store server logs for ad-hoc analysisAmazon S3 + Athena

Exam Trap: Timestream is not a general-purpose database. Do not use it for relational data, document storage, or workloads where time is not the primary query dimension. The exam will present IoT or metrics scenarios specifically to test whether the exam-taker knows Timestream exists.

SAA/SAP Tip: Any exam scenario mentioning IoT sensor data, DevOps metrics, time-series, or monitoring data at high write volume that needs efficient time-range queries → Amazon Timestream.


Integration with AWS Services

ServiceHow It Integrates
AWS IoT CoreRoute IoT device messages directly to Timestream via IoT Rules
Amazon Kinesis Data StreamsStream high-throughput events into Timestream via Lambda
Amazon Managed GrafanaNative Timestream data source; visualize metrics as dashboards
Amazon SageMakerQuery Timestream for ML feature engineering on time-series data
AWS LambdaServerless ingestion layer between event sources and Timestream

Cross-Cloud Equivalents

ProviderService / SolutionNotes
AWSAmazon TimestreamBaseline purpose-built time-series DB
AzureAzure Data Explorer (ADX)More powerful and flexible; used for logs + time-series; steeper learning curve
GCPGoogle Cloud Bigtable / BigQueryBigtable for high-write time-series; BigQuery for analytical queries; neither is purpose-built
On-PremisesInfluxDB / TimescaleDB / PrometheusInfluxDB = purpose-built time-series OSS; TimescaleDB = PostgreSQL extension; Prometheus = metrics only, no long-term storage

Pricing Model

  • Writes: per million time-series data points written
  • Memory store: per GB-hour stored
  • Magnetic store: per GB-month stored (significantly cheaper than memory)
  • Queries: per GB of data scanned
  • No charge for the server/cluster — serverless

Built by Fred Cheung @CookedRicer · Powered by Fumadocs & Github Copilot

On this page