AWS Auto Scaling

Automatic capacity management — target tracking, step, scheduled, and predictive scaling policies for EC2, ECS, DynamoDB, and more.

Overview

AWS Auto Scaling automatically adjusts compute capacity to maintain performance and minimise cost — it launches instances when demand increases and terminates them when demand drops.

Auto Scaling operates at two levels: EC2 Auto Scaling (manages EC2 instance fleets within Auto Scaling Groups) and AWS Application Auto Scaling (scales ECS services, DynamoDB tables, Aurora replicas, and other resources).

Core Concepts

Concept	Description
Auto Scaling Group (ASG)	Collection of EC2 instances managed as a unit with min, max, and desired capacity settings
Launch Template	Defines the instance configuration (AMI, instance type, key pair, security groups, user data)
Desired Capacity	The number of instances the ASG tries to maintain at any given time
Minimum / Maximum	Hard bounds on the number of instances; ASG never scales below min or above max
Scaling Policy	Rules that define when and how to add or remove instances
Cooldown Period	Wait time after a scaling action before another can occur — prevents rapid oscillation
Health Check	EC2 status checks or ELB health checks determine if an instance should be replaced
Capacity Provider	(ECS) Links an ASG to an ECS cluster for managed scaling of EC2-backed tasks
Warm Pool	Pre-initialised stopped/hibernated instances that can launch faster than cold instances

Scaling Policy Types

Policy Type	How It Works	Use Case
Target Tracking	Adjusts capacity to keep a metric at a target value (e.g., CPU at 50%)	Most workloads — simple and self-tuning
Step Scaling	Adds/removes capacity in steps based on CloudWatch alarm thresholds	Different scale-out rates for different alarm levels
Simple Scaling	Single scaling adjustment per alarm; waits for cooldown	Legacy — prefer Target Tracking or Step
Scheduled Scaling	Sets desired capacity at predefined times (cron-like)	Predictable traffic patterns (business hours, events)
Predictive Scaling	Uses ML to forecast demand and pre-provisions capacity ahead of time	Cyclical traffic patterns; pairs with dynamic policies

SAA/SAP Tip: Target Tracking is the recommended default. It is self-correcting, adds and removes capacity automatically, and requires minimal configuration. Step Scaling is useful only when fine-grained, multi-threshold responses are needed.

How Auto Scaling Works

CloudWatch Alarm (CPU > 70%)
         │
         ▼
   Scaling Policy
         │
         ▼
Auto Scaling Group
├── min: 2, desired: 3, max: 10
│
├── Launch Template → new EC2 instance
│
├── Health Check (EC2 / ELB)
│   └── Unhealthy → terminate + replace
│
└── AZ Rebalancing
    └── Distributes instances evenly across AZs

Scaling Process

CloudWatch detects a metric breach (or schedule triggers, or predictive forecast fires)
Scaling policy evaluates and determines adjustment (add/remove N instances)
If cooldown has elapsed, ASG launches or terminates instances
New instances register with the target group (if ELB-integrated)
ELB health checks validate the new instances before routing traffic

Termination Policies

When scaling in, the ASG decides which instances to terminate. Default behaviour:

Select the AZ with the most instances (rebalance)
Within that AZ, terminate the instance with the oldest Launch Template/Configuration
If tied, terminate the instance closest to the next billing hour

Policy	Behaviour
Default	AZ rebalance → oldest launch config → closest billing hour
OldestInstance	Terminate the oldest running instance
NewestInstance	Terminate the newest running instance
OldestLaunchTemplate	Terminate instance using the oldest launch template
ClosestToNextInstanceHour	Terminate instance nearest to billing cycle end

Mixed Instances Policy

An ASG can combine multiple instance types and purchase options in a single group.

Setting	Description
On-Demand base capacity	Minimum number of On-Demand instances (handles baseline load)
On-Demand percentage	Percentage of additional capacity fulfilled by On-Demand
Spot allocation strategy	`lowest-price`, `capacity-optimized`, or `price-capacity-optimized`
Instance type overrides	List of instance types the ASG can launch (for flexibility)

SAA/SAP Tip: Use a mixed instances policy with price-capacity-optimized Spot allocation strategy for the best balance of cost and availability. Set an On-Demand base for minimum steady-state load.

Application Auto Scaling

Beyond EC2, AWS Application Auto Scaling supports these resources:

Resource	Scalable Dimension	Common Metric
ECS Service	Desired task count	CPU/memory utilisation, request count
DynamoDB Table/GSI	Read/write capacity units	Consumed capacity / provisioned
Aurora Replicas	Number of read replicas	Average connections
Lambda (Provisioned)	Provisioned concurrency	Utilisation percentage
SageMaker Endpoint	Instance count	Invocations per instance
Spot Fleet	Target capacity	CPU utilisation

Warm Pools

Warm pools keep pre-initialised instances in a stopped or hibernated state, reducing scale-out time.

State	Behaviour	Cost
Stopped	Instance is stopped; EBS persists; faster than cold AMI	EBS charges only
Hibernated	RAM state preserved on encrypted EBS root volume	EBS charges (slightly more)
Running	Fully running but not yet in service (warm-up period)	Full EC2 charges

Common Use Cases

Web application tier — Scale web servers behind an ALB based on request count or CPU utilisation.
Batch processing — Scale workers based on SQS queue depth (custom CloudWatch metric) to drain queues cost-efficiently.
Scheduled scaling — Pre-provision capacity before a known traffic event (product launch, sale).
Cost optimisation — Scale in during off-hours with scheduled policies; use Spot with mixed instances.
Microservices (ECS) — Scale ECS service task count using Application Auto Scaling with target tracking on CPU.
Database capacity — Auto scale DynamoDB provisioned capacity to handle traffic spikes without manual intervention.

SAA/SAP Exam Tips

SAA/SAP Tip: For "SQS queue depth" scaling, create a custom CloudWatch metric (approximate messages visible / fleet size) and use Target Tracking with that metric. This is a common exam pattern.

Exam Trap: Auto Scaling health checks default to EC2 status checks only. To replace instances that fail ELB health checks, explicitly enable ELB health checks on the ASG — this is a frequently tested configuration.

SAA/SAP Tip: Predictive Scaling uses ML to forecast traffic and pre-provisions instances before demand arrives. It pairs well with dynamic scaling for handling both predicted and unexpected spikes.

Exam Trap: The cooldown period (default 300 seconds) prevents the ASG from launching or terminating additional instances before the previous scaling activity's effect is visible. Setting it too low causes flapping.

Cross-Cloud Equivalents

Provider	Service / Solution	Notes
AWS	AWS Auto Scaling	Baseline
Azure	Azure Virtual Machine Scale Sets / Azure Autoscale	Separate autoscale for App Service, VMSS, etc.
GCP	Google Cloud Managed Instance Groups (MIG) autoscaler	Integrated with load balancing
On-Premises	Kubernetes Horizontal Pod Autoscaler (HPA), VMware DRS	Requires manual cluster capacity planning

Pricing Model

Dimension	Unit	Notes
Auto Scaling	Free	No charge for the Auto Scaling service itself
EC2 instances	Per use	Standard EC2 pricing for launched instances (On-Demand, Spot, RI)
CloudWatch alarms	Per alarm/month	Custom metrics and alarms may incur CloudWatch charges
Warm pool instances	Per use	EBS charges while stopped; full charges while running

Amazon EC2 — instances managed by Auto Scaling Groups
Elastic Load Balancing — distributes traffic to scaled instances
Amazon CloudWatch — metrics and alarms that trigger scaling policies
Amazon ECS and EKS — container services with Application Auto Scaling
AWS Lambda — inherent auto scaling with no capacity management

AWS Auto Scaling

On this page