AWS Auto Scaling
Automatic capacity management — target tracking, step, scheduled, and predictive scaling policies for EC2, ECS, DynamoDB, and more.
Overview
AWS Auto Scaling automatically adjusts compute capacity to maintain performance and minimise cost — it launches instances when demand increases and terminates them when demand drops.
Auto Scaling operates at two levels: EC2 Auto Scaling (manages EC2 instance fleets within Auto Scaling Groups) and AWS Application Auto Scaling (scales ECS services, DynamoDB tables, Aurora replicas, and other resources).
Core Concepts
| Concept | Description |
|---|---|
| Auto Scaling Group (ASG) | Collection of EC2 instances managed as a unit with min, max, and desired capacity settings |
| Launch Template | Defines the instance configuration (AMI, instance type, key pair, security groups, user data) |
| Desired Capacity | The number of instances the ASG tries to maintain at any given time |
| Minimum / Maximum | Hard bounds on the number of instances; ASG never scales below min or above max |
| Scaling Policy | Rules that define when and how to add or remove instances |
| Cooldown Period | Wait time after a scaling action before another can occur — prevents rapid oscillation |
| Health Check | EC2 status checks or ELB health checks determine if an instance should be replaced |
| Capacity Provider | (ECS) Links an ASG to an ECS cluster for managed scaling of EC2-backed tasks |
| Warm Pool | Pre-initialised stopped/hibernated instances that can launch faster than cold instances |
Scaling Policy Types
| Policy Type | How It Works | Use Case |
|---|---|---|
| Target Tracking | Adjusts capacity to keep a metric at a target value (e.g., CPU at 50%) | Most workloads — simple and self-tuning |
| Step Scaling | Adds/removes capacity in steps based on CloudWatch alarm thresholds | Different scale-out rates for different alarm levels |
| Simple Scaling | Single scaling adjustment per alarm; waits for cooldown | Legacy — prefer Target Tracking or Step |
| Scheduled Scaling | Sets desired capacity at predefined times (cron-like) | Predictable traffic patterns (business hours, events) |
| Predictive Scaling | Uses ML to forecast demand and pre-provisions capacity ahead of time | Cyclical traffic patterns; pairs with dynamic policies |
SAA/SAP Tip: Target Tracking is the recommended default. It is self-correcting, adds and removes capacity automatically, and requires minimal configuration. Step Scaling is useful only when fine-grained, multi-threshold responses are needed.
How Auto Scaling Works
CloudWatch Alarm (CPU > 70%)
│
▼
Scaling Policy
│
▼
Auto Scaling Group
├── min: 2, desired: 3, max: 10
│
├── Launch Template → new EC2 instance
│
├── Health Check (EC2 / ELB)
│ └── Unhealthy → terminate + replace
│
└── AZ Rebalancing
└── Distributes instances evenly across AZsScaling Process
- CloudWatch detects a metric breach (or schedule triggers, or predictive forecast fires)
- Scaling policy evaluates and determines adjustment (add/remove N instances)
- If cooldown has elapsed, ASG launches or terminates instances
- New instances register with the target group (if ELB-integrated)
- ELB health checks validate the new instances before routing traffic
Termination Policies
When scaling in, the ASG decides which instances to terminate. Default behaviour:
- Select the AZ with the most instances (rebalance)
- Within that AZ, terminate the instance with the oldest Launch Template/Configuration
- If tied, terminate the instance closest to the next billing hour
| Policy | Behaviour |
|---|---|
| Default | AZ rebalance → oldest launch config → closest billing hour |
| OldestInstance | Terminate the oldest running instance |
| NewestInstance | Terminate the newest running instance |
| OldestLaunchTemplate | Terminate instance using the oldest launch template |
| ClosestToNextInstanceHour | Terminate instance nearest to billing cycle end |
Mixed Instances Policy
An ASG can combine multiple instance types and purchase options in a single group.
| Setting | Description |
|---|---|
| On-Demand base capacity | Minimum number of On-Demand instances (handles baseline load) |
| On-Demand percentage | Percentage of additional capacity fulfilled by On-Demand |
| Spot allocation strategy | lowest-price, capacity-optimized, or price-capacity-optimized |
| Instance type overrides | List of instance types the ASG can launch (for flexibility) |
SAA/SAP Tip: Use a mixed instances policy with price-capacity-optimized
Spot allocation strategy for the best balance of cost and availability. Set an
On-Demand base for minimum steady-state load.
Application Auto Scaling
Beyond EC2, AWS Application Auto Scaling supports these resources:
| Resource | Scalable Dimension | Common Metric |
|---|---|---|
| ECS Service | Desired task count | CPU/memory utilisation, request count |
| DynamoDB Table/GSI | Read/write capacity units | Consumed capacity / provisioned |
| Aurora Replicas | Number of read replicas | Average connections |
| Lambda (Provisioned) | Provisioned concurrency | Utilisation percentage |
| SageMaker Endpoint | Instance count | Invocations per instance |
| Spot Fleet | Target capacity | CPU utilisation |
Warm Pools
Warm pools keep pre-initialised instances in a stopped or hibernated state, reducing scale-out time.
| State | Behaviour | Cost |
|---|---|---|
| Stopped | Instance is stopped; EBS persists; faster than cold AMI | EBS charges only |
| Hibernated | RAM state preserved on encrypted EBS root volume | EBS charges (slightly more) |
| Running | Fully running but not yet in service (warm-up period) | Full EC2 charges |
Common Use Cases
- Web application tier — Scale web servers behind an ALB based on request count or CPU utilisation.
- Batch processing — Scale workers based on SQS queue depth (custom CloudWatch metric) to drain queues cost-efficiently.
- Scheduled scaling — Pre-provision capacity before a known traffic event (product launch, sale).
- Cost optimisation — Scale in during off-hours with scheduled policies; use Spot with mixed instances.
- Microservices (ECS) — Scale ECS service task count using Application Auto Scaling with target tracking on CPU.
- Database capacity — Auto scale DynamoDB provisioned capacity to handle traffic spikes without manual intervention.
SAA/SAP Exam Tips
SAA/SAP Tip: For "SQS queue depth" scaling, create a custom CloudWatch metric (approximate messages visible / fleet size) and use Target Tracking with that metric. This is a common exam pattern.
Exam Trap: Auto Scaling health checks default to EC2 status checks only. To replace instances that fail ELB health checks, explicitly enable ELB health checks on the ASG — this is a frequently tested configuration.
SAA/SAP Tip: Predictive Scaling uses ML to forecast traffic and pre-provisions instances before demand arrives. It pairs well with dynamic scaling for handling both predicted and unexpected spikes.
Exam Trap: The cooldown period (default 300 seconds) prevents the ASG from launching or terminating additional instances before the previous scaling activity's effect is visible. Setting it too low causes flapping.
Cross-Cloud Equivalents
| Provider | Service / Solution | Notes |
|---|---|---|
| AWS | AWS Auto Scaling | Baseline |
| Azure | Azure Virtual Machine Scale Sets / Azure Autoscale | Separate autoscale for App Service, VMSS, etc. |
| GCP | Google Cloud Managed Instance Groups (MIG) autoscaler | Integrated with load balancing |
| On-Premises | Kubernetes Horizontal Pod Autoscaler (HPA), VMware DRS | Requires manual cluster capacity planning |
Pricing Model
| Dimension | Unit | Notes |
|---|---|---|
| Auto Scaling | Free | No charge for the Auto Scaling service itself |
| EC2 instances | Per use | Standard EC2 pricing for launched instances (On-Demand, Spot, RI) |
| CloudWatch alarms | Per alarm/month | Custom metrics and alarms may incur CloudWatch charges |
| Warm pool instances | Per use | EBS charges while stopped; full charges while running |
Related Services / See Also
- Amazon EC2 — instances managed by Auto Scaling Groups
- Elastic Load Balancing — distributes traffic to scaled instances
- Amazon CloudWatch — metrics and alarms that trigger scaling policies
- Amazon ECS and EKS — container services with Application Auto Scaling
- AWS Lambda — inherent auto scaling with no capacity management
Amazon Redshift
Amazon Redshift — AWS's managed cloud data warehouse for OLAP workloads. Covers architecture, cluster types, Redshift Serverless, Spectrum, and cross-cloud equivalents.
Amazon EC2
Elastic Compute Cloud — virtual servers in the cloud covering instance types, AMIs, placement groups, Nitro, tenancy, and pricing models.