Xoxoftware - XOXO Creative Studio | Web & Mobile App Development | Fred Cheung | Hong Kong
AWSCompute

AWS Auto Scaling

Automatic capacity management — target tracking, step, scheduled, and predictive scaling policies for EC2, ECS, DynamoDB, and more.

Overview

AWS Auto Scaling automatically adjusts compute capacity to maintain performance and minimise cost — it launches instances when demand increases and terminates them when demand drops.

Auto Scaling operates at two levels: EC2 Auto Scaling (manages EC2 instance fleets within Auto Scaling Groups) and AWS Application Auto Scaling (scales ECS services, DynamoDB tables, Aurora replicas, and other resources).


Core Concepts

ConceptDescription
Auto Scaling Group (ASG)Collection of EC2 instances managed as a unit with min, max, and desired capacity settings
Launch TemplateDefines the instance configuration (AMI, instance type, key pair, security groups, user data)
Desired CapacityThe number of instances the ASG tries to maintain at any given time
Minimum / MaximumHard bounds on the number of instances; ASG never scales below min or above max
Scaling PolicyRules that define when and how to add or remove instances
Cooldown PeriodWait time after a scaling action before another can occur — prevents rapid oscillation
Health CheckEC2 status checks or ELB health checks determine if an instance should be replaced
Capacity Provider(ECS) Links an ASG to an ECS cluster for managed scaling of EC2-backed tasks
Warm PoolPre-initialised stopped/hibernated instances that can launch faster than cold instances

Scaling Policy Types

Policy TypeHow It WorksUse Case
Target TrackingAdjusts capacity to keep a metric at a target value (e.g., CPU at 50%)Most workloads — simple and self-tuning
Step ScalingAdds/removes capacity in steps based on CloudWatch alarm thresholdsDifferent scale-out rates for different alarm levels
Simple ScalingSingle scaling adjustment per alarm; waits for cooldownLegacy — prefer Target Tracking or Step
Scheduled ScalingSets desired capacity at predefined times (cron-like)Predictable traffic patterns (business hours, events)
Predictive ScalingUses ML to forecast demand and pre-provisions capacity ahead of timeCyclical traffic patterns; pairs with dynamic policies

SAA/SAP Tip: Target Tracking is the recommended default. It is self-correcting, adds and removes capacity automatically, and requires minimal configuration. Step Scaling is useful only when fine-grained, multi-threshold responses are needed.


How Auto Scaling Works

CloudWatch Alarm (CPU > 70%)


   Scaling Policy


Auto Scaling Group
├── min: 2, desired: 3, max: 10

├── Launch Template → new EC2 instance

├── Health Check (EC2 / ELB)
│   └── Unhealthy → terminate + replace

└── AZ Rebalancing
    └── Distributes instances evenly across AZs

Scaling Process

  1. CloudWatch detects a metric breach (or schedule triggers, or predictive forecast fires)
  2. Scaling policy evaluates and determines adjustment (add/remove N instances)
  3. If cooldown has elapsed, ASG launches or terminates instances
  4. New instances register with the target group (if ELB-integrated)
  5. ELB health checks validate the new instances before routing traffic

Termination Policies

When scaling in, the ASG decides which instances to terminate. Default behaviour:

  1. Select the AZ with the most instances (rebalance)
  2. Within that AZ, terminate the instance with the oldest Launch Template/Configuration
  3. If tied, terminate the instance closest to the next billing hour
PolicyBehaviour
DefaultAZ rebalance → oldest launch config → closest billing hour
OldestInstanceTerminate the oldest running instance
NewestInstanceTerminate the newest running instance
OldestLaunchTemplateTerminate instance using the oldest launch template
ClosestToNextInstanceHourTerminate instance nearest to billing cycle end

Mixed Instances Policy

An ASG can combine multiple instance types and purchase options in a single group.

SettingDescription
On-Demand base capacityMinimum number of On-Demand instances (handles baseline load)
On-Demand percentagePercentage of additional capacity fulfilled by On-Demand
Spot allocation strategylowest-price, capacity-optimized, or price-capacity-optimized
Instance type overridesList of instance types the ASG can launch (for flexibility)

SAA/SAP Tip: Use a mixed instances policy with price-capacity-optimized Spot allocation strategy for the best balance of cost and availability. Set an On-Demand base for minimum steady-state load.


Application Auto Scaling

Beyond EC2, AWS Application Auto Scaling supports these resources:

ResourceScalable DimensionCommon Metric
ECS ServiceDesired task countCPU/memory utilisation, request count
DynamoDB Table/GSIRead/write capacity unitsConsumed capacity / provisioned
Aurora ReplicasNumber of read replicasAverage connections
Lambda (Provisioned)Provisioned concurrencyUtilisation percentage
SageMaker EndpointInstance countInvocations per instance
Spot FleetTarget capacityCPU utilisation

Warm Pools

Warm pools keep pre-initialised instances in a stopped or hibernated state, reducing scale-out time.

StateBehaviourCost
StoppedInstance is stopped; EBS persists; faster than cold AMIEBS charges only
HibernatedRAM state preserved on encrypted EBS root volumeEBS charges (slightly more)
RunningFully running but not yet in service (warm-up period)Full EC2 charges

Common Use Cases

  • Web application tier — Scale web servers behind an ALB based on request count or CPU utilisation.
  • Batch processing — Scale workers based on SQS queue depth (custom CloudWatch metric) to drain queues cost-efficiently.
  • Scheduled scaling — Pre-provision capacity before a known traffic event (product launch, sale).
  • Cost optimisation — Scale in during off-hours with scheduled policies; use Spot with mixed instances.
  • Microservices (ECS) — Scale ECS service task count using Application Auto Scaling with target tracking on CPU.
  • Database capacity — Auto scale DynamoDB provisioned capacity to handle traffic spikes without manual intervention.

SAA/SAP Exam Tips

SAA/SAP Tip: For "SQS queue depth" scaling, create a custom CloudWatch metric (approximate messages visible / fleet size) and use Target Tracking with that metric. This is a common exam pattern.

Exam Trap: Auto Scaling health checks default to EC2 status checks only. To replace instances that fail ELB health checks, explicitly enable ELB health checks on the ASG — this is a frequently tested configuration.

SAA/SAP Tip: Predictive Scaling uses ML to forecast traffic and pre-provisions instances before demand arrives. It pairs well with dynamic scaling for handling both predicted and unexpected spikes.

Exam Trap: The cooldown period (default 300 seconds) prevents the ASG from launching or terminating additional instances before the previous scaling activity's effect is visible. Setting it too low causes flapping.


Cross-Cloud Equivalents

ProviderService / SolutionNotes
AWSAWS Auto ScalingBaseline
AzureAzure Virtual Machine Scale Sets / Azure AutoscaleSeparate autoscale for App Service, VMSS, etc.
GCPGoogle Cloud Managed Instance Groups (MIG) autoscalerIntegrated with load balancing
On-PremisesKubernetes Horizontal Pod Autoscaler (HPA), VMware DRSRequires manual cluster capacity planning

Pricing Model

DimensionUnitNotes
Auto ScalingFreeNo charge for the Auto Scaling service itself
EC2 instancesPer useStandard EC2 pricing for launched instances (On-Demand, Spot, RI)
CloudWatch alarmsPer alarm/monthCustom metrics and alarms may incur CloudWatch charges
Warm pool instancesPer useEBS charges while stopped; full charges while running

Built by Fred Cheung @CookedRicer · Powered by Fumadocs & Github Copilot

On this page