Cloudformation and auto scaling, from the concepts to the details
Before you start
Not every environment is ready for auto scaling, to be able to fully utilize these techniques your infra must comply with the following.
- No local state on the disks of application servers.
- Redeployable images: ec2 asg/spot fleet (ami) or in the case of ecs you need docker images. In case you want something to get started with, I would recommend using the linked below “hello world” docker image to get familiar with the way of working.
Horizontal vs Vertical scaling
The basic definition which can be applied in almost all cases:
Vertical scaling: You increase the compute power of the node your code runs on. This type of scaling is pretty limited and should be avoided is possible.
Horizontal scaling: You increase the amount of nodes handling your traffic while keeping the per node size small. This increases your resilience as your traffic increases and if a node fails impact is lower, this is the type of scaling we will take a look at.
Why would you add automated scaling
The answer most people will tell you is; “To handle peak workloads.” but this is not the most important part of it. After thinking it over I came to these 2 key reasons why auto scaling is the way to go.
- Predictability
- Flexibility
Predictability example: When your application in it’s most minimal, predictable and stable form can handle a maximum of X req/s
adding more
nodes will theoretically give you NX req/s
maximum throughput. Of course other bottlenecks will appear like database throughput, hold that thought
I will go into detail about automatically scaling your database later in this article.
Predictability example: As your traffic increases you automatically add more vault tolerance, scaling out with small instances will decrease the impact of a node failure. Node failure is a given, it will happen you just don’t know when, if you don’t have automated scaling you might turn off the node because it is not functioning and put that load permanently on the rest of the cluster until the node is fixed.
Flexibility example: If you optimize your application you can easily change the machine size as you are already adding and subtracting machines regularly this is nothing out of the ordinary and could change many times a day. This makes size selecting based on trial and measure iterations shorter and leads to faster results.
Flexibility example: As your application and requirements changes the need may arise to deploy to a different part of the world to have lower latency for your end users or migrate to a new location for whatever reason, it is pretty easy to add a new auto scaling resource in a different location and reroute your traffic to the new location.
What to scale
You can scale many parts of your infrastructure, in this writeup I focus on the following:
- ECS
- EC2
- RDS
Types of scaling triggers
Step based scaling
You define a set threshold if it is crossed scaling takes an action.
Target tracking
You define a number and a metric that should lower or raise by increasing or decreasing the amount of application servers and let auto scale follow your desired number This in the background sets 2 step based scaling rules 1 to scale out and another to scale in.
What to base the scaling on
- CPU Usage
- ALB traffic
- Memory usage
- Cloudwatch metrics
What does scaling look in a graph
No scaling
Perfect scaling
Scaling parameters
This scaling examples was based on the following calculation: B3=IF(A2 < 2*D2, 3, ROUNDUP(A2/D2)
this roughly translates to the following:
Track 40 req/node with a minimum of 3 nodes
As auto scaling is reactive not predictive you first see the the traffic increase and in the next increment you see the instances respond. The example just calculates the desired amount based on the previous data point, in AWS you have quite a bit more control on the triggers. (if you want) Scaling steps are based on cloudwatch alarms, so if you can create a cloudwatch alarm for something you can use it to base your scaling on.
- Period is the length of time to evaluate the metric or expression to create each individual data point for an alarm. It is expressed in seconds. If you choose one minute as the period, there is one data point every minute.
- Evaluation Period is the number of the most recent periods, or data points, to evaluate when determining alarm state.
- Datapoints to Alarm is the number of data points within the evaluation period that must be breaching to cause the alarm to go to the ALARM state. The breaching data points don’t have to be consecutive, they just must all be within the last number of data points equal to Evaluation Period.
How to write the scaling rules in Cloudformation
The most straight forward autoscaling can be found in EC2 Auto Scaling groups, lets first take a look at that and later on dive into more complex setups.
AWSTemplateFormatVersion: 2010-09-09
Parameters:
AMI:
Type: String
Subnets:
Type: CommaDelimitedList
AZs:
Type: CommaDelimitedList
PolicyTargetValue:
Type: String
Resources:
# what the instances should look like
myLaunchConfig:
Type: AWS::AutoScaling::LaunchConfiguration
Properties:
ImageId: !Ref AMI
InstanceType: t3.micro
# group the instances
myASGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
MaxSize: '2'
AvailabilityZones: !Ref AZs
VPCZoneIdentifier: !Ref Subnets
MinSize: '1'
LaunchConfigurationName: !Ref myLaunchConfig
# Scale the group by this rule
myCPUPolicy:
Type: AWS::AutoScaling::ScalingPolicy
Properties:
AutoScalingGroupName: !Ref myASGroup
PolicyType: TargetTrackingScaling
TargetTrackingConfiguration:
PredefinedMetricSpecification:
PredefinedMetricType: ASGAverageCPUUtilization
TargetValue: !Ref PolicyTargetValue
This is probably the most compact way to write an auto scaling application stack based on machine CPU load as a variable, it uses 3 parts:
- Resource to scale
- Grouping those resources
- Scaling rules for the group
Unfortunately, this is not the case for all things we want to scale, the other type of scaling is based on Application Auto Scaling which is based on the following components:
- Resource to scale
- Grouping those resources
- Scaling rules for the group
Lets go trough one of my most elaborate self scaling templates, piece by piece.
We start of with a ecs service, I left out the boilerplate resources like task definition and iam roles. For the full templates check out the links at the bottom of this article.
Service:
Type: AWS::ECS::Service
Properties:
TaskDefinition: !Ref TaskDefinition
DesiredCount: !Ref DesiredCount
HealthCheckGracePeriodSeconds: 600
LoadBalancers:
- TargetGroupArn: !Ref TargetGroup
ContainerPort: !Ref ContainerPort
ContainerName: !Ref ContainerName
Cluster: !Ref ContainerCluster
LaunchType: FARGATE
NetworkConfiguration:
AwsvpcConfiguration:
AssignPublicIp: DISABLED
SecurityGroups:
- !Ref ContainerSG
Subnets: !Ref Subnets
DependsOn: ListenerRule
This is the service that we want to scale, it is located in fargate so aside from soft limits of containers per account not much scaling restrictions. Sorry I’ll not go deeper into building ECS services as it is a bit outside of the scope of this article.
By default the ecs service resource has no scaling mechanism, so we create one with the AWS::ApplicationAutoScaling::ScalableTarget resource. This puts a virtual handle on a dimension of your resource which will be controllable by auto scaling.
ServiceScalingTarget:
Type: AWS::ApplicationAutoScaling::ScalableTarget
Properties:
MinCapacity: !Ref MinCount
MaxCapacity: !Ref MaxCount
ResourceId:
# Example: service/MyECSCluster-AB12CDE3F4GH/MyECSService-AB12CDE3F4GH
Fn::Join:
- ''
- - service/
- Ref: ContainerCluster
- "/"
- Fn::GetAtt:
- Service
- Name
RoleARN:
Fn::GetAtt:
- ApplicationAutoScalingRole
- Arn
ScalableDimension: ecs:service:DesiredCount
# The all important scaling Dimension, in this case the amount of containers running in an ECS service.
ServiceNamespace: ecs
ScheduledActions:
Fn::If:
- NonProdCondition
-
- ScalableTargetAction:
MinCapacity: 0
MaxCapacity: 0
Schedule: "cron(0 20 * * ? *)"
ScheduledActionName: NightlyDown
- ScalableTargetAction:
MinCapacity: !Ref MinCount
MaxCapacity: !Ref MaxCount
Schedule: "cron(30 4 * * ? *)"
ScheduledActionName: MorningUp
- Ref: AWS::NoValue
This ScalableTarget gives us a few more options to scale our service, as you can see in the example above I use a cron like scaling action to turn off the service outside working hours if it isn’t a production deployment. Of course you can go all out with the cron function and scale your application that way, but that’s not really automatic but can help if you have no trust in your auto scaling or to give your auto scaling reasonable boundaries to operate in.
So Let’s break it down into it’s components and where they get there data from
ServiceScalingTarget:
Type: AWS::ApplicationAutoScaling::ScalableTarget
Properties:
MinCapacity: !Ref MinCount
MaxCapacity: !Ref MaxCount
ResourceId:
Fn::Join:
- ''
- - service/
- Ref: ContainerCluster
- "/"
- Fn::GetAtt:
- Service
- Name
- Capacity: Pretty straight forward, these are the upper and lower boundaries I import them from a variable with
!Ref
- ResourceId This is in our case a “pointer” to the ecs service, but instead of a simple
!Ref
we need a RecourceId
Unfortunately we can’t just ask for this string, we need to construct it ourself so lets start with the end result and work our way backwards on how to construct that string.
service/container-cluster/container-service
| \ /
| +--- Unique Identifier --------+
|
+-- resource type
RoleARN:
Fn::GetAtt:
- ApplicationAutoScalingRole
- Arn
ScalableDimension: ecs:service:DesiredCount
ServiceNamespace: ecs
ScheduledActions:
Fn::If:
- NonProdCondition
-
- ScalableTargetAction:
MinCapacity: 0
MaxCapacity: 0
Schedule: "cron(0 20 * * ? *)"
ScheduledActionName: NightlyDown
- ScalableTargetAction:
MinCapacity: !Ref MinCount
MaxCapacity: !Ref MaxCount
Schedule: "cron(30 4 * * ? *)"
ScheduledActionName: MorningUp
- Ref: AWS::NoValue
Now that we have the groundwork in place lets add some automation to scale based on target tracking
ServiceScalingPolicy:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyName: !Sub ${AWS::StackName}
PolicyType: TargetTrackingScaling
ScalingTargetId: !Ref ServiceScalingTarget
TargetTrackingScalingPolicyConfiguration:
TargetValue: !Ref AutoscalingCpuTarget
ScaleInCooldown: 180
ScaleOutCooldown: 30
PredefinedMetricSpecification:
PredefinedMetricType: ECSServiceAverageCPUUtilization
This is the most basic auto scaling for ecs services and is pretty effective, especially if you consider how easy it is to implement. Let’s deconstruct it
ServiceScalingPolicyReqPerM:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyName: 'req_per_minute'
PolicyType: TargetTrackingScaling
ScalingTargetId: !Ref ServiceScalingTarget
TargetTrackingScalingPolicyConfiguration:
TargetValue: !Ref AutoscalingReqTarget
ScaleInCooldown: 180
ScaleOutCooldown: 30
PredefinedMetricSpecification:
PredefinedMetricType: ALBRequestCountPerTarget
ResourceLabel: !Join [ '/', [!ImportValue GulbFullName, !GetAtt TargetGroup.TargetGroupFullName] ]
# https://docs.aws.amazon.com/en_pv/AWSCloudFormation/latest/UserGuide/aws-properties-applicationautoscaling-scalingpolicy-predefinedmetricspecification.html
# RecourceLabel:
# app/<load-balancer-name>/<load-balancer-id>/targetgroup/<target-group-name>/<target-group-id>
# arn:aws:elasticloadbalancing:eu-west-1:716268079250:loadbalancer/app/gulb/8a5d787993154763
# app/gulb/8a5d787993154763
# !ImportValue GulbFullName
# arn:aws:elasticloadbalancing:eu-west-1:716268079250:targetgroup/iris-Targe-2ITUDAYHYT85/17d40dfaff1a60e6
# targetgroup/iris-Targe-2ITUDAYHYT85/17d40dfaff1a60e6
# !GetAtt TargetGroup.TargetGroupFullName
# RecourceLabel:
# app/gulb/8a5d787993154763/targetgroup/iris-Targe-2ITUDAYHYT85/17d40dfaff1a60e6
# !Join [ '', [!ImportValue GulbFullName, !GetAtt TargetGroup.TargetGroupFullName] ]