Cloudformation and auto scaling, from the concepts to the details

Before you start

Not every environment is ready for auto scaling, to be able to fully utilize these techniques your infra must comply with the following.

  • No local state on the disks of application servers.
  • Redeployable images: ec2 asg/spot fleet (ami) or in the case of ecs you need docker images. In case you want something to get started with, I would recommend using the linked below “hello world” docker image to get familiar with the way of working.

Hello container

Horizontal vs Vertical scaling

The basic definition which can be applied in almost all cases:

Vertical scaling: You increase the compute power of the node your code runs on. This type of scaling is pretty limited and should be avoided is possible.

Horizontal scaling: You increase the amount of nodes handling your traffic while keeping the per node size small. This increases your resilience as your traffic increases and if a node fails impact is lower, this is the type of scaling we will take a look at.

Why would you add automated scaling

The answer most people will tell you is; “To handle peak workloads.” but this is not the most important part of it. After thinking it over I came to these 2 key reasons why auto scaling is the way to go.

  • Predictability
  • Flexibility

Predictability example: When your application in it's most minimal, predictable and stable form can handle a maximum of X req/s adding more nodes will theoretically give you NX req/s maximum throughput. Of course other bottlenecks will appear like database throughput, hold that thought I will go into detail about automatically scaling your database later in this article.

Predictability example: As your traffic increases you automatically add more vault tolerance, scaling out with small instances will decrease the impact of a node failure. Node failure is a given, it will happen you just don't know when, if you don't have automated scaling you might turn off the node because it is not functioning and put that load permanently on the rest of the cluster until the node is fixed.

Flexibility example: If you optimize your application you can easily change the machine size as you are already adding and subtracting machines regularly this is nothing out of the ordinary and could change many times a day. This makes size selecting based on trial and measure iterations shorter and leads to faster results.

Flexibility example: As your application and requirements changes the need may arise to deploy to a different part of the world to have lower latency for your end users or migrate to a new location for whatever reason, it is pretty easy to add a new auto scaling resource in a different location and reroute your traffic to the new location.

What to scale

You can scale many parts of your infrastructure, in this writeup I focus on the following:

  • ECS
  • EC2
  • RDS

Types of scaling triggers

Step based scaling

You define a set threshold if it is crossed scaling takes an action.

Target tracking

You define a number and a metric that should lower or raise by increasing or decreasing the amount of application servers and let auto scale follow your desired number This in the background sets 2 step based scaling rules 1 to scale out and another to scale in.

What to base the scaling on

  • CPU Usage
  • ALB traffic
  • Memory usage
  • Cloudwatch metrics

What does scaling look in a graph

No scaling

Perfect scaling

Scaling parameters

This scaling examples was based on the following calculation: B3=IF(A2 < 2*D2, 3, ROUNDUP(A2/D2) this roughly translates to the following:

Track 40 req/node with a minimum of 3 nodes

As auto scaling is reactive not predictive you first see the the traffic increase and in the next increment you see the instances respond. The example just calculates the desired amount based on the previous data point, in AWS you have quite a bit more control on the triggers. (if you want) Scaling steps are based on cloudwatch alarms, so if you can create a cloudwatch alarm for something you can use it to base your scaling on.

  • Period is the length of time to evaluate the metric or expression to create each individual data point for an alarm. It is expressed in seconds. If you choose one minute as the period, there is one data point every minute.
  • Evaluation Period is the number of the most recent periods, or data points, to evaluate when determining alarm state.
  • Datapoints to Alarm is the number of data points within the evaluation period that must be breaching to cause the alarm to go to the ALARM state. The breaching data points don't have to be consecutive, they just must all be within the last number of data points equal to Evaluation Period.

How to write the scaling rules in Cloudformation

The most straight forward autoscaling can be found in EC2 Auto Scaling groups, lets first take a look at that and later on dive into more complex setups.

AWSTemplateFormatVersion: 2010-09-09
Parameters:
  AMI:
    Type: String
  Subnets:
    Type: CommaDelimitedList
  AZs:
    Type: CommaDelimitedList
  PolicyTargetValue:
    Type: String
Resources:
  # what the instances should look like
  myLaunchConfig:
    Type: AWS::AutoScaling::LaunchConfiguration
    Properties:
      ImageId: !Ref AMI
      InstanceType: t3.micro

  # group the instances
  myASGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      MaxSize: '2'
      AvailabilityZones: !Ref AZs
      VPCZoneIdentifier: !Ref Subnets
      MinSize: '1'
      LaunchConfigurationName: !Ref myLaunchConfig

  # Scale the group by this rule
  myCPUPolicy:
    Type: AWS::AutoScaling::ScalingPolicy
    Properties:
      AutoScalingGroupName: !Ref myASGroup
      PolicyType: TargetTrackingScaling
      TargetTrackingConfiguration:
        PredefinedMetricSpecification:
          PredefinedMetricType: ASGAverageCPUUtilization
        TargetValue: !Ref PolicyTargetValue

This is probably the most compact way to write an auto scaling application stack based on machine CPU load as a variable, it uses 3 parts:

  • Resource to scale
  • Grouping those resources
  • Scaling rules for the group

Unfortunately, this is not the case for all things we want to scale, the other type of scaling is based on Application Auto Scaling which is based on the following components:

  • Resource to scale
  • Grouping those resources
  • Scaling rules for the group

Lets go trough one of my most elaborate self scaling templates, piece by piece.

We start of with a ecs service, I left out the boilerplate resources like task definition and iam roles. For the full templates check out the links at the bottom of this article.

  Service:
    Type: AWS::ECS::Service
    Properties:
      TaskDefinition: !Ref TaskDefinition
      DesiredCount: !Ref DesiredCount
      HealthCheckGracePeriodSeconds: 600
      LoadBalancers:
      - TargetGroupArn: !Ref TargetGroup
        ContainerPort: !Ref ContainerPort
        ContainerName: !Ref ContainerName
      Cluster: !Ref ContainerCluster
      LaunchType: FARGATE
      NetworkConfiguration:
        AwsvpcConfiguration:
          AssignPublicIp: DISABLED
          SecurityGroups:
          - !Ref ContainerSG
          Subnets: !Ref Subnets
    DependsOn: ListenerRule

This is the service that we want to scale, it is located in fargate so aside from soft limits of containers per account not much scaling restrictions. Sorry I'll not go deeper into building ECS services as it is a bit outside of the scope of this article.

By default the ecs service resource has no scaling mechanism, so we create one with the AWS::ApplicationAutoScaling::ScalableTarget resource. This puts a virtual handle on a dimension of your resource which will be controllable by auto scaling.

  ServiceScalingTarget:
    Type: AWS::ApplicationAutoScaling::ScalableTarget
    Properties:
      MinCapacity: !Ref MinCount
      MaxCapacity: !Ref MaxCount
      ResourceId:
        # Example: service/MyECSCluster-AB12CDE3F4GH/MyECSService-AB12CDE3F4GH
        Fn::Join:
        - ''
        - - service/
          - Ref: ContainerCluster
          - "/"
          - Fn::GetAtt:
            - Service
            - Name
      RoleARN:
        Fn::GetAtt:
        - ApplicationAutoScalingRole
        - Arn
      ScalableDimension: ecs:service:DesiredCount
      # The all important scaling Dimension, in this case the amount of containers running in an ECS service.
      ServiceNamespace: ecs
      ScheduledActions:
        Fn::If:
        - NonProdCondition
        -
          - ScalableTargetAction:
              MinCapacity: 0
              MaxCapacity: 0
            Schedule: "cron(0 20 * * ? *)"
            ScheduledActionName: NightlyDown
          - ScalableTargetAction:
              MinCapacity: !Ref MinCount
              MaxCapacity: !Ref MaxCount
            Schedule: "cron(30 4 * * ? *)"
            ScheduledActionName: MorningUp
        - Ref: AWS::NoValue

This ScalableTarget gives us a few more options to scale our service, as you can see in the example above I use a cron like scaling action to turn off the service outside working hours if it isn't a production deployment. Of course you can go all out with the cron function and scale your application that way, but that's not really automatic but can help if you have no trust in your auto scaling or to give your auto scaling reasonable boundaries to operate in.

So Let's break it down into it's components and where they get there data from

  ServiceScalingTarget:
    Type: AWS::ApplicationAutoScaling::ScalableTarget
    Properties:
      MinCapacity: !Ref MinCount
      MaxCapacity: !Ref MaxCount
      ResourceId:
        Fn::Join:
        - ''
        - - service/
          - Ref: ContainerCluster
          - "/"
          - Fn::GetAtt:
            - Service
            - Name
  • Capacity: Pretty straight forward, these are the upper and lower boundaries I import them from a variable with !Ref
  • ResourceId This is in our case a “pointer” to the ecs service, but instead of a simple !Ref we need a RecourceId

Unfortunately we can't just ask for this string, we need to construct it ourself so lets start with the end result and work our way backwards on how to construct that string.

service/container-cluster/container-service
  |     \                                /
  |      +--- Unique Identifier --------+
  |
  +-- resource type
      RoleARN:
        Fn::GetAtt:
        - ApplicationAutoScalingRole
        - Arn
      ScalableDimension: ecs:service:DesiredCount
      ServiceNamespace: ecs
      ScheduledActions:
        Fn::If:
        - NonProdCondition
        -
          - ScalableTargetAction:
              MinCapacity: 0
              MaxCapacity: 0
            Schedule: "cron(0 20 * * ? *)"
            ScheduledActionName: NightlyDown
          - ScalableTargetAction:
              MinCapacity: !Ref MinCount
              MaxCapacity: !Ref MaxCount
            Schedule: "cron(30 4 * * ? *)"
            ScheduledActionName: MorningUp
        - Ref: AWS::NoValue

Now that we have the groundwork in place lets add some automation to scale based on target tracking

  ServiceScalingPolicy:
    Type: AWS::ApplicationAutoScaling::ScalingPolicy
    Properties:
      PolicyName: !Sub ${AWS::StackName}
      PolicyType: TargetTrackingScaling
      ScalingTargetId: !Ref ServiceScalingTarget
      TargetTrackingScalingPolicyConfiguration:
        TargetValue: !Ref AutoscalingCpuTarget
        ScaleInCooldown: 180
        ScaleOutCooldown: 30
        PredefinedMetricSpecification:
          PredefinedMetricType: ECSServiceAverageCPUUtilization

This is the most basic auto scaling for ecs services and is pretty effective, especially if you consider how easy it is to implement. Let's deconstruct it

  ServiceScalingPolicyReqPerM:
    Type: AWS::ApplicationAutoScaling::ScalingPolicy
    Properties:
      PolicyName: 'req_per_minute'
      PolicyType: TargetTrackingScaling
      ScalingTargetId: !Ref ServiceScalingTarget
      TargetTrackingScalingPolicyConfiguration:
        TargetValue: !Ref AutoscalingReqTarget
        ScaleInCooldown: 180
        ScaleOutCooldown: 30
        PredefinedMetricSpecification:
          PredefinedMetricType: ALBRequestCountPerTarget
          ResourceLabel: !Join [ '/', [!ImportValue GulbFullName, !GetAtt TargetGroup.TargetGroupFullName] ]

          # https://docs.aws.amazon.com/en_pv/AWSCloudFormation/latest/UserGuide/aws-properties-applicationautoscaling-scalingpolicy-predefinedmetricspecification.html
          # RecourceLabel:
          # app/<load-balancer-name>/<load-balancer-id>/targetgroup/<target-group-name>/<target-group-id>

          # arn:aws:elasticloadbalancing:eu-west-1:716268079250:loadbalancer/app/gulb/8a5d787993154763
          # app/gulb/8a5d787993154763
          # !ImportValue GulbFullName

          # arn:aws:elasticloadbalancing:eu-west-1:716268079250:targetgroup/iris-Targe-2ITUDAYHYT85/17d40dfaff1a60e6
          # targetgroup/iris-Targe-2ITUDAYHYT85/17d40dfaff1a60e6
          # !GetAtt TargetGroup.TargetGroupFullName

          # RecourceLabel:
          # app/gulb/8a5d787993154763/targetgroup/iris-Targe-2ITUDAYHYT85/17d40dfaff1a60e6
          # !Join [ '', [!ImportValue GulbFullName, !GetAtt TargetGroup.TargetGroupFullName] ]
Avatar
Stein van Broekhoven
Cloud & Open-Source magician 🧙‍♂️

I try to find the KISS in complex systems and share it with the world.

comments powered by Disqus

Related