Autoscale architecture for AWS

The following diagram illustrates the architecture of the autoscaling feature with DNS as the traffic distributor.


The following diagram illustrates the architecture of the autoscaling feature with NLB as the traffic distributor.


NetScaler Console

NetScaler Console is a web-based solution for managing all NetScaler deployments that are deployed on-premises or on the cloud. You can use this cloud solution to manage, monitor, and troubleshoot the entire global application delivery infrastructure from a single, unified, and centralized cloud-based console. NetScaler Console provides all the capabilities required to quickly set up, deploy, and manage application delivery in NetScaler deployments and with rich analytics of application health, performance, and security.

The Autoscale groups are created in NetScaler Console and the NetScaler VPX instances are provisioned from NetScaler Console. The application is then deployed through StyleBooks in NetScaler Console.

Traffic distributors (NLB or DNS/Route53)

NLB or DNS/Route53 is used to distribute traffic across all the nodes in an Autoscale group. See Autoscale traffic distribution modes for more information.

The NetScaler Console communicates with the traffic distributor to update the application domain and IP addresses of the load balancing virtual servers that front end the application.

NetScaler Console Autoscale group

Autoscale group is a group of NetScaler instances that load balance applications as a single entity and trigger autoscaling based on the configured threshold parameter values.

NetScaler clusters

A NetScaler cluster is a group of NetScaler VPX instances and each instance is called a node. The client traffic is distributed across the nodes to provide high availability, high throughput, and scalability.


  • Autoscaling decisions are made at the cluster level and not at the node level.
  • Independent clusters are hosted in different availability zones and therefore support for some of the shared state features are limited.

    Persistence sessions such as source IP persistence and others except cookie based persistence cannot be shared across clusters. However, all the stateless features like load balancing methods work as expected across the multiple availability zones.

AWS auto scaling groups

AWS auto scaling group is a collection of EC2 instances that share similar characteristics and are treated as a logical grouping for the purposes of instance scaling and management.

AWS availability zones

AWS availability zone is an isolated location inside a region. Each region is made up of several availability zones. Each availability zone belongs to a single region.

Traffic distribution modes

As you move your application deployment to cloud, autoscaling becomes a part of the infrastructure. As the applications scale-out or scale-in using autoscaling, these changes must be propagated to the client. This propagation is achieved using DNS based or NLB based autoscaling.

NLB based autoscaling

In NLB-based deployment mode, the distribution tier to the cluster nodes is the AWS network load balancer.

In NLB based autoscaling, only one static IP address is offered per availability zone. This is the public IP address that is added to route53 and the back-end IP addresses can be private. With this public IP address, any new NetScaler instance provisioned during autoscaling operates using private IP addresses and does not require additional public IP addresses.

You can use NLB-based autoscaling to manage both TCP and UDP traffic.

DNS based autoscaling

In DNS based autoscaling, DNS acts as the distribution layer to the NetScaler cluster nodes. The scaling changes are propagated to the client by updating the domain name corresponding to the application. Currently, the DNS provider is AWS Route53.


In DNS based autoscaling, each NetScaler instance requires a public IP address.

How autoscaling works

The following flowchart illustrates the autoscaling workflow.


The NetScaler Console collects statistics (CPU usage, memory usage, throughput) from the Autoscale provisioned clusters at a time interval of one minute.

The statistics are evaluated against the configuration thresholds. Depending on whether the statistics exceed the maximum threshold or are operating below the minimum threshold, scale-out or scale-in is triggered respectively.

  • If a scale-out is triggered:

    • New nodes are provisioned.
    • The nodes are attached to the cluster and the configuration is synchronized from the cluster to the new node.
    • The nodes are registered with NetScaler Console.
    • The new node IP addresses are updated in DNS/NLB.

When the application is deployed, IPset is created on clusters in each availability zone and the domain and the instance IP addresses are registered with DNS/NLB.

  • If a scale-in is triggered:
    • The IP addresses of the nodes identified for removal are removed.
    • The nodes are detached from the cluster, deprovisioned and then deregistered from NetScaler Console.

When the application is removed, the domain and the instance IP addresses are deregistered from DNS/NLB and the IPset is deleted.


Consider that you have created an Autoscale group named asg_arn in a single availability zone with the following configuration.

  • Threshold parameter – Memory usage
  • Minimum limit: 40
  • Maximum limit: 85
  • Watch time – 3 minutes
  • Cooldown period – 10 minutes
  • Drain connection timeout – 10 minutes
  • TTL timeout – 60 seconds

After the Autoscale group is created, statistics are collected from the Autoscale group. The Autoscale policy also evaluates if any an Autoscale event is in progress and if an autoscaling is in progress, waits for that event to complete before collecting the statistics.


Sequence of events:

  • T1 and T2: Memory usage exceeds the maximum threshold limit.
  • T3 - Memory usage is below the maximum threshold limits.
  • T6, T5, T4: Memory usage has breached the maximum threshold limit consecutively for three watch time durations.

    • A scale-out is triggered.
    • Provisioning of nodes occur.
    • Cooldown period is in effect.
  • T7 – T16: Autoscale evaluation is skipped for this availability zone from T7 through T16 as the cool down period is in effect.

  • T18, T19, T20 - Memory usage has breached the minimum threshold limit consecutively for three watch time durations.
    • Scale-in is triggered.
    • Drain connection timeout is in effect.
    • IP addresses are relieved from the DNS/NLB.
  • T21 – T30: Autoscale evaluation is skipped for this availability zone from T21 through T30 as the drain connection timeout is in effect.

  • T31
    • For DNS based autoscaling, TTL is in effect.
    • For NLB based autoscaling, deprovisioning of the instances occurs.
  • T32
    • For NLB based autoscaling, evaluation of the statistics starts.
    • For DNS based autoscaling, deprovisioning of the instances occurs.
  • T33: For DNS based autoscaling, evaluation of the statistics starts.
Autoscale architecture for AWS