February 07 2023 03:34 am

An In-Depth Guide to Autoscaling on Heroku with 123 Dyno


Server scaling is the process of increasing compute power to accomodate increasing traffic and load, autoscaling dynamically adjusts compute power so you only pay for it during times it's needed. This is an in-depth guide to autoscaling Heroku to maximum effciency with add-on 123 Dyno whether your new or old the topic. Together these provide more autoscaling options and faster autoscaling than another other cloud provider, achieving rates like 800+ req/second per single Standard Dyno.

Autoscaling is a nuanced task and requires awareness of multiple layers of your stack to handle load. Dynos can be scaled both vertically and horizontally to handle incoming load past what one dyno can serve speedily. Vertically a dyno can expand both in computer power and memory to run an application, with low cost Eco dynos for prototyping and higher end Performance dynos with dedicated CPU and memory.

Aside from vertical scaling "making a server bigger", Heroku also comes with the now more standard method of load balancing requests across multiple computers "lets spin up a bunch of servers and distribute load across all of them". This is accomplished with autoscaling via Performance dynos with Heroku's 95th percentile (M95) method.

It autoscales up when server response times become too high and autoscales back down when load returns to normal around every 60 seconds. All that's needed is a few button clicks, a response time target and Heroku will then autoscale to keep the application at that response time target conservatively.

Extending and Optimizing Heroku Autoscaling with Add-On 123 Dyno

While Heroku's M95 method is easy to setup and works for the general web application, it leaves much to be desired for those who want more options in varying situations or optimized efficient autoscaling that fits the application.

123 Dyno extends Heroku with scaling via standard dynos, CPU, memory, queue times, and response times along with options for a 12x boost to autoscaling speed and pruned URL paths. In contrast to a single target value, 123 Dyno uses the concept of a Golidlocks range, autoscaling up when metrics exceed upper threshold and downscaling when metrics drop below lower threshold.

1. Choosing an Autoscaling Metric

Applications respond to load differently, some may be CPU bound while others are bound to memory. Choosing the correct autoscaling metric is crucial to ensuring that your stack is autoscaling efficiently to handle load. Autoscaling via a less optimal method can lead to slow server formation or piling resource costs on your bill. 123 Dyno provides a wide array of metrics to fit any web or worker dyno.

Response Times - 95th Percentile, Average, and Maximum

123 Dyno comes default with M95 autoscaling found in Heroku for the general application before any fine tuning. This method autoscales when M95 exits threshold values, this cuts out most statistically outlying response times and autoscales by the speed at which an application is responding to incoming load.

Autoscaling via average response time is available for applications that either have high varience in response time ranges (eg. file manipulation of different sized files), as well as maximum response time for applications that criticially need to autoscale to keep response times low (eg. live editing applications).

CPU Load - Average and Maximum

Response time autoscaling is dependent on a variety of different factors from overloaded downstream resources to cleared caches. For those seeking cost performance or maximum compute power for each dyno, CPU load/utilization can be used to autoscale. This method autoscales via the amount of CPU utilized instead of response time range.

A typically use case is for non-blocking IO applications like Node.js and Tornado that mostly wait on other services, the event loop architecture in these can achieve higher throughput at relatively low CPU cost. Bottlenecks typically exist in downstream resources if response times start to rise and throughput is low. Autoscaling by CPU ensures that the Dyno application layer of a stack scales both indepedently from other parts of the stack and at maximum efficiency.

Heroku measures CPU usage via a metric called load which closely corresponds to available CPU cores on underlying machines, Standard Dynos max out at about 4 load with 4 CPU cores, and Performance Dynos similarly at 8 load. 123 Dyno is able to be configured to autoscale to achieve maximum CPU usage unlike any other Heroku add-on.

Queue Times - Average and Maximum

123 Dyno queue times are representative of the Heroku connect time, the amount of time a request remains waiting before being processed by the dyno server. Higher queue times indicate that the downstream server, dyno handling the requests may be overloaded and cannot yet accomodate the incoming request. This style of autoscaling works well for frameworks and services that operate under the paradigm of concurrent connections and threaded processes. This includes frameworks like Flask behind Gunicorn and Ruby on Rails with easy to read syncronous code.

Memory Utilization

Memory utilization is the percentage amount of used memory divided by the total memory allowed for your Dyno plan. Dynos begin at 512MB of memory and increase to 14GB at the highest range. For applications and workers that are memory dependent this can be an excellent way of autoscaling an application. In particular its possible to autoscale a worker or web Dyno handling large files or otherwise and avoid out of memory (OOM) errors occuring to overages.

OOM errors can be avoided using memory based autoscaling in conjuction with other configurations like settings a servers max accepted file size, compressing images or files, and passing images into queues to be processed by memory autoscalable workers.

2. Pruning URL Paths

Response time base autoscaling is highly dependent on good response time data to know when to autoscale. 123 Dyno provides the option to prune out long running paths from autoscaling calculation to ensure that your application only autoscales when under sufficient load. For example, a server's file /upload path can throw off metrics as it is typically longer even though the server isn't under load, usually just waiting on a downstream file service. This can cause autoscaling to scale when an application shouldn't increasing your bill. URL blocklisting gives you the ability to cut this out from metric calculation, a feature that can save you tons.

3. Adjusting Autoscaling Speed

123 Dyno provides the option of adjustable autoscaling speed up to 12x faster with intervals at 5, 10, 15, 30 & 60 seconds for both upscaling and downscaling. This is important for applications with burst traffic as the default Heroku 60s autoscaling interval can leave your application overloaded and slowed down until the next interval. This allows for complete speed customization as well as options to balance robustness versus efficiency.

By default 123 Dyno comes standard with 60s intervals between autoscaling events for both upscaling and downscaling, these can be adjusted independenlty to fit your application. Upscaling speed can be adjusted to down to 5 seconds for a 12x speed boost ensuring server's can adapt to changing and increasing load faster, keeping your application snappy at all times. Downscaling in tandem can be slowed down to ensure that server resources stay online during periods of high traffic.

3. Monitoring Response to Load and Adjusting

The above ways to choose an autoscaling method are good guidelines however monitoring your application in practice is still necessary to ensure that resources are being used to the maximum extent possible, and reconfiguring if they are not.

You may find during load testing or production that an application may fit one method over the other especially as it changes to meet future demands. Your application may be CPU bound but later evolve into a memory dependent service for instance.

123 Dyno provides monitoring tools for all Dyno metrics along with path specific response times to quickly and easily find the metric that most directly correlates to required resource usage. Viewing this will allow you to both find what metric to autoscale by as well as a general idea of what normal application boundaries that metric is remaining in. You may find, for example, that an application's response times do not correlate to the CPU usage and decide to autoscale by CPU.

Further, you may find that your application layer of Dynos isn't being stressed at all via CPU or memory yet response times are still highly variable. This can be a sign of downstream resources that must be scaled up like databases, caches, or api services. In cases like these take a glance at other resources usage and scale accordingly.

4. Alerting and Production Monitoring

After initial setup you may want to monitor your dynos to ensure that they are provisioned to meet all incoming load. 123 Dyno comes default with load alerting for web dynos that exceed 4000ms M95 response times as well as CPU at 3.5 load or 87.5% utilization for Dynos up to Standard 2x. These alerts will provide you a sense of when resources are being stressed to review analytics and adjust dyno formations as necessary.

In Conclusion

Autoscaling an application is a nuanced task that requires a holistic view of your system, experience and continuous monitoring to ensure that your application is fit to the task. Heroku and 123 Dyno make this easy.

Often times it may appear that Heroku is costly but many services like Strapi for instance can achieve conservatively ~800 req/s on single Standard Dynos and likely much more for varying use cases. 123 Dyno provides the toolset, analytics, monitoring and alerting to be able to do this easily in PaaS fashion. It installs in a button click with no dependencies and allows intelligent handling of any load thrown at your application.

More Articles:

Read more about autoscaling by CPU.

Read more about 123 Dyno's 12x speed boost.