Creating applications that are both scalable and resilient is a crucial requirement. Applications must be scalable as user numbers change. In this context, scalability is capability of an application to maintain performance while servicing different numbers of users. Applications must run well whether supporting 100 users, 1000 users, 1 million users or even more. Most cloud providers automatically scale configurations to ensure smooth running as user numbers increase or decrease with minimum manual effort required to manage the scalability process. Unfortunately, for applications using on-premise infrastructure, this process can require significant extra effort, and is sometimes only triggered by performance degradation due to rapid, unpredicted increases in demand. Such changes to configurations often require a downtime period and require complex configuration changes depending on how many clusters or servers are involved.
This type of manual scalability is rarely enough for today’s applications. Time is money, and when our ecommerce application’s database or web server crashes, the application may not run properly (or may even be completely inaccessible). Significant losses due to lost sales or reduced productivity are a real concern. Applications must be engineered to be resilient enough to recover from certain types of failure and yet remain functional from a customer perspective. Such resilient applications will keep providing acceptable service even in the face of failures such as database or web server crashes.
Scalable & Resilient Application Infrastructure
Scalability must be taken into consideration when designing an application’s infrastructure. While it may not be related to code development activities directly, scalability is tightly related to application infrastructure design, and the application infrastructure design will certainly affect the stacks and techniques that will be used in the application development process.
Two different kinds of technique can be employed to achieve scalability:
1. Horizontal Scaling
In this method, as the workload increases, we add one or more additional servers to scale the application. Let’s look at the n-tier diagram below:
The first diagram above is a basic scalability infrastructure for web-based applications. We place a load balancer between the tier levels to distribute the workload among the servers, making the application infrastructure more resilient. In horizontal scaling, instead of upgrading our hardware server like memory, processor and storage to increase scalability level, we add more servers to our infrastructure as shown in the second diagram above.
2. Vertical Scaling
Whereas in horizontal scaling we add more servers to achieve scalability, in vertical scaling we elect to upgrade the server’s hardware rather than add more servers to our infrastructure. You can see this in the diagram below. In the first diagram we have basic scalability infrastructure with 32GB memory for each web server and application server. To increase the scalability level of the application, we upgrade the memory available to the web application and application server to 64GB for each web server and application server.
There are several other popular techniques to make infrastructure scalable & resilient:
This technique allows server duplication in our infrastructure, giving it more resources to handle more workload. One of the popular redundancy techniques for database servers is database replication. Database replication is a process to copy data from one database to another database server. There are two kinds of replication configuration: master-master and master-slave configurations.
A master-master configuration will keep each database server synchronised with each other for each update/insert operation in either database server. You can see this in the following diagram. In a master-slave configuration, the slave database will only be synchronised to the master database in response to any update or insert operation in the master database server. Any update or insert in the slave database will not reflected back to the master database. A master-slave configuration is usually used in operation-separation infrastructures, with the master database used for write-something operations and the slave databases for read-something operations.
Replication at database level will result in High Availability (HA) for application data. If any database server goes down, the data will still be available. This term is also applicable when building resilient applications. Operation-Separation Infrastructure (as we can see on pic 1.6) will increase application performance, and will ensure that under high transaction volume, the whole read operations are not getting delayed by any read operation (and vice-versa).
The other form of replication that can be used in an application server or web server layer is known as server clustering. A server cluster is a group of servers that communicate with one another, work together, perform the same task. A server cluster is seen as a single unit server by any outsider. We can see the sample of server clustering in the following diagram:
In the above example, the application infrastructure uses a load balancer to distribute workload among several Apache Web Servers, and there are multiple instances of Apache Httpd in 4 Apache Web Servers to handle the load of load balancing and to manage the tomcat instances. This architecture will route a request from client to the running instance only. If any instance goes down, this will have a minimal impact on service availability. Several cloud providers like AWS and Google Cloud Provider provide load balancer services. In AWS, two different kinds of load balancer services are provided - Nginx Plus and Elastic Load Balancer (ELB). AWS Elastic Load Balancer does not have as large a set of Layer 7 features (like HTTP/2, Web Socket and UDP) as Nginx Plus.
One crucial requirement to achieve resilience is redundancy in multiple Availability Zones. For example, in an on-premise infrastructure, servers must be located in separate datacentres. For critical or very demanding applications like ecommerce, infrastructure must be located in more than two availability zones and database servers should be located on different servers away from web and application servers.
Installing a Web or Application server on the same machine with the Database server will affect the high availability and resilience of the application. If we lose one server due to an electrical outage or hardware failure, we are completely lose all 3 resources at the same time (Web Server, Application Server and Database Server). We can see an example of resilient and scalable infrastructure in on-premise infrastructure that use 2 different data centres in the following diagram below:
2. Distributing Workload
Scalability, Resilience and High Availability can be hard to achieve in a simple way if there is no dedicated system that can distribute workload among the servers and manage a single entry point to the database server. This dedicated system can be a hardware (a Hardware Load Balancer or HLB) like F5, or a software load balancer like HAProxy, MaxScale, ProxySQL, MySQL Router, or Nginx.
Without a load balancer, we must maintain multiple database connections for more than one database server, and those database connections must be managed by the application. This results in a significant increase in code complexity. Scalable and resilient applications always require redundancy in database servers, web servers and application servers. We can see an example of how to use MariaDB, and MaxScale combined with a keepalived application to create scalable and high availability data infrastructure in image 1.9 below. To make the database infrastructure resilient and high availability, redundancy must be located in different datacentres from the application server or web application infrastructure.
In the diagram above, the database infrastructure has one database cluster configured with master-slave replication configuration. The cluster consists of 1 master database and 3 slave databases. The infrastructure also has 2 machines using Software Load Balancers (MaxScale), 1 machine active and 1 machine on standby as a backup machine. These two machines will be configured and managed by MaxCtrl that is installed on a different machine. The keepalived application will be installed on both machines where MaxScale is installed. Keepalived functions as a failover handler, and both MaxScale machine continuously broadcast their statuses. If the active MaxScale machine goes down and stops sending its status, the backup machine will take the role as active load balancer and all database connections will direct to the backup machine. Cloud providers like AWS is also provide Multi Availability Zone Deployment and failover support for DB instance like Oracle, PostgreSQL, MySQL and Maria DB (as well for their Amazon RDS services) to achieve High Availability Data.
Distributing workload for the web server layer or application server layer is almost the same as using a load balancer in database layer, as we can see in picture 1.7.
Resilient in Application Design
Providing a scalable and high availability infrastructure to host our web application is not enough to achieve resiliency in our application. We must design both our application and infrastructure to ensure resilience. There are several patterns to achieve resiliency in our application design, especially if we are going to implement a microservice architecture (as we will be discussing later in this blog). One of the patterns to achieve resiliency is known as client-side resiliency. This pattern is focused on protecting a remote resource's client (like a microservice call or database lookup) from crashing when the remote resource is down/unreachable or performing poorly/long response. There are four client-side resiliency patterns:
- Client-side load balancing
- Circuit breakers
Client-side resiliency patterns are located between a services consumer and microservices as you can see in the following picture:
1. Client-side Load Balancing
In microservice architecture, client-side load balancing will look up all of microservice instances from a service discovery agent, caching the physical location of the service instances from a service discovery into its services location’s pool. If there is more than one instance of a microservice, the client-side load balancing use a round-robin algorithm to choose one instance whenever a service consumer needs to call the service instance. Client-side load balancing will detect any error or problem (service unavailable or performing poorly) that occurs in a service instance; client-side load balancing will remove the service instance location in its pool to prevent any service call from hitting that service instance.
2. Circuit Breaker
This pattern is adopted from circuit breaker behaviour in an electrical infrastructure. A circuit breaker will protect all components located behind it from any damage due to excess current flow through the wire. In a microservice architecture, a circuit breaker will monitor for any remote services called. If the calls take too long waiting for the response (due to the service instance being down or performing poorly), the circuit breaker will intercede and kill the call. It will also prevent any future calls to that failing service instance. Time out configuration will be used by a circuit breaker to decide how long a service consumer should wait for a response before the circuit breaker intercedes and kills the call.
3. Fallback Processing
With this pattern, when a remote service call takes too long to respond, rather than throw an exception, the service consumer will execute an alternative code path. This may be looking for data from another resources or queueing the process for future processing if the failing service call is processing something else.
An example of fallback processing is in the new-arrivals service call in an e-commerce application. When a service consumer wants to return all goods arriving this week, this is called a new-arrivals service. If, for some reason, a service consumer waits too long for a service response, the service consumer will usually throw an exception. If we implemented fallback processing in this case, the fallback method will be executed after the amount of time assigned in the time-out variable. The fallback method will then invoke services from other resources to get all new arrivals rather than generating an exception.
With this pattern, each remote service call in a service consumer will have its own thread pool that is segregated from others service calls’ thread pool. Without this pattern, if a service consumer’s call to remote resources takes too long (or is even stuck waiting an infinite time for a service response) and this service call is running concurrently, this will lead to many threads unable to be released and re-used by other threads context. In other words, the service consumer’s resources (like thread availability in the thread pool) may run out, and this situation will impact other remote service calls or other functionality in service consumers. In the worst situation, this case could even down the service consumer instance or stop processing requests. In a monolithic application, this case could cause the whole application to stop processing requests or fail as resources run out. However, if we implemented bulkheads in our service consumer, other remote service calls or functionalities will not be affected by that failing service call.
I. Designing Resilient Applications with Client-Side Resiliency Pattern
Before we discuss further how to make our application resilient, we need to look at an example scenario of inter-service communication as shown here:
Both application A and application B are making remote service calls to service A. Service A gets the data from its database and from a service call to service B and service C as well. Service B gets data from its database, whereas service C gets its from its database and from service call to third-party services that are hosted in the cloud. On the weekend, the cloud provider upgrades their infrastructure and install several patches, and on Monday morning service C starts to call third party services. Unfortunately, service C experiences long waiting times for service responses. The developer of service C never anticipated that service calls to the third party would slow down this much.
Like the developer of service C, the developer of service A also never anticipated service calls to service C slowing so much. Both developers wrote their code to handle database and service calls to the third-party service occurring within the same transaction. So, when third party services start running slowly, not only will service C’s thread pool be running out to back up a service call to third party services, but the database’s connection pool quickly becomes exhausted as these connections are held open because the calls to the third party services never complete. Eventually, service C will start running out of resources and may even stop processing requests. The same thing will happen to Service A, as it too runs out of resources and stops processing request from application A and application B.
The whole scenario above could be avoided if we implemented circuit breakers and fallback processing at each point where we call remote services within service consumer. Look at the following picture:
If the third-party services have the same experience as in the previous scenario, they will either run slowly or perform poorly. With a circuit breaker implemented in service C, it will never call the third-party services directly. Instead, when service C needs to call third party services, the service call process will be delegated to a circuit breaker. The circuit breaker will take the call and wrap it in a thread that is independent from the originating caller. The originating caller is no longer directly waiting for service call to complete. Instead, the circuit breaker will monitor the calling process and can kill the call if the thread that is wrapping the call runs too long. If the thread wait time does not exceed the timer (time-out variable), the service response will be sent to originating caller. But if the thread runs too long due to the third-party services either running slowly or performing poorly and the timer runs out, the circuit breaker will count this as happening for a certain amount of time, If these failures occur enough, the circuit breaker will trip the call and fallback processing will be executed to get an alternative solution for this failure rather than throwing exceptions. With fail-fast from a circuit breaker, this failing call will not be eating up resources like threads. There are several libraries we can use to implement circuit breakers, fallback processing and bulkheads in our project, like the Hystrix library that we use in spring-based projects.
In addition to implementing circuit breakers, fallback processing and bulkheads, we need to implement client-side load balancing if we have more than one instance for a microservices (and, of course, to achieve high availability we need to deploy more than one instance for each microservice). In microservice architecture, the Client-side load balancing looks up all service instances from a service registry and will cache the physical location of service instances within its local cache. Client-side load balancing will detect if there are any instances down, and will remove it from the cache to prevent the instance address being hit by any service consumer. So, with client-side load balancing, if we have 10 instances and 2 instances experience poor performance or failure, the services can still process the request from any service consumer. Client-side load balancing uses a round-robin algorithm to distribute the workload. Client-side load balancing is located within the service consumer. To implement client-side load balancing in spring-based projects, we can use the Netflix ribbon library.
II. Designing Resilient Application with Message Oriented Middleware
In addition to implementing the client resiliency pattern, we can use a message broker to implement asynchronous communication between services to achieve even more resiliency in our application. When a service consumer needs to invoke a microservice, it does not directly call the microservices but publishes the request into the message broker channel. The microservices subscribe to the channel to process that request and publish the result to the message broker. In turn, the service consumer subscribe to the channel to get the resultant message. Each service consumer has one channel to publish and subscribe the message. Each request and each response has a correlation id to identify the request –response.
The message broker can persist the message (request-response), so if after the service consumer publishes the request successfully and the microservices is goes down, the request itself still exists in the message broker and will be ready for further processing once the microservice is back up.
In this article we have discussed the steps necessary to make an application scalable and resilient from the point of view of infrastructure and application design. Both of these aspects need to be taken into consideration when building a resilient application, and we should ensure that both are addressed. Not doing so introduces the risk of producing a less than resilient product.
Syarif Hidayat - Analyst