A High availability architecture is increasingly demanded requirement for modern applications that want to minimize the impact of outages to business functionality.
The high availability is measured with a value below 100 which measures how much a service is available in percentual respect to operational time. For example, one availabilty of 99.99 means that the service should be available 99.99% of its designated operational time.
A concept similar to high availability is the redundancy where in this case, when there is some fault in one part of the system, there is any migration service, but the redundancy itself makes sure the availability of the service. It’s a stronger concept which aims to emphasize the concept of availability,
Returning the concept of high availability, it’is achieved generally with different approaches:
- Orchestrator. There is a software – like Kubernetes, swarm – that manages the running of the services and it makes sure that everthing is always up&running respecting the deploy requirements.
- Partitioning: It’s generally used in database, memory key store. In these case the databases is split in different partitions and every partition is replicated and distributed along the nodes of the clusters. For every replication it’s possible to elect some leader using different algoritm like raft for example in etcd.
- Software cluster. There is a software cluster installed that monitor continue the state of services intervening when one node of the cluster crashs or one service stop to work correctly In this scenario there is a time necessary for migrating or restarting the service, if necessary from one node to other of the cluster.
A orchestrator has the goal to make sure that any service, generally intented as process running, is running respecting the requirements defined, for example, in yaml configuration file.
Generally a orchestrator has a controller part, generally separated from data plane where the processes are running. Its main task is to make sure that the processes defined are running and scalable according the its definition.
Kubernetes is most famous container orchestrator that manage the execution of containers running inside pool of worker clusters providing network and storage services that permit to container to communicate beewten them, to be reachable from and to outside. and to have storage volumes – directory, block and file system volume – directly inside the containers.
The availability is provided maintaining running a configurable number of processes distributed to all nodes of the clusters. Different strategies are available for distributing the processes, called pods, along the workers nodes.
Following a kubernetes cluster architecture:
Other orchestrator like kubernetes are docker swarm, mesos,
A Orchestrator could also have the functionaly to run etl job, following a configured calendar, along distributed workers nodes.
In the database, generally no sql database, like mongo or cassandra, the replication provides redundancy and increases data availability. With multiple copies of data on different servers, replication provides a level of fault tolerance against the loss of a single node
Any copy of data can be spllited in multiple partioned or chunk for providing better perfomance in search and write query. Every partition can have a leader node, reponsible for write query, and different standby node that can be used for read query. If a partition, stored in a database server, goes down, it’s can be synched from the other. In this was this logic makes sure availability and scalability.
Mongodb and cassandra implement an architecure high availability splitting the database in different partitions and replicating them to different nodes.
Etcd only replicates the primary data and one leader is elected for the write query with the raft algoritm.
There are different type of cluster depending on the layer where it’s provides its functionality. The goal of this article is to organize and explain the meaning of all the possible type of software cluster commonly used in a infrastructure technology.
The most widespread software cluster are:
- Application cluster. The cluster is a software layer installed in all nodes partecipating in the cluster. It’s a resource manager that monitors and restart in case of need a set of depedent resources beewten them. Typical application cluster are Veritas Cluster,Oracle Cluster, PaceMaker cluster, etc..
- Web Application cluster. The cluster is provided by a application server like Tomcat, Jboss, WLS, glassfish, etc, that permits to view all the web application running in the nodes of the cluster as a single unit. Session persistent, services and messaging load balancing, fail over and replication data are the services commonly provided.
- Virtualization Cluster: The cluster is inside the virtual infrastructure like KVM or vmware and permits the automatic failover of a virtual machine from node failed to another live node.
- Database Cluster: The cluster permits to think the database managed from db engine running on more node as a single and logic data unit. In some scenario the physical data is one only(like Oracle RAC); in others the database is replicated between different nodes like Maria DB Cluster or clustering realized with drbd driver.
- Network Cluster: The cluster has the goal to think a router, firewall or load balancer, in general a network element, as a single logic unit. If the active node of the cluster goes down, the standby node becomes in a transparent way the active unit. No network outage is possible.
Let’s go to speak inside application cluster.
Application cluster provide high availability to applications restarting automatically the resources software managed from one node to other. Every resource is dependent from other and all together form a logical group that can be started as single unit respecting the resources dependencies.
When a node or a monitored application fails, the service is restarted automatically from cluster engine to other nodes of the cluster. The services organized in groups are restarted respecting the its dependencies. The cluster provides resource agents specialized for different job: mount file system, enable virtual ip address, application restart, etc.
A possible application cluster configured for managing a logical group in active/standby way is showed below:
Every time the logical group is migrated from one node of the cluster to other, this is what happens inside the scene:
- The drbd data volume is switched in primary state and a virtual ip is created.
- The file system is now mounted from the primary drbd volume.
- The NFS resource server is started on top of the other resources.
A logical group can be also configured in active/active way. Let’s suppose to have a database split in two different nodes. Two logical group can be created and each one of them containing the relative resource database. The two group must be always up: they are managed from the cluster in active/active way.
A free application cluster is pacemaker: you can find more information about it in this my article https://www.securityandit.com/system/pacemaker-cluster-with-nfs-and-drbd/. A commercial example is provided by veritas cluster from Symantec.
A classic problem that application cluster must resolve is the split brain issue. A split brain occurs when two independent systems configured in a cluster think to have exclusive access to a given resource, usually a file system or volume. This could bring to have a disruption of data if both nodes, for example, mount in write mode a remote volume configured in a external SAN/NAS.
Different approach must be implement in order to prevent this issue. Most of the solution aims to use two way as communication channel: the first is the network, the other is a volume present in a SAN/NAS remote reached by iscsi protocol (using a different network path) or by fiber channel.
Let’s go now to describe the web application cluster.
Web Application Cluster
The web application cluster is the one of best feature provided by application server to web application deployed on them. The most important services offered to web application are:
- Centralized deployment of web application.
- Session replication used to replicate the data session on all cluster nodes. If one node goes down, it’s possible to continue to browser inside the web application without losing the data session.
- Unified view of JNDI trees cluster-wide: it’s possible to find java object inside this tree running on all cluster nodes.
- JMS clustering: the queue and the topic used from web application and java client for message exchange are accessible by any node of the cluster.
- Centralized data source and jvm configuration widespread automatically on all cluster nodes.
- Load Balancing of messages of queue and topic.
- Load Balancing of EJB and RMI java objects.
A possible solution is showed in the image below:
As you can see the load balancing is performed by external system like haproxy (you should use nginx or apache): the http request are not balanced by the cluster. The cluster replicates the data session between the two nodes of cluster and provides a cluster queue where to put message and an external client can read without prior knowledge by number of cluster nodes. The discovery is automatically by multicast protocol.
If the one of node goes down, the tcp connection is balanced to new node from haproxy and the data sessions is always accessible because replicated from the cluster. The node down is trasparent for the java client.
The important thing to understand is that the cluster needed of a http load balancing behind it for balancing all the http traffic. The load balancing functionality provided by cluster is relative to jms messages or jndi lookup, ejb method or other stuff like that. The discovery of services like JMS queue/topic, jndi java object, etc, is implemented by multicast protocol.
All application server widepread in the market provide an administration server for managing all the cluster functionality. The GUI permit to configure all the feature of cluster services. As free product, I suggest to use wildfly (http://wildfly.org/), the new application server inherited from jboss.
The clustering functionality provided by the modern hypervisor like vmware vsphere or kvm aims to restart the virtual machine from one node failed to another node of the cluster. The restarting of virtual machine is trasparent and happens automatically when the physical node where the machine is hosted goes down.
The prerequisite for having this functionality is to have on all nodes of the cluster a data store shared where the virtual disk of virtual machine are stored. The switch is very slow respect to classic application cluster because the virtual machine must be rebooted in another physical node.
Virtual machines are dynamically allocated to any host in a cluster and can be migrated between them without service disruption. The migration can happen manually or automatically depending the load (cpu and ram) of node where the virtual machine is running.
All node of the cluster are able to reach the same vlan and data storage that can be configurable in a centralized way by a GUI. The best free solution available on Linux is oVirt on top of KVM. For commercial solution there is the vmware vsphere cluster.
The reference architecture in this case is the following ( I refer a kvm cluster managed by oVirt):
A functionality provided by these type of cluster is to migrate the virtual machine from one physical node to other without service disruption. The migration can happen manually or automatically depending the load (cpu and ram) of node where the virtual machine is running.
This type of cluster is not very optimal because this approach is not able to detect system operating or application issue: if a application goes in hang state, the virtualization cluster does nothing and the outage is not resolved.
This last problem is solved in application cluster by monitor scripts that frequently check the service availability and communicate to cluster agent if it’s the case to switch over all the resources to another node.
Another limit of this approach is not be able to manage the resources dependencies inside the virtual machine: it must be managed internally inside the system operating.
For all these limits, I prefer to use this type of cluster for the migration live service or for management scope, but not for trusting our service high availability. For this scope it’s better to have one application cluster or one database cluster that will explained in next paragraph.
Database cluster is a solution where more db instances form a single logic unit. With this approach it’s possible to store, update and retrieve data from multiple db instances running on different nodes.
Basically there are two different approaches for this type of software cluster:
- In the first solution the data volume containing the database are must reside on shared storage accessible from all the clusters servers. The shared storage is not mounted but managed by raw device file and accessible by iscsi or fiber channel protocol. A commercial solution that implements this logic is Oracle RAC.
- The second solution is an extension of replication functionality. The servers of the cluster are configured in a synchronous multi-master way where every data change is replicated to other. It’s possible to read and write to any cluster node. A good free solution is MariaDB Galera Cluster.
It’s important to understand that the database cluster is not data replication. In data replication there are different database instances that replicate the database in asynchronous way in master/master (if every instance is configured in write mode) or master/standby mode (if one only instance is configured in write mode). For Maria DB you can get all the information in this my article https://www.securityandit.com/security/how-to-set-up-mariadb-master-slave/.
The main mechanism used in replication is a file log where all updates to the database are written. Slaves read the binary log from each master in order to access the data to replicate. In MariaDB the file log is called binary log, in Postegres WAL files, in Oracle archive logs.
The cluster configuration is a active-active solution where all the instances are in write mode and are managed by a single instance of a running database server. All nodes appear identical, without lag and any update to any master is visible from other masters immediately.
In Oracle cluster (this is not free solution) the solution is more interesting: there is any replication approach, but all the instances of the cluster have access a single database generally shared in raw mode. The disks where the database resides are not mounted in any node of the cluster but shared from a common SAN using iscsi or fiber channel protocol.
After talking about application clusters, let’s move to a kind of cluster whose service is at the base of all the others
A network cluster composed by two network elements prevents a network outage in the event that a system becomes unavailable for software or hardware problem.
The network elements, Firewall or Load Balancing, are cabled by a dedicated interface used for configuration synchronization and for sharing the tcp connection state between all the cluster nodes. If one node of the cluster goes down, the network traffic is managed by the other. If the cluster is configurated for sharing the tcp connections, any outage is perceived from a user point of view.
Three solutions are possible in this type of scenario:
- One only node of cluster processes the network traffic at any given moment. One of the nodes is called primary, the other secondary. The primary is in active state, the secondary in standby ready to became active if the primary node goes down or some network problem happens in one of its interfaces. In any time one only node is in active state. This approach is the most widespread and implemented in Cisco Asa, F5 load balancing, Check Point (in Legacy High Availability), Pfsense cluster https://doc.pfsense.org/index.php/Configuring_pfSense_Hardware_Redundancy_(CARP).
- One only node of cluster processes the network traffic at any moment for a given vlan. Every vlan has a virtual ip address active in one only node of the cluster. The intra vlan communication is provided by a multicast protocol like CARP, VRRP or HSRP. Cisco router or catalyst implements this solution by HSRP protocol: http://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst3750x_3560x/software/release/12-2_55_se/configuration/guide/3750xscg/swhsrp.html
- This solution is more complicated for high load traffic necessary for its working. The cluster nodes are all in active state and every node is able to process any network packet. This is possible because all nodes share to same virtual ip address. This approach is implemented in Check Point in Load sharing multicast mode https://sc1.checkpoint.com/documents/R76/CP_R76_ClusterXL_AdminGuide/7292.htm.
A typical network cluster architecture is showed in the picture below (the picture is taken from https://doc.pfsense.org/index.php/Configuring_pfSense_Hardware_Redundancy_(CARP).
In the picture above, there is one pfsense firewall in active state and the other is in standby state. Every vlan has a virtual ip address that is active on Primary Firewall. The secondary standby node has all the virtual ip address in backup state. If the Primary server goes down, all the VIP of every vlans are actived on secondary node that becomes the new active firewall.
The most common solution is the first where the secondary node is in standby way until the primary fails for software or hardware fault. This is very simple to implement and respect to other no high load is necessary in order to manage it.
The high availability is a required requirement by modern applications that increasingly run critical services. As I explained, there are different approaches depending on the level where high availabiltiy is implemented.
Choose the right solution is the first step absolutely not to be mistaken: it’s not smart to use a virtualization cluster for reaching a high network availability because the service recovery times could not be sufficient for the managed service.
At the same time the new technologies as the cloud and the virtualization don’t resolve the underlying problem.
I hope this article has clarified several doubts on the subject.
Contact me for any clarification.