In this article I would like to share my experience of these last years in the management of microservices architecture making a comparison with service oriented architecture, called commonly SOA, through the point of view of service operation, coherently to my experience.
I’m very passionate about the system and network concepts behind the scenes – kernel namespace, onion file system, network tunneling, iptables rules, etc. – involved in the implementation of microservices, but I will try not to be influenced from this passion trying to be as objective as possible.
I will give some tips and suggest some tools with the aim of making the management of a microservice infrastructure as effective and robust as possible.
Service oriented architecture
Service oriented architecture, commonly called SOA, is a way to design and implement services where the business functionality is achieved calling distributed web service provided by software packages normally deployed inside application server – like tomcat, jboss, wls, etc..
There are been a evolution in the communication protocol used in this type of infrastructure. While In the past, the server web services used to communicate with its clients over SOAP protocol, on top of HTTP or SMTP, in these last year the rest approach is becoming more popular for its simplicity and lightness even if offers less functionality respect to SOAP,
It’s possible to decouple the single pieces of the architecture, designing solutions more available and scalable, by load balancer, for syncronous transactions, and message broker or no sql database, for asyncronous transactions. The use of application cluster must also be considered for designing architecture both for active-active or active-standby solutions.
Even if it’s possible to split the software in different packages – war or ear – deployed in different virtual machines, this approach has a natural limit because, for avoiding confusion in the management architecture, is better not to increase too much the number of type of virtual machines.
Consequently, the single packages deployed on application server tend to become too big and it leads to issue in the scaling of the service: if a single part of the package has poor perfomance, it’s necessary to scale all the package adding new virtual machines. The scaling approach is too heavy, expansive and slow.
The big size of the packages leads to centralize different functionality and this means that a fault in a part of one sub system can have impacts in all the service: It often happens that the entire functionality of a system can be compromised by an issue related to a small part.
The management is very simple. Looking at the web application server log of few systems it’s possible to understand what’s happening. It’s simple to control the network flow between the systems but it needs to manage application servers, that in some case are very light like tomcat and jboss, in the other cases are very complicated and heavy.
Regarding the database, it’s generally shared by all the web applications in one big database or schema. This approach can have perfomance issues because there are a lot of sessions connected to same database that becomes very big and heavy.
However there are benefits to have a big but unique database and they are related to the implementation of disaster recovery procedure because the data recovery point is easily determinated respect to case to have the data distributed in different databases, where is more difficult to syncronize all the data at the same recovery point in time.
If we could break these big and heavy services in more smaller light and manageable micro services, without increasing too much the number of virtual machine necessary for that, we would have resolved a lot of above issues.
Fortunally, it becomes possible with the advent of containers that has permitted to run web services in isolated sandbox, by kernel namespace features, running inside the user space of the system operating with a context switch very fast respect to a hypervisor that run virtual machines.
With containers it’s possible to split the components of the web application into smaller parts lighter and faster manageable without using heavy application server but simpler web framework like spring boot, go server web or node js server. The solution software can become poliglot even if the experience and knowledge of the developers and the manageability of the code do not allow it easily.
A lot of limits of SOA architecture are outdated and improved as long as the containers respect good development practices – lightness, resilience, replicability, speed, monitorability and security – on which I wrote this interesting article: Best Practises for designing docker containers.
One one major benefits is the possibility to scale only the parts more perfomance sensible leading to optimize better the hardware resources respect to SOA approach. The scaling up and down is faster than SOA because the containers are very light and so the automation of the delivery process can be easily implemented. It’s easier to implement a content delivery pipeline, with jenkins or ansible for example, that deploys, for example, packages using helm software management
The deploy process is very fast and, if well planned, is without outage because is done on light processes that are enabled automatically in the balancing by the container orchestrator. For examples, kubernetes implementes these features by kubernetes readiness and liveness probe.
The usage of heavy application server is not encouraged because the container must be small as possible, but a container orchestrator must be used and kubernetes has become the standard de facto. In the past I tested docker swarm with very bad results (I wrote these two articles concerning the differences between them: Docker swarm Vs Kubernetes part 1 and Docker Swarm Vs Kubernetes part 2 ).
A container orchestrator, like kubernetes, that provides availability and scalability to pieces of the software, avoids to use of application cluster that become not more useful. It’s also possible to avoid the use of reverse proxy and load balancer using the functionality provided by the orchestrator. For example, in kubernetes, it’s possible to expose the services directly to outside by ingress, also with ssl offloading, and use the kubernetes service for internal load balancing
For well managing this type of architecture is necessary to have a good basic network knowledge because the network traffic into the mesh network becomes difficult to control and understand. Let’s make an example that shows as it’s very increased the number of http requests involved in a web service request in a mesh network of a microservice architecture, showing a http get request managed by a application server in SOA and following the same request in the microservice architecture where I suppose to have split the applications in only two pieces:
There is an increase of 300% of http requestes involved respect to SOA. Under the hood the things are more complicated because every http request is managed by a virtual service implemented generally with iptable rules. In other words, it’s more difficult respect to SOA the control of the network traffic.
It’s possible to use a mesh network control like istio that helps to monitoring all the http traffic inside the mesh in ingress and egress chain. Istio is a good way to monitor and control a microservices architecture but for me it’s very challenge to run it in a Production environment.
Remember that the containers running inside the cluster, like kubernetes for example, are reachable by a virtual service that makes simple round robin balancing, and it can lead to issue to manage some type of containers that are not stateless for example. In these case, a way to manage these type of services must be taken in consideration, as I made, in my case, using a haproxy for balancing the http traffic to backends stateful pod running inside kubernetes. Following architecture with this haproxy:
I described all the details in this article Haproxy for Service Discovery in Kubernetes.
This could not be enough if it’s necessary, for example, discovery not only the ip addresses of all the pods, but also the api provided by every container. In these case, a discovery service can be used and for that I suggest to implement it by a etcd cluster inside or ouside the kubernetes cluster.
Another disadvantage is concerned about a hard troubleshooting because, in addition to complex network flows, there are a lot of log files to analyze. A solution for managing better that is to collect by a syslog server all the logs in a single place for better analysis.
A monitoring solution, elasticsearch or prometheus based, must be used not only for the kubernetes metric, but also for the service metric useful for business or monitoring analysys. For having metric for every pod or containers, it could be necessary to install into every pod a sidecar for only collecting all the logs of the pod, by logstash for example, sending them to an external elasticsearch/prometheus.
Regarding the database, it’s generally split in different sources because every microservice could have its data schema in a different engine. This can lead to have better manageability because there are a lot a little database, but it could be a problem in case of disaster recovery because could become very difficult to have a single restore point.
Moreover, we need to remember that the containers are running in a ephemeral file system, destroyed every time the container is restarted or moved from one node of the cluster. It’s necessary to configure network data storage – like glusterfs, nfs, ceph, etc – in order to have persistent volume storage to mount inside the containers for the data that cannot be lost. This complicates the infrastructure because an external network cluster storage must be configured and managed.
Another decision to address is if the database must be external to containers cluster, or configured inside using the features provided by the orchestrator software. For example, kubernetes provides the stateful set that gives the chance to configure a database service in high availability running inside the platform. See this example for that: https://kubernetes.io/docs/tasks/run-application/run-replicated-stateful-application/. I prefer, if it’s possible, to put in virtual machine external to kubernets/swarm cluster all the databases – mariadb, postgres, mongo, etcd, redis – and configure it in high availability following the official configuration.
In the end, a microservice software is running in a infrastructure like that where the docker nodes can be thought as the worker node of a kubernetes cluster.
I can say, for my personal experience, that a microservice architecture permit to give value to managed service in term of scalability, availability, elasticity and robustness, but, as I demostrated, it’s very challenging from management point of view.
The benefits are not immediately and the most important thing is to have with development team the same vision and a synergistic approach to creating the best possible solution.