Discover how BDR and Kubernetes allow you to reach very high uptimes in a year for a database solution thanks to the fast failover capability. Watch the demo!
In my previous article we went through the deployment of a BDR database in a Kubernetes cluster using our Cloud Native BDR Operator, and in particular we focused on declarative configuration and multi-master capability. We demonstrated how it is possible to create a 3-node BDR group in a few dozens of seconds and showcased DDL replication as well as writes on multiple nodes.
This post is about high availability, intended as one of the fundamental components of a system’s business continuity. High Availability is about restoring a service after a failure in the shortest amount of time and is usually measured in terms of recovery time objective (RTO). Thanks to its self-healing capabilities, Kubernetes is conceived and well suited for high availability of a given service.
High availability with PostgreSQL
Normally in the database sector, high availability is associated with the concept of read-only or passive replicas. For example, with a single primary database system like PostgreSQL, we rely on standby servers – which are usually kept synchronised through physical streaming replication for Write Ahead Log (WAL) shipping. In case of failure, one of the standby servers is selected and promoted to primary. Technically, this operation requires the standby to exit recovery mode and start serving write operations – thus it might not be immediate. Consider for example a PostgreSQL cluster with a very high workload where standby servers have a natural lag when it comes to replaying REDO log data contained in WAL files.
Also, it is important that cluster manager systems (like repmgr or Patroni, to name a few) ensure that the old primary server is down and that applications are correctly handled during this process, which is commonly known as failover. Failover requires proper monitoring and can be fully automated (including detection), automated with manual trigger or entirely manual.
Even though implementation details may change, failover procedures in single primary database management systems like PostgreSQL require a transition from recovery mode to primary (promotion) that is not always immediate and deterministic.
High availability with BDR and Kubernetes
On the other hand, a technology like BDR allows us to implement multi-master architectures. Consider for example the WriteAnywhere architecture, available with our operator for Kubernetes, in which a BDR group can have 3 or more masters and also take advantage of two services for your Cloud Native applications: any random server or the selected “Lead master”.
As mentioned earlier, Kubernetes provides an entire framework specifically designed for high availability. We have programmed our operator to integrate itself with the Kubernetes API and to properly react after voluntary/involuntary disruptions that involve a BDR group (self-healing).
For example, in case of failure of a Pod of any BDR node, Kubernetes removes that specific endpoint from all services so that applications won’t use it. Moreover, in case of failure of the lead master, Kubernetes immediately transfers the lead master role to the next available BDR node.
The added value of BDR is that, as you have probably already noticed, there is no promotion involved as all nodes are always active and accepting write operations. This operation is instantaneous, a matter of milliseconds in case of disruption of a specific Pod. This is the reason why with BDR we use the term fast failover – or even rapid switchover to emphasise on the fact that the operation is primarily a switch without promotion.
It is worth noting that our CI/CD pipeline has several End-to-End tests for the operator, and one of them involves systematic measurement of fast failover performance: if, after a Pod is killed, failover does not happen in less than a second, the test fails.
Of course, depending on how you have configured your liveness and readiness probes as well as timeouts in your Kubernetes cluster, different disruptions like worker node failures might have slightly higher recovery times.
In the following “Hard to kill” video I will go through a demonstration of self-healing capabilities of a BDR database in Kubernetes, measuring the high availability of the cluster from the applications point-of-view.
I will use “kind” (Kubernetes IN Docker) on my laptop.
The failure that we have selected is a common one in database context: issue on a persistent volume, which becomes unusable.
Our BDR operator allows us to annotate a PVC as unusable: we will use this technique to simulate this kind of failure in our test. Specifically, we will simulate such a problem on the pod where the lead master is running. Then we will proceed by deleting that pod.
Our Kubernetes and our BDR operator should detect this issue and react by routing the lead-master service to the next BDR node, parting the deleted node from the cluster and creating a new pod to restore the desired state of 3 masters. In a few words, issue a fast failover, or if you prefer a rapid switchover.
The HTTP load generator should be transparently redirected to the new lead-master.
As a final step, we will execute a query in the database that reports the biggest lag from the previous record in the table, using the timestamp. This is a pessimistic estimation of the downtime experienced by the frontend application.
We will also make sure that the self-healing process completes through the restoration of the desired configuration of 3 masters, showing how Kubernetes promptly detects change of status and reacts by correctly updating the service used by the applications.