BDR 1.0
I’m pleased to say that we’ve just released Postgres-BDR 1.0, based on PostgreSQL 9.4.9.
This release contains significant improvements to DDL replication locking, global sequences, documentation, performance, and more. It also removes the deprecated UDR component in favour of pglogical.
- The release announcement on the pgsql-announce mailing list
- Postgres-BDR 1.0 release notes
- Installation instructions
- Upgrade instructions for BDR 0.9.x users
- Git repository
It’s taken a lot of work to get to this point. This release sets the foundation to port BDR to PostgreSQL 9.6 and to enhance its high-availability capabilities, and I’m excited to be pushing BDR forward.
About Postgres-BDR
Bi-Directional Replication for PostgreSQL (Postgres-BDR, or BDR) is an asynchronous multi-master replication system for PostgreSQL, specifically designed to allow geographically distributed clusters. Supporting more than 48 nodes, BDR is a low overhead, low maintenance technology for distributed databases.
Great work. So BDR can support more than 48 nodes, what is the maximum, roughly? Thanks.
There is no fixed maximum of nodes, we use the 48 number mainly because that’s the maximum number of nodes with which we did extensive testing. The practical limits depend mostly on what your network and hardware can handle (BDR uses mesh topology so every node is connected to every other node).
It works on anything from 2 up to the resources limit imposed by the network and hosts involved. There’s no fixed number. We use 48 because that’s what we tested to, but it’s a pretty arbitrary value.
For a group of n nodes, each BDR node has 2(n-1) connections to other nodes – half inbound, half outbound. The total number of interconnections is (n-1)². Like normal PostgreSQL connections there’s some overhead for each BDR connection but the actual impact depends a lot on the workload. I would be surprised if collections of multiple hundreds of nodes didn’t work fine at least under light loads, though I couldn’t say whether we’d start hitting scaling limitations in various internal logic or not. It isn’t really a particularly interesting use case – for high node counts, I suspect one would be better off looking at a star topology with pglogical.
If there’s appropriate interest and funding in future we may be able to introduce non-mesh topologies to BDR, which will alleviate the connection overheads and likely let us scale to higher numbers of nodes. For example, I can see a hub-and-spoke model where a small core of mesh connected nodes serve numerous satellite nodes being useful. This isn’t on the current roadmap, though, and we have a lot of other things to tackle first.
Super stoked about the project. Long time coming. Would love some more details on various patterns.
Fantastic?
Is there any documentation on how it behave in case of network partition and other important failure modes?
Was there any sort of testing conducted to verify that the implementation handles these cases gracefully/correctly? (I am thinking Jepsen)
The manual covers network partitions as they apply to global sequences and DDL locking, but doesn’t describe them in any dedicated section.
A network partition is basically just an extended replication delay as far as BDR is concerned and isn’t any different to slow, lagging replication. There’s no global lock manager or global transaction manager to upset. The application must always be aware of the possible anomalies that can arise from asynchronous multimaster replication, whether or not the network is currently suffering a disruption.
Essentially, BDR runs as if it’s always on a potentially incomplete or partitioned network. It doesn’t switch modes in any way. The failure mode is the normal case.
I don’t recall seeing documented failure modes for any other database product, even Postgres core. Everybody’s assistance is welcome with this open source project, so contrasting failure modes against alternative products would be welcome.
As I mention in a later blog, Postgres-BDR has been in production use now for 2 years, following formal testing.