BDR 1.0

August 12, 2016/7 Comments/in Craig's PlanetPostgreSQL /by craig.ringer

I’m pleased to say that we’ve just released Postgres-BDR 1.0, based on PostgreSQL 9.4.9.

This release contains significant improvements to DDL replication locking, global sequences, documentation, performance, and more. It also removes the deprecated UDR component in favour of pglogical.

The release announcement on the pgsql-announce mailing list
Postgres-BDR 1.0 release notes
Installation instructions
Upgrade instructions for BDR 0.9.x users
Git repository

It’s taken a lot of work to get to this point. This release sets the foundation to port BDR to PostgreSQL 9.6 and to enhance its high-availability capabilities, and I’m excited to be pushing BDR forward.

About Postgres-BDR

Bi-Directional Replication for PostgreSQL (Postgres-BDR, or BDR) is an asynchronous multi-master replication system for PostgreSQL, specifically designed to allow geographically distributed clusters. Supporting more than 48 nodes, BDR is a low overhead, low maintenance technology for distributed databases.

7 replies

Abdul Yadi says:
August 13, 2016 at 8:34 am

Great work. So BDR can support more than 48 nodes, what is the maximum, roughly? Thanks.
Reply
- Petr Jelinek says:
  August 13, 2016 at 10:44 pm
  
  There is no fixed maximum of nodes, we use the 48 number mainly because that’s the maximum number of nodes with which we did extensive testing. The practical limits depend mostly on what your network and hardware can handle (BDR uses mesh topology so every node is connected to every other node).
  Reply
- craig.ringer says:
  August 16, 2016 at 2:02 am
  
  It works on anything from 2 up to the resources limit imposed by the network and hosts involved. There’s no fixed number. We use 48 because that’s what we tested to, but it’s a pretty arbitrary value.
  
  For a group of n nodes, each BDR node has 2(n-1) connections to other nodes – half inbound, half outbound. The total number of interconnections is (n-1)². Like normal PostgreSQL connections there’s some overhead for each BDR connection but the actual impact depends a lot on the workload. I would be surprised if collections of multiple hundreds of nodes didn’t work fine at least under light loads, though I couldn’t say whether we’d start hitting scaling limitations in various internal logic or not. It isn’t really a particularly interesting use case – for high node counts, I suspect one would be better off looking at a star topology with pglogical.
  
  If there’s appropriate interest and funding in future we may be able to introduce non-mesh topologies to BDR, which will alleviate the connection overheads and likely let us scale to higher numbers of nodes. For example, I can see a hub-and-spoke model where a small core of mesh connected nodes serve numerous satellite nodes being useful. This isn’t on the current roadmap, though, and we have a lot of other things to tackle first.
  Reply
JeffSogolov says:
August 13, 2016 at 2:43 pm

Super stoked about the project. Long time coming. Would love some more details on various patterns.
Reply
ltorokjr says:
August 15, 2016 at 3:11 pm

Fantastic?

Is there any documentation on how it behave in case of network partition and other important failure modes?

Was there any sort of testing conducted to verify that the implementation handles these cases gracefully/correctly? (I am thinking Jepsen)
Reply
- craig.ringer says:
  August 16, 2016 at 1:56 am
  
  The manual covers network partitions as they apply to global sequences and DDL locking, but doesn’t describe them in any dedicated section.
  
  A network partition is basically just an extended replication delay as far as BDR is concerned and isn’t any different to slow, lagging replication. There’s no global lock manager or global transaction manager to upset. The application must always be aware of the possible anomalies that can arise from asynchronous multimaster replication, whether or not the network is currently suffering a disruption.
  
  Essentially, BDR runs as if it’s always on a potentially incomplete or partitioned network. It doesn’t switch modes in any way. The failure mode is the normal case.
  Reply
- Simon Riggs says:
  August 19, 2016 at 9:04 pm
  
  I don’t recall seeing documented failure modes for any other database product, even Postgres core. Everybody’s assistance is welcome with this open source project, so contrasting failure modes against alternative products would be welcome.
  
  As I mention in a later blog, Postgres-BDR has been in production use now for 2 years, following formal testing.
  Reply

Want to join the discussion?
Feel free to contribute!

2ndQuadrant is now part of EDB

BDR 1.0

About Postgres-BDR

Leave a Reply

Leave a Reply Cancel reply

Support & Services

Products

Postgres Learning Center

About Us