2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / Pavan's PlanetPostgreSQL3 / Sharding: Bringing back Postgres-XL technology into core PostgreSQL
Pavan Deolasee

Sharding: Bringing back Postgres-XL technology into core PostgreSQL

March 15, 2016/4 Comments/in Pavan's PlanetPostgreSQL /by Pavan Deolasee

Sharding or horizontal scalability is a popular topic, discussed widely on PostgreSQL mailing lists these days. When we started the Postgres-XC project back in 2010, not everyone was convinced that we need a multi-node PostgreSQL cluster that scales with increasing demand. Even more likely, we, the PostgreSQL community, were skeptical about whether we have enough resources to design and implement a complex sharding architecture. Also, there was so much work that we could have done for vertical scalability and the obvious choice for many was to focus on the vertical scalability aspects of performance.

With great work that Robert Haas, Andres Freund and others have done in the last few releases, law of marginal utility is now catching up with the vertical scalability enhancements.  Now, almost 6 years later, it’s becoming quite clear that horizontal scalability is a must to meet the demand of the users who very much like to use RDBMS for managing and querying their terabytes of data. Fortunately, Postgres-XC and Postgres-XL community have already proved the merits of a sharding solution based on the chosen design. These products have also matured over the last few years and are ready for public consumption. Of course, in the long run we all, including the Postgres-XL community, would like to push most of these features back into core PostgreSQL. But this is a multi-year, multi-release effort and until such time, the Postgres-XL community is committed to support, maintain and even enhance the product.

Here is a short list of features, in no particular order, that IMHO any sharding or horizontal scalability solution should support.

  • Transparent and efficient placement of data for query optimisation
  • On-demand addition and removal of nodes
  • Scalabale components
  • Guaranteeing transaction ACID properties all the time on all the nodes
  • Parallel execution of queries on nodes
  • Offloading execution to the remote nodes
  • Connection pooling between various nodes
  • UI to configure, monitor and manage the shards and other components
  • High availability
  • Node membership

Anything else?

A few of these features have already made it to core PostgreSQL, may be in a different form. Some of them may have been inspired by the work that was previously done as part of the Postgres-XC or Postgres-XL projects. But there are a whole bunch of things that we need to work on and get in a committable shape. I’m sincerely hoping that we start working in that direction sooner than later, leveraging the knowledge and the technology developed as part various sharding solutions.

Tags: horizontal scalability, postgres-xl, PostgreSQL, sharding, vertical scalability
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
4 replies
  1. Jamey
    Jamey says:
    March 15, 2016 at 1:16 pm

    I suggest a new data type concatenating (timestamp + UUID) used for independent key generation on different nodes. MongoDB (and others) use this approach because it provides naturally sorting, unique keys without needing a coordinator.

    I also suggest you relax the 100% ACID compliance requirements, or at least make it configurable. Per the CAP theorem, that requirement means that no transaction can be commited if any node is unavailable.

    Finally, connection pooling between nodes sounds like you have already commited to an architecture in which the client connects to each node vs. an architecture in which a client connects to a query-coordinator, which connects to each node and aggregates the results. I have no idea which is better, but it seems like that should be an explicit decision rather than an implied one.

    Nice work! Cheers.

    Reply
    • craig.ringer
      craig.ringer says:
      March 15, 2016 at 2:44 pm

      I have to agree re “Guaranteeing transaction ACID properties all the time on all the nodes”. That’s very useful and very important for transparently moving an application from a single node to a horizontally scaled cluster, but it has limits and won’t meet all needs. XL is a fairly tightly coupled cluster system that favours transparently “just working” at the cost of latency- and partition-tolerance.

      That’s why BDR implements a different strategy, with asynchronous multi-master replication. It doesn’t have the other pieces of the puzzle though – pooling, cross-node query execution, sharding, etc – and there are some complexities when it comes to doing those correctly in an asynchronous, loosely coupled system.

      There’s value to both approaches, but XL meets the great majority of needs. Despite having been heavily involved in BDR development I spend quite a bit of time trying to convince people they don’t need BDR or shouldn’t use it because they just see “multi-master, we need that”. They don’t consider the complexities of asynchronous multi-master replication when it comes to consistency, lost-update issues, etc. People tend to assume they can just point their app at multiple nodes and expect things to work. With BDR that’s very far from the case, apps need to be very aware of the replication/clustering platform and behave differently to how they would on a standalone server.

      With XL what you expect is what you get – it “just works”. It’s what most people who need scale-out should be looking at using.

      Reply
    • Simon Riggs
      Simon Riggs says:
      March 21, 2016 at 5:39 pm

      “Per the CAP theorem, that requirement means that no transaction can be commited if any node is unavailable.”
      XL has High Availability so there is no single point of failure, so CAP doesn’t cut in easily.

      “Finally, connection pooling between nodes sounds like you have already commited to an architecture in which the client connects to each node vs. an architecture in which a client connects to a query-coordinator, which connects to each node and aggregates the results. I have no idea which is better, but it seems like that should be an explicit decision rather than an implied one.”
      You’ve interpreted the architecture incorrectly, it doesn’t work like that. Also, the architecture is based upon explicit decisions, this wasn’t hacked together quickly, its a trusted design.

      Reply
  2. Brian
    Brian says:
    May 6, 2016 at 10:00 am

    “XL has High Availability so there is no single point of failure, so CAP doesn’t cut in easily.”

    Can you explain this? Because you’re claiming both consistency and availability which is definitely within the domain of the CAP theorem.

    Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
And Barman 1.6.0 is out! Tables and indexes vs. HDD and SSD
Scroll to top
×