2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / Simon's PlanetPostgreSQL3 / Joins Don’t Scale!
Simon Riggs

Joins Don’t Scale!

December 22, 2015/0 Comments/in Simon's PlanetPostgreSQL /by Simon Riggs

“Joins Don’t Scale”. Well, that’s what I heard MongoDB said anyway.

My response was “Huh? Yeh, they do”. So what gives? Who is right? Why the mixup?

Well, first thing to realise is that the implicit topic we are talking about is massively parallel (MPP) databases, so what they are talking about is the scalability of a join between two tables that are spread across multiple nodes of an MPP database, such as Postgres-XL.

All of the MPP databases I’ve worked with allow you to spread data across multiple nodes, typically using a simple hashing method, using one or more columns. That’s often called the Distribution Key, Primary Index or similar. If you have two tables with the same DK, then the join happens evenly across all nodes. That scales very nicely, meaning that if you have twice as many nodes the join happens twice as quickly. No data is passed between nodes at all; well, apart from the result set going back to the user.

If the Distribution Keys don’t match then the join plan requires a step called a “redistribution”, which puts the data in the right place so it can be joined. That step involves re-hashing the data and then sending it to the correct nodes. Which then requires each node to send data to all other nodes. Which needs N*(N-1) sessions between nodes and can be a lengthy operation. That sounds like it’s a problem, but redistributions are still scalable, as long as your network doesn’t hit limits. The more nodes you have the less work each node has to do.

OMG! O(N^2) operation detected! PANIC?!? Redistribution is an extra operation that is best avoided, much the same as a large sort or sequential scan is best avoided. Having said that, I don’t see any need to fear them. The physics of databases means that every MPP database needs to cope with this problem, so its not isolated to XL or any other MPP database. Even MongoDB suffers from it – just because you have only one table doesn’t mean you never need joins.

Anyway, that’s what I believe is the source of this weird meme that joins don’t scale. There’s also been some strange questions asked about Postgres-XL, as if XL has an issue here that other databases don’t, which it definitely doesn’t.

Still worried? Don’t be. The main purpose of the Star Schema data model is to reduce Business Intelligence problems down to One Big Table plus Lots of Small Ones. In that case, we simply avoid the redistribution task altogether. In the case of a Star Schema, joins take place by copying the smaller tables in the join to all nodes.

Some MPP databases, such as Postgres-XL are designed with star schema joins in mind, providing the ability to replicate reference data tables to all nodes. If the data already exists on all nodes, then we can just use it immediately, speeding up smaller joins.

If you have a more complex data model then you may need to do redistribution joins, which happen pretty much the same way on every MPP database.

Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
The End of MongoDB 2UDA RC1 – New features in Orange (Part 1)
Scroll to top
×