2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / Pavan's PlanetPostgreSQL3 / Postgres-XL Scalability for Loading Data
Pavan Deolasee

Postgres-XL Scalability for Loading Data

October 24, 2016/2 Comments/in Pavan's PlanetPostgreSQL /by Pavan Deolasee

In my last blog, we looked at the benchmark results from bulk load test for a Postgres-XL database cluster. Using a 16-datanode, 2-coordinator cluster, running on EC2 instances, we could easily clock 9M rows/sec or 3TB/hr of ingestion rate. That’s a significant number in itself. In this blog, we’ll see if the ingestion rate is scalable in Postgres-XL. In particular, we’ll try to answer if adding more nodes to the cluster can result in a linear increase in performance.

Let’s use the same line item table from the TPC-H benchmark for these tests. We’ll increase the cluster size from 16 datanodes to 20 datanodes and then further to 24 datanodes. We’ll also repeat the tests with 1, 2 and 3 coordinators respectively. For all these tests, we are using i2.xlarge EC2 instance for a datanode and c3.8xlarge EC2 instance for a coordinator. Also, we are keeping the number of concurrent COPY processes to 24 for all these tests.

Sustained Bulk-load Rate

To measure if the bulk-load rate can be sustained over longer period of time and when the data significantly overflows the combined RAM on all machines, let’s measure time to load 3 billion, 6 billion and 9 billion rows on a 16-datanode cluster. The following chart shows the data load time and a steady ingestion rate even for a very large table.

screen-shot-2016-10-19-at-3-10-26-pmAt the end of loading 9 billion rows, the table is close to 1TB in size, but that does not seem to have any negative impact on the load rate.

Scaling with Coordinators

Let’s first see if adding more coordinators to the cluster has any positive impact on the load time. To measure the impact of additional coordinators, we keep the number of datanodes constant at 16 and vary coordinators from 1 to 3.
screen-shot-2016-10-19-at-3-07-57-pm

So there is a significant improvement in the ingestion rate when 1 more coordinator is added to the cluster. The rate remains almost the same for 3 coordinators, suggesting that the bottleneck probably shifts to some other place and hence adding the third coordinator does not help much.

Scaling with Datanodes

This is the most interesting bit that should tell us whether Postgres-XL cluster can scale or how well it can scale with addition of more datanodes, as far as data ingestion is concerned. Keeping number of coordinators constant at 2, we vary the number of datanodes from 16 to 20 and again to 24. The following results are obtained by loading 3 billion rows with 24 COPY processes split equally between the two coordinators.

screen-shot-2016-10-19-at-3-08-11-pm

These results are very interesting and they show a linear scalability in ingestion rate with addition of more nodes to the cluster. While we could load at a rate of 13.5M rows/sec or nearly 4.5TB/hr while using 24 datanodes, what’s more interesting to note is that per datanode ingestion rate stays almost constant at about 560K rows/sec. Of course, it may not stay the same if many more datanodes are added since at some point the coordinator may become bottleneck or having too many concurrent sessions at the datanodes may cause performance degradation. A nice thing about Postgres-XL’s architecture is that it allows us to add as many coordinators as we like and hence coordinator side bottleneck is easy to handle, assuming application can split the load between coordinators.

I believe we demonstrated a few important things with these benchmarks.

  1. Postgres-XL can scale linearly with increasing number of datanodes.
  2. For large and concurrent bulk load, coordinator may become a bottleneck and it’s advisable to have more than one coordinators for such workloads.
  3. Postgres-XL can sustain the ingestion rate for very large data. While in these tests, we could achieve the maximum rate of 13.5M rows/sec or 4.5TB/hr, the trend clearly shows that this is nowhere close to the upper limit of Postgres-XL’s ability to handle high concurrent ingestion load.

 

Tags: COPY, ingestion, load, performance, postgres-xl, PostgreSQL
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
2 replies
  1. adam
    adam says:
    October 28, 2016 at 4:45 pm

    Thank you for the post. Did you manage to identify the bottleneck after adding the 2nd coordinator?

    Reply
    • Pavan Deolasee
      Pavan Deolasee says:
      October 28, 2016 at 6:18 pm

      It’s quite evident that the single coordinator was a bottleneck. But after we added the second coordinator the bottleneck shifted somewhere else because adding third coordinator did not help much. It wasn’t clear if the AWS network became bottleneck at this point or whether the datanodes were fully saturated. The disk utilisation was quite high at datanodes with 2 coordinators, but it’s hard to completely rule out network saturation.

      At peak, datanodes were receiving data at about 500 Mbps speed. FWIW I’d observed a simple scp between AWS nodes could only go upto 560 Mbps.

      Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
pglogical 1.2 with PostgreSQL 9.6 support CHAR (16) – Another conference on the horizon but with a focus on bus...
Scroll to top
×