2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / 2ndQuadrant3 / Column Store Plans
Álvaro Herrera

Column Store Plans

April 25, 2016/5 Comments/in 2ndQuadrant, Alvaro's PlanetPostgreSQL, PostgreSQL /by Álvaro Herrera

Over at pgsql-general, Bráulio Bhavamitra asks:

I wonder if there is any plans to move postgresql entirely to a columnar store (or at least make it an option), maybe for version 10?

This is a pretty interesting question. Completely replacing the current row-based store wouldn’t be a good idea: it has served us extremely well and I’m pretty sure that replacing it entirely with a columnar store would be disastrous performance-wise for OLTP use cases.

Some columns

Some columns. Picture courtesy of Yiming Sun on Flickr

That doesn’t mean columnar stores are a bad idea in general — because they aren’t. They just have a more limited use case than “the whole database”. For analytical queries on append-mostly data, a columnar store is a much more appropriate representation than the regular row-based store, but not all databases are analytical.

However, in order to attain interesting performance gains you need to do a lot more than just change the underlying storage: you need to ensure that the rest of the system can take advantage of the changed representation, so that it can execute queries optimally; for instance, you may want aggregates that operate in a SIMD mode rather than one-value-at-a-time as it is today. This, in itself, is a large undertaking, and there are other challenges too.

As it turns out, there’s a team at 2ndQuadrant working precisely on these matters. We posted a patch last year, but it wasn’t terribly interesting — it only made a single-digit percentage improvement in TPC-H scores; not enough to bother the development community with (it was a fairly invasive patch). We want more than that.

In our design, columnar or not is going to be an option: you’re going to be able to say Dear server, for this table kindly set up columnar storage for me, would you? Thank you very much. And then you’re going to get a table which may be slower for regular usage but which will rock for analytics. For most of your tables the current row-based store will still likely be the best option, because row-based storage is much better suited to the more general cases.

We don’t have a timescale yet. Stay tuned.

Tags: column store, community, development, open source
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
5 replies
  1. vdp
    vdp says:
    April 25, 2016 at 3:22 pm

    Isn’t that essentially what cstore_fdw already provides ? Or does its fdw nature make it hard to optimize ? What about a table with some fields stored in colums and the other fields stored in rows ?

    Reply
  2. Pete
    Pete says:
    April 26, 2016 at 1:14 am

    Oh man, what a tease. I’ll just sit here daydreaming about an optimized columnar store integrated with parallel query…

    Reply
  3. Anon
    Anon says:
    May 6, 2016 at 10:37 am

    So I watched Mark Wong youtube clip summarising patched and CS comparisons..

    From what I could tell, CS1 ans CS2 were 3-4x faster for queries. XL was 160 faster than PG. so therefore XL about half the speed of CS DBs?
    I also assume that XL was based on 9.5? and doesnt have any of the 9.6 perf improvements which looks like 100% read improvement approx and 2 CPU parallel query also looks like a fairly linear 100% gain as well (not sure if XL would improve the stats if it did have 9.6 perf code in it?)

    If the 9.6 code would improve XL.. and with similar perf gain e.g. 100% gain, then it would be similar to CS for queries would it not?

    So a Column store only benefit would be the write benefits..
    So is there much benefit of a column store? wouldnt focus on improvement PG as it is be a better option?

    Also, not sure I understand the full PG history, but if Cstore_FDW already exists.. why not incorporate that and just improve that rather than reinvent the wheel? I assume it may not be easy to utilise SIMD but surely tweaking cstore is better than from scratch, especially from a time to market – well get it into a release perspective..

    I could be well off, and since im not technical, any insight, corrections and thoughts would be appreciated 🙂

    Reply
  4. Dave Sisk
    Dave Sisk says:
    June 26, 2018 at 9:20 pm

    I’ve worked with several of the columnar analystics RDBMS’s…namely, Vertica (the leader in this space, and an off-shoot of the academic work done on the CSTORE project…also very Postgres-like on the surface…*hint hint*), Infobright (MySQL variant), Amazon Redshift (aka ParAccel, a fork of Postgres), and MariaDB ColumnStore (now called MariaDB AX and formerly Calpont InfiniDB). In doing analytical benchmarks with various row-oriented and column-oriented SQL RDBMS’s, it becomes quickly apparently that the folks at Vertica have the equation right in terms of performance, high-availability, and manageability. I’d pit Vertica against Oracle Enterprise Edition any time and know that it would come out the winner for large-scale analytics. I would highly, highly suggest that, instead of re-inventing the wheel, you dig deep into the CSTORE project to gain an understanding of a foundation that works in practice…the open-source code for that academic project is still readily available.

    Reply
    • Dave Sisk
      Dave Sisk says:
      June 26, 2018 at 9:24 pm

      One additional note: Column-oriented technology comes with a penalty, and that penalty (as you’ve already noted) is difficulty handling updates to existing rows. I’ve looked at column-oriented tech such as the the cstore_fdw, however a 20% gain in performance just isn’t worth that penalty in practice. When you look at gains from Vertica by comparison, a 50X faster improvement is unquestionably worth handling the penalty around updates and other row-centric operations.

      Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
On the usefulness of expression indexes Don’t set fsync=off if you want to keep your data
Scroll to top
×