2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / Uncategorized3 / Big Data Analytics: Tablesample, Orange, 2UDA
Umair Shahid

Big Data Analytics: Tablesample, Orange, 2UDA

October 12, 2015/1 Comment/in Uncategorized /by Umair Shahid

A lot is being said about the new ‘tablesample’ feature of PostgreSQL 9.5. The ability to retrieve a random sample of data in a short amount of time from a very large table makes it ideal to use tablesample as part of big data analytics. To demonstrate my point, I created 6 tables using the same DDL and then inserted varying amounts of data in them. I ended up with tables of row count 1k, 100k, 1m, 5m, 10m, & 100m. I turned on timings in psql and then ran a simple select count(*) query against each of these table. The time taken by the query to return, along with the size of these respective tables is given below: **** ****

Number of Rows     Time Taken (ms)     Size on Disk (MB)
1k     219.706     0.23
100k     1302.135     24
1m     7696.386     195
5m     40691.603     951
10m     60012.457      1923
100m     801493.319     19456

**** **** These numbers are from a PostgreSQL 9.5 server running locally on my laptop. As any big data expert will tell you, 100 million rows isn’t exactly a huge amount of data by today’s standards. Even then, I had to wait more than 13 minutes before a simple count query returned with results to me. Just imagine what would happen if I wanted statistics and some mathematical calculations to be done as part of my data analytics routine. Enter tablesample. As Gulcin mentions in her very informative blog here, tablesample can be used to retrieve a random sample of data from a potentially very large table. In conjunction with the extension tsm_system_time, you could also bind the query by time … i.e. tell it to return with whatever number of rows it has gathered (random sampling, of course!) in an X amount of time. Now imagine using this feature to create visualizations on big data very quickly without having to write SQL code. Enter Orange. It is exactly this feature that the team at University of Ljubljana has been able to exploit while creating the 3rd version of their data visualization tool, Orange. What Orange essentially does is that it fetches a random sample of data that it can retrieve within a second from a large table and then create statistical visualizations from this random sample. This enables a data analyst to very quickly be able to visualize statistical patterns in data, no matter how big the table is. Now imagine bringing together Orange & PostgreSQL 9.5 in a crisp, easy-to-use installation package that laymen can easily install on Windows, OSX, or Linux. Enter 2UDA (pronounced tudor). 2ndQuadrant’s Unified Data Analytics tool combines PostgreSQL 9.5 with Orange3 & LibreOffice5 in a nice, easy to use, GUI based installer available for Windows, OSX, & Linux. It has Orange & PostgreSQL pre-configured to work seamlessly with each other and utilize the tablesample feature to its fullest to create useful visualization quickly and efficiently. 2ndQuadrant announced today the Beta release of 2UDA that you can download here. Go head, take a look … it’s free to download and unlimited use. I am sure you will enjoy it 🙂

Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
1 reply
  1. Waiz
    Waiz says:
    October 28, 2015 at 9:02 pm

    Very informative Umair. Some useful insights.

    Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
Testing Postgres-XL with DBT-3 Ansible Loves PostgreSQL
Scroll to top
×