2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / 2ndQuadrant3 / pgpredict – Predictive analytics in PostgreSQL
Lan Zagar

pgpredict – Predictive analytics in PostgreSQL

April 14, 2016/0 Comments/in 2ndQuadrant, Data Mining, Lan's PlanetPostgreSQL, PostgreSQL /by Lan Zagar

We all realize how important it is to be able to analyze the data we gather and extract useful information from it. 2UDA is a step in that direction and aims to bring together data storage and management (PostgreSQL) with data mining and analysis (Orange).
pgpredict is a project in development and aims to be the next step that will bring it all full circle. Starting with data (in our case stored in a database), we first need to give access to it to experts who can analyze it with specialized tools and methods. But afterwards, when for example they train a predictive model that can solve something important and beneficial for us, they need to be able to convey those results back so we can exploit them. This is precisely what pgpredict tries to solve – deploying predictive models directly inside the database for efficient and real-time execution.

The project started as a continuation of 2UDA, which already allows Orange to be used to work with data stored in a PostgreSQL database. What was needed was a way to export trained predictive models, transfer them to where they are needed (e.g. the production server) and deploy them. So the project is split into extensions for Orange that can export models to .json files, and for postgres that can load and run those models. Because the models are stored in text files, they can be tracked in a version control system. The json format also enables them to be easily stored in the database after loading, making use of PostgreSQL json capabilities.

pgpredict

Currently there exists a working implementation for a limited number of predictive models and it has not undergone thorough optimization yet. But it is already showing great promise.
To test it, I generated a table of imaginary customers with 10M rows with some independent random variables (age, wage, visits) and an output variable (spent). Orange was then used to load the table and obtain a predictive model. Because it makes use of TABLESAMPLE (a PostgreSQL 9.5 feature) trying different parameters and settings works quickly (even for data much larger than in this test). The data scientist can therefore interactively try different solutions, evaluate them, and come up with a good model in the end. The final ridge regression model was then exported and loaded into the database. There it can be used in real time to predict the amount spent for new customers appearing in the database.
Using pgbench showed that while selecting an existing column for a single customer from the table required 0.086 ms, it was only slightly longer to get the independent variables, and make a prediction for the value of spent: 0.134 ms.
Predicting the amount spent for 10^6 customers does not take 10^6 times more time (134 s) since model initialization is done the first time and then reused. So it actually took 13.6 s, making it about 10x faster.
These numbers were obtained for a simple model, on my laptop, with code that has potential for much more optimization. Expect a more rigorous evaluation soon, when we get ready to release pgpredict to the public. But even now, I think the exhibited efficiency and ease of use would make it a great advantage for a large majority of potential users looking for predictive analytics for their PostgreSQL powered data warehouses.

Tags: 2UDA, data mining, Orange, pgpredict, predictive analytics
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
What is SKIP LOCKED for in PostgreSQL 9.5? Code coverage stats
Scroll to top
×