2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / 2ndQuadrant3 / How to use Logistic Regression Machine Learning model with 2UDA – PostgreSQL a...
How to use Logistic Regression Machine Learning model with 2UDA – PostgreSQL and Orange (Part 5)
Semab Tariq

How to use Logistic Regression Machine Learning model with 2UDA – PostgreSQL and Orange (Part 5)

May 21, 2020/0 Comments/in 2ndQuadrant, Seemab's PlanetPostgreSQL /by Semab Tariq

This article gives a step by step guide to utilizing Machine Learning capabilities with 2UDA. In this article, we will use examples of Animals to predict whether they are Mammals, Birds, Fish or Insects.

Software versions

We are going to use 2UDA version 11.6-1 to implement the Machine Learning model. This version 11.6-1 combines:

  1. PostgreSQL 11.6
  2. Orange 3.23.0

You can find the latest version of 2UDA here.

Step 1:

Load training dataset into PostgreSQL

The sample dataset that is used to train our model is available at the official Orange GitHub repository here.

Follow these steps to load the training data into PostgreSQL tables:

1. Connect to PostgreSQL via psql, OmniDB or any other tool that you are familiar with.

2. Create a table to store training data. Here it is named as training_data.

CREATE TABLE training_data(name VARCHAR (100), hair integer, feathers integer, eggs integer, milk integer, airborne integer, aquatic integer, predator integer, toothed integer, backbone integer, breathes integer, venomous integer, fins integer, legs integer, tail integer, domestic integer, catsize integer, type VARCHAR (100));

3. Insert training data into the table via COPY query. Before executing this query make sure that PostgreSQL has required read permissions on the data file otherwise COPY operation will fail. 

NOTE: Please make sure you type a tab space between single quotes after the delimiter keyword. Rows two and three of the training dataset in the .tab file contain some meta information. Since it is not needed at this point, it has been removed from the file.

COPY training_data FROM 'Path_to_training_data_file’ with delimiter '    ' csv header;

Please find the screenshot of training dataset below

Step 2:

Create a workflow with Orange

  1. Go to the desktop and double click on the Orange icon.
  2. Create a blank project by clicking on New from the menu.

 

Now you are ready to apply the Machine Learning model on the dataset.

Step 3:

Select Machine Learning model to train the data

For this article, the Logistic Regression model is used to train the data. Once the data training process is complete, in the next step, test data is passed to the Prediction widget to check the accuracy of predictions.

Step 4:

Import training data from PostgreSQL into Orange

This training dataset will be used to train the Machine Learning model.

  1. Drag and Drop SQL Table widget from the Data menu.

 

  1. Rename widget (Optional)
    1. Right-click on the SQL Table widget.
    2. Select Rename.

 

  1. Connect with PostgreSQL to load the training dataset:
    1. Double click on the Training data widget.
    2. Enter credentials to connect to the PostgreSQL database.
    3. Press the reload button to load all the available tables from the given database.
    4. Select the training_data table from Drop down menu and close the pop-up.

Step 5:

Add Target column

This step is important because the Machine Learning model will try to predict the data for this target variable/column:

  1. Drag and drop Select Columns widget from the data menu.
  2. Draw a link line from the training data widget to Select columns widget.
  3. Double click on the Select Columns widget.
  4. Search your target column under the Features label. Here, the type is used as a target variable because we need to see what type a given animal is.
  5. Drag and drop it under the Target Variable box and close the pop-up.

 

Step 6:

Columns ranking

You can Rank or Score the training variable/columns according to their correlation with the target column.

  1. Drag and drop the Rank widget from the data menu.
  2. Draw a link line from Select columns widget to Rank widget.
  3. Double click on the Rank widget to see the most related columns in the training data table. It will select the top 5 columns by default.

 

Step 7:

Data training

In this step, the Logistic Regression model will be trained with the training dataset. Follow the following steps:

  1. Drag and drop the Logistic Regression widget from the Model menu.
  2. Draw a link line from Rank widget to Logistic Regression widget.

Step 8:

Load test dataset into PostgreSQL

A separate test dataset is created to perform predictions. Follow the following steps to load the test dataset into the PostgreSQL table.

1. Create a table to store our test data. Here it is named as test_data.

CREATE TABLE test_data(name VARCHAR (100), hair integer, feathers integer, eggs integer, milk integer, airborne integer, aquatic integer, predator integer, toothed integer, backbone integer, breathes integer, venomous integer, fins integer, legs integer, tail integer, domestic integer, catsize integer, type VARCHAR (100));

2. Insert test data into the test table via COPY query. Before executing this query please make sure that PostgreSQL has required read permissions on the data file otherwise COPY operation will fail.

NOTE: Please make sure you type a tab space between single quotes after the delimiter keyword. A question mark is intentionally placed in the type column of the test dataset because we need to figure out the type of a given animal with our Machine Learning model.

COPY test_data FROM 'Path_to_test_data_file’ with delimiter '    ' csv header;

Please find the screenshot of test dataset below

Step 9:

Import the test data from PostgreSQL into Orange

  1. Drag and drop SQL Table widget from the data menu.

  1. Rename widget (Optional)
    1. Right-click on the SQL Table widget.
    2. Select Rename.

 

 

  1. Connect with PostgreSQL to load test data.
    1. Double click on Test data widget.
    2. Connect it with the Test_data table from PostgreSQL.

Now we are ready to perform predictions.

Step 10:

Predictions

The Prediction widget will try to predict the test data based on training data from the Logistic Regression model.

  1. Drag and drop the Prediction widget from the Evaluate menu.
  2. Draw a link line form Test data widget to Prediction widget.
  3. Draw a link line from the Logistic Regression widget to the Prediction widget.

Step 11:

Results

Double click on Prediction widget to view the results.

Understanding the Results

You will see 2 main tables in the prediction window. The table on the left side shows the predicted results, while the table on the right shows the original test data, which was provided for predictions.

Since the Logistic Regression model was used to train data so you will see one column named Logistic Regression that lists the results.

As we know:

  1. Horse is a Mammal
  2. Trout is a Fish
  3. Turkey is a Bird

So the Logistic Regression model is able to determine all the types correctly.

Predictions Accuracy

If you see the table on the left side in the prediction widget’s output, it has some numbers before the predicted type i.e, 0.02, 0.05, 0.04. These numbers show the accuracy of the predicted type.

We have used 7 types of animals in the training dataset, so it shows a total number of 7 columns with accuracy values each column will represent 1 type of animal. You can check which column is representing what type of animal by looking at the list available on the left side of your screen under Predicted probabilities for.

1st row says Turkey is a bird. We can see its accuracy is 0.82(from the 2nd column) which is the highest of all in that row.

2nd row says Trout is a Fish and its accuracy is 0.57 (from 3rd column) which is the highest of all in that row.

3rd row says Horse is a Mammal and its accuracy is 0.68 (from the 6th column) which is the highest of all in that row.

In this article, we have used the Logistic Regression model to implement the Machine Learning model. In the next blog, we will be using the AdaBoost model.

For any questions or comments, please get in touch using the contact form here.

Previous blog post:

How to use the KNN Machine Learning Model with 2UDA – PostgreSQL and Orange (Part 1)

How to use the SVM Machine Learning Model with 2UDA – PostgreSQL and Orange (Part 2)

How to use the Random Forest Machine Learning Model with 2UDA – PostgreSQL and Orange (Part 3)

How to use Tree Machine Learning model with 2UDA – PostgreSQL and Orange (Part 4)

Tags: 2UDA, AdaBoost, Logistic regression, Machine Learning, Orange, Orange3, postgres, PostgreSQL, Predictions
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
Webinar: Collations: Introduction, Features, Problems [Follow Up] Collations: Introduction, Features, Problems [Follow Up] Tuple Freezing & Transaction Wraparound Through Pictures Webinar: Tuple Freezing & Transaction Wrap around Through Pictures [Follow...
Scroll to top
×