2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / Haroon's PlanetPostgreSQL3 / Using Window Functions for Time Series IoT Analytics in Postgres-BDR
Muhammad Haroon

Using Window Functions for Time Series IoT Analytics in Postgres-BDR

June 13, 2018/0 Comments/in Haroon's PlanetPostgreSQL, Internet of Things, Time Series Data /by Muhammad Haroon

Internet of Things tends to generate large volumes of data at a great velocity. Often times this data is collected from geographically distributed sources and aggregated at a central location for data scientists to perform their magic i.e. find patterns, trends and make predictions.

Let’s explore what the IoT Solution using Postgres-BDR has to offer for Data Analytics. Postgres-BDR is offered as an extension on top of PostgreSQL 10 and above. It is not a fork. Therefore, we get the power of complete set of analytic functions that PostgreSQL has to offer. In particular, I am going to play around with PostgreSQL’s Window Functions here to analyze a sample of time series data from temperature sensors.

Let’s take an example of IoT temperature sensor time series data spread over a period of 7 days. In a typical scenario, temperature sensors are sending readings every minute. Some cases could be even more frequent. For the sake of simplicity, however, I am going to use one reading per day. The objective is to use PostgreSQL’s Window Functions for running analytics and that would not change with increasing the frequency of data points.

Here’s our sample data. Again, the number of table fields in a real world IoT temperature sensor would be higher, but our fields of interest in this case are restricted to timestamp of the temperature recording, device that reported it and the actual reading.

CREATE TABLE iot_temperature_sensor_data (
    ts timestamp without time zone,
    device_id text,
    reading float
);

and we add some random timeseries temperature sensor reading data for seven consecutive days

         ts          |            device_id             | reading 

---------------------+----------------------------------+---------

 2017-06-01 00:00:00 | ff0d1c4fd33f8429b7e3f163753a9cb0 |      10

 2017-06-02 00:00:00 | d125efa43a62af9f50c1a1edb733424d |       9

 2017-06-03 00:00:00 | 0b4ee949cc1ae588dd092a23f794177c |       1

 2017-06-04 00:00:00 | 8b9cef086e02930a808ee97b87b07b03 |       3

 2017-06-05 00:00:00 | d94d599467d1d8d3d347d9e9df809691 |       6

 2017-06-06 00:00:00 | a9c1e8f60f28935f3510d7d83ba54329 |       9

 2017-06-07 00:00:00 | 6fb333bd151b4bcc684d21b248c06ca3 |       9

Some of the very obvious questions of course:

  • What was the lowest temperature reading and when?
  • What was the highest temperature reading and when?

Quite understandably, we can run Min() and MAX() on ‘reading’ column to get lowest and highest temperature readings:

SELECT
    MIN(reading),
    MAX(reading)
FROM
    iot_temperature_sensor_data;
 min | max 
-----+-----  
   1 |  10 
(1 row)

While it is useful, it doesn’t tell us which corresponding dates and/or devices reported lowest and highest temperatures.

SELECT
    ts,
    device_id,
    reading
FROM
    iot_temperature_sensor_data
WHERE
    reading = (
        SELECT
            MIN(reading)
        FROM
            iot_temperature_sensor_data);
          ts          |            device_id             | reading 
 ---------------------+----------------------------------+--------- 
 2017-06-03 00:00:00 | 0b4ee949cc1ae588dd092a23f794177c |       1 (1 row)

So far so good!

However what about the questions like:
– Where does a given day rank in terms of temperature reading for the given week in a specified order?
– Let’s take a look to see how low or high the temperature was compared to previous or following day. So essentially we are looking to find the delta in temperature readings for a given day compared to one previous and/or following day.

This is where Window Functions come into play allowing us to compare and contrast values in relation to current row.

So if we wanted to find where a given day ranks in terms of its temperature reading with lowest temperature ranked at the top:

SELECT
    ts,
    device_id,
    reading,
    rank()
    OVER (
    ORDER BY
        reading)
FROM
    iot_temperature_sensor_data;
          ts          |            device_id             | reading | rank
 ---------------------+----------------------------------+---------+------
  2017-06-03 00:00:00 | 0b4ee949cc1ae588dd092a23f794177c |       1 |    1
  2017-06-04 00:00:00 | 8b9cef086e02930a808ee97b87b07b03 |       3 |    2
  2017-06-05 00:00:00 | d94d599467d1d8d3d347d9e9df809691 |       6 |    3
  2017-06-02 00:00:00 | d125efa43a62af9f50c1a1edb733424d |       9 |    4
  2017-06-06 00:00:00 | a9c1e8f60f28935f3510d7d83ba54329 |       9 |    4
  2017-06-07 00:00:00 | 6fb333bd151b4bcc684d21b248c06ca3 |       9 |    4
  2017-06-01 00:00:00 | ff0d1c4fd33f8429b7e3f163753a9cb0 |      10 |    7
 (7 rows) 

If we looked at the rank column, it looks good except that we notice rank 7 right after 4. This is because the next rank gets skipped because we have three rows with identical temperature reading of 9. What if I wanted not to skip ? PostgreSQL provides rank_dense() exactly to serve the same purpose.

SELECT
    ts,
    device_id,
    reading,
    dense_rank()
    OVER (
    ORDER BY
        reading)
FROM
    iot_temperature_sensor_data;
         ts          |            device_id             | reading | dense_rank
---------------------+----------------------------------+---------+------------
 2017-06-03 00:00:00 | 0b4ee949cc1ae588dd092a23f794177c |       1 |          1 
 2017-06-04 00:00:00 | 8b9cef086e02930a808ee97b87b07b03 |       3 |          2 
 2017-06-05 00:00:00 | d94d599467d1d8d3d347d9e9df809691 |       6 |          3 
 2017-06-02 00:00:00 | d125efa43a62af9f50c1a1edb733424d |       9 |          4 
 2017-06-06 00:00:00 | a9c1e8f60f28935f3510d7d83ba54329 |       9 |          4 
 2017-06-07 00:00:00 | 6fb333bd151b4bcc684d21b248c06ca3 |       9 |          4 
 2017-06-01 00:00:00 | ff0d1c4fd33f8429b7e3f163753a9cb0 |      10 |          5 
(7 rows)

How about which days saw the maximum rise in temperature compared to previous day ?

SELECT
    ts,
    device_id,
    reading,
    reading - lag(reading, 1)
    OVER (
    ORDER BY
        reading) AS diff
FROM
    iot_temperature_sensor_data;

         ts          |            device_id             | reading | diff
---------------------+----------------------------------+---------+------
  2017-06-03 00:00:00 | 0b4ee949cc1ae588dd092a23f794177c |       1 |
  2017-06-04 00:00:00 | 8b9cef086e02930a808ee97b87b07b03 |       3 |    2
  2017-06-05 00:00:00 | d94d599467d1d8d3d347d9e9df809691 |       6 |    3
  2017-06-02 00:00:00 | d125efa43a62af9f50c1a1edb733424d |       9 |    3
  2017-06-06 00:00:00 | a9c1e8f60f28935f3510d7d83ba54329 |       9 |    0
  2017-06-07 00:00:00 | 6fb333bd151b4bcc684d21b248c06ca3 |       9 |    0
  2017-06-01 00:00:00 | ff0d1c4fd33f8429b7e3f163753a9cb0 |      10 |    1
 (7 rows)

 We see that the last column shows the rise in temperature vs previous day. A quick visual inspection would show that the maximum rise in temperature is 3 degrees on 2017-06-02 00:00:00 and then again on 2017-06-05 00:00:00. With a little bit of CTE magic, we could list the days that saw the maximum rise in temperature.

WITH temperature_data AS (
    SELECT
        ts,
        device_id,
        reading,
        reading - lag(reading, 1)
        OVER (
        ORDER BY
            reading) AS diff
    FROM
        iot_temperature_sensor_data
)
SELECT
    ts,
    diff
FROM
    temperature_data
WHERE
    diff = (
        SELECT
            MAX(diff)
        FROM
            temperature_data);
         ts          | diff 

---------------------+------

 2017-06-05 00:00:00 |    3

 2017-06-02 00:00:00 |    3

Here is a list of Window Functions that PostgreSQL and Postgres-BDR support:

  • row_number()
  • percent_rank()
  • cume_dist()
  • ntile(num_buckets integer)
  • lead(value anyelement [, offset integer [, defaultanyelement ]])
  • first_value(value any)
  • last_value(value any)
  • nth_value(value any, nthinteger)

For further reading, please refer to the PostgreSQL documentation on Window Functions in PostgreSQL.

Tags: analytics, BDR, Internet of things, Postgres-BDR, PostgreSQL10, temperature sensor data, window functions
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
Postgres Installer – A GUI based, user-friendly installer for Postgre... Road test your patch in one command
Scroll to top
×