2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / Haroon's PlanetPostgreSQL3 / PostgreSQL for IoT Data Retention and Archiving
Muhammad Haroon

PostgreSQL for IoT Data Retention and Archiving

September 11, 2018/1 Comment/in Haroon's PlanetPostgreSQL, Internet of Things /by Muhammad Haroon

We do understand that IoT revolution is resulting in enormous amounts of data. With brisk data growth where data is mostly time series append-only, relational databases and DBAs have a rather tough task to store, maintain, archive and in some cases get rid of the old data in an efficient manner. In my previous posts, I talked about various strategies and techniques around better scalability for data storage using PostgreSQL and Postgres-BDR extension. Data retention is becoming ever so important. So let’s see what PostgreSQL 10 and above have to offer to efficiently manage your data retention needs.

PostgreSQL has supported time based partitioning in some form for quite some time. However, it wasn’t part of the core PostgreSQL. PostgreSQL 10 made a major improvement in this area by introducing declarative partitioning.

As Postgres-BDR runs as an extension on top of PostgreSQL 10 and above, we get all partitioning features and improvements of PostgreSQL 10 and above in Postgres-BDR as well. So from the implementation perspective in the IoT domain, you can now create and manage partitions over time and space in PostgreSQL for multimaster environments using Postgres-BDR extension.

So let’s see how easy it is to implement data retention policies with PostgreSQL’s partitioning improvements.

CREATE TABLE iot_temperature_sensor_data (
 ts timestamp without time zone,
 device_id text,
 reading float
 ) PARTITION BY RANGE (ts);

PARTITION BY RANGE tells PostgreSQL that we are partitioning this table by range using column logdate.

Let’s create some partitions

CREATE TABLE iot_temperature_sensor_data_2018_february PARTITION OF iot_temperature_sensor_data FOR
VALUES
FROM (
    '2018-02-01')
    TO (
        '2018-03-01'
);

CREATE TABLE iot_temperature_sensor_data_2018_march PARTITION OF iot_temperature_sensor_data FOR
VALUES
FROM (
    '2018-03-01')
    TO (
        '2018-04-01'
);

CREATE TABLE iot_temperature_sensor_data_2018_april PARTITION OF iot_temperature_sensor_data FOR
VALUES
FROM (
    '2018-04-01')
    TO (
        '2018-05-01'
);

CREATE TABLE iot_temperature_sensor_data_2018_may PARTITION OF iot_temperature_sensor_data FOR
VALUES
FROM (
    '2018-05-01')
    TO (
        '2018-06-01'
);

CREATE TABLE iot_temperature_sensor_data_2018_june PARTITION OF iot_temperature_sensor_data FOR
VALUES
FROM (
    '2018-06-01')
    TO (
        '2018-07-01'
);
Alright, so our sensor data storage table is now partitioned over time.

\d+ iot_temperature_sensor_data
                                    Table "public.iot_temperature_sensor_data"
  Column   |    Type    | Collation | Nullable | Default | Storage  | Stats target | Description 
-----------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
 ts        | timestamp without time zone |           | | | plain |     | 
 device_id | text                        | | | | extended |              | 
 reading   | double precision            | | |   | plain | | 
Partition key: RANGE (ts)
Partitions: iot_temperature_sensor_data_2018_april FOR VALUES FROM ('2018-04-01 00:00:00') TO ('2018-05-01 00:00:00'),
            iot_temperature_sensor_data_2018_february FOR VALUES FROM ('2018-02-01 00:00:00') TO ('2018-03-01 00:00:00'),
            iot_temperature_sensor_data_2018_june FOR VALUES FROM ('2018-06-01 00:00:00') TO ('2018-07-01 00:00:00'),
            iot_temperature_sensor_data_2018_march FOR VALUES FROM ('2018-03-01 00:00:00') TO ('2018-04-01 00:00:00'),
            iot_temperature_sensor_data_2018_may FOR VALUES FROM ('2018-05-01 00:00:00') TO ('2018-06-01 00:00:00')

So what happens when data grows and we need to either remove old data or archive it?

The simplest option for removing old data is to drop the partition that is no longer necessary:

DROP TABLE iot_temperature_sensor_data_2018_april;

This can very quickly delete millions of records because it doesn’t have to individually delete every record. Note however that the above command requires taking an ACCESS EXCLUSIVE lock on the parent table.

While this approach can help you get rid of old data quickly, often preferable option is to remove the partition from the partitioned table but retain access to it as a table in its own right.

ALTER TABLE iot_temperature_sensor_data DETACH PARTITION iot_temperature_sensor_data_2018_april;

This allows further operations to be performed on the data before it is dropped. For example, this is often a useful time to backup the data using COPY, pg_dump, or similar tools. It might also be a useful time to aggregate data into smaller formats, perform other data manipulations, or run reports.

Partition drop and detach tasks can be automated via scheduled jobs to make sure data archival and removal needs are taken care of automatically. So if your retention policy requires archival/removal of 3 months old data, a cron job could run DROP TABLE or DETACH PARTITION on partitions that grow older than 3 months.

Enter pg_partman

If you need further ease of use with your data retention policy needs, pg_partman makes it easier to create and maintain partitions. So for example to achieve something similar that we did above:

SELECT partman.create_parent('iot.iot_temperature_sensor_data','ts','native','monthly');

Again, let’s create some partitions:

SELECT partman.create_parent('iot.iot_temperature_sensor_data','ts','native','monthly');
By default, pg_partman will create one time-partition for current month, four partitions of past four months and four more partitions for four months in the future. You can tweak around p_start_partition and p_premake if you need to override those settings for cases where you possibly need more partitions for backfilling old data or need to create more partitions into the future respectively.

\d+ iot.iot_temperature_sensor_data
                                     Table "iot.iot_temperature_sensor_data"
  Column   |    Type    | Collation | Nullable | Default | Storage  | Stats target | Description 
-----------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
 ts        | timestamp without time zone |           | | | plain |     | 
 device_id | text                        | | | | extended |              | 
 reading   | double precision            | | |   | plain | | 
Partition key: RANGE (ts)
Partitions: iot_temperature_sensor_data_p2018_03 FOR VALUES FROM ('2018-03-01 00:00:00') TO ('2018-04-01 00:00:00'),
            iot_temperature_sensor_data_p2018_04 FOR VALUES FROM ('2018-04-01 00:00:00') TO ('2018-05-01 00:00:00'),
            iot_temperature_sensor_data_p2018_05 FOR VALUES FROM ('2018-05-01 00:00:00') TO ('2018-06-01 00:00:00'),
            iot_temperature_sensor_data_p2018_06 FOR VALUES FROM ('2018-06-01 00:00:00') TO ('2018-07-01 00:00:00'),
            iot_temperature_sensor_data_p2018_07 FOR VALUES FROM ('2018-07-01 00:00:00') TO ('2018-08-01 00:00:00'),
            iot_temperature_sensor_data_p2018_08 FOR VALUES FROM ('2018-08-01 00:00:00') TO ('2018-09-01 00:00:00'),
            iot_temperature_sensor_data_p2018_09 FOR VALUES FROM ('2018-09-01 00:00:00') TO ('2018-10-01 00:00:00'),
            iot_temperature_sensor_data_p2018_10 FOR VALUES FROM ('2018-10-01 00:00:00') TO ('2018-11-01 00:00:00'),
            iot_temperature_sensor_data_p2018_11 FOR VALUES FROM ('2018-11-01 00:00:00') TO ('2018-12-01 00:00:00')

Now if we wanted to implement 3 month data retention policy with pg_partman, it would be as simple as:

UPDATE partman.part_config 
SET retention = '3 month'
WHERE parent_table = 'iot.iot_temperature_sensor_data';
and then a cron job to execute the retention task:
SELECT run_maintenance(p_analyze := false);

The cron job will make sure that it detaches 3 month old partitions automatically everytime it runs. Note that this will only detach partitions that are older than 3 months. If you want to remove them altogether, you need to specify that when setting the retention policy.

UPDATE partman.part_config SET retention_keep_table = false, retention = '3 month' 
WHERE parent_table = 'iot.iot_temperature_sensor_data';

retention_keep_table is a boolean value to determine whether dropped child tables are kept or actually dropped.

For any questions or comments, please get in touch with us using the contact form here.

Tags: BDR, Internet of things, IOT, Postgres-BDR
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
1 reply
  1. Anders
    Anders says:
    September 12, 2018 at 5:00 pm

    It is too bad that you need to create the partitions. If you could just tell the database how to calculate the partition key, either declaratively (truncate this date field to whole months) or by supplying an expression, the database could just create partitions as needed. That Postgresql already does ddl inside transactions, might make it easier to implement.

    Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
Upgrading to PostgreSQL 11 with Logical Replication Webinar : Database Security in PostgreSQL [Follow Up]
Scroll to top
×