2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
    • Support
      • 24/7 PostgreSQL Support
      • Developer Support
    • DBA Services
      • Remote DBA
      • Database Monitoring
    • Consulting Services
      • Health Check
      • Performance Tuning
      • Database Security Audit
      • PostgreSQL Upgrade
    • Migration Services
      • Migrate to PostgreSQL
      • Migration Assessment
  • Products
    • Postgres-BDR ®
    • PostgreSQL High Availability
    • Kubernetes Operators for BDR & PostgreSQL
    • Managed PostgreSQL in the Cloud
    • Installers
      • Postgres Installer
      • 2UDA
    • 2ndQPostgres
    • pglogical
    • Barman
    • repmgr
    • OmniDB
    • SQL Firewall
    • Postgres-XL
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / Craig's PlanetPostgreSQL3 / PostgreSQL anti-patterns: Unnecessary json/hstore dynamic columns

PostgreSQL has json support – but you shouldn’t use it for the great majority of what you’re doing. This goes for hstore too, and the new jsonb type. These types are useful tools where they’re needed, but should not be your first choice when modelling your data in PostgreSQL, as it’ll make querying and manipulating it harder.

Some readers will be familiar with the (anti)-pattern. EAV has been a necessary evil in database schemas for a long time. It’s something we all advise against using and look down on, then resort to when reality and business requirements mean not everything can be fit into rigid fields in properly modelled relations. For those who don’t know it, it’s a schema design where you turn a relational database into a poor-man’s key/value store using a table with object_id (“entity”), key (“attribute”) and value columns. Querying it usually involves lots and lots of self-joins.

json is the new EAV – a great tool when you need it, but not something you should use as a first choice.

(Most of this also applies to PostgreSQL arrays, which are great tools for building queries, but not usually an appropriate way to store authorative data).

JSON blob tables

Since the release of PostgreSQL 9.3, and even more since jsonb in 9.4, I’m seeing more and more Stack Overflow questions where people ask how to do joins on json objects, how to query for a nested key inside any array element, etc. All reasonable things to do, but when asked for schema and example data they’ll often post something like this:

CREATE TABLE people(
    id serial primary key,
    data jsonb not null
);

INSERT INTO people(data) VALUES ($$
{
    "name": "Bob",
    "addresses": [
        {
            "street": "Centre",
            "streetnumber": 24,
            "town": "Thornlie",
            "state": "WesternAustralia",
            "country": "Australia"
        },
        {
            "street": "Example",
            "streetnumber": "4/311-313",
            "town": "Auckland",
            "country": "NewZealand"
        }
    ],
    "phonenumbers": [
        {
            "type": "mobile",
            "number": "12345678"
        }
    ]
}
$$);

INSERT INTO people(data) VALUES ($$
{
  "name": "Fred",
  "phonenumbers": [
    { "type": "mobile", "number": "12345678" }
  ]
}
$$);

INSERT INTO people(data) VALUES ($$
{
  "name": "John Doe"
}
$$);

and ask “how do I find people who have the same phone number?”.

Experienced relational database users will already be wincing… but remember, not everybody is used to inverting their thinking to follow foreign keys backwards from child to parent and decomposing composite entities into normal form. When you think about it, relational databases are weird.

Most of PostgreSQL’s power comes from that relational model, though. When you store your data as json blobs you deny the query planner the ability to make sensible decisions based on table and column statistics, you lose most of the indexing features and scan types, and are generally restricted to quite primitive operations. Doing anything interesting tends to involve lots of self-joins and filters.

Baroque queries

For example, the request to find people with matching phone numbers could be turned into a query like:

select 
  p1.id AS person1,
  p2.id AS person2,
  p1.data ->> 'name' AS "p1 name",
  p2.data ->> 'name' AS "p2 name",
  pns1 ->> 'type' AS "type", 
  pns1 ->> 'number' AS "number" 
from people p1 
  inner join people p2 
    on (p1.id > p2.id)
  cross join lateral jsonb_array_elements(p1.data -> 'phonenumbers') pns1
  inner join lateral jsonb_array_elements(p2.data -> 'phonenumbers') pns2 
    on (pns1 -> 'type' = pns2 -> 'type' AND pns1 -> 'number' = pns2 -> 'number');

Isn’t that just a thing of readable beauty?

There’s nothing particularly wrong with the json support functions and operators, it’s just awkward taking nested object-like data and querying it using relational joins and predicates.

At least PostgreSQL supports LATERAL; without that, this would be a nightmare of a query to write.

As it happens an equivalent query on a relational schema for this particular question isn’t that much nicer:

SELECT
  p1.id AS "person1",
  p2.id AS "person2",
  p1.name AS "p1 name",
  p2.name AS "p2 name",
  pns1.phoneno AS "p1 phone",
  pns2.phoneno AS "p2 phone"
FROM
  person p1
  INNER JOIN person p2 ON (p1.id < p2.id)
  INNER JOIN phone pns1 ON (p1.id = pns1.person_id)
  INNER JOIN phone pns2 ON (p2.id = pns2.person_id)
WHERE
  pns1."type" = pns2."type"
  AND pns1.phoneno = pns2.phoneno;

... but it's likely to go a *lot* faster with an index on (phone."type", phone.phoneno) than the json based query would, and read a lot less data off disk in the process.

No fixed data typing

The json-based query given above is also buggy, because jsonb comparisons are sensitive to data type so the jsonb values "12345678" and 12345678 are unequal:

regress=> SELECT '12345678'::jsonb, '"12345678"'::jsonb, '12345678'::jsonb = '"12345678"'::jsonb AS "isequal";
  jsonb   |   jsonb    | isequal 
----------+------------+---------
 12345678 | "12345678" | f

so this:

insert into people (data) values ('{"phonenumbers": [{"type":"mobile","number":12345678}]}');

won't be identified as a duplicate even though it should be.

Note that this is arguably a PostgreSQL limitation, since a JavaScript intepreter's loose typing means they compare as equal:

> 12345678 == "12345678"
> true

To get the numbers to match using a PostgreSQL query, we have to change our query so that this:

    on (pns1 -> 'type' = pns2 -> 'type' AND pns1 -> 'number' = pns2 -> 'number');

uses the ->> json-value-as-text operator instead:

    on (pns1 -> 'type' = pns2 -> 'type' AND pns1 ->> 'number' = pns2 ->> 'number');

since the text form of the numbers is the same.

No constraints

Of course, someone might insert:

insert into people (data) values ('{"phonenumbers": [{"type":"Mobile","number":"1234 5678"}]}');

... which won't match because of the capital M and the space in the number.

Because everything is in a json blob can't easily add CHECK constraints on the value, use ENUM types or DOMAINs, create trigger-maintained shadow normalized columns, or any of the usual approaches used to handle data normalisation like this. It'd be necessary to write a json validator/transform trigger in a procedural language like pl/v8 or do it all in the application.

So when should json be used?

From the above you might be thinking that I'm against using json types in the database. That's not the case at all. They're often better than EAV when you have dynamic-column data that just won't fit into a traditional relational model. Or, as I've seen people try in the past, co-ordinate with an external MongoDB for the json storage.

For example, sometimes application users want to be able to add arbitrary attributes to an entity. It's a business requirement. The client don't like it when you say that the database doesn't do that so they can't have notes in their application, and they're muttering things about "just doing it in the application" or "we didn't have these problems with MongoDB".

So you might model the typical stuff relationally (albeit with indiscriminate use of surrogate keys), but add a json field in person just for those extras so the app can fetch them when displaying a person.

How to decide when to use json

Use json if your data won't fit in the database using a normal relational modelling. If you're choosing between using EAV, serializing a Java/Ruby/Python object into a bytea field, or storing a key to look up an external structured object somewhere else ... that's when you should be reaching for json fields.

At this point it's probably worth replacing hstore use with jsonb in all new applications.

Note that plain json fields are also useful - if you're not going to be indexing the json and querying within it, they're usually more compact and faster to send and receive.

Popular Articles from EDB

  • Connecting PostgreSQL using psql and pgAdmin
  • How to use PostgreSQL with Django | EDB
  • How to use PostgreSQL with Laravel | EDB
  • 10 Examples of PostgreSQL Stored Procedures
  • How to import and export data using CSV files in PostgreSQL

Featured Articles from EDB

  • PostgreSQL Replication and Automatic Failover Tutorial
  • PostgreSQL vs. SQL Server (MSSQL) - Extremely Detailed ...
  • The Complete Oracle to PostgreSQL Migration Guide ...
  • PostgreSQL vs. MySQL: A 360-degree Comparison ...
  • How to use pg_dump and pg_restore in multi-host ...

Useful Links from EDB

  • EDB Postgres Advanced Server: The Oracle Alternative
  • Backup and Recovery Tool: Disaster Recovery for PostgreSQL
  • Migration Portal: Convert Oracle database schemas to ...
  • Migration Toolkit: Command-line data migration tool
  • Managing Postgres Replication Made Easy

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
Automating Barman with Puppet: it2ndq/barman (part three) PostgreSQL 9.5: IMPORT FOREIGN SCHEMA
Scroll to top
×