2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / Marco's PlanetPostgreSQL3 / JSONB type performance in PostgreSQL 9.4
Marco Nenciarini

JSONB type performance in PostgreSQL 9.4

March 3, 2015/15 Comments/in Marco's PlanetPostgreSQL /by Marco Nenciarini


The 9.4 version of PostgreSQL introduces the JSONB data type, a specialised representation of the JSON data, allowing PostgreSQL to be competitive in managing the “lingua franca” of the moment for the exchange of data via web services. It is useful to perform a number of tests to verify its actual performance.

Test dataarticolo-json-marco

We will run our tests using the customer reviews data from Amazon for the year 1998 in JSON format. The file customer_reviews_nested_1998.json.gz can be downloaded from the website of Citus Data.

The file, once unzipped, takes up 209MB and contains approximately 600k of records in JSON format with a structure similar to the following:

{

    "customer_id": "ATVPDKIKX0DER",
    "product": {
        "category": "Arts & Photography",
        "group": "Book",
        "id": "1854103040",
        "sales_rank": 72019,
        "similar_ids": [
            "1854102664",
            "0893815381",
            "0893816493",
            "3037664959",
            "089381296X"
        ],
        "subcategory": "Art",
        "title": "The Age of Innocence"
    },
    "review": {
        "date": "1995-08-10",
        "helpful_votes": 5,
        "rating": 5,
        "votes": 12
    }
}

Dimensions

The data can be loaded into a PostgreSQL database using the JSONB data type with the following commands:

CREATE TABLE reviews(review jsonb);
\copy reviews FROM 'customer_reviews_nested_1998.json'
VACUUM ANALYZE reviews;

The resulting table will take up approximately 268MB, with an additional cost of disk storage of around 28%. If we try to load the same data using the JSON type, which stores it as text, the result will be a table of 233MB, with an increase in space of roughly 11%. The reason for this difference is that the internal structures of JSONB, which are used to access the data without analysing the entire document each time, have a cost in terms of space.

Data access

Once the data is stored in the database, it is necessary to create an index in order to access it efficiently. Before the 9.4 version of PostgreSQL, the only method of indexing the contents of a JSON field was to use a B-tree index on a specific search expression. For example, if we want to perform a search by product category we will use:

CREATE INDEX on reviews ((review #>> '{product,category}'));

The newly created index takes up 21MB, or approximately 10% of the original data, and will be used for queries that contain within the WHERE clause the exact search term “review #>> {product,category}”, such as:

SELECT
    review #>> '{product,title}' AS title,
    avg((review #>> '{review,rating}')::int)
FROM reviews
WHERE review #>> '{product,category}' = 'Fitness & Yoga'
GROUP BY 1 ORDER BY 2;
                       title                       |        avg
---------------------------------------------------+--------------------
 Kathy Smith - New Yoga Challenge                  | 1.6666666666666667
 Pumping Iron 2                                    | 2.0000000000000000
 Kathy Smith - New Yoga Basics                     | 3.0000000000000000
 Men Are from Mars, Women Are from Venus           | 4.0000000000000000
 Kathy Smith - Functionally Fit - Peak Fat Burning | 4.5000000000000000
 Kathy Smith - Pregnancy Workout                   | 5.0000000000000000
(6 rows)

The query takes approximately 0.180ms to be performed on the test machine, but the index that has been created is highly specific and cannot be used for different searches.

Starting with version 9.4, the JSONB data type supports the use of inverted indexes (GIN, or General inverted Indexes), which allow indexing of the components of a complex object.

Let’s create a GIN index on our table reviews with the following command:

CREATE INDEX on reviews USING GIN (review);

The resulting index takes up 64MB of disk, which is approximately 30% of the size of the original table. This index can be used to speed up searches using the following operators:

  • JSON @> JSON is a subset
  • JSON ? TEXT contains a value
  • JSON ?& TEXT[] contains all the values
  • JSON ?| TEXT[] contains at least one value

The above query shall thus be rewritten using the operator @> to search for rows that contain '{"product": {"category": "Fitness & Yoga"}}':

SELECT
    review #>> '{product,title}' AS title,
    avg((review #>> '{review,rating}')::int)
FROM reviews
WHERE review @> '{"product": {"category": "Fitness & Yoga"}}'
GROUP BY 1 ORDER BY 2;

The query takes approximately 1.100ms to be performed on the test machine, and the index that has been created is flexible and can be used for any search within the JSON data.

In fact, it is often the case that the only operation used in the applications is the search for a subset. In this case it is possible to use a different GIN index which only supports the @> operator and is therefore considerably smaller. The syntax for creating this type of “optimised” index is as follows:

 

CREATE INDEX on reviews USING GIN (review jsonb_path_ops);

The resulting index occupies 46MB – only 22% of the size of the original data – and, thanks to its smaller size, it is used more efficiently by PostgreSQL.

This allows the above query to be run in just 0.167ms, with a performance increase of 650% compared to the original GIN index and 8% compared to the specific B-tree index initially used: all this without loss of generality with regards to the possible search operation.

Conclusions

With the introduction of the JSONB type and the GIN indexes
using jsonb_path_ops operator class, PostgreSQL combines the elasticity of the JSON format at an amazing data access speed.

Today it is thus possible to store and process data in JSON format with high performance while enjoying the robustness and flexibility that PostgreSQL has habitually provided us with over the years.

Tags: 9.4, b-tree, database, dbms, GIN, javascript, JSON, JSONB, jsonb_path_ops, key-value, mongodb, NOSQL, object, postgres 9.4, PostgreSQL, PostgreSQL 9.4
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
15 replies
  1. greg
    greg says:
    March 10, 2015 at 3:54 pm

    Thanks a lot for this article 😉

    Reply
  2. janko
    janko says:
    March 30, 2015 at 7:23 pm

    This was a very interesting article. I have a couple of questions:

    1. What does “\copy” exactly do? How did it know that you want to store the data in the “review” column? Was Postgres able to detect that it was the only column in the table?

    2. What does “VACUUM ANALYZE” do in your case? I read the Postgres documentation, and I understood from there that it cleans the deleted rows, but you didn’t delete anything. Does the “ANALYZE” part make future queries faster?

    3. If the “json_path_ops” index is focused on the @> operator only, does it mean that your SELECT statement is slower, since it still uses the #>> operator? Or it’s that fast because it’s actually doing the SELECT only after the WHERE filtering?

    Reply
    • Marco Nenciarini
      Marco Nenciarini says:
      March 30, 2015 at 8:36 pm

      1) \copy is a psql meta command which invokes the SQL commands “COPY … FROM STDIN” or “COPY .. TO STDOUT”, and then fetches/stores the data in a file accessible to the psql client.

      If you don’t specify any column list, PostgreSQL will load/dump all the columns of the table.

      2) It’s always a good idea to “VACUUM ANALYZE” a table after a bulk load of data. It ensures that all the internal structures used by PostgreSQL are up to date (hint bits, visibility map, etc…) and collects statistics about the content of the table, so the query planner can use these statistics to help determine the most efficient execution plans for queries.

      3) The “json_path_ops” GIN index is used to quickly find the data during the WHERE filtering, then the #>> operator is applied in the output phase to display only the required portion of data. It makes the access very fast.

      Reply
      • kustodian
        kustodian says:
        December 27, 2016 at 4:40 pm

        Just a small correction. After bulk inserting of data, VACUUM is not needed, only ANALYZE, so that it updates the query planner for that table/database. Running VACUUM ANALYZE won’t hurt, but it will just take longer, which makes a big difference on big tables/databases.

        Thanks for the article. I didn’t know that you can make a specialized GIN index which will only work for some operators.

        Reply
  3. dwilkins
    dwilkins says:
    April 6, 2015 at 6:48 pm

    Marco,
    Thanks for putting this information together. I’ve used the json/jsonb functionality a bit and it’s really nice. Accessing the `similar_ids` array is a little more challenging. I started out doing this:

    SELECT review #>> ‘{product,title}’ AS title
    FROM reviews
    WHERE exists(select 1 from jsonb_array_elements(“reviews”.review#>'{product,similar_ids}’) where value = ‘”0910627312″‘);

    which works for both json / jsonb by changing the name of the function to `json_array_elements`

    I’m going to adopt your syntax (and better use of the index) with this as soon as I can get my production db upgraded to 9.4.x:

    SELECT review #>> ‘{product,title}’ AS title
    FROM reviews
    WHERE “reviews”.review#>'{product,similar_ids}’ @> ‘”0910627312″‘;

    Thanks again!
    dwilkins

    Reply
    • Marco Nenciarini
      Marco Nenciarini says:
      April 7, 2015 at 12:48 pm

      Hi Dwilkins,

      the query you pasted will not use the jsonb_path_ops index. To take advantage of it you need to write the query as follows:

      SELECT review #>> ‘{product,title}’ AS title
      FROM reviews
      WHERE review @> ‘{“product”: {“similar_ids”: [“0910627312”]}}’;

      Reply
      • dwilkins
        dwilkins says:
        April 7, 2015 at 1:28 pm

        Thanks Marco – I’m gradually getting more familiar with Postgres – It’s obvious now that I explain each query that mine wasn’t using the index.

        Reply
  4. alexcabezas1
    alexcabezas1 says:
    April 14, 2015 at 6:39 pm

    What is the size limit of JSONB data type?

    Reply
    • Marco Nenciarini
      Marco Nenciarini says:
      April 15, 2015 at 4:37 pm

      It has the same limitations of other TOASTable fields, so it should work with maximum one GB of data. However, in my experience, it’s better to not exceed few hundred megabytes of data.

      Reply
      • alexcabezas1
        alexcabezas1 says:
        April 21, 2015 at 3:47 pm

        Thanks Marco. Very helpful this data.

        Reply
  5. sriram
    sriram says:
    August 18, 2015 at 6:29 am

    really useful article. Thanks a ton marco.

    Reply
  6. vincvan
    vincvan says:
    December 16, 2015 at 11:09 am

    Great Article thanks! Have you any suggestions how I could speed up a jsonb query, when I would like to compare floats which are saved as values in a jsonb column?
    Such as:

    SELECT “products”.* FROM “products”
    WHERE (COALESCE(CAST( nullif(technical_details#>>’{weight,value}’,”) AS FLOAT),0) BETWEEN 500 AND 1500)

    This query takes 30ms in my db. I added a GIN index on the column technical_details, but this did not really help.

    Thanks for your help!

    Reply
  7. Username*
    Username* says:
    August 20, 2016 at 6:18 pm

    In my experience
    review #>> ‘{product,title}’ is slower than review->product->>title

    Reply
  8. mika
    mika says:
    September 6, 2016 at 10:37 pm

    I want to use use jsonb but I want to find out if using gin index or jsonb_path_ops index described in your article reduces the poor performance that occurs when querying jsonb since PostgreSQL doesn’t know how to keep statistics on the values of fields within JSONB columns as described in the article below.

    http://blog.heapanalytics.com/when-to-avoid-jsonb-in-a-postgresql-schema/

    Thanks for your response

    Reply
  9. David
    David says:
    November 9, 2019 at 2:49 pm

    Is there a way to benchmark Postgres JSONb implementation as directly as possible? I mean measuring the serialization/deserialization time for some json documents, as well as measuring the time to access specific values?

    Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
NoSQL with PostgreSQL 9.4 and JSONB What’s new about Barman 1.4.0?
Scroll to top
×