2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • Support & Services
    • Support
      • 24/7 PostgreSQL Support
      • Developer Support
    • DBA Services
      • Remote DBA
      • Database Monitoring
    • Consulting Services
      • Health Check
      • Performance Tuning
      • Database Security Audit
      • PostgreSQL Upgrade
    • Migration Services
      • Migrate to PostgreSQL
      • Migration Assessment
  • Products
    • Postgres-BDR ®
    • PostgreSQL High Availability
    • Kubernetes Operators for BDR & PostgreSQL
    • Managed PostgreSQL in the Cloud
    • Installers
      • Postgres Installer
      • 2UDA
    • 2ndQPostgres
    • pglogical
    • Barman
    • repmgr
    • OmniDB
    • SQL Firewall
    • Postgres-XL
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • AlwaysOn Postgres
      • PostgreSQL with High Availability
      • Security Best Practices
      • BDR
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • AlwaysOn Postgres
      • PostgreSQL with High Availability
      • Security Best Practices
      • BDR
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • What Does “2ndQuadrant” Mean?
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / PostgreSQL3 / Those darn Large Objects
Martín Marqués

Those darn Large Objects

September 20, 2016/3 Comments/in PostgreSQL /by Martín Marqués

Intro

PostgreSQL gives developers the chance of choosing between two possible storage facilities for large binary data: Bytea and LargeObjects.

Large Objects have been around for a long time, and PostgreSQL has a smart way of storing large binary data. It does so by splitting it into chunks of LOBLKSIZE (a forth of BLCKSZ). That way the tuples from pg_largeobject don’t spill on the toast table.

On the other hand bytea stores the binary data directly in the tuple, which may lead to poor performance depending on how your schema looks.

This sounds great if you have an intelligent interface for dealing with the manipulation of these binary files, specially if update modify just a small portion of the whole binary file.

But normally we don’t bother writing code that takes advantage of this, and instead we write again of the whole binary data.

One of the things that I believe make people adopt large objects are the functions available for importing and exporting files directly from the database server to it’s filesystem. There’s a con to this: if the application is on a different server, you’ll need more code to move the file to the location where it’s needed.

A problem you might face

The past days I had to examine a database used to store information of user sessions from a Java CAS system. I found there were almost 100 million large objects in the database, not very big ones.

I went over the user tables checking the fields that had an oid field, and then I cross-referencing the values in those fields with the pg_largeobject_metadata table. I found that 96% of those large objects where orphan ones. Those are large objects which weren’t referenced by any tuple from the user tables.

Further investigation concluded that Hibernate didn’t take care of purging the largeobjects it created when deleting or updating tuples with oid fields. So it was generating a great amount of bloat which could not be clean up by vacuuming, but had to be purged from pg_largeobjects table manually.

In the particular case of the CAS database, this query served to identify the largeobjects still in use:

SELECT unnest(array[expiration_policy,
                    authentication,
                    services_granted_access_to])
       FROM public.ticketgrantingticket
UNION
SELECT unnest(array[expiration_policy, 
                    service])
       FROM public.serviceticket

The query can be used to exclude from the list of large objects which ones to remove. Something like this:

SELECT lo_unlink(pg_largeobject_metadata.oid)
       FROM pg_largeobject_metadata
       WHERE pg_largeobject_metadata.oid NOT IN (
             SELECT unnest(array[expiration_policy,
                                 authentication,
                                 services_granted_access_to])
             FROM public.ticketgrantingticket
             UNION
             SELECT unnest(array[expiration_policy, 
                                 service])
             FROM public.serviceticket
)

Conclusion

Large objects have their issues, just like other types of data (especially when using types to store large binary data). It’s up to the developers and database administrators to take advantage of the pros and mitigate the cons.

We gave a possible query to perform the clean-up, but there’s also a nice extension which cleans up the orphaned large objects with triggers: Large Object Manager

Some people might prefer running a purge query during quiet hours instead of executing a trigger on every UPDATE and DELETE. On systems with very, very low UPDATE and/or DELETE rate, a trigger over each table that has an oid field, seems a more elegant solution. And any performance loss for having to execute the trigger function would be superfluous.

In any case, large objects still have great fans, most likely because of the internal functions provided to import and export the binary data directly to the local filesystem. With bytea you’d normally use more memory at the application tier. It’s a very common procedure to read the binary field completely into a variable an then process it.

I might write something about using bytea which I used in one of my past developments in a future blog post.

Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
3 replies
  1. Greg
    Greg says:
    September 23, 2016 at 8:33 am

    We had a case of developers using Hibernate where large objects were created as the developers wanted CLOB support, as that is what it was in MySQL. It turned out that the combination of the developers and Hibernate chose to create the large objects and reference them with their own integer id instead of an OID. This caused every large object to be orphaned at creation. However, there was no indication of anything wrong, as the developers were able to reference the large objects using these ids.

    Eventually, we were seeing serious bloat. A vacuumlo ended up deleting all of the large objects, so we then knew there was something wrong. In the end altering the table to make the large object a text field instead fixed the issue.

    In the end, it’s really important that developers understand how PostgreSQL handles large objects when using them. Especially when they get them from a framework like Hibernate.

    Reply
  2. Guillermo
    Guillermo says:
    September 8, 2017 at 2:25 pm

    Martín, creo que entiendo el procedimiento que mencionas para limpiar la tabla largeobject, pero hay muchas tablas del catálogo que tienen tipo oid ¿o me equivoco? en ese caso tendría que tener en cuenta que hay muchas filas en largeobject que pueden tener datos del catálogo y no precisamente de datos de otros esquemas?

    Reply
    • Martín Marqués
      Martín Marqués says:
      September 19, 2017 at 3:02 pm

      Hola Guillermo,
      Creo que estás confundiendo el *campo* oculto oid de las tablas de catalogo con el *tipo de datos* oid de los Large Objects.
      Ahora, la forma más segura y elegante de limpiar la tabla pg_largeobjects de archivos huérfanos es usando vacuumlo.

      Suerte,

      Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
My experience with the Indonesia PUG meetup BDR is coming to PostgreSQL 9.6
Scroll to top
×