2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / 2ndQuadrant3 / Oracle to PostgreSQL – Binary Objects
Oracle to PostgreSQL – Binary Objects
Kirk Roybal

Oracle to PostgreSQL – Binary Objects

July 2, 2020/6 Comments/in 2ndQuadrant, Kirk’s PlanetPostgreSQL /by Kirk Roybal

Binary Large OBject  storage (BLOB) is a topic that comes up in almost every conversation about Oracle to PostgreSQL migration. I will discuss it here with enough detail to make architectural choices coming from the Oracle perspective.

A basic understanding of the dataype is available in the PostgreSQL documentation. A bit more detailed discussion is also in the documentation. I suggest you read these articles lightly, and come back here for further discussion.

Ready? Ok, let’s continue.

The previous articles touched on a discussion about why the binaries are placed in the database in the first place. In addition to the argument that binaries may need atomicity with other data, I’d also like to suggest some practical reasons.

  1. They get backed up with appropriate transaction integrity.
  2. They get transported with physical streaming replication.
  3. They do not require a separate connection stream to the caller.

So, that’s settled then, right? We’ll just stuff all the binaries into the database.

Whoa, hoss.

Here are a few reasons why that may not be such a great idea.

  1. PostgreSQL does not actually UPDATE rows. Due to concurrency semantics, it makes a new row with the new data, and marks it as the current row. Whoopsie, you just doubled your storage requirement for blobs.
  2. PostgreSQL can’t really do any rational SQL things with the blob, other than store and retrieve it. None of the operators, castings, aggregations… well, you get the idea. INSERT and SELECT is all you get, UPDATE is effectively limited to replacement.
  3. Blobs can have a terrible impact on garbage collection operations (VACUUM).
  4. Blobs have the highest potential to make you unexpectedly run out of space on a volume, which is insanely critical for PostgreSQL.
  5. Blobs storage functions and libraries stink a little bit like 1965.
  6. Blobs make other operations on the row run slowly, even if they don’t have anything to do with the blob data. This is mostly due to how PostgreSQL likes to ferry things around in 8k blocks from memory to disk to memory.
  7. Blobs don’t cache, and will discourage row caching.
  8. Blobs don’t participate in Logical replication.
  9. Blobs make the backups slow.

Ok, so by now you’ve decided that blobs are horrendous, and should never be used for anything.

Hang on there, pardner.

These arguments basically boil down to the intellectual vs. the practical. In an ideal world, we would just store everything in PostgreSQL, and the world would be at peace again. Selah.

But the truth is that there are physical limitations on how this storage should be handled to provide the expected experience. Let me give you some more suggestions than were provided in the official answers to the question.

  1. Store blobs in associative tables using a referential id to the corresponding row data. Use the additional columns recommended in the article linked above in that table. Put the blob column last in this table. Do this even if it is a 1:1 cardinality table. Just do it. Don’t argue with me or yourself. You’ll thank me later.
  2. In fact, always put the blobs and clobs columns dead last in any table, ever.
  3. Don’t update the blob unless it actually changes. PostgreSQL will obediently duplicate the row due to the concurrency semantics, and will not be intelligent enough to do a “no-op” update.
  4. Provide a huge amount of temp space, and put it on an amazingly fast NVME LVM drive array or SAN. Put the Write Ahead Log (WAL) files on this storage also. Make it big. Did you wince when you opened your wallet? No? Make it bigger.
  5. Increase your work_mem setting on a per-connection basis to help prevent thrashing blobs to disk.
  6. Increase the amount of RAM until you can handle all the work_mem multiplied by the number of connections you plan to support.
  7. Write your application to do separate calls to the database for the row data and the blob data, which can then be loaded lazily. It doesn’t really help, but it makes people feel like it helped, because they get a little instant gratification.
  8. Be realistic about whether the atomicity of the blob data is truly critical in a practical way for your application.
  9. Use XFS/ZFS or similar filesystem that can handle large files well.
  10. Keep hoping that Moores Law holds out for about 2 or 3 more iterations. With enough hardware, a lot of expensive things become trivial.

Now, with all of these mitigating strategies in place, a moderate amount of blob retention can be handled well by PostgreSQL. This requires that you think like the trickster, able to hold opposing points of view in your own head at the same time without going crazy.

The alternative is another kind of database. We call it a file system, and it’s been around for a very long time. PostgreSQL has get() and put() functions for direct file operations. You could use those in a transaction to manipulate the binary files with transactional integrity. Some (a lot) of assembly required, batteries not included. Doesn’t participate in the backup, unless you customize it. May I suggest Barman?

I hope this puts a few more ideas into your thinking about how to handle binary data in PostgreSQL effecively. This is one of those areas where having a database directly tied to a piece of hardware would help in a practical sense. I’m confident that Moores Law will catch up, and this article will eventually become stale. Somebody please wake me up when that happens, and give me my evening whiskey.

Tags: oracle migration
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
6 replies
  1. Isaac Morland
    Isaac Morland says:
    July 2, 2020 at 4:41 pm

    Can you be more specific about the get() and put() functions? I know about pg_read_binary_file() for reading a server file into a bytea but as far as I know there is no way to write an arbitrary bytea out as a file on the server.

    Reply
    • Kirk Roybal
      Kirk Roybal says:
      July 2, 2020 at 10:40 pm

      Probably the easiest way to go about it is the file_fdw foreign data wrapper extension. There’s also COPY FROM/TO file. The harder way to do it would be to create your own extension, where you can do anything you want.

      Reply
  2. Hans
    Hans says:
    July 2, 2020 at 9:56 pm

    > Blobs don’t participate in Logical replication.

    The manual states that only “large objects” (OID) are excluded from logical replication, but it does not mention that bytea is not replicated. So the claim that “blobs don’t participate” isn’t true.

    Reply
    • Kirk Roybal
      Kirk Roybal says:
      July 2, 2020 at 10:39 pm

      Correct. I should have been more specific. Thank you for the edit.

      Reply
    • Peter Eisentraut
      Peter Eisentraut says:
      July 16, 2020 at 9:18 pm

      When people talk about “blobs” in PostgreSQL, they usually mean “large objects”.

      The terminology is a bit ambiguous without context. But your understanding of the capabilities is correct.

      Reply
  3. Kirk Roybal
    Kirk Roybal says:
    July 6, 2020 at 8:57 pm

    It has been called to my attention that a couple of points about TOAST storage that were made in this article are not true.

    https://www.postgresql.org/message-id/[email protected]

    They have been removed to protect the guilty.

    Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up]
  • Full-text search since PostgreSQL 8.3
  • Random numbers
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up]

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
Bulk transactions with RESTful CRUD API using PostgreSQL and Spring Boot  Bulk transactions with RESTful CRUD API using PostgreSQL and Spring Boot Webinar: PostgreSQL Partitioning Roadmap Webinar: PostgreSQL Partitioning Roadmap [Follow Up]
Scroll to top
×