Oracle to PostgreSQL: Basic Architecture - 2ndQuadrant

Summary

One of the most frustrating parts of an Oracle to PostgreSQL migration is the fact that both products use the same terminology, but with vastly different meanings. It is very easy to get caught up in a terminology misunderstanding and spend quite some time arguing across one another, only to find out that there wasn’t any disagreement at all. This rather strange argument becomes even more confusing as other add-on technologies are involved. Seeking to reproduce the same functionality across the two systems by adding components can make the transition easier, but strangely makes the dialogue harder.

This article seeks to describe the two architectures in somewhat comparative terms. Along with that comes the understanding that the comparison cannot be made exactly, as the architectures are fundamentally different at the implementation level. At least we will seek to alleviate the largest misunderstandings, and possibly give our new novice PostgreSQL DBA some basic images in their head to think with.

Components

The confusion starts nice and early in the process. Let’s talk about “instances”. In Oracle terms, a system consists of a host (hardware/virtual machine) that has a set of database processes providing access to data. This one-to-one-to-one relationship of hardware to processes to databases encourages the Oracle administrator to speak about the instance at 3 levels fairly glibly. This could mean the hardware, the virtual machine, the processes serving the data, or the data itself as if this were all one big happy installation system. There is such a thing as a shared server configuration, but for the sake of this article, we are going to skip that additional complexity for the sake of brevity.

In the PostgreSQL environment, a host (hardware/virtual machine) may have any number of instances (processes serving databases), which may in turn have multiple databases (directories with data). The PostgreSQL DBA must differentiate between the hardware providing the platform, the instances providing service, and the data. This is a one-to-many-to-many-to-many relationship. In rough terms you may look at the relationship thusly:

Component	Oracle Term	PostgreSQL Term
Platform	Instance or database	Host or Virtual Machine
Service	Instance or database	Instance
Repository	Instance or database	Data Directory or Cluster
Storage	Instance or database	Database

And this brings us to our next disambiguation. The word “cluster”. Unfortunately, the word cluster in PostgreSQL is not unique to any given concept in the PostgreSQL architecture. It can mean “data ordered by an index”, “a collection of databases”, “the act of reorganizing the table data according to the index”, and “a collection of services participating in replication”. In this context, we are referring to a collection of databases. This collection of databases starts at a top level directory. In PostgreSQL terms, the “data directory”. Further directories under this top level are fully complete and separate databases. So, we find out that a single master process (commonly referred to as the “postmaster”) serves any number of user data directories “databases”.

Of course the diagram above is an oversimplification of the Oracle vernacular. It is presented in this form to make the point that the common language is used very loosely and that this habit causes PostgreSQL users to constantly interrupt Oracle users for disambiguation.

Services

The next level of the architecture provides for some more confusion. When we look at the service layer (“instance” in both vernaculars), we see that the processes that make up the service delivery are again similar, as they solve similar conceptual problems. Unfortunately, again, they are not the same, as they solve these problems using different implementations.

TASK	Oracle Term	PostgreSQL Term
Apply Changes	Database Writer	Background Writer
Journaling	Log Writer	WAL Writer WAL
Flush to disk	Checkpoint	Checkpointer
Health Check	System Monitor Process Monitor	External to PG
Distributed Recovery	Recoverer	Recovery Mode
Journal Retention	Archiver	Archiver
MVCC Cleanup	Table Shrink*	Autovacuum
Table Compaction	Table Redefinition	VACUUM FULL
Listener	TNS Listener	postgres (Postmaster)
Statistics Refresh	Table Redefinition	Stats Collector ANALYZE
Cluster Communication	None	Replication
Fulfill User requests	Session	session or backend or planner

Online Table Shrink and Online Table Redefinition turned on with automatic space management turned on.
Statistics Refresh invoked after Table Redefinition on Oracle in the same process
Health check and monitoring processes are left to external utilities like Nagios, Zabbix, Cacti, Munin, Data Dog, etc.
“Recovery” is a mode of the Postmaster process on startup that replays journal files to catch up with source systems.
“Session” is a software term that refers to a transaction envelope. “Backend” is a system term for a process, and “planner” is a euphemism for the PostgreSQL query engine. To be a bit more exacting, “Session” is the proper direct translation.

This list of caveats could be infinitely expanded. And these equivalents are not exact by any means. In some cases, the Oracle process may map to several PostgreSQL processes, and vice versa. This diagram should at least serve as a reference for where to get further information on the same general concepts.

The Data

The databases consist of file components that provide the persistence of the data. These file components map roughly like this:

Component	Oracle Term	PostgreSQL Term
semaphore	control files	recovery.signal standby.signal promote.signal
past rows	REDO log	contained in tables
journal archive	directory	user configurable
parameters	parameter file	postgresql.conf pg_hba.conf pg_ident.conf
database	data files	subdirectories of data dir
data	data files	files/dirs in database directory

The only process controlled by file semaphore for PostgreSQL 12+ is crash recovery. recovery.signal will process WAL, standby.signal will allow queries while in crash recovery mode, and promote.signal instructs PostgreSQL to exit recovery mode and proceed to accept write connections.
Again, these are rough equivalents. The concept here is that you will begin to understand where to look for PostgreSQL data on disk.
Within the database, the structure of the data is completely different from the Oracle implementation. Any equivalency at this level becomes misleading at best and is better studied as a separate concept, rather than attempt to cross train.

Tablespaces

A quick word about “tablespaces” here. They do not have anywhere near the significance in PostgreSQL that they do in Oracle. In PostgreSQL, tablespaces are just a glorified symbolic link to a directory. That’s it. The only reason to maintain them in the database is so that PostgreSQL can move the data appropriately across containers. This could be useful to put critical data such as indexes or lookup tables on faster storage, for example.

PostgreSQL does not heavily rely on these tablespaces. The common Oracle technique of preemptively defining tablespaces for each object is not a best practice for the PostgreSQL community.

Cleanup

PostgreSQL does not move old rows out of the table data. The old version of the row is left in the pages and leaves for a background cleanup process to deal with. The cleanup strategy necessitates the “autovacuum” process, which kicks off occasionally and marks the old rows as recoverable space. The reclaimed space is then reused by subsequent write transactions such as INSERT and UPDATE.

This strategy for cleanup leaves PostgreSQL without the basic functionality of the REDO log. Current development is underway to remedy this situation. At the time of this writing, PostgreSQL 12.3 is current, and there is hope that this feature will appear in a coming major release.

Futures

PostgreSQL 13 will ship in the future with a glossary of terms. Hopefully, as this glossary is expanded, many of the misconceptions surrounding architectural misunderstandings can be alleviated.

Reference

There is also a basic description of some of the major processes in an article on the main PostgreSQL website.

2ndQuadrant is now part of EDB

Summary

Components

Services

The Data

Tablespaces

Cleanup

Futures

Reference

Support & Services

Products

Postgres Learning Center

About Us