PostgreSQL 9.1 Features
This article takes a look at the features from the recent 9.1 release of PostgreSQL. For a broader overview of PostgreSQL features and history, look at “PostgreSQL - The story so far”.
The release of PostgreSQL 9.1 in September 2011 was groundbreaking for the PostgreSQL community, the PostgreSQL developers in general and 2ndQuadrant in particular. 9.1 saw the introduction of a number of world first features, showing the quality of both innovation and implementation in PostgreSQL.
Most notably, 9.1 introduced Synchronous Replication.
Designed and developed by 2ndQuadrant CTO Simon Riggs, Synchronous Replication addresses a problem quite different to the problems that Asynchronous Streaming Replication addressed in 9.0, but does so in a way that similarly builds upon Simon’s earlier work on Hot Standby. Synchronous Replication offers the ability to confirm that all changes made by a transaction have been transferred to one synchronous standby server, extending the standard level of durability offered by a transaction commit. Durability, the ‘D’ in ACID (an acronym for a set of concepts critical to reliable transaction processing, and widely followed by database systems), refers to the requirement that committed transactions will survive permanently, despite, for example, a loss of power to the server immediately after the commit.
"Heroku runs the largest PostgreSQL database-as-a-service in the world"..."The release of synchronous data replication with 9.1 provides our customers with innovative new ways of protecting mission-critical data, and validates PostgreSQL as one of the fastest-moving datastores available."
James Lindenbaum, Heroku co-founder.
Traditionally, Synchronous Replication as a broad concept has addressed use-cases where there was both a requirement for an extremely high level of durability, and where the performance penalty could be easily paid; financial trading systems, that often use expensive low-latency switches, are a classic example.
Synchronous Replication, by its very nature, implies a performance penalty as compared to Asynchronous Replication; the master must wait for remote confirmation of the transaction having committed. This property makes Synchronous Replication extremely sensitive to increasing latency over long distances simply because of the fundamental limit of the speed of light. In a basic implementation, Performance and Durability would be traded off against each other.
In a world first, PostgreSQL 9.1 allows the application using Synchronous Replication to do so selectively; individual users, connections, or even transactions can dynamically select their own level of durability. They can do so in deference to the actual value of the data to the business. For example, critical customer details or financial transactions can have the full level of super-durability offered by Synchronous Replication. Less critical data, such as instant messaging archives, can avoid paying the innate performance penalty of Synchronous Replication where that doesn’t make business sense.
When one considers that typically, only a small but critical selection of database transactions actually require this super-durability, it becomes apparent just how valuable this incredible innovation is. With PostgreSQL 9.1, Synchronous Replication has broken out of its traditional confines as an all-or-nothing high-availability solution for a select few that can afford to throw money at the latency problem.
Synchoronous Replication requires the user to nominate a prioritised list of one or more synchronous standbys; in the event of the current synchronous standby failing, the next in the list is promoted to the role of synchronous standby. In this way, the cluster maintains its durability guarantees robustly and reliably.
Another huge feature in PostgreSQL 9.1 is the addition of Extension support by 2ndQuadrant’s Dimitri Fontaine. Extensions offer users the ability to install a third party extension with a single SQL statement, and to uninstall or otherwise manage that extension with the same ease. Extensions are analogous to a package, and represent a set of functionality that solves a particular problem, for example, the installation of a new datatype that represents ISBNs, or a set of common cryptographic functions.
While PostgreSQL has always been highly-extensible, and while that has already been exploited in a myriad of innovative ways, it was not previously possible to manage the resulting database objects as a logical whole. The newly-developed capability to do just that will form the basis of the PGXN project, a community-led effort to create a central repository of PostgreSQL extension libraries. PGXN is anticipated to provide the PostgreSQL community with facilities that are similar to those CPAN provides the Perl community: a purpose-built automated software installer with a central, authoritative repository.
PostgreSQL 9.1 sees the continuation of 2ndQuadrant consultant Gregory Smith’s incremental improvements to write performance and server configurability, informed by vast experience in optimising for write-heavy workloads. Greg worked on the removal of duplicate fysnc() requests (requests to the operating system to flush dirtied buffers to permanent storage, in order to satisfy the aforementioned durability requirement), greatly improving performance under heavy write loads. Greg also simplified the configuration of the server, with the addition of a simple though highly efficacious heuristic for automatically setting the value of wal_buffers.
Greg displays an article he wrote for PostgreSQL magazine, while attending the 2011 PostgreSQL developer's meeting (Picture: O. Bartunov)
The introduction of an SQL/MED (Management of External Data) implementation in PostgreSQL 9.1, a part of the SQL standard that described how databases should interact with data stored outside the database, has many useful applications. It allows the user to define “Foreign Tables” - tables that can be queried like any other, but in fact represent arbitrary external sources of data such as CSV files, tables in some other RDBMS, or even a Twitter feed.
True serialisability, another world first, will be introduced in PostgreSQL 9.1. As of and from that release, the SQL standard SERIALIZABLE isolation level (the highest level available) in PostgreSQL will behave serially according to the strict mathematical definition: there will be apparent serial execution of statements. In particular, the “leaks” in the illusion of serial execution that could previously occur at that isolation level and still may occur in other MVCC databases like Oracle® Database, in the form of write-skew anomalies are no longer seen. Thanks to the work of Kevin Grittner, database administrator for the Wisconsin State Courts, MIT graduate student Dan Ports, and the progressive research of Michael James Cahill, the PostgreSQL implementation accomplishes this feat remarkably efficiently, without causing additional blocking or deadlocks in transactions as compared to the previous SERIALIZABLE isolation level implementation.
WITH queries, or queries with Common Table Expressions, are enhanced in a way not seen in or anticipated by any other RDBMS. WITH queries are a useful, standard SQL construct through which it is possible to define intermediary tables that do not persists beyond a single statement, to break up a complicated query into multiple parts. They can be used to perform sophisticated computations on tree structures through the use of recursion, and have found many novel applications.
2ndQuadrant’s Marko Tikkaja has enhanced WITH queries to be writeable too, by introducing Writeable CTEs. This enables various useful patterns, such as moving records from one table to another within a single, atomic statement/snapshot. Techniques like these have found wide applicability, as for example in partition management, where swathes of records must be moved to a new child table.
