Last week at the CHAR(10) conference we had a workshop on “Cloud Databases”. To put it simply: what to do when the use case requirements exceed the resources available in the database server.
This was a main topic of the whole conference, and several solutions have been illustrated during the day. A common theme has been that no solution fits all the use cases, and that each solution comes with its cost; hence you have to choose the solution that your use case can afford.
Another common (albeit implicit) point has been the focus on “high-level” solutions, that is: connecting several database servers at a higher level to emulate a single server with larger resources.
An obvious advantage is that you don’t need to alter the well-scrutinised PostgreSQL code; a drawback is that using multiple database servers with their independent timelines you are losing some useful properties. Two examples: the partial loss of transactional semantics generates conflicts; pre-parsing each query outside the database introduces limitations on the accepted queries.
The discussion was quite interesting, and when Dimitri Fontaine mentioned remote tablespaces I started wondering around a related but distinct idea, namely: whether a lower-level approach to the problem of resource pooling would really be impractical. Before I could elaborate on the details the workshop ended, and I could only sketch the idea to some of the people that were around the whiteboard (among which Gabriele Bartolini, Nic Ferrier, Marko Kreen, Hannu Krosing, Greg Smith) together with the basic questions “does it look feasible?” and “does that resemble something you already know?”.
A brief sketch: an application stack can be represented in this way
(application) --> (connection) --> (db server) --> (resources)
where the resources used by the database include storage, RAM and CPUs. The purpose is to allow the application to command more resources in order to increase capacity and speed. “Clever” applications that manage several databases can be represented as
(application) --> (connection) --> (db server) --> (resources) | +---------> (connection) --> (db server) --> (resources)
while “connection pooling” solutions can be represented as
(application) --> (connection) --> (db server) --> (resources) | +---------> (db server) --> (resources)
by “lower-level” solutions I mean something like
(application) --> (connection) --> (db server) --> (resources) | +---------> (resources)
which might resemble something familiar, but it is not what I am proposing here. To explain the difference I can increase the detail and write
(resources) = (virtual resources) --> (physical resources)
to represent the fact that at the lowest level you can have a non-trivial mapping between physical objects and virtual ones. For instance, SAN storage or RAID striping can provide larger virtual disks by joining together smaller physical disks. Such cases could be pictured as
(application) --> (connection) --> (db server) --> (virt.res.) --> (ph.res.) | +--------> (ph.res.)
My proposal is to pool resources at the database server level, so that we can have a more efficient “virtualisation” by using the knowledge of the specific use cases for each resource (CPU, RAM, disk), and at the same time we can avoid may of the difficulties of the transactional paradigm. The picture would be:
(application) --> (connection) --> (db server) --> (virt.res.) --> (ph.res.) | +--------> (virt.res.) --> (ph.res.)
The advantage is that we don’t need to manage all the possible use cases for each virtual resource; we just have to manage (and optimise for) the use cases that are actually needed by PostgreSQL. For instance: WAL should still be written in local “unvirtualised” storage, the bgwriter will access local and remote resources (RAM and disk), etc.
Some final words about reliability. To operate properly the whole system needs each subsystem; partial failures are not managed, because this architecture is not redundant. It is a distributed system, but not shared. If this architecture could provide cheap and simple scalability via a virtual database server which is functionally equivalent to a physical server with larger resources, then high availability could be obtained in the standard way by setting up two identical virtual servers in a Hot Standby configuration.
Network quality has a large impact on the overall performance; this design might be useful only if you have an array of machines in the same LAN, not only for speed reasons but also because a network failure would actually be a system failure. Even with these restrictions, my opinion is that having this option would be quite useful.
This is still a sketch, to be used as a reference for further discussion. Next possible steps:
- to make a detailed list of the resource use cases
- to decide which technologies can help best in each use case
- to estimate the actual performance/development costs