Heads in the cloud at CHAR(10)

Whether or not you made it our CHAR(10) conference last month, you can now relive part of the experience by downloading the conference slides. Some of those were posted live during the conference, some showed up later, but almost everything is there now. Sadly, Nic Ferrier’s entertaining presentation about how WooMe (acquired by Zoosk) was scaled up using Londiste and Django wasn’t available in a form we could easily replay. For that one, you certainly did have to be there, in more ways than one.
The two talks I found the most informative were the updates on the states of pgpool-II and pgmemcache. Both those tools have that slightly frustrating combination of being really useful and a bit underdocumented relative to how complicated they are (in English at least!), so getting additional insight into them from those actually working on the code was great.
Markus’s discussion of MVCC and clustering also had a fun twist to it. His talk ended with a performance analysis of his Postgres-R against pgpool-II, Postgres-XC, and PostgreSQL 9 using Streaming Replication plus Hot Standby, all used in cluster configurations to accelerate dbt2 test results. I don’t quite agree with his premise there that network congestion is the most vital cluster component because “overall computing power, memory and storage capacity scale easily”–that’s not always true–but it was satisfying to see that the PG9 HS/SR pairing is efficient in that regard.
The conference set aside two sessions to talk about general clustering topics in a relatively unstructured way. The more heated discussion talked about what would make PostgreSQL deployments into cloud computing infrastructure easier to deal with. That stirred up enough ideas to generate two blog entries from my coworkers already.
One of the ideas from that session I found particularly interesting was noting that if you have a deployment where nodes are added in the “elastic” way people like to discuss in relation to the cloud concept, there’s a manageability gap there right now in terms of making it easy for applications to talk to that node set. If you can put pgpool-II or pgBouncer between your application and the set of nodes, you can abstract away exactly what’s behind the nodes a bit right now. But now you’ve added another layer and therefore a potential bottleneck to the whole thing. That’s the opposite of what elastic cloud deployments are supposed to be about: just adding capacity as needed with minimal management work.
A solution approach suggested was making it easier to build a database routing directory at the application level, so that apps can just ask for the type of node needed and get one to directly connect to. Nodes can just register themselves to the directory as they are brought online (or are taken down). This has similarities to some components that are already floating around. The directory lookup part you might put into LDAP; PostgreSQL servers can already announce themselves via ZeroConf AKA Bonjour. It’s not hard to imagine bolting those two together, putting an application layer that does LDAP lookups connected to a routing backend that tracks available nodes via any number of protocols. As usual, the devil’s in the details. Things like timing out failed nodes, distinguishing between read and write traffic (pgpool-II does it by actually parsing the SQL, which is expensive), and making the resulting directory broadcasts cached for high performance while also featuring cache invalidation are all tricky implementation details to get right.
With PostgreSQL 9.0 featuring more ways than ever to scale upward database architecture, this problem isn’t going away though. I’m not sure what form yet people are going to solve it in, but it’s a common enough problem that it’s worth solving.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *