Improving PostgreSQL performance on AWS EC2
Questions periodically come up on the PostgreSQL mailing list regarding Amazon EC2 and how to get PostgreSQL to perform well on it. The general feeling on the list appears to have been that EC2 performs very poorly for database workloads, at least with PostgreSQL, and it doesn’t seem to be taken particularly seriously as a platform. I certainly thought of it as a last resort myself, for when other constraints prevent you from using a proper VPS or real hardware.
I had the chance to meet with a high level AWS support engineer last week. It’s prompted me to write up the basics of configuring EC2 instances for decent PostgreSQL performance. I haven’t had the chance to back the following advice with hard numbers and benchmarks yet, so remember: Always test everything with a simulation of your workload.
Before I can get into the configuration details, I need to outline how EC2 storage works.
EC2 storage types
EC2 instances have two very different storage options. These are explained quite well in the storage documentation, so I won’t repeat the details. It’s sufficient to say that the Instance Store is local to the VM and is unrecoverably lost if the VM is stopped. It is backed by local hard drive or SSD storage. EBS by contrast is durable and isn’t lost when the VM is stopped. It is more like NBD or iSCSI than local storage; it’s a network block device protocol with the corresponding latency and throughput issues that entails.
If you’ve been facing performance issues, you might’ve seen the High I/O instance types and thought “That sounds ideal for my database workload”. I thought so myself – but they won’t actually make any difference if your database storage is on EBS volumes. The High I/O instances have fast instance store storage, but aren’t any different in terms of EBS. So if you’re been using a High I/O instance with a database that’s on EBS volumes you’re not gaining any benefit from the enhanced instance store I/O you’re paying for, and are better off on an EBS-optimized large instance.
For a durable database where you care about your data, what you want instead of a high I/O instance is an EBS Optimized instance, which has guaranteed network bandwidth to the EBS storage servers. Use EBS volumes with provisioned IOPs and, for best results, stripe a group of EBS volumes into a RAID10 array. See increasing EBS performance.
You might also want to consider putting your temporary tablespaces on the instance store, rather than EBS. That way things like on-disk sorts won’t compete with other data for EBS bandwidth, and will get to use probably-faster local storage. (See also tablespaces).
As always, if you care about your data, back it up and use replication and/or PITR.
This is just the basics – there’s a lot more tuning, testing and performance work that can be done from here, not to mention work on HA, automated backup, and other data protection measures. Just using the right instance type will get you to vaguely reasonable performance; you won’t get the most out of your (virtual) iron without a lot more work.
If you have data you can re-generate, or you’re running streaming replicas from a master that you can just re-clone from the master, you don’t necessarily need durable storage.
In this case, consider using the instance store, which is available on every instance type except Micro.
Since you have no hope of recovering the database if the instance terminates or stops anyway, you might as well use
postgresql.conf, saving a lot of I/O and synchronization. This is one of the few situations where turning fsync off is acceptable; never do it for data you expect to preserve, as you’re effectively giving the database permission to corrupt your data if the power fails or the host OS crashes.
Since you’re using the instance store, you can also potentially benefit from one of the high I/O instance types, using an array of SSDs for extremely fast seeks and high IOPS.
Do not use the instance store for data you want to preserve!. Use it only for analytics work on data sets you can re-generate, or for replicas of data where the master lives elsewhere.
A non-durable storage based setup needs extensive testing and is likely to need some different tuning and configuration to a normal PostgreSQL install. Don’t expect amazing performance out of the box, you’ll need to do more than just fire up a High I/O instance and set up a default Pg install on it.
The next step, time permitting, is to quantify the above information with hard numbers. How does a High I/O instance with the DB on instance store perform compared to an EBS optimized x.large with a RAID10 striped set of provisioned I/O EBS volumes? Or the ubiquitous micro instance?
I’d like to look at integrating S3 and Glacier support into barman and repmgr down the track, as it’d be really interesting to have basebackups and WAL automatically stored in S3, archived to Glacier, used to provision new instance-store based replica servers, and more.
I’ll be doing more with EC2 in the coming months, so I expect to be able to offer more insight into future performance issues.
Gotta disagree about not using instance storage for data you care about. If you have one or two servers receiving WALs from the master you are covered in case of the master instance going down. Use something like WAL-E to be safer, which will put the WALs and base backups on S3. https://github.com/heroku/WAL-E
EBS isn’t very reliable (in terms of performance and reliability). Amazon’s last two major outages have screwed people who rely on EBS.
I’ve found instance storage to have much more consistent/steady performance than EBS as well, though I haven’t looked at provisioned IOPs yet.
If you do any testing, would you mind testing how disabling fsync and full_page_writes performs on instance storage as well?
I was interested in the idea of archiving WAL to S3; I’m glad to see that tools to take care of that already exist.
What I’d really like is to have the feature integrated into barman and/or repmgr, so base backups and datadir clones could be done through S3.
As for instance store: Yes, with proper replication and backup I’m sure it’s fine, but it’s really important to get that right. People often provision things shoddily, don’t test their replication and backups, etc. I’m reluctant to recommend instance store to people who don’t know exactly what they’re doing and why.
Joe, I think you’re quite right about instance store. A properly configured setup should be fine with a couple of replica servers in different AZ’s plus good backups. It takes the EBS system out of the possible set of failure causes, too. Nonetheless, new users are probably best staying on EBS for that extra safety net.
Instance store is also massively cheaper than EBS if you want good performance. PIOPS is great, but costs the earth.
The take away from your post seems to be that EC2 is not a good option for PG hosting. If that’s the case, can you point to what vendors you would recommend for high(er) performance PG hosting?
Joe’s comment may be informative to you there: He’s essentially suggesting that EC2 is OK for hosting Pg, you just have to re-think how you use it. If you rely on replication and backups for data protection you can use instance store – and since the instance store vanishes if the host crashes you can use fsync=off and disable full_page_writes, the data can’t be protected even if Pg tries.
This should result in much more acceptable performance, but it requires very careful consideration of your replication, backups, and fail-over. You’ll need to test and monitor the system very carefully to avoid catastrophic data loss, so it’s not an option for basic users.
As for alternative hosting, I’ve done some recent testing on LunaCloud and found their systems’ performance to be immensely greater than that of EC2 for a host of equivalent or lesser price. (The downside is it’s much less featureful in terms of API control, multilevel security, role based access control, etc etc). I’m not recommending them specifically, in that I’m sure many other hosting providers offer options with good I/O performance too. If I were doing an evaluation of the current options a couple of others I’d try (far from an exhaustive list) are Linode and Hetzner.de. Rackspace too, though their prices are very steep. Honestly, you should do your own testing as your requirements almost certainly encompass more than price and performance, so everyone’s different. Most cloud-y hosts offer free trials.
No matter what you do, assume your host can just go away at any time. Replicate to somewhere else and also take regular backups. Test your backups. I wrote a bit about data protection for Pg a while ago on my old blog, just before I joined 2ndQuadrant.
Thanks for your insight Craig.
I’m spending this year reading through your Stackoverflow posts.
I tried with fsync=off and the performance was 10 times better. , but I my db got corrupted twice in a week. Had to turn on fsync and it is super slow again
I am using IOPS
Any notes on Amazon RDS for Postgres? http://aws.amazon.com/rds/postgresql/
There isn’t much you can do to tune RDS, because you don’t get any control at this low level. That’s part of the point – you let Amazon deal with the low level stuff, and in exchange you pay a premium on top of the underlying EC2 instance pricing for a service that theoretically “just works”.
I haven’t done much with RDS, but what I’ve seen so far suggests that you can’t really do much beyond twiddling the I/O cost parameters, setting work_mem per-session, etc.