Hinting at PostgreSQL
This week’s flame war on the pgsql-performance list once again revolves around the fact that PostgreSQL doesn’t have the traditional hint syntax available in other databases. There are a mix of technical and pragmatic reasons behind why that is:
- Introducing hints is a common source of later problems, because fixing a query place once in a special case isn’t a very robust approach. As your data set grows, and possibly changes distribution as well, the idea you hinted toward when it was small can become an increasingly bad idea.
- Adding a useful hint interface would complicate the optimizer code, which is difficult enough to maintain as it is. Part of the reason PostgreSQL works as well as it does running queries is because feel-good code (“we can check off hinting on our vendor comparison feature list!”) that doesn’t actually pay for itself, in terms of making the database better enough to justify its continued maintenance, is rejected by policy. If it doesn’t work, it won’t get added. And when evaluated objectively, hints are on average a problem rather than a solution.
- The sort of problems that hints work can be optimizer bugs. The PostgreSQL community responds to true bugs in the optimizer faster than anyone else in the industry. Ask around and you don’t have to meet many PostgreSQL users before finding one who has reported a bug and watched it get fixed by the next day.
Now, the main completely valid response to finding out hints are missing, normally from DBAs who are used to them, is “well how do I handle an optimizer bug when I do run into it?” Like all tech work nowadays, there’s usually huge pressure to get the quickest possible fix when a bad query problem pops up.
If PostgreSQL didn’t have some ways to deal with that situation, there would be no serious production PostgreSQL databases. The difference is that the things you adjust in this database are more rooted in influencing the decisions the optimizer already makes in a fairly subtle way, rather than just you telling it what to do. These are hints in the literal sense of the word, they just don’t have the some user interface for hinting that users of other databases new to PostgreSQL go looking for.
With that in mind, let’s take a look at what you can do in PostgreSQL to work around bad query plans and optimizer bugs, particularly the things that many people seem to think can only be solved with hints:
- join_collapse_limit: This adjusts how much flexibility the optimizer has to reorder joins of multiple tables. Normally it tries every possible combination when joins can be rearranged (which is most of the time, unless you’re using an outer join). Lowering join_collapse_limit, perhaps even to 1, removes some or all of this flexibility. With it set to 1, you’ll get the joins in the order you wrote them in, period. Planning large numbers of joins is one of the hardest things for the optimizer to do; each join magnifies errors in estimates, and increases query planning time. If the underlying nature of your data makes it obvious what order joins should happen, and you don’t expect that to ever change, once you figure out the right order you can lock it down using this parameter.
- random_page_cost: Defaulting to 4.0, this parameter sets how expensive seeking to disk to find a random page on disk is, relative to a reference value of 1.0. Now, in reality, if you measure the ratio of random to sequential I/O on regular hard drives, you’ll find this number is closer to 50. So why 4.0? First, because it’s worked out better than larger values in community testing. Second, in many cases index data in particular will be cached in memory, making the effective cost of reading those values lower. If, for example, your index is 90% cached in RAM, that means that 10% of the time you’ll do the operation that is 50X as expensive; that would make your effective random_page_cost about 5. This sort of real-world situation is why the default makes sense where it’s at. I normally see popular indexes get >95% cache in memory. If your index is actually much more likely than that to all be in RAM, reducing random_page_cost all the way down to just above 1.0 can be a reasonable choice, to reflect that it’s no more expensive than any other read. At the same time, random seeks on a really busy systems can be much more expensive than the expectation you’d have from just looking at single-user simulations. I’ve had to set random_page_cost as high as 60 to get the database to stop using indexes when the planner was mis-estimating how expensive they would be. Typically that situation comes from a sensitivity estimate error on the planner’s part–if you are scanning more than about 20% of a table, the planner knows using a Sequential Scan will be much more efficient than an Index Scan. The ugly situation where I had to force that behavior to happen much earlier than that came about when the planner expected 1% of the rows to be returned, but it was actually closer to 15%.
- work_mem: Adjusts how much memory is available to queries doing sorting, hashing, and similar memory-based operations. This is only a rough guideline to queries, not a hard limit, and a single client can end up using multiples of work_mem when running a query. Accordingly, you need to be careful not to set this value too high in the postgresql.conf file. What you can do instead, though, it set it before running a query that really benefits from having extra memory to hold sorting or hash data. You can sometimes find these queries from logging slow ones using log_min_duration_statement. You can also find them by turning on log_temp_files, which will log every time work_mem is too small, and therefore sorting operations spill to disk instead of happening in memory.
- OFFSET 0: PostgreSQL will rearrange subqueries into the form of a join, so it can then use the regular join order logic to optimize it. In some cases, that decision can be a really bad one, as the sort of thing people tend to write as subqueries seem a bit more difficult to estimate for some reason (I say that based on the number of such troublesome queries I see). One sneaky trick you can do to prevent this logic is put OFFSET 0 onto the end of the subquery. This doesn’t change the results any, but inserting the type of Limit query node used to execute OFFSET will prevent rearrangement. The subquery will then always execute in the way most people expect it to–as its own isolated query node.
- enable_seqscan, enable_indexscan, enable_bitmapscan: Turning off one of these features for looking up rows in a table is a fairly large hammer to highly recommend avoiding that type of scan (not always preventing it–if there’s no way to execute your plan but a seqscan, you’ll get a seqscan even if the parameters is turned off). The main thing I recommend these for is not to fix queries, it’s to experiement with EXPLAIN and see just why the other type of scan was preferred.
- enable_nestloop, enable_hashjoin, enable_mergejoin: If you suspect your problem is the type of join being used rather than how the tables are being read, try turning off the type you are seeing in your plan using one of these parameters, then run EXPLAIN again. Errors in sensitivity estimates can easily make a join seem more or less efficient than it really is. And, again, seeing how the plan changes with the current join method disabled can be very informative for why it decided on that one in the first place.
- enable_hashagg, enable_material: These features are relatively new to PostgreSQL. Using Hash Aggregation aggressively was introduced in version 8.4, and more aggressive materialization in 9.0. If you are seeing those types of nodes in your EXPLAIN
output and they seem to be doing something wrong, because this code is so much newer it’s a bit more likely to have a limitation or bug than some of the older features. If you had a plan that worked fine in older versions of PostgreSQL, but uses one of these node types and seems to perform much worse as a result, disabling these features can sometimes return you to the earlier behavior–as well as shine some light on why the optimizer did the wrong thing as useful feedback. Note that this is generally the way more advanced features tend to get introduced into PostgreSQL: with the option to turn it off for troubleshooting purposes, if there proves to be a plan regression relative to how earlier versions executed things.
- cursor_tuple_fraction: If you don’t intend to read all of the rows back from a query, you should be using a cursor to implement that. In that case, the optimizer tries to prioritize whether it gives you the first row back quickly, or whether it prefers to optimize the whole query, based on this parameter. By default the database assumes that you’ll be reading 10% of the query back again when you use a cursor. Adjusting this parameter lets you bias it toward expecting you to read less or more than that.
All of these parameters and query tweaks should be consider triage adjustments. You don’t want to run with these in place forever (except perhaps for join_collapse_limit). You use them to get out of a jam, and then hopefully you’ll figure out what the real underlying cause of the bad plan–bad statistics, optimizer limitation/bug, or something else–and then address the problem from that direction. The more you are pushing optimizer behavior in a direction, the more exposed you are to future changes in your data making that push no longer a correct one. If you use them right, as a way to study why you got the wrong plan (the approach I used in the query optimization chapter of PostgreSQL 9.0 High Performance), the way you hint at things in PostgreSQL should result in you leaving every run-in with bad optimizer behavior a little more saavy about how to avoid that class of problem in the future
Given that many people quickly assume the optimizer is at fault (rather than their own bad queries and/or indexing) the PostgreSQL party line seems completely justified. Realistically if you could not work around using different indexing or one of the tricks you or Jeff Davis suggested, most users can open up the code again even though this is a last resort. (Really bad query performance tends to show up Monday morning after a weekend upgrade.) Query hints are such an unmitigated evil in applications that this is a fair trade-off for any organization I know of. Cheers, Robert
Why the F&#% Doesn’t Postgres Have Hints?!?!
Once again someone has brought up the idea of having a hints system in Postgres, which means once again we’re all subjected to watching people trod out the same tired, faulty, and even self-contradictory reasons to try to justify the idea that Postgres do
Hints are very helpful because they enable us to deal with one off situations without much pain. All other major databases have hints. Postgresql is the only database that doesn’t. Being cynical and experienced DBA, my explanation for that is that the absence of hints sells consulting hours. I simply stopped using PostgreSQL and am warning everybody about the absence of an expected application tuning feature. Optimizer hints are not “bad”, they are necessary. If a database doesn’t have it, I will simply not use it, period.
Other DBMSs have hints. They being commercial concerns, it would seem the suspicion should be of hints (consequence of faulty optimisers) helping selling consulting hours. PostgreSQL wants to solve the optimiser issues, so refuses to patch over them with hints.
Even if you accept the team’s assertions at face value, there is a difference between “wants to” and “has the time to”.
Some issues, such as poor usage of Gin indexes for full-text search (the planner prefers a filter instead, which is hideously expensive due to @@/ts_match_vq and toasting) have come up repeatedly for years and remain unresolved. In this case, it is possible to trick the planner into using the query by adding ‘offset 0’ on the query, but if you could just say “use this index!” it might not be necessary.
Do have to agree with MGogalia
@MGogala That makes no sense. PostgreSQL is an open source tool, with the body responsible for it not even selling support or even training services as far as I’m aware. The way you sell support dollars is by making your system so impossible to use that no one can be effective at using it without spending money on you. The best industry example of that would have to be Oracle, which hides tons of resources behind a paid subscription service and is so difficult to figure out that the classes and certifications that cost thousands of dollars might even seem worth it. PostgreSQL, on the other hand, provides an incredibly intuitive database that does what you would expect it to do in the vast majority of cases as well as some of the best documentation I’ve seen for ANY software. They also provide an entirely free user mailing list where you can pose your own questions, with major developers who know the system inside and out (Tom Lane comes to mind) actually participating! For FREE! If PostgreSQL is trying to sell consulting dollars, they’re doing the worst job ever. I’ve managed to teach myself quite a bit about using a database effectively just by reading around about PostgreSQL on blogs, the mail lists, and StackOverflow. No training or consulting required.
This particular feature is missing for a reason: a quick, dirty solution becomes a pain point (read: costs dollars) later down the line. That’s pretty sound reasoning. Instead, PostgreSQL gives you tools to figure out the real problem. If you don’t have time to track down the real problem, you can perform a quick diagnosis using the options presented here, and 99% of the time, the diagnosing process will produce a quick hack to fix the problem in the short term until you can find the real cause. You should be trying to diagnose, anyway, not just using short term hacks as your go-to answer.
If you don’t want to use PostgreSQL, that’s your choice, but it’s your loss, not ours.