Why PostgreSQL is better than MySQL
Someone recently tweeted about the fantastic news that MySQL fixed a bug.
Now in my world, bugs get fixed quickly and well. Bugs happen and they need to be fixed. It never occurred to me that we should ever tweet or blog about the fixing of a bug. I guess I assume it’s just quality: bugs get fixed, no drama – people depend upon us to do just that so that the (literally) millions of PostgreSQL servers out there run well. That’s our job and I’m happy and proud to do that job alongside my colleagues at 2ndQuadrant and my colleagues in other PostgreSQL companies and in the wider community.
So the bug in question was “number 199″… check this out
http://lefred.be/content/bye-bye-bug-199/
It’s always been a big argument in the PostgreSQL community about whether we need a bug tracker or not. Obviously, if you fix bugs, why track them? And if you don’t fix bugs, why track them? Hmmm, not sure that’s a hugely rational argument or not, but let’s look at the MySQL bug tracker for bug 199
http://bugs.mysql.com/bug.php?id=199
Yes, that bug was filed 14.5 years ago. And now it’s fixed.
Yay?
Yay? 14.5 years guys! That’s long enough for my pre-school children to have gone to school, grown up and start University. OMG, or if I was crueler, LMFAO.
Maybe its wrong to start a flame war on this, but I need to say, at a personal level, that this is exactly the reason why I have spent years contributing to PostgreSQL and never once contributed to MySQL.
PostgreSQL 10 just came out, if you didn’t hear. I’m certain it will contain bugs, but not that many. I’m also equally certain that we will fix those bugs and that you can rely on the quality and robustness of PostgreSQL.
If you use PostgreSQL in the cloud or from a services company, make sure you ask them what they have contributed to PostgreSQL and what their capability is to fix bugs and with what SLA. The answer to that question is what keeps open source well funded for these latest developments and future ones too.
If you’re interested in learning more about the robustness and security of PostgreSQL, specifically as it applies to enterprise usage – we will be covering that and more at the 2ndQuadrant PostgreSQL Conference in New York (Nov. 7) and Chicago (Nov. 9) – full schedule of topics can be found here or if you’re interested in getting more hands on – we are offering trainings in the following areas in New York on Nov. 6: PostgreSQL Database Security, PostgreSQL-BDR, and PostgreSQL-XL 10.
Sounds a little self-serving, no?
Sure – bugs happen. No, they shouldn’t exist for 14+ years before being fixed. MySQL (through various owners over time) is still, more-or-less, and open-source project. Obviously, that bug wasn’t considered “bad” enough to worry about (IOW – there were higher-priority issues for well over a decade that kept pushing that one down the list).
But it *sounds* like you’re saying, “we don’t track bugs” – *and* that tracking bugs adds to a reason for someone to dislike a product.
Tracking bugs highlights that there are some. Having bugs and not tracking them would be much worse. But clearly fixing bugs is the best option and then you don’t have as much to track, just a commit log.
On the point that it was not a serious bug, did you read the nature of the bug?
—
Description:
Innodb autoincrement counter is lost on restart which is very confusing as
after server restart server starts to get different auto_increment values than
if it was not restarted.
This can possibly result in replication breakage.
—
Then, feel free to read the numerous data integrity issues users report in the bug comments over the next 14 years. The fact that it wasn’t a high enough priority to be addressed in a reasonable time is one of myriad reasons I would never choose MySQL.
It does not result in replication problems, but it can result in problems from the combination of a crash and poor data design – f.e. the case of having a mirrored table, say, an “archive” table, that you shuffle rows to from another “main” table. After a crash you may end up with the main table’s primary key being lower than the highest primary key in the archive table, denying you to shuffle further rows over to the archive table.
Arguably fixing a 14 year old bug is definitely something to shout about. Does this imply that ALL of the higher priority bugs that kept this one at the bottom of the list have also been quietly fixed?
For my part and as a product owner, I’d have closed it off after a few years. Fixing it is likely to impact all the workarounds that have been implemented in the interim.
“Fixing it is likely to impact all the workarounds that have been implemented in the interim”—nicely sums up everything that is wrong with current software development, across languages and applications. It is the reason you still get no reliable character count in JavaScript by using `str.length`. You can now do `Array.from(str).length`, which has the added value of introducing un-obvious subtle semantic differences into constructs that shouldn’t show them. In principle, Python does it right; there, `7/2` used to give `3` because integer division, which was widely regarded a mistake. For some intermittent versions, you could say `from __future__ import division` to make `7/2` equal to `7.0/2`, now the latter is standard behavior. Some say it’s nasty having to introduce switches into your source, and yes, that can cause subtle bugs when copy-pasting. But isn’t that much better than to mandate that any system anywhere has to keep propagating problematic behavior just because old code might break otherwise?
Great! Now when can we expect ‘select count (*)’ to be faster than O(n) in PostgreSQL? I’m even willing to lock the whole table to make things easy. Give me some way to get an exact count of rows in a table at some time under some set of conditions without having to scan the table. Please.
The speed of SELECT count(*) on PostgreSQL is not a bug, its a conscious choice for the benefit of most users, given that people that actually want to do that in production are rare. PostgreSQL could choose to keep such a count for fast access but this requires us to effectively unpick the performance and concurrency advantages of MVCC that most users wish. If you want to have an exact count, just have a trigger that does +1 on INSERT and -1 on DELETE and you’ll understand why we don’t do that by default. PostgreSQL supports approximate methods of determining the count for various users.
Care to give a pointer on why that trigger is so bad for performance? It does look like it should be a very fast operation.
Sure. Anything that serializes access to a table contrains performance. The trigger shown will cause the total value to be locked until commit, causing all other concurrent sessions to wait, so just one at a time. PostgreSQL goes to great lengths, very, very great lengths to ensure scalability so we avoid such things, so that the default is always multiple concurrent users able to perform reads and/or writes. Application developers frequently serialize things explicitly and then wonder why things perform poorly.
In addition to the locking issues, you’ll also generate some bloat (dead rows) in your counter table.
There’s definitely a time and place for doing it, but it’s not going to be to the net benefit of most users most of the time. Maybe it’d be nice if PostgreSQL had a table option like track_row_counts that automatically did the required locking and counter maintenance, but I suspect it’d be hard to make a convincing argument for it.
There are various fancy ways to optimise this with a counter that contains all known-committed rows, plus extra count-adjustment rows maintained by each transaction that’s modifying the table. You sum the counts table to see the current row count. This can be done in a more concurrency friendly way, but requires periodic cleanup to “roll up” committed count adjustments into the main counter. Perhaps we could, again, do this in core PostgreSQL with help from autovacuum and autoanalyse. But I find it telling that nobody seems to have found this to be enough of a problem to dedicate themselves to solving it.
There are a number of other techniques for improving aggregation over large data sets. For example, if you can do append-only workloads, you can store the count at a particular point and roll forward from there. This is nice because you can pre-aggregate a bunch of specific conditions and adjust from there. Also this avoids both locking and bloat issues but adds a lot of complexity to queries.
So COUNT(*) (without WHERE clause, of course) is now fast on InnoDB…? Faster than Postgres?
I would definitely like a way to get quick total count, but that hasn’t been instant since old-school MySQL (MyISAM). So trigger it is I guess in the cases it’s desired…
If the table has a primary key it’s always instant with InnoDB, and always has been. The difference lies in that MyISAM stores the row count in the table meta data and can thus tally the number of rows instantly, no matter if the table has indices or not. InnoDB is still faster than Postgres in this topic, irregardless of indices.
What’s the purpose of this post? To prove that Postgres is superior to MySQL? This doesn’t sound convincing when coming from a Postgres developer or vendor! I started using Postgres 3 years ago after many years of experience with MySQL and indeed it’s a better and more robust database overall but if you want to be objective you can also find cases where MySQL is better than Postgres. If you are picking on auto increment then for so many years Postgres implemented AI that has been convoluted via the sequence objects – it’s only now in Postgres 10 that finally we have a simple and sane AI implementation as it should be. I know it’s not as serious as this bug in MySQL but still a nuisance.
MySQL has superior support for all kinds of character collations – something that is slowly beginning to come to Postgres in version 10 and we still need to wait for future releases to catch up.
MySQL is easier to use for simple projects and that’s why it’s the most popular one because most web sites need a simple database. At certain times this is an advantage and whenever I need to code a simple web site then I choose MySQL over Postgres.
But to me the weakest element of Postgres is its provided administration tools. For people who love command line then psql is definitely great but for others PgAdmin is a very poor piece of software. Just compare what you get from MySQL – MySQL Workbench, which apart from standard administrative stuff can be used to design databases graphically using diagrams! In comparison Postgres’ offering is really discouraging, especially now that PgAdmin 4 is the new but very awkward piece of software written by a team which is apparently not very competent in writing desktop applications. I know this is a separate tool from the main database but still it is bundled with Postgres and is an (anti-)advertisement to newcomers of what Postgres is capable of.
What I would prefer to read here instead of articles nitpicking single MySQL bugs to prove how Postgres is better – is to read how Postgres devs look for areas where MySQL is better and try to improve Postgres in those areas to make it as good or even better. And in recent years this kind of development has really been happening with so many great features and I really appreciate that. But this kind of criticism of other databases is totally useless, especially that it’s clear that it’s biased, one-sided and lacking any objectiveness.
How is MySQL easier to use? I have never understood that claim.
@Dianne Skoll > you know, easier to user when you have no idea what you’re doing. That kind of easier to use: implicit conversions, tons of must-have yet missing features (partial indexes, full outer joins, deferrable integrity constraints (does the word “constraint” even have a meaning for MySQL?), JSON documents indexing, materialized views, …). MySQL is the PHP of RDMS. You can type anything and it will execute, therefore, it’s easier to use. “We don’t fix bug because there are no bugs: every input is valid, you just typed the wrong input.” Still, everybody use it for no good reason.
Many thanks to PostgreSQL guys for maintaining such a stable and fully-featured DBMS and keeping it FLOSS. Let’s be honest though, even though Postgres is a great software, it also has some flaws:
– The upgrade issue (pg_upgrade is not good enough and pglogical is great, but there should be something like that built-in)
– Slow aggregation (I know that’s due to the MVCC model, but that’s still slower than most competitors; there should be some way around this)
– Replica limitations (the hot stand-by stuff is just horrible, but also a consequence of the MVCC model)
– Inability to add NOT NULL constraints and default values on big tables without involving major downtime (the “CHECK” workaround ain’t sufficient)
– Generally, Postgres is not good in “hot” stuff. I think Uber summed everything up in this post last year: https://eng.uber.com/mysql-migration/ . Every point shouldn’t be take at face value, but there’s definitely some lessons that should have been learnt out of this.
I suppose you can’t understand it until you’ve actually made serious use of both MySQL/MariaDB and Postgres. For one, notably simpler and more easy-going syntax in terms of what’s covered by basic SQL, which makes up 95% of all SQL, is something MySQL/MariaDB benefits from. It also has several handy syntactical features that Postgres does not.
Please give examples of these handy syntactical features? How many of those are missing from MySQL Compatibility library? https://github.com/2ndQuadrant/mysqlcompat
MySQL used to be easier than PostgreSQL. I remember when I was overjoyed at the fact that I could stop using MySQL for prototyping because PostgreSQL got ALTER TABLE DROP COLUMN.
However, the fact is, at a certain point whatever database is easier is the one you are most used to. These days when I try to do anything in MySQL I find myself going up against the limits of the features they support. Doing anything advanced with PostgreSQL is something I find far easier than MySQL but I cannot rule out the fact that it may be just that I am more used to PostgreSQL at this point.
For the administrator tools, Here I kind of agree and kind of disagree. It is certainly true that commercial companies do a better job of providing polished GUIs than open source projects do. But I guess I have generally found that I would rather have solid backend software than pretty tools. The command line tools are *extremely* powerful (they can even output tables in LaTeX format if you like) and of you need a graphical design tool, there’s always Microsoft Access as a design and front-end tool (use as a linked server) as well as a bunch of more DBA-oriented commercial tools out there.
You are right about the features. I find MySQL easier for simple databases. Part of its simplicity comes because of fewer number of features. Simple web sites and the most popular CMS systems in PHP use MySQL as a dumb data store, they don’t even use FKs! For this kind of usage MySQL is easier, especially for those who are novices. But certainly, once you get past a certain point of requirements and need for features then Postgres becomes the easier one because MySQL will require ugly workarounds (if possible at all) for things native to Postgres.
As for the graphical admin tools I somewhat I agree with you that a solid backend is more important. However, I feel that Postgres could win over some of the less experienced MySQL users if a convenient admin tool was provided. Also, an architecture aimed at easy deployment in shared hosting (like being able to CREATE EXTENSION without admin rights, etc.). Not that Postgres must have those things but they would attract new users and in effect strengthen the community.
I am curious to find out what application places so much importance to count the number of rows in a table. Its usefulness is highly overrated.
Paging?
Ravi, I do periodic updates of a database that comes from a 3rd party. This involves several million updates, inserts and deletes every few weeks. The 3rd party provide a list of expected values after the update has completed. As such, the importance of an exact row count is paramount. Because of the way we replicate (using ZFS snapshots), we have also found it beneficial to set tables to unlogged, turn of autovacuuming and even (shudder) the WAL. Speed is of the essence, and once we know, via the counts, that everything is accounted for, we replicate the data to DBs where we don’t care about counts, but we do care about WAL logging, MVCC, etc.
“Displaying 100 results out of 103556”
Now the main case here is not just a count(*) from table, but a count(*) from table where certain conditions are met. I am not sure MySQL has PostgreSQL beat in that case.
I love postgres as much as the next guy but this post was self serving as shit. What was the point of this really?
Have some respect man.
The point of this is to explain how the roots and mindset of PostgreSQL project differ substantially to MySQL, and how that leads to PostgreSQL being a better product. If by “self-serving” you mean “pro-PostgreSQL” then yes, it is. As far as self-serving, no I don’t see anything specific to me personally or to my company, the article is purely about PostgreSQL and why it is better.
Fixing bug #199 was indeed long overdue, but attempts to use it as a
pretext for yet another “Why PostgreSQL is better than MySQL” post looks a
little self-serving as others have already mentioned.
Let me give you an example. PostgreSQL 10 is out with great new
features. Let’s go over those that have been available in MySQL for many
years:
– logical replication. In MySQL it was introduced in version 3.23, so in this regard
PostgreSQL is about 17 years late to the party.
– native partitioning. First appeared in MySQL 5.1 released in
2008. Again, 9 years before PostgreSQL
– Quorum Commit. The same functionality is known as “semi-synchronous
replication” in MySQL and has been available since MySQL 5.5 released
in 2010
– OS-independent collations. MySQL has always used its own portable library,
PostgreSQL 10 can finally not depend on OS libraries, but hey it’s
2017 already
Also, how many years it took PostgreSQL community to make hash indexes
crash safe?
Most importantly, why don’t we see Oracle, Percona or MariaDB trying to
make a marketing story out of the above? Maybe because the MySQL
community is free of this fanboyism and the “little brother” complex
plaguing the PostgreSQL community?
Thanks for coming forward with some detailed arguments.
I think your point about hash indexes is probably the most likely to strike home. However, I was discussing long term bugs, not long term feature differences. Hash indexes were clearly labelled as not crash safe and the restriction was well known.
Logical replication using trigger-based replication has been available for many years as an add-on package, so PostgreSQL was not at all late to the party. The new logical replication is a) in-core, b) substantially faster than other methods, c) not statement-based. In MySQL, the replication is statement-based so suffers significant number of production issues that PostgreSQL has tried hard to avoid. I won’t link to the MySQL docs here, but they illustrate the issues I mention only briefly here.
Quorum commit is new, but the equivalent of semi-synchronous replication was released 6 years ago.
Partitioning functionality, not full syntax, was available in PostgreSQL in 2007, 1 year before MySQL.
PostgreSQL does things correctly; I can see how that would really annoy people that are fans of other products.
OK, we start drowning into details as it usually happens (trigger-based replication is not really an equivalent of built-in replication from the performance and maintenance perspective, synchronous_standby_names was not an equivalent of semi-synchronous replication until PostgreSQL 10, partitioning even in PostgreSQL 10 is still inferior to MySQL, let alone older functionality, row-based replication has been available since MySQL 5.1), but that’s not my main point.
One could find many reasons to write “Why MySQL is better than PostgreSQL?” kind of posts with even more detailed arguments that are mentioned in this one. With the only exception of Uber, nobody even cares to try. So who’s really annoyed?
The main point of the blog was to highlight the attitude to quality that PostgreSQL has, which nobody has argued against. I’m sure it would be good to bury that in detail, especially if we disagree on it as we clearly do.
For those interested in Uber, you may want to review this also https://blog.2ndquadrant.com/thoughts-on-ubers-list-of-postgres-limitations/ since it debunks many misconceptions.
There is one thing that I think has to be mentioned here, namely why PostgreSQL is Awesome in terms of quality, is the codebase. The codebase is extremely readable and has very well thought out problems for almost every major issue one runs into with C, from garbage collection to data structures. The server effectively provides a great server programming framework and I am surprised that many other event-loop server programs don’t copy in utility parts of the source code.
Work with the code for a while and it is clear the level of professionalism is very very high.
or to sum it up… “Cause I say so”