Future of Postgres-XL
You probably know that Postgres-XL is a distributed database based on PostgreSQL. A few days ago we pushed the XL 9.6 code into the public git repository. Additional details about the new stuff available in Postgres-XL 9.6 are available here.
The topic of this blog post is quite different, though. I’d like to discuss some changes to the project management and development practices, and why (and how) we plan to tweak it.
At first sight, the XL community may not seem particularly active, particularly if you only look at code the number of contributors or traffic on mailing lists. We know this is not entirely accurate, as we get a lot of off-list interest from customer and developers building exciting stuff on Postgres-XL. But it also shows that perhaps we could improve this side of the project, to make it easier to contribute code or provide feedback.
We also know there are quite a few Postgres-XL forks. We don’t expect people to stop working on them and move back to XL; some forks address use cases that are not the primary aim of XL. But perhaps those forks might benefit from upstreaming some of the generic improvements (e.g. bugfixes or some of the boring infrastructure bits), lowering the maintenance burden and reducing merge conflicts.
Obviously, this is a long term goal and there is not one particular thing that would make it happen. So feel free to propose other changes, or point out additional annoyances that keep you from contributing to XL.
Growing the community
One of the goals of these changes is to growing the XL community and making it more active. That includes not only getting more messages on the mailing lists, more downloads, bug reports (or whatever is metric you pick). I also means sharing control of the project with a wider community, including for example granting commit rights to experienced contributors, etc.
It’s not a question of “if” but “when.” We don’t have an exact schedule or deadlines for adding committers, but my estimate is that it’ll happen sooner rather than later.
Keep XL close to PostgreSQL
One of the reasons why we don’t want to adopt a more complete (and complex) development platform is that we want to keep Postgres-XL as close to PostgreSQL as possible, both in terms of code and development practices. And PostgreSQL uses a very simple process, based on sending patches to a mailing list. That is both simple and also serves as a simple “audit trail.”
So we do not plan to move the development to github or gitlab, but there’s nothing preventing you from embracing those technologies while working on XL, as long as the final patches get sent to the mailing list. We’re using github internally, for example.
Move off Sourceforge
Long time ago, sourceforge was a great place to host open source projects. But nowadays the site seems pretty much in maintenance-only mode, faced various controversies related to bundling adware to downloads, etc. It’s time to move on.
Luckily, we don’t need that much – a project website, a git repository and a few mailing lists and. The first two items – website and the git repository are already hosted off sourceforge.
So we only need to do something about the mailing lists, which we can easily host on http://www.postgres-xl.org (and we can even import the current archives, so that we don’t lose the history).
The plan is to do this change sometime next week. If you’re subscribed to any of the mailing lists, you’ll be automatically subscribed to the new mailing lists, and you’ll receive message with all the details. The main change will be a change of the domain, from @lists.sourceforge.net
to @lists.postgres-xl.org
.
It’s awesome to see the work you all put into XL. We appreciate that you are working on it, and keeping up to date with mainline for the most part. It’s easy to fork, and forget ever updating. It’s not so easy to fork and keep up to date.
I am happy someone keep on working in XL.
This all sounds like very good directions! A small tip, though. I’m not sure if I am misreading this, as I’m not following XL closely, so if I am then ignore the comment 🙂 But this:
A few days ago we pushed the XL 9.6 code into the public git repository
Is a bad indicator. That sounds like the development did not happen in public, but instead dropped as a batch of commits once already done. If so, I suggest that’s the first thing you should change. However, if it just means that “untli then it was just a patch living on the mailinglists” then it’s obviously fine — the part that’s potentially bad is if there is an internal, non-public, git-repo where development was done. That is a great way for making sure that outside contributors don’t feel part of the effort.
The efforts are of course much appreciated and very nice to have regardless of that, but a structure like that would make the even more important effort of growing the community so much harder.
Anyway, I hope I’m reading that wrong, but given that it is what showed up in my RSS reader as the intro to the post, it had me worried.
That said, now off to try to find some time to test the new functionality 🙂
You are reading it both right and wrong. Let me explain.
You’re right that we pushed most of the 9.6 merge at once, and that it lived in a private repository until that point. I’m not particularly happy about that either, and I agree it’s something we need to change. And we’re making steps in that direction.
However, let me explain why it was done that way. It’s not because we want to keep the code private and gain some dubious advantage by doing that, but more about the challenges of maintaining large fork. Immediately after the “git merge” from upstream the repository is utterly broken, because of merge conflicts. Initially it does not even compile, then after a lot of whacking it compiles but initdb promptly crashes with ten different segfaults, then initdb passes but everything else segfaults, etc.
I’d argue that’s not something you want to push to the master branch, because it’ll immediately break everyone else’s patches and fixing those issues is mostly serial effort. Admittedly, we could have pushed the code sooner, once it gets into mostly-working state.
We’re looking for ways to improve this for the next large merge, but I’m not sure we can eliminate that entirely – it’s simply a part of XL being a fork. If you have a better idea how to do this (say, some smart git workflow), we’d love to hear it.