Je k dispozici i čeká verze rozhovoru: Magnus Hagander, prezident PostgreSQL Europe.
You’re a part of the Core PostgreSQL Team and, at the same time, the president of PostgreSQL Europe. What are your responsibilities in these positions?
So, the president of PostgreSQL Europe is probably the easiest to explain. PostgreSQL Europe is the PostgreSQL user group for Europe. It doesn’t have anything directly to do with the development of PostgreSQL, it’s more to do with the promotion of PostgreSQL. So we’re there to support local user groups, like the one we have here in the Czech Republic, or in other places like France or Germany or Sweden and what not. We help to run the pan-European things. PostgreSQL Europe runs the PostgreSQL Conference Europe that runs every year and moves around. That’s the main thing. We’re a board of five people, and I’m the president of the board that makes those decisions, that makes the things happen. We try to get other people to do the actual work.
PostgreSQL Conference Europe will be held in Prague on October 23-26.
The Core Team of PostgreSQL is the administrating team that runs PostgreSQL. Lot of people say that whenever things run perfectly, the Core Team does almost nothing. And that’s by design. Again, the Core Team has nothing to do with the actual writing of code or anything like that. Many of us on the Core Team do, but not all of us. The Core Team is there to make the decisions in case we can’t reach them through community consensus. Then the Core Team will make a decision. This happens fairly seldom. If there’s a company that wants to communicate with the PostgreSQL project, without doing so in public, they can contact the Core Team. Because everything else in the community, all our development decisions and all that kind of planning, it all happens on public mailing lists. And a lot of companies don’t like to do that. They might not want to let people know that they’re using PostgreSQL or that they’re considering using PostgreSQL. So the Core Team has the administrative function and the ability to communicate officially on behalf of the project. Those are the main things.
We have separate teams, and the Core Team tries not to get involved in technical decisions. Of course, the people on the Core Team, who are also a part of the Development Team, will get involved in the technical decisions, but not in their functions as Core Team members.
Besides being on the Core Team, you’re also on the Development Team?
Yes, I am. I do actual coding on the PostgreSQL team. And, of course, in my day-job I’m a PostgreSQL consultant, and I do set-ups, training, support services and all that kind of things. Which is what most of the people on these different teams do – in one way or another work with PostgreSQL. But that’s, of course, a separate thing from your position in the community.
This is the third time you’re here in Prague for the P2D2 conference, right? Do you have any special relationship with this event?
Yes, I believe this is the third time. And I don’t know, I like Prague. I do a lot of speaking, and if I have the opportunity I love to go to places where I have that possibility. It’s also a great conference, it has a lot of people coming – more than 100 people is very impressive for just one day, local event. As for the Czech Republic, there are a number of prominent, very good developers here. I assume the presentations are good – the ones that are in Czech I don’t understand. I know several of those people, and I know what they’re talking about, so I assume the presentations are good and interesting. They like to bring in one or two outside speakers every year. And, apparently, I did something good because they invited me back.
You talked about PostgreSQL 8.4 in 2009. How far has PostgreSQL gone since that time?
It’s gone far since 8.4, a lot has happened. Every new PostgreSQL release comes with better performance than ever before. I don’t think we ever went backwards on that front. A really big things that have happened is... for one thing we have the work on the replication done by Simon who just walked by who did the keynote this morning. We have the streaming replication, we now have a synchronous replication, and we now have a synchronous replication that has features that not many, if any at all, other databases have. You can have a very very fine-tuned control of an individual transaction whether you find it important enough to be synchronous or OK with regular asynchronous. And we’re adding even more on top of that for the next 9.2 version.
We’ve also seen – what I find exciting myself – a number of things going into PostgreSQL, both 9.0 but particularly 9.1 as well where PostgreSQL is really leading the industry. Adding newly researched functionality that nobody else has. We have things like serializable snapshot isolation, which is brand-new, and I couldn’t even begin to explain the math behind it, but it works, and it’s really good. It gives you full, true serializability while maintaining the very scalable MVCC system that we have.
We are the first database to release with that if we can just count all databases, but also the feature KNN GiST, which is location-based indexing or distance-based indexing. You can do really really rapid lookups of queries of the kind ‘I have something, give me fifty objects that are closest to this something’. And the typical and most obvious use case is, of course, geographical. Everybody has a GPS in their phone today, every query this generates is ‘I’m right here, give me the closest things’, and we’ve got an index method optimised for delivering that. But it can be used for a lot of other things as well, such as text-processing. It can give you the words that are the closest to what the user tried to search for and that kind of thing. I believe that Microsoft has been working on this, and it will be in the next release of SQL Server, but again, we’re back in the position where we’re leading the pack, and it comes to new database theory and features that don’t exist. Whereas, of course, basic streaming replication, that was the ‘catch up’ phase... others had log shipping, streaming-based log shipping before. There are still features that other databases have, that Oracle has that we’re still working on, but we’re also … When PostgreSQL was first created, it was a research database, that was the focus of it. Now we’re seeing work come out of modern research in the database that goes into actual production. I personally find that very exiting.
Are you coming next year?
It’d be a little too early to say. I mean, as a matter of principle, I’d like to, but those guys have to want me to come back, and it has to fit in the schedules.
Many projects use the MySQL database and the terrible MyISAM engine. What would be your advice for migrating these to PostgreSQL?
The fact is that in a lot of the cases it’s easier than you’d think. Now, the thing is, if you want to run them both, it’s a little bit more difficult because they do things a bit differently. You’ve really got two things. You’ve the MySQL thing, and the other thing is MyISAM. Because, if you go from MyISAM to InnoDB, there are things that go really bad. And those are exactly the same things that can go really bad if you go to PostgreSQL. But if you fix those things, like don’t do
SELECT COUNT(*) on a huge table – that’s a problem on PostgreSQL, and it’s a problem on the InnoDB. it’s not a problem on MyISAM. It’s probably the one thing that it’s actually good at. Once you’ve solved those problems, then you’ve got these query-layer issues.
They’re not difficult to migrate, but it’s more difficult if you want to run them both at once because MySQL can’t run the standard syntax, and PostgreSQL can’t run the MySQL syntax. So if you can pick one of them, it’s really not that hard. There are a couple of things that everybody runs into like, the MySQL has the backticks to quote identifiers, and PostgreSQL uses the standard doublequotes, but that’s just a search and replace, that’s not really hard. And you’ve got the whole
GROUP BY that you’ve got typically in MySQL, you do
GROUP BY that you don’t actually specify all the fields in your query, and in theory, they’ll just randomly pick something, but it gets, predictably, the same result because MySQL only has one way to issue
GROUP BY, which is a
SORT and then
UNIQUE. But if you run that same code on PostgreSQL, we have different ways of running it, so it wouldn’t be predictable, so we reject the query.
How to go about forcing open-source projects, such as WordPress, to offer the possibility to use PostgreSQL?
I think it’s going to be hard because people have been talking to WordPress, and they have exactly zero interest in doing it. But you can see others. For example, I know that the latest release of Drupal has much better PostgreSQL support than previous releases. Now, I haven’t tested it myself, but I know the latest version of Joomla was released with at least better support for PostgreSQL. I’m told it wasn’t actually completed yet, but it’s moving in that direction.
You can’t really force them, that’s never going to work. You need to get people on the inside interested in doing it. At that point it’s probably be helpful to say ‘I’m willing to do the the work if you guys are willing to work with me’. It’s not good to just dump it on them. I know from the PostgreSQL community how we react when someone just comes in from the outside and goes blam! do this. Like, yeah, try again. I think you need to engage that community and work with that community. And from there you can get to something that runs. And you need to look at it.
The truth is, if it’s a fairly small product, you’re still adding a lot of overhead by adding a database-independent layer. In particular, if you started off writing 100 percent for MySQL, it’s not just the SQL cores. The whole system is designed for MySQL, and it turns out that it’s much easier to move a system that’s from the ground up for Oracle because PostgreSQL and Oracle are semantically closer. Whereas MySQL does a number of things very differently from everybody else and if you design your system around that, then you need to back up and think again, and that’s where you end up with a lot of work. But usually, if the system is at any level of complexity, that’s what you need to do if you’re going to make it good, and making a half-assed port to PostgreSQL isn’t really going to help anyone. Because people will just say ‘it doesn’t really work’ and either they’ll go back to MySQL or say ‘those guys don’t know how to code’ and skip the project. It really doesn’t help anyone.
You attended last week’s FOSDEM. Was it worth it from the PostgreSQL point of view? What did you discuss?
FOSDEM is always good. We have two steps. Very much of FOSDEM for the PostgreSQL community is connecting with our users, connecting with all the other projects that use PostgreSQL. Connecting with the guys who package PostgreSQL – CentOS or Debian and all those projects out there. For us it’s really more about connecting with those guys than it is to discuss our internal development. It’s still a place where we get all the major developers in Europe. They’re still there, and we will discuss a lot of things, and we always make some progress, but the main focus for us is to connect with the other communities. We have a table where, for one thing, we sell the t-shirts and mugs like everybody does, but it also gets you chance to have a dialogue with users and with the other projects. Then we had a developer room, as they call it, for a day, but it’s mainly presentations mainly aimed at users from other projects.
There were a couple that were fairly internal, and we got some good feedback on those, as in I know that Heikki Linnakangas had a presentation about linking in PostgreSQL, and there was some guys from the kernel team who then came to discuss locking. That’s the real beauty of FOSDEM – you got all these different projects all in the same place at the same time. So it’s mostly about connecting with the other guys.
Are you involved in any other open-source projects besides PostgreSQL?
I’m involved in some. For example, I do some work on Munin, monitoring tool, but that’s mostly to make it work with PostgreSQL. I’ve also been involved with Varnish, which is not directly related to PostgreSQL, it’s one of the focus products of my company, so I’ve been involved with that, but I’ve only done some patches, not real major development. When it comes to major development, one project is really enough, it can easily fill all the hours. I am involved in the PostgreSQL server itself, but it’s also the maintenance of the infrastructure of the project, the website, the wiki, all the servers, git source control and all that stuff. That takes a bit of my time as well.
In you presentation, you talked about an imaginary blog with a million views a day-
...does your blog have that kind of traffic?
I do have a blog, but I don’t get a million visitors a day, let alone a minute.