Yet another *full blown* database [1]

(Warning: long post coming up. Read on if you are interested in performance of the various Akonadi database back ends).

Last weeks I’ve been looking (again) whether or not it was possible to create a working sqlite back end for Akonadi. The last time I tried was around august last year and by then sqlite just wasn’t able to meet the multi-threading requirements that Akonadi has for its database back ends. A couple of sqlite releases later things seem to have changed. I managed to clean up some problematic code paths in Akonadi server and voila, we’ve a sqlite back end that is on par (unit test wise) with the mysql back end.

There are some catches of course. The first being that the default sqlite driver that comes with Qt does not work. It uses the sqlite API in a non-blocking way. So I had to adjust that to make the driver consistent with the sqlite documentation which states that when sqlite calls return BUSY, you have to wait a bit and try again. This custom driver can be found in kdesupport/akonadi/qsqlite.

The next catch is related to performance. Though we did not had the numbers until now, it was expected that sqlite would perform worse compared to mysql. Given that we have another back end, postgresql and are working on yet another one: a file system back end, it seemed a good time to do some benchmarking. So I brushed up the item benchmark a bit and performed the benchmark for all back ends. The benchmark tests the performance of Akonadi over two dimensions. The first dimension is the number of items and the second dimension the size of the payloads. We used the Qt data driven test framework and added rows to it like: ${number-of-items}-${payload-size}. Then we benchmark the important actions of Akonadi, which are creation, modification, fetching and deletion of items. This enables us to see how the performance scales with the two dimensions.

Before getting to the results I first have to make a note about the file system back end. This one is designed to be a fall-back for large payloads. The idea is that database actions become too slow for large payloads. So at a certain offset Akonadi doesn’t store payloads in the database but in plain files. The benchmark for the file system back end is set up to always write the payload to the file system. This enables us to find out the offset that gives best of two worlds, i.e. fast performance for small payloads by using the database and fast performance for large files by using the file back end. (Note: currently the file system back end is disabled by default, there still are some issues with it that need to be sorted out).

Sooooo, show me the numbers, I hear you thinking. Well, here you go, lets start with item creation:

Item creation benchmark

The image show the results scaled logarithmic on the x-axis (time on y-axis, but relative due to logarithmic scaling on x-axis). As you can see the, the files system back end (akonadi-fs) is hardly influenced by the file size, only by the number of items. For the other back ends we see that file size has influence on the performance too, but roughly scale  linear. We also see here that sqlite does not perform as well as the others. Lets have a look at the absolute numbers:

Item creation benchmark scaled linearly

The y-axis now shows time in msecs. The graph now shows us clearly that when items get larger and the number of items grows too, sqlite is clearly outperformed by all other back ends. We also see that databases in general don’t cope well with large payloads, which is exactly the reason to provide a file system back end too. First conclusion, don’t use sqlite unless you have very strong restrictions on your environment. (Which is the case when running Akonadi on Windows CE for example, where the number of processes is extremely limited and which we are working on here at KDAB). Still not convinced about sqlite performance? Okay, lets have a look at one more. Item modification:

Item modification benchmark, scaled linearly

Again, we see that sqlite is outperformed by all other back ends as soon as the payload size  becomes large. When the payload size grows we also see that only the file system back end doesn’t start to grow exponentially like the database back ends do. So, sqlite works, might even work fast enough for you, but is definitely not fast enough for the general use case Akonadi was designed for in the first place: handle many large items. Again, unless you have very strict requirments on the environment where Akonadi is used.

The last thing I want to show is a benchmark with different payload sizes for 2500 items. This makes it easier to find the cutoff value for the file system back end. I.e. what payload size should be used to store an item using the file system back end in stead of the database? First the images (I only compared mysql and fs to have slightly clearer graphs, you can find the full results at the links posted at the end of the blog):

Benchmark for creation of 2500 items

For creation and modification the cutoff value seems to 8 KB. However, fetching, which is also an often performed operation, has a cutoff value of 512B. A good trade-off between those two is probably around 4 KB.

So that’s all for now. Short recap. The sqlite back end for Akonadi seems to work, though its about ~5 times slower. Also, there are already some problems reported, so it still should be considered as a work in progress. Work on the file system back end is ongoing but seems promissing and with the right offset and file system/database combination (i.e. mysql) we get best of both worlds. Thanks for reading!

Links to the full (interactive) results:

  1. Multiple item counts, multiple sizes
  2. 2500 items, multiple size

Update: For a better comparison between database and fs back ends I added another benchmark which also uses 2500 items but only goes up to 8 KB payloads. Check the results here:

2500 items, multiple size, only up to 8K
[1] The title is meant to be a pun. Every now and than people pop up on the ML who think that sqlite is not a *full blown* (whatever they mean with that) database. Let me ensure you, it is. It supports SQL at a similar level as mysql, it does transactions and multithreading, it just tends to be smaller (and here I mean the library itself, not the database) and it does not run in a seperate process but it therefore has its limitations too.

This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

25 Responses to Yet another *full blown* database [1]

  1. Malte says:

    I don’t know much about databases. But I always thought that Firebird had the right size for something like this. AFAIK it was used in cases where SQLite could have also been used. So it looks like it can scale down pretty good. And probabely scale up better than SQLite can. Though I perfectly understand MySQL as a choice, because of it’s ubiquity in Linux.

  2. Adam says:

    It seems clear to me that Akonadi is still in a state of experimentation and flux, and therefore it is not ready for production use by average users. (cf. problems with data loss with the KDEPIM switchover)  It is frustrating to see KDEPIM/Akonadi making the same mistakes that KDE 4 made by releasing too early–or should I say, pushing immature code on unsuspecting, trusting users who fall into a pit full of bugs. 

    If the KDE community wants to gain serious traction among people other than developers and hardcore KDE fans, they must begin focusing on stability and reliability, and then performance, while relegating experimental features and refactoring and rewrites into branches that are THOROUGHLY tested and vetted among a wide userbase before being released as part of the SC.  If this means that some code doesn’t get released for a long time because there aren’t a lot of testers, so be it–better to delay a new feature for a long time and get it right than force it upon users who have to suffer as unwitting beta testers. 

    The alternative is for KDE to remain a developers’ playground for the foreseeable future, in a constant state of flux, doing cool things over and over again, never quite reaching maturity, never at a point where people can rely on it for serious use.  

    Maybe some KDE folks are content with that. But KDE could be so much more. KDE needs stronger leadership and direction. There is so much wasted effort right now, people reinventing and rewriting code while important components are neglected, people working on long-shot experiments that will never be actually used, even writing code for devices which don’t exist on the market. In the meantime the existing desktop market gets neglected–just simple network management in KDE is a mess compared to Unity or OS X. There’s one guy working on it, and he just took it over–but there are plenty of experienced programmers in KDE who could have fixed it up years ago. 

    You know, I can’t stand GNOME 3, either the software itself or the way they’ve gone about the process and snubbing the existing community–but I will give them this: somehow they have worked together to achieve their vision, whether it’s a good vision or not. I don’t want KDE to blindly follow one person’s vision, but it needs to strike a middle ground–this free-for-all that’s been going on the past few years is not accomplishing much. Yes, I know the underlying foundations and libs are way way better than KDE 3, and the magic moment when it all comes together is just around the corner–except it’s not, because people keep reinventing the wheel instead of building the rest of the vehicle. Better to have a car with less than perfect wheels than to have perfect wheels rolling around aimlessly.  The fact is that KDE 4 as a DE is not that much better of an experience for a user than KDE 3 was. Sure it’s shinier, but as far as actually using it, it’s about the same. 

    All of KDE is made by volunteers who are free to do whatever they want. But if they would put off rolling their own little snowballs and work together rolling one big one for a while, KDE could start quite an avalanche. Please put aside your experiments for a while and fix the bugs and optimize performance and make what KDE is now really shine. 

  3. andy says:

    1. rule of graphing:
    ALWAYS LABEL YOUR AXES!

    You tell us in the text that the y-axis shows time in milliseconds (except for the first graph, where it apparently shows time in some other scale), but you never tell us what the numbers on the x-axis represent, so we are left to *guess* what they might mean from your analysis of the graphs.

  4. David says:

    Came looking since I was a bit wtf why something was trying to start a per-user mysql on my home machine.

    So benchmarks are nice, but not actually useful without reference with typical usage patterns.

    How big are the entries that you’re typically working with? How slow is slow? Below the realm of what a user will notice the difference is irrelevant in most cases.

  5. Markus says:

    Oracle released a new version of Berkely DB that adopts SQLite’s API. Have your tried using your new Akonadi driver with that one?
    http://www.oracle.com/us/corporate/press/063695

  6. Bertjan says:

    No I didn’t and it isn’t very likely that I will do that. However, for the people who like to get involved in Akonadi development, grab your chance, try out and tell us the results :)

  7. Anonymous says:

    Thanks for this post. Perhaps I am misreading these results, but it seems that in all cases the file system back end perfoms at least as well as the databases, and better at the larger payloads. So why is a database back end needed at all? Is there a discussion about that somewhere?

    Best regards

  8. Bertjan says:

    That’s what I realized too. I updated the post with results for payloads only up-to 8KB. Have a look at that. That clearly shows that DB is faster for smaller payloads.

    In addition we need a DB for transaction support and to store metadata.

  9. Anonymous says:

    The update no longer has akonadi-fs but adds mysql-fs and postgresql-fs???

  10. lcewfeqsnv says:

    z5I5HQ ummmksmbbrxf, [url=http://jsccuybqqcgn.com/]jsccuybqqcgn[/url], [link=http://vholqbldpccp.com/]vholqbldpccp[/link], http://mkvqkhixzlba.com/

  11. pprkut says:

    I’m running akonadi with mariadb for a few weeks now. I do not make accessive use of it (yet), so I haven’t felt any performance difference, but it would be interesting to see how well it performs compared to postrgresql or mysql.
    Since mariadb is a drop-in replacement for mysql, this should be really easy to test. Maybe you can take a look at it the next time you run such a benchmark.

  12. ototzm says:

    1XIDSk pnmarboyeaza, [url=http://jzvgapcstalc.com/]jzvgapcstalc[/url], [link=http://lsldsldcucst.com/]lsldsldcucst[/link], http://mmzpxepujdjt.com/

  13. Christoph says:

    You benchmarked performance, but what about space requirements (both disk space, as well as process memory)? I see an empty MySQL database creating more than 128 MB of files in users home directory.

  14. Anonymous says:

    Are they sparse files, though?

    Also, seconding a request for memory benchmarks :)

  15. Bertjan says:

    The initial size of mysql is due to some optimisation configuration options and can be changed.

    A memory benchmark would be interesting too but that is not realy db related I think.

  16. jdkicojrbdk says:

    KlPgWc wtscyzumazjl, [url=http://myalkbyasmly.com/]myalkbyasmly[/url], [link=http://ntcpdysyhipf.com/]ntcpdysyhipf[/link], http://lejdxepaoxvf.com/

  17. rroizqmxfd says:

    nZH51G drmiyfszmsuk, [url=http://skexdddbmzdv.com/]skexdddbmzdv[/url], [link=http://htjyqpqowjgw.com/]htjyqpqowjgw[/link], http://rxlposcoslmx.com/

  18. xav_19 says:

    I don’t know how much you rely on SQL (i guess that if you can use fs as a backend, it isn’t too much) but noSQL databases seem to be trendy these days (basically it’s a kind of hashtable with no constraint, but afaik transactions are supported). Did you take this in consideration ? Or is it completely irrevelant ?

  19. Bertjan says:

    The fs backend only stores the payload in files. The metadata and payloads that are too small to store in files with reasonable perfomance are still stored in a db. I don’t know noSQL. The name seems to suggest that it doesn’t understand SQL which would make it useless in our case.

  20. Anonymous says:

    I’m sorry for my ignorance, but what kind of data does Akonadi store? Are there any applications that would be off better with Akonadi than some custom storage backend?

  21. ucwwygpkw says:

    N0K8QZ eicbthvmroku, [url=http://xlknsrckfatu.com/]xlknsrckfatu[/url], [link=http://bdrcgktewauw.com/]bdrcgktewauw[/link], http://kzyilrlbdqpd.com/

  22. MrB says:

    What about Virtuoso – could it be used as Akonadi backend? Right now, akonadi requires running nepomuk and the latter requires virtoso. It’s a bit silly, that to launch for instance an email application (kmail) I have to run two database engines.

  23. Bertjan says:

    I think some is looking into that currently, but not sure about that. It does make sense indeed when possible to take such an approach.

  24. sandsmark says:

    You might know it already, but postgres actually stores large entries in separate files, iirc, so when using postgres, akonadi-fs is useless.

  25. renoX says:

    @sandsmark before saying that something is useless, you should at least look at t he benchmarks!
    I’m a fan of postgres but the benchmark made by bertjan show that akonadi-fs has better performance than postgres for large file..
    So either postgres is not storing large entries in separate files (configuration issue?) or there is an issue with the benchmark or it is normal that akonadi-fs beats postgres because postgres add some overload, I don’t know, but it’s better to measure as bertjan did than to assert blindly as you did.

Comments are closed.