(Warning: long post coming up. Read on if you are interested in performance of the various Akonadi database back ends).
Last weeks I’ve been looking (again) whether or not it was possible to create a working sqlite back end for Akonadi. The last time I tried was around august last year and by then sqlite just wasn’t able to meet the multi-threading requirements that Akonadi has for its database back ends. A couple of sqlite releases later things seem to have changed. I managed to clean up some problematic code paths in Akonadi server and voila, we’ve a sqlite back end that is on par (unit test wise) with the mysql back end.
There are some catches of course. The first being that the default sqlite driver that comes with Qt does not work. It uses the sqlite API in a non-blocking way. So I had to adjust that to make the driver consistent with the sqlite documentation which states that when sqlite calls return BUSY, you have to wait a bit and try again. This custom driver can be found in kdesupport/akonadi/qsqlite.
The next catch is related to performance. Though we did not had the numbers until now, it was expected that sqlite would perform worse compared to mysql. Given that we have another back end, postgresql and are working on yet another one: a file system back end, it seemed a good time to do some benchmarking. So I brushed up the item benchmark a bit and performed the benchmark for all back ends. The benchmark tests the performance of Akonadi over two dimensions. The first dimension is the number of items and the second dimension the size of the payloads. We used the Qt data driven test framework and added rows to it like: ${number-of-items}-${payload-size}. Then we benchmark the important actions of Akonadi, which are creation, modification, fetching and deletion of items. This enables us to see how the performance scales with the two dimensions.
Before getting to the results I first have to make a note about the file system back end. This one is designed to be a fall-back for large payloads. The idea is that database actions become too slow for large payloads. So at a certain offset Akonadi doesn’t store payloads in the database but in plain files. The benchmark for the file system back end is set up to always write the payload to the file system. This enables us to find out the offset that gives best of two worlds, i.e. fast performance for small payloads by using the database and fast performance for large files by using the file back end. (Note: currently the file system back end is disabled by default, there still are some issues with it that need to be sorted out).
Sooooo, show me the numbers, I hear you thinking. Well, here you go, lets start with item creation:
The image show the results scaled logarithmic on the x-axis (time on y-axis, but relative due to logarithmic scaling on x-axis). As you can see the, the files system back end (akonadi-fs) is hardly influenced by the file size, only by the number of items. For the other back ends we see that file size has influence on the performance too, but roughly scale linear. We also see here that sqlite does not perform as well as the others. Lets have a look at the absolute numbers:
The y-axis now shows time in msecs. The graph now shows us clearly that when items get larger and the number of items grows too, sqlite is clearly outperformed by all other back ends. We also see that databases in general don’t cope well with large payloads, which is exactly the reason to provide a file system back end too. First conclusion, don’t use sqlite unless you have very strong restrictions on your environment. (Which is the case when running Akonadi on Windows CE for example, where the number of processes is extremely limited and which we are working on here at KDAB). Still not convinced about sqlite performance? Okay, lets have a look at one more. Item modification:
Again, we see that sqlite is outperformed by all other back ends as soon as the payload size becomes large. When the payload size grows we also see that only the file system back end doesn’t start to grow exponentially like the database back ends do. So, sqlite works, might even work fast enough for you, but is definitely not fast enough for the general use case Akonadi was designed for in the first place: handle many large items. Again, unless you have very strict requirments on the environment where Akonadi is used.
The last thing I want to show is a benchmark with different payload sizes for 2500 items. This makes it easier to find the cutoff value for the file system back end. I.e. what payload size should be used to store an item using the file system back end in stead of the database? First the images (I only compared mysql and fs to have slightly clearer graphs, you can find the full results at the links posted at the end of the blog):
For creation and modification the cutoff value seems to 8 KB. However, fetching, which is also an often performed operation, has a cutoff value of 512B. A good trade-off between those two is probably around 4 KB.
So that’s all for now. Short recap. The sqlite back end for Akonadi seems to work, though its about ~5 times slower. Also, there are already some problems reported, so it still should be considered as a work in progress. Work on the file system back end is ongoing but seems promissing and with the right offset and file system/database combination (i.e. mysql) we get best of both worlds. Thanks for reading!
Links to the full (interactive) results:
- Multiple item counts, multiple sizes
- 2500 items, multiple size
Update: For a better comparison between database and fs back ends I added another benchmark which also uses 2500 items but only goes up to 8 KB payloads. Check the results here:
2500 items, multiple size, only up to 8K
[1] The title is meant to be a pun. Every now and than people pop up on the ML who think that sqlite is not a *full blown* (whatever they mean with that) database. Let me ensure you, it is. It supports SQL at a similar level as mysql, it does transactions and multithreading, it just tends to be smaller (and here I mean the library itself, not the database) and it does not run in a seperate process but it therefore has its limitations too. Continue reading →