Program understanding tools (part 1)

First a small remark for planetkde readers, this post is not in particular about KDE, but I believe it is an interesting read for the developers among the readers. This post is about language tools for C++ and in particular about  integrating the clang framework in these kind of tools. The post became longer than expected so I’ll split it in two. In the first part I’ll give a bit of background  why I’m interested in these tools. While in the second part I’ll go into more technical detail with respect to using clang.

I have been fascinated for a long time by language tools, in particular those meant for C++. This started back at the university where I had a course on software maintenance and evolution. This fascination was enforced by the internship I did at KDAB, where I worked on a KDevelop based code querying and rewriting tool for C++. A bit later I also wrote a small program to extract dependency information, which currently has the not so cool name Cpp Dependency Analyzer.

Starting a PhD basically killed all side projects for a good while, but lately I found some time and energy to pick this up again. The dependency analyzer is still far from being very useful for two reasons, first there is still some infrastructure missing, second the ui need some love. Two (recent) developments triggered my interest though, the first being my ever nagging desire to try out clang and second the post from Sandro Andrade on an implementation of hierarchical edge bundles. As an InfoVis researcher this immediately got my attention and I hope to see further improvements in this project. However, as my research is in InfoVis, I decided to have a look at clang first (i.e. one should separate work and spare time, no?!).

Lets set the context first. Why a dependency analyzer? It has been estimated that 80% of the software life-cycle is spent on software maintenance, while 40% of this costs is represented by software understanding. This observation makes tools that help with to simplify this take quite important. Although, not always directly relevant for individual developers (depending on the kind of information the tools provide), these tools can be of importance for integrators that want to incorporate software components in their product and therefore need indications about the quality and complexity of this component. Another use-case of particular interest for open-source software components, is those where third parties are interested in adding additional functionality to a component. Tools that help them understand the structure of the program could contribute to lowering the barriers.

The dependency analyzer currently extracts the following information from a compile log: compiled source files, generated object files, generated libraries and executables and linked libraries. Additionally, it can extract secondary linked libraries (i.e. libraries that are linked by those directly used in the project) in a separate run. You see what is missing here? Right, headers. What I wanted for a long times was to have some means to extract included headers from the compiled source files of a directory. Not only the ones included by the compiled sources, but the full tree. Ones you have this information you are able to calculate useful metrics such as build impact of project headers and build costs of source files (see build cost analysis). A feature I like to add at some point.

You might wonder why I do not use KDevelop which provides this information as well. Let me briefly go over this issue. KDevelop was built with a certain philosophy, providing an IDE with good support for a wide variety of languages. Being good engineers the KDevelop developers abstracted quite a lot of things to reach their goal, also in the area of language support. I really think they did quite a good job here, but it comes at a price. One of these prices is that the AST has been generalized as well. At the point where you want to do more complex things with languages (such as querying for code constructs or code rewriting) this becomes incredible painful. Here you come at a point where you want the most detailed representation of the source you can get. This is not a very strong justification for the needs I currently have, but I do have some wild dreams which would require such a representation. And,  anyhow I just wanted to play with clang for a bit. To be continued…

Tagged , , , | 1 Comment

Happy new year

Happy new year all. This is just a short message to wish you a new year and to apologize that I flooded planet with old posts of mine. I just changed my drupal installation into a wordpress one. Hope this make managing my blog (and the spam on it) a bit easier. Not that it matters much as I didn’t hardly write anything last year. Though it seems to have broken some stuff as well though =:(.

I did see it triggered some comments too older posts though. Some of them require some action which I’ll put on my todo list (hi Inge!). However, as you might have guessed from my blog posting activity last year, I’ve had very little time to do KDE work. Things are settling down a bit now though. I start to get used too living in Paris and try to pick up some small projects in the evenings. I’m working on something which is interesting and on which I’ll post more details soon (hopefully).


Tagged | Comments Off

On moving and d-pointer performance

Edit: fixed links… sigh, I should switch to wordpress I did!

The usual thing, long time no blogging. Well, real life got a good deal in the way. Not for bad though. While I did my MSc. thesis at KDAB I got the possibility to start an Ph. D. position on information visualization at ILOG (now part of IBM) in Paris. This meant packing our stuff from our Berlin apartment (not too much luckily as we rented furnished) and moving it back to the Netherlands at first. Then using the next couple of months to sort out and pack our remaining stuff which we didn’t take to Berlin and find an apartment in Paris. As you probably can imagine, this was slightly stress-full, but I’m pleased to say that we are finally settled now. We have a nice apartment south of Paris, only a 20 min walk from the office I work and we have Internet access again The only thing we don’t have yet is the ability to understand french…. Well, lets see if that changes over time.

What triggered me to write this blog was something different though. First a note of warning, I didn’t do any other research than presented in the remaining of this writing. My former colleague Marc wrote two great articles (here and here) on the topic of d-pointers. Its in German, though it shouldn’t be too hard to read if you’re somewhat familiar with the topic and German. One topic he doesn’t address though (or I skimmed too fast now, it was some time ago I really read them) is the performance of d-pointer classes versus value classes. Recently I was writing a class which has the potency of getting created a lot and often which triggered the question where and how much the cost difference would be. We all know that creating objects on the heap is more expensive than creating them on the stack, so I quickly deduced from this that d-pointer classes must be more expensive. Okay, that shouldn’t have come as too much of a surprise. Two questions remain though, what is the price one pays and where (e.g. creation, copy, access).

Intrigued by these questions I set up a real quick and dirty benchmark just to get some rough ideas. By no means I pretend to be complete and given that I’m quite tired I might have made some horrible mistakes as well. So feel free to comment on the results. Here’s what I did: I created three different classes: DPointerClass, ValueClass and InlineValueClass (the latter just for additional comparison). The classes are very simple, they just hold an int which you can get (const access) and set (non-const access). Complete source can be found here. Next I created a test class using QTest lib with the following methods:

  • void test${Class}CreateDestruct();
  • void test${Class}HeapCreateDestruct();
  • void test${Class}Copy();
  • void test${Class}Assignment();
  • void test${Class}ConstAccess();
  • void test${Class}NonConstAccess();

In these methods I use the QBENCHMARK macro to benchmark the actions described by the test names (using walltime). This leaded to the following results:

In this image you see the number iterations for each action. To be honest, when seeing that graph I’m not completely sure how to interpret it. As I understand from the QTest doc, for walltime a higher number of iterations might be needed to get more precise results. However, what more precise results mean is not clear to me and neither why there is such a big variation in the number of iterations per tests for the individual classes as well as between all tests. (E.g. for value it is high for the CreateDestruct test, low for HeapCreateDestruct and again high for Copy). Well I guess the second issue is closely related to the first one so if anyone could enlighten me on that It’d be great.

Secondly, and slightly more interesting is the following graph:

Here you see the time (in msec) it took per iteration. We can directly conclude that for creation/destruction and copying the difference is significant. For assignment and the access operations the difference is negligible. So the extra step of indirection doesn’t seem to add, more strangely it seems to be slightly faster even. The factors between DPointer and Value for CreateDestruct, HeapCreateDestruct and Copy are respectively 42, 20 and 12 20, 42 and 12.

Concluding we can say that only for CreateDestruct, HeapCreateDestruct and Copy the differences are significant. For creation this is logical, for copying I assume that the byte-by-byte copying from the default constructors are performing slightly better than custom ones. For assignment and access there is no performance loss. One last question remains, although the difference is significant even in the worst case we’re talking about 0.0012 msec. How does one determine whether this is too much for his application and thus fall back to creating classes with private members? The class I’m working on has the potention of getting created in large amounts in short time (e.g. 100.000 several times in a number of consecutive calls)  I’m somewhat tempted at the moment to move away from value classes to d-pointer classes for the obvious design reasons as the class will become part of a library. Any comments and thoughts on this are welcome.

Tagged , , | 13 Comments

More lies^wstats on the krazy results

Found a bit of time to continue the number crunching saga. Two new datasets are added to the KDE SC Stats topic center. The first data set is about the number of krazy2 issues per module (e.g. kdelibs, kdeedu, etc.):


We see two modules that have clearly more issues than the others, kdelibs and kdepim. But, that was to be expected because when we look at the lines of code for these modules:


we notice that kdelibs and kdepim clearly have more lines of code than the other modules as well. Actually, the two graphs have quite some similarities which is not completely surprising. One of the things that draws the attention is that we can see that although kdebase-workspace, kdebindings and kdeedu have an almost equal evolving lines of code count, kdeabse-workspace has a significant higher number of issues. Well, as we said before, these raw numbers do not tell that much, so lets have a look at the number of issues versus lines of code:


In the above image we see how the each module contributes (or not) to the downwards trend for issues/loc I showed last time. In general, most of the modules have a somewhat downwards trend though not very convincing. The two modules contributing most to the current number of issues per KLOC are kdeartwork and kdewebdev (look at the most right bars of each module). Clear winners in getting te number of issues down are kdeaccessibility and kdeplasma-addons. Furthermore, props for kdeedu and kdeutils who got the number of issues down under 2 per KLOC for KDE SC 4.4.0. Soon to follow, stats per krazy check.

Tagged | 4 Comments

Marble: I iz ur training tracker

Though I leave already tomorrow I didn’t want leave Akademy without having tried out something new. As I’m paid to work on KDEPIM for work-work I would try out something completely different. A week or ten ago I started running again, definitely not at marathon level but I’m already quite happy that I’m able to drag myself out twice a week for a ~12 km run. As some additional motivation I bought myself a cheap hart beat monitor as well to be able to train more efficiently. As I said, a cheap one, so no fancy features such as GPS support. Recently I found this dutch website: measuring distance. It is a Google maps based web application which lets you enter routes and register training times for saved routes. This is quite a handy tool as it gives you at least some indication on how far and how fast you ran. But well, its web based, Google maps based….. and than we’re at Akademy where people keep bashing us that we should have full control over our data and that web-applications have these annoying side effects (Hi Frank and Sebas! Loved both of your talks!).

So what do we do?! Torsten "Marble" Rahn to the rescue. When I saw his talk I was thinking: neat, Marble is coming along very nicely. I definitely have to try that. So I started hacking on this a bit and with some help of the Marble developers I got a first quick and dirty proof of concept up and running. So here’s the idea: I want to build a custom application using the Marble widget. That’s what it is meant for right?! Who’s doing his training sessions on the moon anyway? Also, the behavior of operating it must be different, point-and-click track specification is the major use case.

After some fiddling and trying different approaches I now ended up with a custom Render plugin that is also installed as an event filter on the widget. It listens to mouse events and has some logic to allow drag and drop without resulting in unwanted lines. For you viewing pleasure:

Marble at Tampere from Bertjan Broeksema on Vimeo.

As you can can see, we found a good bar before arriving at the university. If this looks somewhat familiar, yes there are two similar kind of functionalities but according to the experts one is completely outdated and should be removed, and the other is a buggy, disabled by default result from a GSoC project. I didn’t put the code online yet but that will eventually happen when I’ve implemented the basic ideas. Which are as follows:

  • User can save and load tracks
  • User can store training info for a specific track
  • Simple training overview (i.e. distance, avg speed)
  • Support for a simple profile (at least the application should remember where I live

Can’t wait until I release the code? Feel free to contact me if you want to work on this as well. All in all I’m quite satisfied with the result already, given that this was only like five hours of semi concentrated hacking. A bit longer term feature which I’d like to get in, is really good support for creating tracks in a convenient way. But that will need some additional reading and poking around I guess. Marble guys you rock! This thing is becoming really usable and I’m quite sure there are a multitude of applications that should have Marble in there (KAddressbook anybody?!). Over and out now, don’t want to miss my plain tomorrow.

Tagged | 4 Comments

Akademy 2010

Time flies when you have fun… indeed it does. The conference part of Akademy is over already. Lots of cool things happened though. It started good at the Tegel airport, where I to my surprise (and their surprise as well) met my fellow kdabians Andreas Hartmetz and Patrick Spendring. Yay for not having to travel alone. This also resulted in me not spending 5 hours at Riga airport but actually visit the city. We where warmly welcomed at one of the squares by a bunch of cheering people wearing orange t-shirts….. Nice, just on time to be witness of 1 – 1 in the NL – BRA match. The second goal we saw as well while enjoying one of the local beers.

When heading back to the airport we just missed the bus. Ooh well we’re not in the middle of nowhere here, right? No kidding, but the next bus to the airport on left 20 minutes later. When it finally arrived it was full of locals. Hey, you don’t look like traveling to the airport and you neither and you and you and you. Indeed, the bus stopped at every freaking bus stop. Boarding started at 19:45, we arrive at the airport at 19:57, rushed through security and where just on time to catch the plane. That was a good shot of adrenaline! In the plain we teamed up with Jos van den Oever who had the same flight as well.

The conference itself was quite nice as well. Many very nice talks and some outstanding talks. Its really great to so see how technologies mature and evolve. Of course it is terrific to meet lots of familiar people for the first time or to see them again. It is also great to see so many new faces at Akademy. I also had some observations which I found quite fascinating:

  • Compared to a couple of years back there seems to be a relative high number of people which in one or the other way is on a professional base at Akademy.
  • It is the first Akademy I know of at which four companies in a row are telling us that they’re hiring.
  • and to add to the fact list: Aaron has an endless amount of hugs which he gives out for free if you don’t watch out….. if he had drinks hugs transform into lap dances…. It has to be said though, that he does the lap dances in a very elegant way!

Though I did not present, I did chair part of the Saturday program in room two. That was quite a fun thing to do, though I could have put my motivation device to better use I guess.

One thing left for me, the e.v. assembly. First time for me so we’ll see what it brings. After that Akademy comes to an end for me already. Tuesday I’ll be flying back to Berlin. Is there anyone by change taking the flight to Riga at 08:00 am as well? Let me know so we could share a taxi in that case.

Tagged | Comments Off

Giving insight in code quality

Lies, damned lies…and statistics. In KDE we have several means to guarantee the quality of our code base. These means include unit testing, continuous building on several platforms, policies on various levels and krazy2. What we fail to do in my opinion is to give insight in how these measures actually affect the KDE SC. That is, we do not have a high level overview on how these measures develop over time. Of course, not all measures taken are quantifiable but some of them are. Take for example unit tests, it would be nice to see at various levels the number of tests, the number of performed test and the success ratio for each release. Giving insight in these kind of numbers, again in my opinion, can be a good argument in conversations on high level (e.g. convince someone of deploying KDE SC or convincing someone of basing his software on the KDE SC stack). Of course this only works when the numbers are convincing.

For this reason (and also because I have a fetish for numbers) I started to extend the architecture of krazy2 in order to be able to extract quality related information of our code base. The first step to achieve this was to add XML output support to krazy2. This made the tool set a bit more flexible. The next step was to add support for sloccount to krazy2. The raw number of issues is relatively useless for comparison over various releases as this is only meaningful in relation to the lines of code of both releases. Krazy2ebn was replaced by krazy2xml which generates a set of XML files on component/module/submodule level. A set of XSLT style sheets transforms these files to HTML for the ebn website. Additionally I wrote a small tool which parse these XML files and put the result in a database (currently there is support for sqlite and postgres). Up to this level the tools are currently in a reasonable shape. So the numbers are there, but how do we give insight in these numbers? The EBN site had some statistics but these where really marginal. As I’m lacking both SQL and PHP skills I had no good idea on to do this, until a prof. of mine pointed me to IBM’s manyeyes. This site lets you upload a dataset in a simple plain text format and than provides you with various ways to create visualizations for the uploaded datasets. So another tool added to the chain, db2maneyeyes. Look at the source if you’re not convinced about my lack of SQL skills, ooh well we don’t really care about performance anyway in this context. Its not finished yet, that is I’d like to add some more export functions but the first results are there.

So, the lies^Wstatistics. I created a topic center on manyeyes for KDE SC related statistics. You’ll need a Java enabled browser for interactive browsing of the data. It does not contain many datasets yet but that will change the next weeks. Most datasets currently there, show how the SLOC evolved over time on various levels for the KDE4 life cycle. Some random facts related to SLOC:

  • The code base became ~1.5 times larger in the time span KDE-4.0.0 – KDE-4.4.0
  • The code base contains according to sloccount 23 different languages,
  • of which the largest part consist of (how surprisingly) C++.
  • followed by XML, ANSIC and csharp.
  • The three largest modules are from large to smaller: kdelibs, kdepim, kdeedu.

Well, all of this you can see yourself in the various visualizations I created. This of course does not say much about the actual quality. Here I still have some work to do but the first data is there. Number of issues vs. lines of code on component level. If you select in this graph both the number of issues and the ratio we notice two things:

  1. The number of issues fluctuates over the various releases, not showing a clear downwards trend.
  2. The ratio (#issues/loc) does show a clear downward trend.

Conclusions: The number of issues (as said before) in itself is not a useful measure. We (as a community) do seem to care about the issues reported by krazy2 and actually fix them, or at least make sure that new code has fewer or no issues.

Final remark, how useful the checks performed by krazy2 are is a different discussion. I definitely would not suggest that these numbers show that the overall quality of KDE SC is improving. For that we would need similar statistics from different quality measures we have. Anyway, it’s a start in giving insight in these statistics. If you’re interested in this topic, want krazy2 stats at a specific level or have ideas for improvements please find me in #ebn on freenode.

And finally:
I'm going to Akademy 2010

I is a chair at Akademy 2010

Thanks, Adriaan for providing this wonderful graphic!

Tagged , | 12 Comments

Spring: cleaning up

Its the time of the year again, spring. Meaning, time for the big clean up. Not the house of course, we leave that to the people feeling themselves called to clean houses. I’m talking about code. Over time we, as in developers, come to the conclusion that this neat API we wrote is not as neat as it looked in the first place. So we redesign our API and BIC lovers as we are, mark methods that should not be used anymore as deprecated.

This is of course very nice, but how do we actually get an overview of the methods that are marked as deprecated, more important how do we get an overview of how often deprecated methods are still in use? Well, kdevcpptools to the rescue. As I blogged before I worked on a C++ query and transformation engine for KDevelop as part of my Msc thesis. After an awesome lot of stabilization work of Milian "code monkey" Wolf, the plugin really is in quite a nice state at the moment and I would say quite usable. In the rest of this post I’ll show how to use it by taking kdepimlibs as an example.

The first thing to do is to create a query file containing the queries for the deprecated functions. The query (and transformation) file is a relative simple XML file describing the queries and optionally transformations. There are some example files in the git repository, but don’t jump to that yet as I want to show a cool feature Milian implemented.  Lets find the deprecated functions first. Okay, no fancy tools for that (yet?), so just a plain ack-grep on "KDE_DEPRECATED" in kdepimlibs. This gives me a lot of files containing this macro. We startup KDevelop and create a new empty query file with the following content:
?xml version="1.0" encoding="UTF-8"?

(Yes, the formatting sucks, don’t know how to make drupal output XML in a sane way). Next, open one of the files that showed up in the results of your grep and go to one of the deprecated functions. Make sure that you have the kdevcpptools tool view enabled and right click the function. The context menu shows you this wonderful option: Copy XML query:

Copy the resulting XML from the clipboard into your file and there you go, its that easy. I did this for whole kdepimlibs, you can find the file here. First observations: we’ve 47 deprecated functions in whole kdepimlibs and we really have  a shitload of date string formatting functions in kdepimlibs, so…. if you need one…..

When you’re done putting in your queries in there, you can open the XML file in the kdevcpptools plugin. This enables you to query a project. I did this for kdepim, meaning, the result I get are all uses of deprecated functions from kdepimlibs in kdepim. This result was so to say pleasantly surprising. Have a look at the next picture:

In the lower part of the screen you see an overview of the results for kdepim. Vertical axis are files horizontal axis are queries. First observation here: there are surprisingly few uses of deprecated functions. (Note: we’re only talking here about functions, explicitly marked as deprecated). Second observation: though not particularly interesting for such few results, the curve you see in the total column looks promising. I.e. most uses in only a small part of the files. Meaning, only a couple of files need "major" changes. This must of course be seen in perspective, this reasoning is not particular interesting when you only have seven hits for a ~700KLOC code base. Would you had have a similar curve for ~8000 uses this would of course already make a lot more sense. The curve in the total column will actually tell you how the migration work is distributed over the code base you’re querying.

So, my initial idea was to do a bit of clean up in kdepim but it turned out that this is hardly needed with respect to this specific area. The good thing is that we now know that it isn’t needed. This means that as soon as KDE5 development kicks off we can get rid of these deprecated methods without a lot of work (in kdepim at least). I’d really encourage anyone interested in these kind of analysis on his/her code base, either from a lib perspective like I did, or from an application perspective (anyone interested in building up a query file for qt3support and query the various kde modules for it?) to checkout the plugin[1] and play with it. You can find me and Milian in KDevelop if you need help on getting started with the plugin.

Next step is using the transformation engine to do (at least part of) the migration work automatically. I hope to blog about that soon. But first a thesis to finish…..


Tagged , , | Comments Off

Yet another *full blown* database [1]

(Warning: long post coming up. Read on if you are interested in performance of the various Akonadi database back ends).

Last weeks I’ve been looking (again) whether or not it was possible to create a working sqlite back end for Akonadi. The last time I tried was around august last year and by then sqlite just wasn’t able to meet the multi-threading requirements that Akonadi has for its database back ends. A couple of sqlite releases later things seem to have changed. I managed to clean up some problematic code paths in Akonadi server and voila, we’ve a sqlite back end that is on par (unit test wise) with the mysql back end.

There are some catches of course. The first being that the default sqlite driver that comes with Qt does not work. It uses the sqlite API in a non-blocking way. So I had to adjust that to make the driver consistent with the sqlite documentation which states that when sqlite calls return BUSY, you have to wait a bit and try again. This custom driver can be found in kdesupport/akonadi/qsqlite.

The next catch is related to performance. Though we did not had the numbers until now, it was expected that sqlite would perform worse compared to mysql. Given that we have another back end, postgresql and are working on yet another one: a file system back end, it seemed a good time to do some benchmarking. So I brushed up the item benchmark a bit and performed the benchmark for all back ends. The benchmark tests the performance of Akonadi over two dimensions. The first dimension is the number of items and the second dimension the size of the payloads. We used the Qt data driven test framework and added rows to it like: ${number-of-items}-${payload-size}. Then we benchmark the important actions of Akonadi, which are creation, modification, fetching and deletion of items. This enables us to see how the performance scales with the two dimensions.

Before getting to the results I first have to make a note about the file system back end. This one is designed to be a fall-back for large payloads. The idea is that database actions become too slow for large payloads. So at a certain offset Akonadi doesn’t store payloads in the database but in plain files. The benchmark for the file system back end is set up to always write the payload to the file system. This enables us to find out the offset that gives best of two worlds, i.e. fast performance for small payloads by using the database and fast performance for large files by using the file back end. (Note: currently the file system back end is disabled by default, there still are some issues with it that need to be sorted out).

Sooooo, show me the numbers, I hear you thinking. Well, here you go, lets start with item creation:

Item creation benchmark

The image show the results scaled logarithmic on the x-axis (time on y-axis, but relative due to logarithmic scaling on x-axis). As you can see the, the files system back end (akonadi-fs) is hardly influenced by the file size, only by the number of items. For the other back ends we see that file size has influence on the performance too, but roughly scale  linear. We also see here that sqlite does not perform as well as the others. Lets have a look at the absolute numbers:

Item creation benchmark scaled linearly

The y-axis now shows time in msecs. The graph now shows us clearly that when items get larger and the number of items grows too, sqlite is clearly outperformed by all other back ends. We also see that databases in general don’t cope well with large payloads, which is exactly the reason to provide a file system back end too. First conclusion, don’t use sqlite unless you have very strong restrictions on your environment. (Which is the case when running Akonadi on Windows CE for example, where the number of processes is extremely limited and which we are working on here at KDAB). Still not convinced about sqlite performance? Okay, lets have a look at one more. Item modification:

Item modification benchmark, scaled linearly

Again, we see that sqlite is outperformed by all other back ends as soon as the payload size  becomes large. When the payload size grows we also see that only the file system back end doesn’t start to grow exponentially like the database back ends do. So, sqlite works, might even work fast enough for you, but is definitely not fast enough for the general use case Akonadi was designed for in the first place: handle many large items. Again, unless you have very strict requirments on the environment where Akonadi is used.

The last thing I want to show is a benchmark with different payload sizes for 2500 items. This makes it easier to find the cutoff value for the file system back end. I.e. what payload size should be used to store an item using the file system back end in stead of the database? First the images (I only compared mysql and fs to have slightly clearer graphs, you can find the full results at the links posted at the end of the blog):

Benchmark for creation of 2500 items

For creation and modification the cutoff value seems to 8 KB. However, fetching, which is also an often performed operation, has a cutoff value of 512B. A good trade-off between those two is probably around 4 KB.

So that’s all for now. Short recap. The sqlite back end for Akonadi seems to work, though its about ~5 times slower. Also, there are already some problems reported, so it still should be considered as a work in progress. Work on the file system back end is ongoing but seems promissing and with the right offset and file system/database combination (i.e. mysql) we get best of both worlds. Thanks for reading!

Links to the full (interactive) results:

  1. Multiple item counts, multiple sizes
  2. 2500 items, multiple size

Update: For a better comparison between database and fs back ends I added another benchmark which also uses 2500 items but only goes up to 8 KB payloads. Check the results here:

2500 items, multiple size, only up to 8K
[1] The title is meant to be a pun. Every now and than people pop up on the ML who think that sqlite is not a *full blown* (whatever they mean with that) database. Let me ensure you, it is. It supports SQL at a similar level as mysql, it does transactions and multithreading, it just tends to be smaller (and here I mean the library itself, not the database) and it does not run in a seperate process but it therefore has its limitations too.

Tagged , , | 25 Comments

What do you depend on?

There seems to be lot of interests in slimming down all kind of libraries lately. Anyone trying to bring his great application or framework to a mobile platform maybe? For other reasons I was interested in the information passed to compilers and figured that having this information available can of course also be used to write tool to analyze dependencies. Given that I’m writing my thesis currently and need a bit of distraction every now and than I started working such a tool. By the way, it was a nice opportunity to play a bit with the Boost Graph Libraries. So, without further ado I present: C++ Dependency Analyzer.

The tool reads a file with compile commands and creates a dependency graph of it. Yes, I am aware that CMake can create a dependency graph but that one is limited. Also, for other reasons I’m interested in other information you can get from a build too, but well that’s long term. Anyways, a graph of a complete project (e.g. kdepim) in itself is not very useful because its huge and cluttered. Therefore I started working filtering options. In the image you see the subgraph of kdesupport containing only the dependencies of akonadiserver. All this is in a very early state. However, colleagues of mine put heavy pressure on me to stop working on my thesis and make it available. Thanks Volker, Keven!

So, if you’re sure you want to play with it:

git clone git://

Short howto

  1. The source contains a script called: scripts/ Modify the contents of it (esp. the path) to meet your system.
  2. Set the CXX environment variable to let it point to the script.
  3. Build the project for which you want to create the graph. Make sure you use -j1 otherwise the resulting file will get borked.
  4. Start the depency analyzer and create a new project. Point the build session file to the file generated by the make.


As said, the whole thing is in very early stages. The UI you see for filtering is *only* functional for the targets part of it. That is, you can double click a target in the tree and update the graph. This will give you the dependencies for the selected targets only. The rest of the filtering is only UI. The current level of filtering is hard coded to leave out any source files and any object files, though both are part of the underlying graph.

The actual information comes from a file which contains lines existing of: the current working directory and the arguments passed to the compiler. I only wrote a parser for g++ and tested it only with g++ 4. It has been used already to create the graphs for kdesupport and kdepim though.

It is really slow for large graphs.


Some ideas to make you work on this application…. eh to document what I’d like to add to the application somewhere in the near future.

I hope to find some time to make the filtering a bit more complete. I already started working on the ui for it, but it still needs to get integrated in the graph logic. Especially useful would be to add duplicate edge filtering, because duplicate edges really clutter the graph at the moment.

A wrapper which stores the same information in an unborked way when using multiple compile jobs would be nice too, to reduce the time needed to generate graphs. However, this is a low priority thing for me.

Exporting graphs to svg would be nice too. This should be trivial, thus boring, thus don’t expect it too soon.


I’m quite convinced that this application has a great potential for maintenance and packaging purposes. But again (I can’t repeat it enough I guess) its in a very premature state at the moment. Expect failures, crashes, your cat eaten and who knows what more which you will have to solve yourself. Add to that that I’m very limited in time currently so feature requests will only be accepted in the form of patches.

Last but not least I’d like to thank our stl and boost guru Marc Mutz for helping me to get started with BGL!

Tagged , | 12 Comments