Lies, damned lies…and statistics. In KDE we have several means to guarantee the quality of our code base. These means include unit testing, continuous building on several platforms, policies on various levels and krazy2. What we fail to do in my opinion is to give insight in how these measures actually affect the KDE SC. That is, we do not have a high level overview on how these measures develop over time. Of course, not all measures taken are quantifiable but some of them are. Take for example unit tests, it would be nice to see at various levels the number of tests, the number of performed test and the success ratio for each release. Giving insight in these kind of numbers, again in my opinion, can be a good argument in conversations on high level (e.g. convince someone of deploying KDE SC or convincing someone of basing his software on the KDE SC stack). Of course this only works when the numbers are convincing.
For this reason (and also because I have a fetish for numbers) I started to extend the architecture of krazy2 in order to be able to extract quality related information of our code base. The first step to achieve this was to add XML output support to krazy2. This made the tool set a bit more flexible. The next step was to add support for sloccount to krazy2. The raw number of issues is relatively useless for comparison over various releases as this is only meaningful in relation to the lines of code of both releases. Krazy2ebn was replaced by krazy2xml which generates a set of XML files on component/module/submodule level. A set of XSLT style sheets transforms these files to HTML for the ebn website. Additionally I wrote a small tool which parse these XML files and put the result in a database (currently there is support for sqlite and postgres). Up to this level the tools are currently in a reasonable shape. So the numbers are there, but how do we give insight in these numbers? The EBN site had some statistics but these where really marginal. As I’m lacking both SQL and PHP skills I had no good idea on to do this, until a prof. of mine pointed me to IBM’s manyeyes. This site lets you upload a dataset in a simple plain text format and than provides you with various ways to create visualizations for the uploaded datasets. So another tool added to the chain, db2maneyeyes. Look at the source if you’re not convinced about my lack of SQL skills, ooh well we don’t really care about performance anyway in this context. Its not finished yet, that is I’d like to add some more export functions but the first results are there.
So, the lies^Wstatistics. I created a topic center on manyeyes for KDE SC related statistics. You’ll need a Java enabled browser for interactive browsing of the data. It does not contain many datasets yet but that will change the next weeks. Most datasets currently there, show how the SLOC evolved over time on various levels for the KDE4 life cycle. Some random facts related to SLOC:
- The code base became ~1.5 times larger in the time span KDE-4.0.0 – KDE-4.4.0
- The code base contains according to sloccount 23 different languages,
- of which the largest part consist of (how surprisingly) C++.
- followed by XML, ANSIC and csharp.
- The three largest modules are from large to smaller: kdelibs, kdepim, kdeedu.
Well, all of this you can see yourself in the various visualizations I created. This of course does not say much about the actual quality. Here I still have some work to do but the first data is there. Number of issues vs. lines of code on component level. If you select in this graph both the number of issues and the ratio we notice two things:
- The number of issues fluctuates over the various releases, not showing a clear downwards trend.
- The ratio (#issues/loc) does show a clear downward trend.
Conclusions: The number of issues (as said before) in itself is not a useful measure. We (as a community) do seem to care about the issues reported by krazy2 and actually fix them, or at least make sure that new code has fewer or no issues.
Final remark, how useful the checks performed by krazy2 are is a different discussion. I definitely would not suggest that these numbers show that the overall quality of KDE SC is improving. For that we would need similar statistics from different quality measures we have. Anyway, it’s a start in giving insight in these statistics. If you’re interested in this topic, want krazy2 stats at a specific level or have ideas for improvements please find me in #ebn on freenode.
Thanks, Adriaan for providing this wonderful graphic!