Every once in a while software needs to be ported. This seems to be more a rule than an exception. New versions of libraries are released, libraries are not maintained anymore or newer libraries outperforming the one currently used appear. So, there you’re sitting as a developer staring at the (potential huge) task lying in front of you.
Or imagine you’re the author of a library yourself. How do you determine how to split up your library in logical modules? Or where do you start when you want to optimize your library, given a couple of projects using your library?
These two scenarios are closely related to the research I do for my master thesis at KDAB. When doing work as described above, one would like to have tools that help him to estimate the work and where possible, to automate the work. For Java lots of tools are available for C++ however, it is a totally different ball game. Well, I guess most of you know why, parsing and analyzing C++ is, well, kind of difficult. There are good proprietary tools available, but part of my assignment was too evaluate how well a FOSS tool would do.
For several reasons, which I won’t outline here I chose to work with the KDevelop parser. It doesn’t do a full semantical analysis and unfortunately it doesn’t store all info I need during analysis but it gave me a descent starting point. The idea was to build a C++ query engine (i.e. find uses of type X, find uses of method X::y(), etc) and a transform engine (i.e. given an use of X::y(), transform it into foo<X>()). As said before, KDevelop doesn’t store all information needed to do a fully correct transformation but some useful transforms can already be done (more on that in another blog).
So, given a C++ query engine, a partly correct working transform engine and a (huge) porting job, what would we like to know and what can the plugin currently show? We start with putting in some queries which we can’t port (or even find) with our current scripts. We load the queries and run it on the project we want to port. First results:
In the tree you see an item for each file with as children results of the different queries run on the projects code base. Double clicking will bring you to the location in the editor. When a transform is defined for the query, selecting a query will enable the transform button and when transforms are applied you can view a diff. But well, as said the transform engine is a bit in a poor state still. However, at first we’re not directly interested in applying the transform. Remember, the customer didn’t sign any contract yet =;). So how do we know what to put in the contract in the first place? Lets enter the project view:
Here you see a more aggregated view of the results of the queries that where run. On the left side you see the files for which queries are run. On top you see which queries where run. Each cell contains the number of hits for a query in a file. Except for the first which shows the total of the row. Of course the results can be sorted for each separate column. So what can we learn from this picture? The first thing that is noticeable, when sorting on total is that the curve is broad at the top and getting narrow quite fast. Which means, given that the queries represent the amount of work to do, that most of the work has to be done in only a couple of files. Now, what kind of work do we need to do mostly? Lets sort by one of the queries:
Hey, that curve there on the right seems to correlate with the total curve. Hmm, so most work is due to query X. Aaaah, we know that this particular piece of code is [very easy|hard|very painful] to port so we need to bill our customer [a bit|a lot|an insane amount]. (But of course relative to the amount of file which are affected =;) ).
Up to now we’ve had the files as rows but it is also possible to switch the headers, which of course also changes the meaning of the image. Lets see:
Now we have the queries on the right and sorting on total gives you an overview which type/function/other construct is used most in the project you’re analyzing. Doing this for several projects using the library you’re maintaining/developing can help you in finding out which parts of your library you should optimize first, what the impact would be of changing the signature of that particular function or how to split up your library in modules when it gets too large.
Next thing I’ll be working on is visualizing the impact of transforms on the code base. Especially to show which parts of the documents are modified by the defined transform and if there are overlapping transforms. More on that later.
I didn’t release the code yet as its far from mature yet. However, if you feel brave and want to play with it, drop me a mail and I’ll send you the source. It will eventually become available under some free license (I should’ve read [ade] blogs better, I would have known which).
P.S. For who wants to know, these kind of views are called Table Lens views which are very powerfulf for finding trends.