First a small remark for planetkde readers, this post is not in particular about KDE, but I believe it is an interesting read for the developers among the readers. This post is about language tools for C++ and in particular about integrating the clang framework in these kind of tools. The post became longer than expected so I’ll split it in two. In the first part I’ll give a bit of background why I’m interested in these tools. While in the second part I’ll go into more technical detail with respect to using clang.
I have been fascinated for a long time by language tools, in particular those meant for C++. This started back at the university where I had a course on software maintenance and evolution. This fascination was enforced by the internship I did at KDAB, where I worked on a KDevelop based code querying and rewriting tool for C++. A bit later I also wrote a small program to extract dependency information, which currently has the not so cool name Cpp Dependency Analyzer.
Starting a PhD basically killed all side projects for a good while, but lately I found some time and energy to pick this up again. The dependency analyzer is still far from being very useful for two reasons, first there is still some infrastructure missing, second the ui need some love. Two (recent) developments triggered my interest though, the first being my ever nagging desire to try out clang and second the post from Sandro Andrade on an implementation of hierarchical edge bundles. As an InfoVis researcher this immediately got my attention and I hope to see further improvements in this project. However, as my research is in InfoVis, I decided to have a look at clang first (i.e. one should separate work and spare time, no?!).
Lets set the context first. Why a dependency analyzer? It has been estimated that 80% of the software life-cycle is spent on software maintenance, while 40% of this costs is represented by software understanding. This observation makes tools that help with to simplify this take quite important. Although, not always directly relevant for individual developers (depending on the kind of information the tools provide), these tools can be of importance for integrators that want to incorporate software components in their product and therefore need indications about the quality and complexity of this component. Another use-case of particular interest for open-source software components, is those where third parties are interested in adding additional functionality to a component. Tools that help them understand the structure of the program could contribute to lowering the barriers.
The dependency analyzer currently extracts the following information from a compile log: compiled source files, generated object files, generated libraries and executables and linked libraries. Additionally, it can extract secondary linked libraries (i.e. libraries that are linked by those directly used in the project) in a separate run. You see what is missing here? Right, headers. What I wanted for a long times was to have some means to extract included headers from the compiled source files of a directory. Not only the ones included by the compiled sources, but the full tree. Ones you have this information you are able to calculate useful metrics such as build impact of project headers and build costs of source files (see build cost analysis). A feature I like to add at some point.
You might wonder why I do not use KDevelop which provides this information as well. Let me briefly go over this issue. KDevelop was built with a certain philosophy, providing an IDE with good support for a wide variety of languages. Being good engineers the KDevelop developers abstracted quite a lot of things to reach their goal, also in the area of language support. I really think they did quite a good job here, but it comes at a price. One of these prices is that the AST has been generalized as well. At the point where you want to do more complex things with languages (such as querying for code constructs or code rewriting) this becomes incredible painful. Here you come at a point where you want the most detailed representation of the source you can get. This is not a very strong justification for the needs I currently have, but I do have some wild dreams which would require such a representation. And, anyhow I just wanted to play with clang for a bit. To be continued…