Exponential Growth of LoC is Not a Sign of Increased Productivity

By Gerald Mücke | September 14, 2015

Exponential Growth of LoC is Not a Sign of Increased Productivity

One of the essential cornerstones of agile software development is its transparency. Teams continuously report on their progress, completed tasks, obstacles, team events, and the quality of their work. The capture of metrics is as much a part of the craft as targeted improvements that can be demonstrated with the help of these metrics - what gets measured, gets managed.

Capturing code metrics is largely automated today with tools such as PMD, Findbugs, Sonarqube, and Code Coverage, and thanks to code versioning systems like CVS, SVN, or Git, it can be seamlessly tracked. However, as with all statistics, the values must be interpreted and put into context to draw meaningful conclusions. Individual values, taken out of context, can not only convey a completely different message but also lead to the wrong conclusions.

Lines of Code (LoC) is a metric to determine the scope of the code. There are various gradations, e.g., product code only, product and test code, configuration files, test data, etc. This distinction is significant for interpretation, as if no tests are written, the LoC of the product can grow faster. If tests are written later, the LoC of the product will hardly grow during this time. If LoC of test data are measured and contain many duplicate entries – which is not uncommon for test data – they disproportionately affect the LoC without making a significant contribution to the product.

A common misconception is to equate the growth rate of LoC with productivity. Certainly, high productivity makes the number of LoCs grow faster, but that is only one aspect of many. Poor craftsmanship lets the code grow just as quickly. Clean programming work always requires questioning the decisions made and looking for better, simpler, or more elegant solutions. This often results in opportunities to reuse code, simplify it, or even remove parts. These tasks rarely add new functions to the product but are beneficial in the long term as they increase maintainability and thus ensure a long-term return on investment. However, these measures have a “negative” effect on the LoCs, i.e., the LoCs will decrease, and the curve will flatten.

It is also not uncommon for projects that have passed the start-up phase to have a flatter curve. The reason is often that the implications of new requirements in the increasingly complex code must first be analyzed. This analysis takes time, the changes must be well-thought-out so as not to break existing functionality, and tests must be adapted. The larger the code base, the “slower” the growth.

If there is exponential growth of LoC in the later phases of a project without a significant increase in the development team, there is a problem. The newly added amount of code is often written “quickly”. Since the new code represents only a relatively small part relative to the existing code amount, percentage metrics such as Code Coverage, Documentation Coverage, or Duplicated Lines hardly fluctuate or are perceived as insignificant. The consequences are that tests are neglected - the coverage is already high enough - refactorings are not done, a few duplicated lines are okay.

The causes are often high time pressure, tighter budgets - not much more effort should be put into it – or the frequent withdrawal of experienced developers from the project, leaving less experienced development without peer control alone and possibly wanting to present themselves well or better.

The result endangers the maintainability of the product and thus represents a significant cost risk in the long term. Often a product outlives the team that cares for it. The code is read by others than those who wrote it. Poor maintainability makes training more difficult, missing tests pose a risk for future changes, and lacking documentation complicates understanding. All these are risks and cost drivers in the late phase of software, which could have been prevented by simple means.

The means of choice are practices of good software craftsmanship, pair programming, peer reviews, test-driven development, red/green/refactor, continuous documentation, to name just a few. The crucial thing is that the problem is recognized and addressed. Statistics like LoCs are a tool for this. Not a marketing or performance management instrument.

comments powered by Disqus