Scares of Technical Debt (Part I)

Let’s first get in the shoes of someone who is running an internet business. Typically, internet businesses are easy to start. You come up with a really good business model, develop a considerably good web site that provides a service that a user would want to. But business is not about just putting a product out into the market, but also surviving — sustain to generate at least adequate business activity and returns, but the target should be growth. So, an evolving business always grows in complexity and for an internet based one, the technical complexity of the application would naturally grow as well inevitably.

Cycle of developing technological business solutions

Let’s take a really simple scenario. As the owner, you see a new market segment that could use the transactions data that you have accumulated over time in being an operational internet business for analysis — let’s say your local authorities. Apart from the legal requirements of developing a feature to obtain fresh consent from the users to allow access for their data, major development effort would be required to expose the data necessarily. Provided the first mover advantage over a competitive market place where the barriers to entry to certainly low, the first thing a business won’t have is time. It’s so precious. But, when it comes to developing these, often a team would request is… well, time. This is always the case and why development teams always try to reduce the “lead time” because, more you get late, more opportunities you miss to earn a buck out of what you did. Technically, more the opportunity cost, more the revenue lost and more it costs to pay the developers on each minute or hour they spend on realizing the new service. But, being too fast and furious has its implications too — like a destructive damage to reputation caused by a major bug. So, this is like living on a knife’s edge and teams have much enthusiastically have adopted Agile, Kanban and shifts in paradigms like DevOps. But still, the effectiveness always depends on how well you execute it and hence, the controls that are supposedly should be in place when practicing might get only a look over the shoulder, meaning the product is prone to issues either in the short or the long run. So, the whole situation is tricky!

Now let’s take a look at the long run — “sustaining” again. As above, the application should be easily extensible to meet the demands quickly. If you as a team, can’t do that… then you might well be trying to repay a huge loan that you have taken from your Technologies — a large “Technical Debt”. Formally, this could be defined as the,

“Implied cost of additional rework caused by choosing an easy solution now instead of using a better approach that would take longer”

Normally affiliated with how the solution is designed, structured and implemented, technical debt is similar to building a house using a bank loan when you don’t have anything for you in the pocket, but still end up building cheap at a low quality resulting to take more smaller loans to patch up the house over and over again to a point, you might find yourself repaying loans from another loan consequently, amounting to a payback deadlock, called “Bankruptcy”. Now replace “loan” with your team’s “resources” — time, your bank, physical resources and so on. So, badly designed and implemented software products can go bankrupt “technically” they can find themselves no longer extensible after a certain point once the complexity of the developed system is no longer maintainable. This is the danger of unhandled “Technical Debt”.

Easy solution to extend the multi plug, but where’s more space for a new plug? Feeling a dead lock?

Lehman’s Laws of Software Evolution

This has 8 laws — developed one by one from 1970s over the next few decades. Out of that, the E-Programs are the more relevant form of software to a internet business which by definition is,

“Developed to perform some real-world activity. Behavior is strongly linked to the environment in which it runs”

The first 2 laws — “Continuing Change” and “Increasing Complexity” is the conundrum that was discussed in the introduction. The application needs to evolve, but that increases technical complexity which is not optimal since complexity brings problems. For instance, changes can lead to unforeseen logical or feature level bugs, developing layers on top of layer could lead to incompatible boundaries as in adapter violations and some old layers could even grow stale technologically or by use in the application logic and so on. Therefore, Lehman is right… unless managed, technically you are not so far from bankrupt.

How do you see it’s coming?

As in with a loan, technical debt doesn’t necessarily halt your spending power or power to do changes to the code. But, let’s say, if you ask the question,

1. “Am I more comfortable and confident in making an estimate for the tasks in hand now… or was I before?”. If you are not, probably the team is exhausting itself with more and more unforeseen work — work to redo or improve some the work done now or before that has an impact on what is being now.

2. “Is my team fast now or back then?”. Typically, due to accumulation of experience and maturity on the technology stack overtime, apart from the initial stages of work probably consolidating a technological platform that the application could further grow easily… one must ideally see the team being able to deliver faster now than before. If not… well, there is something in contrary to the above conditions.

So, technical debt is rather a hidden illness and you could only see that through the symptoms retrospectively like asking questions like above. Also, if you are being handed over the technical leadership of a team that had already accumulated some technical debt (which happened to be one of my own experiences, taking over a team) … you immediately know that the team has some underlying problems when you are being briefed at the start by the delivery manager, “Mallie (“Brother” in Sinhalese), I’m feeling bad about the delivery date now because, for the last few sprints we have continuously not hit the tasks that need be completed. Can you see to this and fix what’s wrong?” (Seriously, true story and may be let’s not discuss my first impressions joining this team therefore).

But these could be clearer to the developers as these symptoms are often materialized in terms of “Improvement” tasks or “Bugs” in your task board, where “bugs” being the scariest ones. For instance, if there seems to be a trend that, often tasks that are moved to manually testing get moved to “QA Halted” column and get created a bug task in the “To-do” backlog… and may be sometimes these improvements or bugs go in a cycle from “Development” to “QA” and back to “Development” so forth, depicts a probability of an incomplete specification. But, mind you, this is not always the fault of the BA or somebody who wrote the specification (in fact, can be totally innocent), but as a result of unsupported infrastructure, code logic, technology or even architecture for the new feature that is written on of the existing code base. These can easily be surprises for a team that unearth with development, if the initial system design is really bad. And bad surprises aren’t good news, irrelevant of the context!

On a business level, if you see your business impeded, that is, if the spending doesn’t reflect the amount of realization, there is a good probability that the development team is struggling with delivering fast enough due to some internal technical impediment.

What you see in role perspective, when techical debt exist

How to tackle it?

Typical management of risk or issues have very common formula: Identify, Measure (Quantify), Evaluate, Plan, Execute, Monitor and Control. So, if you apply it here, you might,

1. Manually or automatically review the code, infrastructure etc.

2. Based on the issues identified above, quantify by number or other metrics for severity

3. Evaluate the criticality of the issue and assign a priority and may be, to critically think whether it is worth the while fixing it?

4. Plan who the issues will be fixed in the short and long run — i.e., what issues to be fixed now in the current sprint, what are in the upcoming ones, what will be taken separately as an initiative, what will be totally ignored and how much effort and resources are to be put to get that done.

5. Execute the plan

6. Check continuously whether that is making any progress against the symptoms

7. Finally, if it making progress, make sure, that either no or only a tolerable number of known issues are introduced (Do a bit of Policing especially if the team lacks experience).

Typical Approach

This is easy… isn’t it? And might as well I would end this blog here saying, you can use something like a static code analysis tool i.e., SonarQube to automatically do step 1 and 2 and rest is up to how the team plan and execute. But that is a hugggeeee noooooooo ….!

This approach could be a rabbit hole. Remember, it’s only the symptoms that get visible? So, this way could well be treating the symptoms not the illness itself. Also, as long as you can live with a symptom (minor delays, marginally tolerable variations against estimates), you won’t have much care until it is unbearable or compelling. So, here are the risks,

1. First of all, you wait till it’s compelling. That is, the team have already deferred some work which is “accepting re-work or additional work” and those are now creating trouble meaning you are already have gone past the breakeven of cost vs revenue. So, in terms of cost, the moment the team agrees for an item that adds to the technical debt, the team is accepting an opportunity cost for the business in the future. The problem here is, according to “time-worthiness”, price of spending now for something is a more of a costly affair in the future. So, anything deemed not so worthy now could actually be worth at least a few more pennies later — an un-estimated larger cost which might or might not be bearable to the business when the conditions are met to actually rework.

2. A consequence of technical debt is the negative correlation to lead time. So, quantified technical debt based on your current velocity may well be deceiving and prompt the team to ignore. But with the negative correlation, the moment more and more reasons for technical debt are added, slower the team will be. So, technical debt measured now for a certain set of issues (symptoms) unattended, will be quantified with a higher value in the future — Technical Debt grow exponentially (Now, think this with the point above herewith). So, values can be deceiving and what I’d like to called being “sucked in the whirlpool”.

3. Tracking by number can be deceiving. Numerical goals for developers practically could easily get the minor fixes done quickly. But, does it really contribute to solving the problem? Might well not be and the most important, most impactful and those that would take the most time to complete (which often satisfy both the previous parameters) could still exist. It’s like chipping away only the tip of the ice-burg.

4. Symptoms requires root cause analysis to diagnose the true causes on technical debt. Apart from all the resources put to doing that and the associated costs, root cause could cause trigger chain reactions depending how bad the architecture is at the moment. Impediments to development caused by architectural issues might require architectural revamps, which means unless migrated properly and with care, the product run the risk of failures either at the migration time (i.e., down times, loss of data due to unplanned failure like in a DB migration) or at hardening period due to incompatibilities and so on, that starts from a smaller incident but propagates to major issues even at the client level. An instance of such a devastation could be a huge penalty to the business due to a newly introduced cache that falsely returns data of a cached user, for violating privacy laws. So, panicked approaches or trying to blanket actual issues for a quick win using an indirect or easier solution (may be the performance issue could be a due to DB design, not indexing properly or not doing it altogether, ill designed thread pool management to access DB etc. all amounting to technical debt. But here the measurement could well be “average response time” which is tried to be solved using this cache approach) might very well undermine “sustainability”.

So, all in all, numbers are deceiving and what said and being done to make the numbers look better can be an act of fooling yourselves. Hence, teams could easy be simply “myopic” and missing out an investing on a real deal. If the team and the business is lucky enough, they can get away with only as little as ‘a waste of some “time”’ but it’s not always the case for most instances. So, you are in a team that is either having or had a similar conversation like in this graph below… know there are more reasons to think about.

Symptoms been treated

In a practical team setup, given the limited time frame, technical debt always results in a dilemma between “improving existing code” vs “developing new features” when technical debt exists. So, approaching that the right way could be a life saver up to being a gateway to a successful software product.

In the next blog, let’s discuss what could possibly be the relevant tool set and analysis (especially on the qualitative side) based on my recent experience with reviewing some source code designed and developed by undergraduates which, shed light on some relationships between SonarQube and a preparatory version control analysis tool (using GIT logs at the core) which I developed recently. This made me look back in to some of the old expereinces to connect a lot of dots and to appriciate some of the approaches that I introduced back then too to myself and observe the relevance to any application context. So, taking those as case studies, will relate how the performance of the overall team at delivery and also team dynamics were affected with poor design policies that amounted to technical debt.