Antifragile Software Culture

Nassim Nicholas Taleb’s “Antifragile” book has a very powerful observation applicable to software development:

Sensitivity to harm from volatility is tractable, more so than forecasting the event that would cause the harm.

Taleb, Nassim Nicholas (2012-11-27). Antifragile: Things That Gain from Disorder (Incerto) (Kindle Locations 339-340). Random House Publishing Group. Kindle Edition.

There are many software development measures to indicate the quality/wellness/adaptability/correctness of the code: cohesion, coupling, bug density, # of unit tests, code coverage, etc. Many software developers, managers & executives simply look to these measures as being academically interesting, but of little or no business value.

Taleb’s statement directly supports the value of software measures.

For argument’s sake, code coverage gives a measurement of the testing completeness. If code coverage is not measured or is close to zero, clearly any change is very high risk. As code coverage approaches a meaningful number (assume 80%), it’s easy to see that the volatility of the system is much more under control than the zero case.

Let’s take the opposite approach: no or little code coverage with reliance on hunches or guesses as to the ability to predict an event that breaks the system. In other words, the culture is to be completely reactive to bugs, customer complaints and so on.

In today’s world, software attacks are a totally new burden on the development team. This non-trivial burden is the last thing a team needs and fits the case where naive hunches regarding the vulnerability of software are 100% wrong. We have no idea where the next attack is coming from and do not have the time or resources to fix them. (But they must be fixed.)

The need for software development metrics is higher than ever.

Antifragile Software Culture

Building Software in Model-T Factories

Ford Motor Company frequently updates their factories for each new model. In fact, a factory will completely cease production whilst updates take place – sometimes for weeks at a time. These are planned, scheduled, detailed, negotiated, etc. as it is a very costly event.

Not performing these shutdowns is even more costly, as Ford would quickly go out of business.

Software people tend to have a different approach: we think updates can be done in place while delivering business value. Shutdowns are not what we do.

So then, how well do we do it? Do you really think you can refactor your object-oriented database interface into a NoSQL database via a series of 2-week iterations? What percentage of the team will work on the refactoring? How will the integration between the final refactored branch be handled? How are other new features & defects being integrated into the NoSQL branch? How are you handling turnover during this (better to pretend it won’t happen, huh?) refactoring effort? What percentage of your team has the intellectual bandwidth to manage this much concurrent change? Do you have it? Does your team really think they understand the depth & breadth of the change? How do they have the expertise to affirm that?

Let’s consider a formal “shutdown” to do some really heavy lifting.

If the team stopped delivering new features for 4 or 6 weeks would the customers really care? Even notice? Are your customers so demanding that frequent updates are a must-have? Have you ever discussed this with them? Would the team perform better with a simple definition of success? (The refactoring.) Would bugs be easier to triage & resolve? Wouldn’t it be great for developers, testers, documentation, etc to be able to have a single focused discussion(s)? Would progress be easier to track & evaluate? Would your confidence be higher at each step along the way? Would everyone be glad to have a definitive start/stop?

Perhaps shutdowns have some consideration after all.

Building Software in Model-T Factories

When to Rewrite Software

Eventually, old tools need to be replaced. Really. sweetdrillPerhaps most of us won’t insist on using this drill.

Many have argued that its better to update/refactor than to completely replace. However, even old tools have lifetime limits. What criteria would help with this decision?

Projects developed largely via on-the-job-training (OJT). (See previous blog.) The burden embodied within such a project has long crushed productivity, time to move on.

When there are little or no unit/automated tests (you have nothing to lose). It’s true that the lack of automated tests are a hardship, but it’s likely  that developing these tests for the old technology will simply be limited in use.

If your product suite hasn’t harnessed common productivity gains, each component has its own set of unique bugs. The rewrite or replacement affords the opportunity to get this done.

If your baseline technology was all the rage 10 years ago and has since languished, your risks are many and exposure grows daily. Is the technology being updated for security vulnerabilities; how is the talent pool going to get filled; can the old architecture support new technologies; how active/broad is the community support online? There are likely very competitive open source alternatives.

When the baseline technology environment is so entangled with old ways of doing work, it struggles to keep up with faster paced changes (for example, if it has its own editor and cannot support a plug-in editor). Clearly this toolset is behind the productivity curve, with a low probability of catching up.

Basically, like any tool that has a falling productivity advantage, not much analysis to support the change is needed. Does a new battery-powered, keyless chuck drill appeal to anyone?

When to Rewrite Software

On the Job Training Is NOT Scaleable

Training takes time & has opportunity costs. Assuming “on the job training” (OJT) is an effective alternative, in my experience, has many times the direct and in-direct costs.

Scaleable solutions can survive employee turnover, technology updates, process changes, etc year over year. OJT is diametrically opposed to scale.

Formal training can be scheduled, priced, delivered to a skill target and options (vendors) can be evaluated. OJT is concurrent and intermingled with daily work, which then has no start or end; it cannot possibly be priced or costed; what is a basic vs advanced skill; and  there is no way to compare one alternative to another (is Bob or Carl a better self-taught student? Is project A or B better to use OJT?) This is the happy part of the comparison…

Without guidance, the student may not be sure when the basic skills are exposed, understood (how many relationships exist in software?) or mastered. Will these be known in 2 days or 2 weeks? The only hope is that they know they don’t know and are vigilant towards gaining that expertise. As far as the notion “scheduling” completion of OJT, it would be better to assume it will never be completed.

OJT lacks a target of vision of success. The student may find small discoveries along the way that may lead to ultimate success, but the path will be un-predictable. Often times self-taught skills are accompanied by trial & error. In software, we call these bugs. Bugs are time-consuming and very expensive. As noted in many research papers, finding bugs late is always more expensive than early. Formal training should provide that definition of success that greatly reduces the opportunity for bugs from the outset.

Self-taught education can be fascinating and have a very positive retention, etc. – for that, I’m very accepting of the positive aspects of trial & error education. However, in a business world, these trials & errors become what is known as “technical debt”. Almost by definition, the self-taught person is going to introduce technical debt. Unless of course, they are enabled to refactor their prior art into sufficient quality. Any takers on this?

As these trial & error bugs ship, the team runs the risk of being painted into a corner by poor architectural decisions. It’s not the student’s fault, they don’t have the expertise to be aware of their blindspots or know of the upcoming pitfalls; they are happy to have shipped something. These bad decisions, over time (scale), can become very costly to mitigate or may even preclude other great options for the customer.

Worst of all, OJT destroys incentive to learn. After sweating & agonizing to get to some hard-earned threshold of knowledge that delivers a feature, it’s a very special person to then take on more personal risk of new challenges. It’s much safer to simply accept the status quo: “I’ve learned one way to get it to work and that’s good enough!” The company has neutered an incredible resource.

On the Job Training Is NOT Scaleable

Always Something for You

This article “Microsoft’s 16 Keys To Being Agile At Scale” is obviously about a large company. What was surprising is how some small, small changes are a part of the big company. Every team can harness some version of item #7 (bug count < 40 – always) to ensure quality is being maintained.

There are many other items useful to teams & companies of all sizes.

Why bother commenting on this? Well, there isn’t enough time to make all the mistakes in the world so it would be better to learn from others. Especially when considering scale; if an idea doesn’t scale, are you precluding outrageous success?

#14 is a wonderful test of your commitment to transition from waterfall to agile. The “hardening” sprints are a sign of management noncommittal, which is immediately sniffed out by developers. Do not expect developers to embrace agile if the management team is weak. I love how Microsoft bit the bullet on this.

Always Something for You

Status Quo == Obsolete Context?

Seth Godin’s definition of art is along the lines of

Art is unique, new, and challenging to the status quo.

Godin, Seth (2010-01-19). Linchpin: Are You Indispensable? (p. 86). Penguin Publishing Group. Kindle Edition.

If you get the opportunity to work with a stagnant team, the defensive of the status quo can be a challenging obstacle to overcome. So then, what tool(s) are available? How can the team move forward?

The book “Pragmatic Thinking & Learning: Refactor Your Wetware by Andy Hunt” discusses the value of context. This body speaks to the value of context in many, many activities.

What context defined the rule(s) of the status quo?

For example, the context of automated testing has changed with the advent of virtualization, such that many old rules simply no longer apply. As an artist, one needs to challenge the status quo. Perhaps it can be argued that the context of the status quo has changed and the current context invalidates the status quo.

It can be easy to defend the status quo, by simply stating: “We’ve always done it this way.” It’s harder to defend practices that were relevant in a clearly obsolete context.

Status Quo == Obsolete Context?

Vacation Stories for Agile

Has anyone created vacation stories for Scrum or Kanban?

Vacations are items that must be delivered. Who cares if they are actual “work”? However, consider what happens if these stories are in the queue; the work-item variation will simply take care of itself! All of those “missing” man-days? Present & accounted for!

Agile is supposed to be visual – vacation stories help make it so.

Vacation Stories for Agile