What is it that makes big software projects hard to plan and keep on schedule?
In 1975 Frederick Brooks published a collections of essays on this topic, based on his experience building OS/360 . By Brooks own account, the project did not go well. The system was finished late, used more memory than it should, blew past the budget many times over and initially didn’t even work that well.
While grounded in specific anecdotes, the book aims to record general observations about the difficulty managing big software projects. Here are the key ideas from the book, with my added commentary on how it reads in 2020s.
The mythical man-month
The most famous concept of the book is the idea of “man-month”. It’s a mythical unit of productivity, defined as the amount of work that one person can perform in a month.
It’s “mythical”, because, well, it doesn’t exist! As the author points out, the effect of adding a person to the project depends on many variables: the nature of the work, the amount of coordination needed to collaborate with others, etc.
The non-existence of a “man-month” as a unit is further demonstrated in the thought experiments comparing the nature of different tasks.
Parallelizable work
Picking strawberries is perfectly parallelizable. By moving from one worker to two, we can expect to cut the time needed to pick the fruits in half. As we keep adding people, we eventually get diminishing returns. (Adding the 100th worker impacts the completion time less than adding the second one.)
Non-parallelizable work
Delivering a single parcel from Lisbon to Warsaw is not parallelizable. If one truck can carry the parcel, assigning more trucks will not make it arrive faster. (But it may improve the reliability if the second truck is a backup ready to take over.)
Software projects
Realistic software projects are somewhat parallelizable. When they’re understaffed, adding people does help to improve velocity. But as we add people, eventually we not only hit diminishing returns, we’re actually making the matters worse because of the increased coordination overhead.
The second system effect
The other famous idea from the book is the “second system effect”. It’s the curse of systems built as a successor to something good and simple that came before it.
OS/360 was the “second system” for most of its architects. The team previously worked on IBM 1410/7010 , IBM 7030 Stretch and Project Mercury mission control system .
As per Brooks, engineers working on the “successor system” have the tendancy to add unneeded features, or refine features of the previous system that are no longer needed. The author points out OS/360 state-of-the-art support for static overlays. (The system shipped at a time when static overlays themselves were becoming redundant thanks to dynamic memory allocation .)
A complete product takes 9x the time
Software can be treacherous to forcast, because of how easy it is to prototype something that just seems to work.
Building an isolated program that works when run by its author, is significantly easier than developing a program that can a) be run by anyone (a product, with documentation and tests) and that b) can integrate with other elements of a software system.
Brooks suggest that each dimension (from isolated product to part of a system, from prototype to production code) increases the amount of work by a factor of 3. For a complete product, take the amount of work needed to prototype and multiply by 9.
Conclusion
So much changed, so little changed. While the anecdotes about wasteful allocation of 26 bytes somewhere in some system routine read out of date, many of Brooks’s general observations on the nature of software systems ring as true today as they did in the 1970s.
It’s because they capture something fundamental about the nature of complex projects. We all tend to think that adding more people == faster progress, extrapolating our experience from picking strawberries to implementing browser rendering.
Let’s learn history so that we don’t repeat it :).