Lessons from Software Engineering at Google: Part 9 - Dependency Management

Lessons from Software Engineering at Google: Part 9 - Dependency Management

This is the ninth article in a series where we cover the book Software Engineering at Google by Titus Winters, Tom Manshreck, and Hyrum Wright. 📕 We will go over various aspects of software engineering as a process, including the importance of communication, iteration and continuous learning, well-thought-out documentation, robust testing, and many more.

Today we cover dependency management. The management of networks of libraries, packages, and dependencies that we don’t control is one of the most challenging problems in software engineering. We will discuss how we update between versions of external dependencies and how to decide whether to depend on someone else's code. Let's dive in!

Hidden costs of dependencies

One of the best features of the software engineering industry is the availability of open source solutions. For virtually any problem that might creep up in various software applications, there's most likely an open-source solution. You need to format or manipulate dates? There's a library for that. You need to keep track of the state of a form in a web application? There are open solutions for that too. This allows you to focus for the most part on the unique business problems that your software is solving. 🎯

However, as the book mentions, adding a dependency isn't free for a software engineering project, and the complexity of establishing an "ongoing" trust relationship is challenging. Importing dependencies into your organization needs to be done carefully, with an understanding of the ongoing support costs.

A dependency is a contract: there is a give and take, and both providers and consumers have some rights and responsibilities in that contract. Providers should be clear about what they are trying to promise over time - but that might not always be enough. The book brings up an interesting observation, that goes under the name of Hyrum's Law. 👇

With a sufficient number of users of an API, it does not matter what you promised in the contract: all observable behaviors of your system will be depended on by somebody.

By using external dependencies, your application relies on code you don't control, code that is released on an independent schedule, and can be updated in ways that you don't expect. It doesn't mean you shouldn't depend on it. It does mean, however, that you have to be careful how you use the dependency (and what you promise on the other side) and how you manage the relationship.

Reducing the problem complexity

In a large organization dependency management doesn't only refer to external dependencies. You might have different teams working on distinct parts of your system in separate repositories, which are then used in other parts of your organization. Synchronizing package versions of these solutions across separate repositories is also a dependency management problem.

According to the book, one way to reduce the complexity is by having all the teams working in a single mono repository, effectively replacing dependency management problems with source control problems. If you can get more code from your organization to have better transparency and coordination, those are important simplifications. 📉

The Holy Grail

The overarching goal of dependency management is to be able to use the newest versions of packages and perform upgrades easily. Getting to the point at which you can reliably stay current when it comes to project dependencies going forward, is the essence of long-term sustainability for your software. 🏆

The book mentions that the only way to achieve this at scale is, as with many other things, through automation. You need to build the processes around your software in a way that infrastructure upgrades over time can be performed by the same number of engineers, even as the codebase grows. That's key. Otherwise, the cost to your dependencies increases not only increasing number of dependencies themselves but also the overall codebase growth. 😬

There are a few elements that are essential to automate dependency management process.

  • Keep track of dependency versions. The first step of external dependency management is keeping track of the versions in use. Most programming languages and their respective ecosystems have a common way of defining these versions, often using Semantic Versioning.

  • Use notifications when new versions are released. Make sure engineers know about new releases of packages their systems depend on. Ideally, when a new version of a library gets released, it should trigger the opening of a merge request in all relevant repositories. That gives engineers the easiest way to act.

  • Auto-generate changes. The best and biggest open-source libraries often feature scripts that allow for automatic code modifications (commonly referred to as Codemods) with major version releases. They can speed up the upgrade process drastically and decrease the number of omissions.

  • Test and release. In large projects with dozens of dependencies, it's not possible to manually test every upgrade. Instead, your infrastructure should decide, for the most part, if a change is ready to release. Write your tests in a way that allows them to be leveraged in such situations.

Conclusion

That's it for today! Dependency management is often an underappreciated aspect of software engineering, however being able to upgrade to the newest versions is a cornerstone of long-term sustainability for your software. Here's a short summary of things we went through:

  • adding a dependency isn't free for a software engineering project

  • dependency is a contract: there is a give and take, and both providers and consumers have some rights and responsibilities in that contract

  • Hyrum's Law: all observable behaviors of your system will be depended on by somebody with a sufficient number of users

  • replacing dependency management problems with source control problems often reduces the complexity

  • for long-term sustainability, project dependencies need to reliably stay current

  • infrastructure upgrades over time should be performed by the same number of engineers

In the last part, we will cover continuous integration. We will touch on the inevitable nature of such solutions in growing systems and highlight what's most important for having a successful integration and release process. See you! 👋

If you liked the article or you have a question, feel free to reach out to me on Twitter‚ or add a comment below!

Further reading and references