Deciphering the Test Pyramid

The test pyramid is a very frequently quoted model. I believe it originates from Mike Cohn’s book Succeeding with Agile Software Development using Scrum. Originally, the test pyramid is drawn with three tiers: UI, Service, and Unit, but google it, and you’ll find many adaptations and refinements. I really like this model, because it illustrates so much about how testing is done on an agile team. In this post I aim to present some ways of reading and interpreting it along some dimensions. The Test Pyramid

Who does the “testing”

Since unit tests are at the bottom of the pyramid, it should come as no surprise that developers will actually be the ones who create the greatest amount of test code. Unit tests are the best place to employ standard testing techniques like equivalence partitioning, boundary value analysis and various sorts of table-based techniques, which means that there’ll be quite a few of them (there are other reasons as well, of course, like TDD). This doesn’t say anything about the testing process as a whole, but the fact remains: developers will create the most test artifacts and do the most checking.

Ratio

Visually, the model implies that there’s a ratio between the layers, i.e, there’s a relation like 1:x between service level tests and unit tests, and there’s a relation 1:y between UI tests and service tests. Personally, I don’t think it’s meaningful to strive for a certain ratio as such. Different systems with different architectures and history will have different ratios. As long as there are more lower-level tests we should be fine. However, for reasons listed in the following sections, we really want the majority of the tests at the bottom of the pyramid.

Level of abstraction and Language

The higher up in the pyramid, the more domain-related the language, or at least it should be. Good unit tests most likely use domain concepts in their code and read as specifications, but they can get away with compact names related to the solution domain at times. This doesn’t work for higher-level tests, since they often work by orchestrating bits and pieces of quite complex test infrastructures in many cases. A typical example is a UI-based test of a specific scenario. The underlying test code will interact with a layer typically called “flow layer” or “scenario layer”, which in turn will orchestrate Page Objects or the like. So, basically, the test will talk to the test infrastructure using language like “login,” “open this customer,” “buy three drill presses.”

Cost

People often mention that the test cost increases as we move up along the pyramid’s tiers. For models that put manual testing at the top, this is certainly true. And yes, the top of the pyramid is inhabited by more complex tests. However, it’s not a truism that that the cost of such tests should be spiralling.

Tests at the top of the pyramid need more code and more moving parts, so they’ll be more expensive, but good teams will have ways of working and a test infrastructure that make the price of creating yet another higher-level test reasonable.

Tooling

Different layers/tiers require different tools. For example, unit tests will most likely rely on a unit testing framework and some mocking framework. In some specific cases, some kind of special-purpose testing library (used in unit tests) will be required.

Tests than run through the user interface will obviously use libraries for automating interaction with web pages, fat clients, or mobile apps. Tests in the middle tier will probably use the most diverse flora of tooling. They may include lightweight servers, things like Spring Boot, in-memory databases, libraries for managing transactions and test data setup; you getthe picture.

Also, BDD frameworks, if used, will most likely be used in the middle or topmost tier (or both), as well as tools for model-based testing.

Execution Time

As we move up towards the top of the pyramid, the tests have a larger footprint: they may require entire servers to be up and running, databases to be repopulated, a series of API calls over a slow network, etc. This naturally will affect their execution time. A corollary of this is that we should strive to push tests as far down the pyramid as we can.

Feedback

Related to the previous point. Tests closer to the top will execute slower and consequently provide delayed feedback.Not only that, but the quality of the information they provide will most likely be lower. A UI-based end-to-end test is usually not the best tool for error localization, since there’s virtually no practical way for it to truly understand the system(s) it tests—not at the level of granularity needed to provide detailed information about what went wrong anyway.

Communication and Stakeholder Involvement

Tests at the top of the pyramid can have a very distinct advantage over unit tests: they may be authored so that they’re interesting to non-technical stakeholders. A good implementation of ATDD, BDD, or specification by example will produce a manageable quantity of tests at a high enough level of abstraction to be interesting to non-technical stakeholders, given that the documentation part is relevant and well written.

The test pyramid also tells us something about environment dependence. Unit tests are, by definition, environment independent. Service-level tests will often make some assumptions about the environment: a port must be open, a process can be launched, there’s a disk to write to, etc. Finally, UI tests probably depend on pretty much the entire system to be running. Depending on the architecture and method of deployment, the environment dependence may become absolute. Try a Cobol backend + licensed database with enterprise features…

These are some dimensions I find useful to discuss in putting the test pyramid to work when deciding on a testing strategy. You may use others, so please share.