Let’s Not Turn the Test Automation Pyramid Upside Down Just Yet

A few days ago, I listened to Gojko Adzic’s talk “Humans vs Computers: five key challenges for software quality tomorrow” on Jfokus.  It was a great talk and it really gave some food for thought. This summary will not do it justice, but basically the plot was that our software is now being used by other software, and there’s AI, and voice recognition and a mix of all this will (and already does) cause new kinds of trouble. Not only must we be prepared for the fact the “user” is no longer human; we must also take into account new edge cases, such as twins for facial and voice recognition, and the fact that our software may stop working because someone else’s does. All in all, the risk is rightfully shifted towards integration and to handle it, we need to turn to monitoring of unexpected behavior. This made Mr. Adzic propose that we do something about the test automation pyramid. Turn it upside down maybe?

Personally, I vote for the test automation monolith :), or rectangle. I’ll tell you why. First, I have to admit that this talk made some pieces fall into place for me. My ambition in regards to developer testing is to raise the bar in the industry. I don’t want us to wonder about how many unit tests we need to write or how we should name them. Mocks and stubs should be used appropriately, and testability should be in every developer’s autonomic nervous system. But why? And here’s the eye opener: Because we’ll need to be solving harder problems in a few years (if not today already). Instead of, or more likely in addition to, finding simple boundary values to avoid off-by-one errors, we’ll also need to handle the twins using voice authorization to login to our software. Needless to say, we shouldn’t spend too much of our mental juice writing simple unit tests and alike.

That being said, we can’t abandon the bottom layer of the pyramid. Imagine handling strange AI-induced edge cases in a codebase that isn’t properly crafted for testability and tested to some degree. It would probably be the equivalent of adding unit tests to poorly designed code or even worse.  Yes, monitoring will probably play a greater part in the software of tomorrow, but isn’t it just another facet of observability?

So, what will probably happen next is that the top of the testing pyramid will grow thicker, maybe like this (couldn’t resist the “AI”):

Test Automation Monolith

The Developer Testing Maturity Curve

I’ve been talking about this topic for years, and have been asked questions about it while giving presentations on developer testing. Still, I’ve been unable to articulate fully what I mean by the “Developer Testing Maturity Curve”. Many wanted to know, because it’s tempting to rank your organization once there’s something to rank against.

After organizing my thoughts for a while I’ve finally come up with a model that matches my experience of how developer testing tends to get implemented across different organizations. Just a caveat: I may revise this model if I learn more or have an epiphany, but this is what it looks like today.

Graph Axis

The horizontal axis is technical maturity. It’s the knowledge and understanding of various tools and techniques. I consider employing a unit testing framework “immature”, which means that you need relatively little technical skill to author some simple unit tests. Conversely, implementing an infrastructure that enables repeatable, automated end-to-end tests would be on the other side of the scale.

The vertical axis is more elusive. In the image, I call it “organizational maturity”, but in reality, it means several things:

  • Understanding that developer testing (any automated testing, in fact) must be allowed to take time
  • Acknowledging testability as a primary quality attribute and designing the systems accordingly
  • Willingness to refactor legacy code to make it testable
  • Time and motivation to clean up test data to make it work for you, not against you
  • Dedicating time and resources to cleaning up any other old sins that prevent from harnessing the power of developer testing and automated checking, be they related to infrastructure, architecture, or the development process in general

You should get the idea… If you’re in an organization that has internalized the above, you won’t be throwing quality and (developer) testing out the window as soon as there’s a slightest risk of not meeting a deadline.

The Maturity Zones

The hard part of this model was to place the individual practices in the different zones. Fortunately, in my experience, technical and organizational maturities seem to go hand in hand. By that, I mean that I haven’t seen an organization with superior technical maturity that would totally neglect the organizational climate needed to sustain the technical practices, and vice versa. After having made this discovery, placing the individual practices became easier. Next, I’ll describe what they are.


Unit tests

Having only unit tests is immature in my opinion. True, certain systems can run solely on unit tests, but they are the exception. If you only do unit tests, you probably have no way of dealing with integration and realistic test data. From an organizational point of view, having only unit tests means that developers write them because they must, not because they see any value in them. Had this been the case, they’d engage in other developer testing activities as well.

Mocking frameworks

This is the only artifact on the border. Let me explain. A mature way of employing a mocking framework assumes that you understand how indirect input and output affects your design and its testability, and then use the framework to produce the correct type of test double. The immature approach is to call all test doubles “mocks” and use the framework because everyone else seems to.


Specialized frameworks

These are frameworks that help you solve a specific testing problem. They may be employable at unit test level, but they’re most frequent (and arguably useful) for integration tests. Examples? QuickCheck, Selenium WebDriver, RestAssured, Code Contracts. Some of these frameworks may be entire topics and areas of competence in themselves. Therefore, I consider it mature to make use of them.

Integration tests without orchestration

What does “without orchestration” mean? It means that the framework you’re using does everything for you and that you don’t need to write any code to start components or set them to a certain state.  I’m thinking about frameworks like WireMock, Dumbster, or Spring’s test facilities.

Using a BDD framework

You can use a BDD/ATDD framework to launch tests, hopefully the complex ones. This requires its infrastructure and training, so I consider it mature (barely). In this sector, you’re not reaping the full benefits of BDD, just using the tools.


Leveraging BDD

In contrast to the immature case, where the BDD framework is just a tool, in this sector the organization understands that BDD is about shared understanding and a common vocabulary. Furthermore, various stakeholders are involved in creating specifications together, and they use concrete examples to do so.

Testability as primary a quality attribute

Testability­—controllability, observability, smallness—applies at all levels of system and code design. Organizations that have internalized this practice ensure that all new code and refactored old code take it into account. When it comes to designing code, it’s mostly about ensuring that it’s testable at the unit level and that replacing dependencies with test doubles is easy and natural. At the architectural level it’s about designing the systems so that they can be observed and controlled (and kept small), and that any COTS that enters the organization isn’t just a black box. The same goes for services operated by partners and SaaS solutions.

Integration tests with orchestration

There’s a fine line between such tests and end-to-end tests. The demarcation line, albeit a bit subjective, is the scope of the test. An integration tests with orchestration targets two components/services/systems, but it can’t rely solely on a specialized framework to set them up. It needs to do some heavy lifting.

End-to-end tests

Anything goes here! These tests will start up services and servers, populate databases with data and simulate a user’s interaction with this system. Doing this consistently and in a repeatable fashion is a clear indication of internalized developer testing practices.

All test data is controlled

This is true for a majority of systems: The more complex the tests, the more complex the test data. At some point your tests will most likely not be able to rely on specific entries in the database (“the standard customer”) and you’ll need to implement a layer that creates test data with specific properties on the fly before each test. If you can do this for all test data, I’d say you’ve internalized this practice.

The entire system can be redeployed in a repeatable manner

If you have your end-to-end tests in place, you most likely have ticked off this practice. If not, there’s some work to do. A repeatable deployment usually requires a bit of provisioning, a pinch of database versioning, and a grain of container/server tinkering. Irrespective of the exact composition of your stack, you want to be able to deploy at will. Why is this important from a developer testing perspective? Because it implies controllability, as all moving parts of the system are understood.

And the point is?

Congratulations on making it this far. What actions can you take now? If you really want to gauge your maturity level, please do so. My advice is that you map the areas in the maturity curve to your organization’s/team’s architecture and practices, and start thinking about where to start digging and in what order.







Why we automate tests

Recently, I was asked by a friend to name a few reasons to automate tests. In this context unit or integration tests—the kind of tests that should be written as part of the actual implementation—don’t count. Rather, we’re talking about test automation done either after the code’s been written, or in parallel with the implementation work. After having pondered this for a while, I came up with the following reasons.

Please keep in mind that most test automation isn’t really about automating tests, but checks. Testing usually refers to a process more creative than the work of a computer running the same verifications over and over again. This said, automation can be used to test something to a degree. I’ll get to that later.

We automate tests…

…to prevent regressions and instill courage

The more tests we can run frequently and repeatedly, the safer we feel about making changes to the code. An extensive suite of unit tests can make us feel quite safe, but a battery of automated end-to-end tests, performance, and system integration tests lets us make major code changes in complex systems without the fear and anxiety that normally accompanies refactoring of non-trivial code.

…to make any kind of iterative development work

This is an extension of the previous argument. Suppose that our team builds the software in iterations (or sprints, if you will). In iteration one, it’s able to deliver three tested stories: A, B, and C. In iteration two, it manages to deliver two more stories, D, E, while ensuring that A, B, and C still work. Iteration three adds yet another three stories, F, G, and H. Now, had there been no automated tests up until this point, regression testing would take more and more of our team’s time and its velocity would drop iteration by iteration. In fact, a team’s velocity can become more or less negative in cases where a small change results in days or weeks of regression testing, or avoiding the change altogether.

In conclusion, it’s crucial that tests are automated in teams that aim to maintain steady or increasing velocity iteration after iteration. Or we can just write “agile teams”.

…to implement continuous delivery

Taking the argument one step further would be stating that continuous delivery, or continuous deployment, just cannot work without all tests being automated. Would you release your system to production several times a day if only its unit tests were automated?

…to mobilize more brainpower and to engage

Test automation is mostly developer work. It requires non-trivial infrastructure, provisioning, and system architecture. On the other hand, if test automation is left solely to developers, without input from testers, some interesting test cases may be left out because of creator’s bias. Therefore, it’s only natural that testers and developers collaborate around test automation, which results in more brains on the problem and more buy-in from everybody.

…to make the system testable

Just as writing unit tests will make basic program elements testable, so will creating higher-level tests to larger components. Developers who know that they’ll end up writing automated tests (checks) for their system will obviously design it to accommodate that. A trivial example is putting in proper ids in web pages, so what WebDriver testing becomes a breeze, but other examples are easy to find.

…to verify the acceptance criteria of a story

Yes, acceptance test-driven development (ATDD), behavior-driven development (BDD), and specification by example (pick your flavor) are mostly about collaboration and shared understanding. That being said, tests that emerge from the specifications based on concrete examples provide a phenomenal opportunity to formalize the act of meeting a story’s acceptance criteria. If the team decides that a certain story has three major acceptance criteria, one developer may start implementing the tests for them, while another starts implementing the actual functionality. For a while, the tests will be red, but once they turn green, there should be indisputable proof that the story has indeed been implemented according to plan.

…to find defects in permutation-intensive scenarios

Many tests—both unit tests and higher level tests—are just examples that confirm something we know about the code. However, there are types of automated tests that can actually find new defects. True, this only applies during their first execution (after which they become normal regression tests), but still.

What tests am I talking about? Generative tests and tests based on models (MBT), of course. Such tests operate given a set of constraints, rather than exact scenarios, and are able to produce states that our brilliant tester mindset hasn’t been able to foresee.

…to find defects 🙂

Finally, if you’re a tech-savvy tester, or a developer in test, why even bother with manual tests? To be usable in the long run, the need to go into the regression test suite anyway. Therefore, why not automate them from the start?

Well, that’s that: my quick thoughts on automating tests. Comments, rebuke, and feedback welcome.

Deciphering the Test Pyramid

The test pyramid is a very frequently quoted model. I believe it originates from Mike Cohn’s book Succeeding with Agile Software Development using Scrum. Originally, the test pyramid is drawn with three tiers: UI, Service, and Unit, but google it, and you’ll find many adaptations and refinements. I really like this model, because it illustrates so much about how testing is done on an agile team. In this post I aim to present some ways of reading and interpreting it along some dimensions.The Test Pyramid

Who does the “testing”

Since unit tests are at the bottom of the pyramid, it should come as no surprise that developers will actually be the ones who create the greatest amount of test code. Unit tests are the best place to employ standard testing techniques like equivalence partitioning, boundary value analysis and various sorts of table-based techniques, which means that there’ll be quite a few of them (there are other reasons as well, of course, like TDD). This doesn’t say anything about the testing process as a whole, but the fact remains: developers will create the most test artifacts and do the most checking.


Visually, the model implies that there’s a ratio between the layers, i.e, there’s a relation like 1:x between service level tests and unit tests, and there’s a relation 1:y between UI tests and service tests. Personally, I don’t think it’s meaningful to strive for a certain ratio as such. Different systems with different architectures and history will have different ratios. As long as there are more lower-level tests we should be fine. However, for reasons listed in the following sections, we really want the majority of the tests at the bottom of the pyramid.

Level of abstraction and Language

The higher up in the pyramid, the more domain-related the language, or at least it should be. Good unit tests most likely use domain concepts in their code and read as specifications, but they can get away with compact names related to the solution domain at times. This doesn’t work for higher-level tests, since they often work by orchestrating bits and pieces of quite complex test infrastructures in many cases. A typical example is a UI-based test of a specific scenario. The underlying test code will interact with a layer typically called “flow layer” or “scenario layer”, which in turn will orchestrate Page Objects or the like. So, basically, the test will talk to the test infrastructure using language like “login,” “open this customer,” “buy three drill presses.”


People often mention that the test cost increases as we move up along the pyramid’s tiers.  For models that put manual testing at the top, this is certainly true. And yes, the top of the pyramid is inhabited by more complex tests. However, it’s not a truism that that the cost of such tests should be spiralling.

Tests at the top of the pyramid need more code and more moving parts, so they’ll be more expensive, but good teams will have ways of working and a test infrastructure that make the price of creating yet another higher-level test reasonable.


Different layers/tiers require different tools. For example, unit tests will most likely rely on a unit testing framework and some mocking framework. In some specific cases, some kind of special-purpose testing library (used in unit tests) will be required.

Tests than run through the user interface will obviously use libraries for automating interaction with web pages, fat clients, or mobile apps. Tests in the middle tier will probably use the most diverse flora of tooling. They may include lightweight servers, things like Spring Boot, in-memory databases, libraries for managing transactions and test data setup; you getthe picture.

Also, BDD frameworks, if used, will most likely be used in the middle or topmost tier (or both), as well as tools for model-based testing.

Execution Time

As we move up towards the top of the pyramid, the tests have a larger footprint: they may require entire servers to be up and running, databases to be repopulated, a series of API calls over a slow network, etc. This naturally will affect their execution time. A corollary of this is that we should strive to push tests as far down the pyramid as we can.


Related to the previous point. Tests closer to the top will execute slower and consequently provide delayed feedback.Not only that, but the quality of the information they provide will most likely be lower. A UI-based end-to-end test is usually not the best tool for error localization, since there’s virtually no practical way for it to truly understand the system(s) it tests—not at the level of granularity needed to provide detailed information about what went wrong anyway.

Communication and Stakeholder Involvement

Tests at the top of the pyramid can have a very distinct advantage over unit tests: they may be authored so that they’re interesting to non-technical stakeholders. A good implementation of ATDD, BDD, or specification by example will produce a manageable quantity of tests at a high enough level of abstraction to be interesting to non-technical stakeholders, given that the documentation part is relevant and well written.

The test pyramid also tells us something about environment dependence. Unit tests are, by definition, environment independent. Service-level tests will often make some assumptions about the environment: a port must be open, a process can be launched, there’s a disk to write to, etc. Finally, UI tests probably depend on pretty much the entire system to be running. Depending on the architecture and method of deployment, the environment dependence may become absolute. Try a Cobol backend + licensed database with enterprise features…

These are some dimensions I find useful to discuss in putting the test pyramid to work when deciding on a testing strategy. You may use others, so please share.

Why I Put My Money on Developer Testing

I’ve always cared about the quality of my code. However, I didn’t always have the necessary tools to achieve it. For the first 6-7 years (of hobby programming in high school and during my university studies) I had to resort to what we call “manual checking” nowadays. Then my professional career began and I got exposed to different types of organizations and their ways of working.

Fast forward 15 years. By now I had worked in, or closely to, roughly 25 teams and I could see some patterns emerge. Basically, I’ve encountered three ways of working with quality assurance, and I’ve seen traces of a fourth. This is what I’ve found.



Cowboy/chaos teams: I’ve encountered these teams in small organizations or as guerilla teams operating under the corporate radar. Such teams have neither testers nor anything that resembles quality assurance. Their testing amounts to manual checking performed by the developers in conjunction with a release or if one of their rather frequent bugs is fixed. Such teams start out as very fast, but the code they produce crumbles under its own weight after a few months. Fortunately, the teams I’ve observed have worked in areas where bugs weren’t that much of a problem: nobody would get injured and they wouldn’t make the newspapers either. Bugs would “only” make the customers unhappy.

Teams with testers on the team: My experience of such teams ranges from newly formed teams struggling to understand how to work in a cross-functional iterative and incremental manner, to rather well-oiled ones. Unfortunately the first category has been dominating. In organizations that have practiced a strict division of labor, i.e. separate development and QA, forming cross-functional teams isn’t an easy task.  The mini waterfall iteration is a common anti-pattern, and it’s a result of old habits: Developers still throw untested code over the wall (although, the wall isn’t there anymore), and testers keep compensating for an inferior development process by just checking that the developers haven’t made any major boo boos. There isn’t any time to do more than that, since all testing is crammed into the last two days of the iteration.

Code quality may also be an issue in teams that are just starting out as agile/cross-functional. Back in the old days, they released once every few months and took the occasional pain of integration hell and manual regression testing. This way of working has left them unprepared for frequent deliveries, which in turn requires skills in unit testing, refactoring, and continuous integration and deployment.

The teams that I’ve seen that have mastered the basics of the above development techniques have been addressing the challenge of making agile testing truly work: make the testers proactive instead of reactive, plan all testing activities properly during iteration planning meetings, pair test/program, automate manual checks, and perfect exploratory testing.

Teams that only rely on developer testing: I’ve also been able to work in teams that relied solely on developer testing. In such teams pretty much all checks were automated and they’ve had thousands of unit tests, and hundreds of integration and end-to-end tests. They were also proficient in refactoring and continuous delivery. My experience is that these teams only lacked good exploratory testing and sometimes specialized types of testing, like usability or security testing. Given that they had a good product owner (and they were able to foster them to some extent too), the worst mistake they would do was to produce something less aesthetically appealing or not 100% user-friendly.

I don’t have enough evidence, but I believe that the difference between chaotic cowboy teams and teams that to developer testing lies in the mission criticality of their product. The teams that I have experience of that did developer testing all worked on software that had to be correct. In the absence of a set of QA activities and people that would perform them, they self-organized into embracing developer testing.

Organizations with separate QA and development: Despite more than 15+ years in the industry, I haven’t seen the true waterfall setup up close. However, I’ve seen traces and residues of it when working in banking and the travel industry. To be fair, we have to acknowledge that banks and flight booking usually work, so obviously it works to separate development and testing. Then again, the code in these systems is really hard to change and few people dare to touch it, much less refactor or delete something. This is a result of hand-written test protocols, rather than test code, I believe.

Given this experience, I’m convinced that you should put your money in the developer testing basket. This is why:

  • Teams that do only developer testing can improve with good testers, but manage without
  • Teams that aim to become cross-functional and good at agile testing will be helped by developer testing practices, since they’ll make their code better and free up their testers’ time
  • Chaos teams that do cowboy coding have a fighting chance to improve their code and quality if they engage in developer testing

As always, your mileage may vary, but my unscientific study of 25 teams in banking, the travel industry, gaming, the public sector, and directory services has pointed me in the direction of developer testing. I’d love to hear your stories, examples, and counter examples.