By Jennifer Moore
If you build or test software, then you’ve almost certainly encountered some variant of the Test Pyramid. Its basic concept is sound: focus your testing resources on the kinds of tests that provide the most value. That is to say, unit tests, with only as many higher-level tests that you can do easily. Or reliably. Or quickly? Inexpensively?
The Fabled Test Pyramid
The Test Pyramid is good advice, but it is an insufficient guide to implementation (which is why I assumed that you’ve seen it before in some form or form, it may have been one of these).
People have attempted to improve the pyramid, but no alteration has made a significant difference because it’s so unhelpful at a practical level. The fact that there are so many versions of this simple idea is a problem.
Rather, it is a symptom of a problem: the root issue is that we don’t have the right words to describe these concepts. We talk about a “hierarchy of tests” that progress from low-level unit tests to high-level functional tests. Or are they called acceptance tests? Maybe end-to-end? Apparently, the only kind of testing we have a common, clear understanding of is unit testing. Everything beyond becomes blurry and ambiguous.
We seem to have so many versions of this test pyramid because people expect it to be a practical guide. Truthfully, it is a visual aid that describes the relationship between distinct tests. It’s not a goal, a plan, or an instruction manual — it’s merely trying to communicate that you should employ a broad foundation of unit tests before you spend resources on less well-isolated types of tests.
The Army We Have
Before we try to change the world, let’s first attempt to understand it as it is. We do have quite a few words to describe various levels of a testing hierarchy.
Unit tests are tests of the most basic conceptual units of your software system. How you define that unit will largely depend on whether you are following some TDD variant, like ATDD or BDD, or something else.
For the sake of this article, we’ll assume that it’s somewhere between single functions and groups of classes (or an analogous structure in your trendy functional language of choice). Regardless, unit tests can and should isolate the units from their dependencies and be narrowly targeted with as few side effects as the design allows.
Component tests are more significant unit tests. They might test larger units or collections of them, and should otherwise follow the same pattern as unit tests.
Service tests are end-to-end/functional tests of a web service.
End-to-end, user acceptance tests, or functional tests, are traditional QA tests executed through the regular user interface against a complete live system in a production-like environment (hopefully, not in production itself). These tests may or may not be automated and are rife with unmitigated external dependencies and often unknown dependencies.
Integration tests are tests of two live “things” interacting with each other. What those things are, depends entirely on who is using the term. These tests should (but often don’t) eliminate external dependencies.
There are numerous other terms beyond this list and countless others that are either redundant or refer to subsets of another category (such as smoke tests).
“The Enemy We Have”
This quote from Donald Rumsfield has a corollary, though I couldn’t find attribution for it: “you have to go to war with the enemy you have, not the enemy you want.” In other words, you have to acknowledge and understand your problem before you can solve it.
The vocabulary we use to describe tests is unsatisfactory, and I believe it’s because the way we attempt to categorize tests isn’t useful. We discuss “levels” of tests, but that only distinguishes unit tests from other kinds. Instead, we should consider the tests’ respective objectives and requirements. What is being tested, and who will conduct it?
Here is how I would define various categories of software tests:
Unit tests are whitebox (or graybox) tests of some basic unit of source code, which will require developer-level access and understanding to create. These tests are entirely isolated from external dependencies. The defects they detect are technical gaps or inconsistencies that could potentially cause crashes, generic error dialogs, the loss and corruption of data, or something else.
In addition to what you already think of as unit tests, this category would include any kind of static analysis and code review. My definition wouldn’t require that these be automated, but in most cases, there is no practical way to execute them manually.
System Integration Tests
System integration tests are graybox or blackbox tests that assess the interactions between two live systems. Developers don’t need to create these tests, but their usefulness may be limited unless the system and its components were built or modified by developers to be testable in this way. They can and should be isolated from all external dependencies, and doing this effectively may require developer support.
These tests discover the kinds of defects related to access, communication, performance, configuration errors, and unknown dependencies (e.g., accidentally relying on behavior in a dependency a bug causes). These tests should be automated.
Functional tests are blackbox tests of a complete live system in an environment that is reasonably production-like. These tests will have a high dependence on external systems, but these dependencies should be limited and managed.
The defects that these tests detect will be related to the system’s requirements and its understanding of them, as well as assumptions that were not well-known or well-communicated. These tests will also be sensitive to any defect that another test missed, though they will not be adept at identifying root causes.
Developers who built the software should not be the ones to create functional tests for it. These tests can be automated, but the numerous (and possibly hidden) dependencies can make them unreliable. Thus, some discretion is warranted regarding which ones should be automated.
I omitted many kinds of tests, but I want to emphasize my exclusion of component and service tests. Component tests are almost synonymous with unit tests; the difference is a matter of scale, not type. It may be useful to distinguish between a small and large test if you are a developer talking to another developer, but additional terms only confuse a general audience.
Similarly, service tests are functional tests. They target a service instead of a user interface, but the nature of what the test requires and reveals is identical. The term may be valuable if you are a test engineer speaking with a peer, but again, a general audience doesn’t benefit from the differentiation.
With similar logic, I specify “system integration” instead of integration. If different audiences are free to assume different contexts for the integration, then the term loses its practicality. System integration is a term that ISTQB uses to distinguish the integration of complete apps or systems versus subsystems within a self-contained application.
Terms I used but didn’t define above:
- Whitebox tests have full knowledge of the system’s design and implementation. They enable you to examine all of its parts and the way they fit together. A strict definition would require that this exclude any test that requires the system to actually be operating (whether traditional tests are “operating” the system is debatable).
- Graybox tests have some knowledge of the system’s design, implementation, and the ability to inspect the system’s state while it’s running. In practical terms, you can think of your test as a graybox if it entails attaching a debugger to something.
- Blackbox tests interact with only one system. The system may be configured in a way that facilitates testing, but it does not expose any special access to its internal state.
This article was originally published by a WWC community member on their blog, Jennifer++.