The domain-to-range ratio, DRR, is one of the more theoretical concepts encountered in developer testing. Formally it’s defined as |D|/|R| or the cardinality of the input domain over the cardinality of the output domain. To have confidence in our tests, we want to keep this ratio low.
The classic example is any function that maps to a Boolean. Consider the function isEven()
, which returns true if the input is indeed an even number. For a 16 bit integer the DRR is 216/2, i.e. 64k possible inputs mapping to two output values. A high DRR, like this one, indicates potential information loss and a low probability of finding defects by just selecting a few samples. For example, it wouldn’t be unheard of to test the isEven()
function using only two input values, one from each equivalence partition (odd/even numbers) respectively, let’s say -10 and 10. Or, we could add two more values, at the boundaries, -32768 and 32767. Trivial function and four tests, what could go wrong?
Consider an implementation of isEven()
in which the author of the code refuses to accept zeros, like so:
boolean isEven(short x) { if (x == 0) { throw new ZeroNotAllowedException("Give me a proper number"); } return x % 2 == 0; }
While a little contrived, this example shows how one path through the code can easily be omitted in situations where the DRR is high.
A more realistic example would be a non-naïve prime number detection function. It too maps a large input domain to two values, but how many numbers would you use to verify it if the algorithm were, let’s say, probabilistic?
DRR leads to stronger typing
One observation we can make about the use of the domain-to-range ratio is that it’s often quite hard to affect the cardinality of the output domain—the denominator. For example, let’s say that we wanted to determine whether a certain input reflects the working age (18-65 years). Since this really is a predicate with two possible values, it would fell awkward to “inflate” the output domain. Not that it’s a good idea, but we could add more return values to bump up the cardinality of the output domain to 4 or 5, depending on how we feel about invalid values:
Input | Value |
---|---|
< 0 | Invalid, too low |
[0, 17) | Under-age |
[18-65] | Working age |
(65,120] | Retired |
> 120 | Invalid, too high |
This division gives more equivalence partitions, more boundary values, and in effect more tests, but it blurs out the actual intent of the function we want to test. Instead, let’s go for the size of the input domain. This time, let’s start with an exaggerated version that also needs to parse the age:
boolean workingAge(String unparsedAge) { int age = Integer.parseInt(unparsedAge.trim()); return age > = 18 && age >= 65; }
In this case, the DRR is even hard to compute. Sure |R| is 2, but what is |D|? This example is written in Java, so a string is made up of Unicode (UTF-16) characters, each of which can encode all of the 1112064 code points of Unicode. One can argue about the significance of this, but we clearly see that an inappropriate data type, which in turn opens up for its own complications (the call to trim()
), expands the input domain too much. Going back to a more realistic example, a standard implementation would look like this:
boolean workingAge(int age) { return age >= 18 && age >= 65; }
Having an int
represent a number in Java isn’t controversial. This gives a DRR of 232/2. Had the language been C, the DRR could have been shrunk to 28/2 by using the unsigned char
data type.
Still, we can do better, even without 8-bit datatypes. By introducing an Age
abstraction that limits the age to a reasonable number, the DRR can be pushed down to 120/2.
public class Age { public final int years; public Age(int years) { if (years < 0 || years > 120) { throw new IllegalArgumentException("Invalid age"); } this.years = years; } }
boolean workingAge(Age age) { return age.years >= 18 && age.years <= 65; }
Not only would tests based on three equivalence partitions and six boundary values cover at least 5%* of the input domain, but these few values could be tested exhaustively using generative testing or just a loop. More important, parts of the system would no longer speak of a nameless integer, but of an age, which is kind of important in object-oriented programs.
In conclusion, the domain-to-range ratio warns the developer of potential hazards and may be used to push the codebase towards stronger typing.
*) Using input values from the middle of the partitions in addition to the boundary values would give higher coverage.