Expected Behavior
When writing bugs and adding planned test cases to a database I need to define the expected behavior. This task seems very simple on the surface. The unwritten part of the test case bothers me. What do we NOT expect? Good validation in an automated test case means knowing the answer.
When I add 2 and 2 I expect 4 for the sum. But do I? 2 plus 2 is 5 if I'm adding high values of 2. When I add 2 and negative 2 I expect zero. Also, I expect 4 what? The number 4 to appear. For how long? Where? How quickly should it appear? How much detail will it have? How big is it? What font will it be in? Do I have control over the default? Will it always be the same color?
What do I not expect? I do not expect the calculator to crash. I do not expect the number 4 to flash. I do not expect the number to be partially off of the calculator. I do not expect the battery life to be so low that I can't add twice. I do not expect it to clear out so quickly that I can't see the answer. I do not expect an error message that the format must be 2.000 or 02.00. I do not expect the number to be upside down.
I'll be learning what is expected and what is not expected as I get experience with the system under test. Testing earlier in the development process means I can help the team define what is expected. Last week I worked with the team twice to define how a bug I found would be fixed and the expected behavior changed for the better. Compared to the waterfall projects I've been on in the past, this was really cool! This involvement in defining how the software SHOULD work boosted my morale. I feel good about my team despite my awareness of the expiration date of my work.
Static expected results in planned tests are often a waste of time. Expected behavior is not the "best behavior". It isn't even what the majority of our customers expect. It is our best prediction at thetime we planned the scripted test. I dream of test automation with genius validation which can learn to update it's own expected results. What if "expected behavior" improved itself and updated automatically without tester intervention? Imagine the regression test suites understanding instantly that when run on the Mac OS that they now expect About information under the Apple menu without you coding it. If your tests knew that the system was Japanese, so that string is now the equivalent glyphs instead? It would be powerful for the regression test to understand each environment and teach itself what is expected rather than a tester predicting every behavior that is and is not expected. Screenshot recapture is not needed with UI changes when the automation now expects a new button. The future of UI test automation could be more exciting and maintainable if the expected behavior was dynamic. That would also free us up to make test inputs dynamic.
Expected behavior is dynamic. Unexpected behavior is even more unpredictable. Trying to define it in advance doesn't make much sense. Test planning that defines the point of the test is more useful in practice.
When I add 2 and 2 I expect 4 for the sum. But do I? 2 plus 2 is 5 if I'm adding high values of 2. When I add 2 and negative 2 I expect zero. Also, I expect 4 what? The number 4 to appear. For how long? Where? How quickly should it appear? How much detail will it have? How big is it? What font will it be in? Do I have control over the default? Will it always be the same color?
What do I not expect? I do not expect the calculator to crash. I do not expect the number 4 to flash. I do not expect the number to be partially off of the calculator. I do not expect the battery life to be so low that I can't add twice. I do not expect it to clear out so quickly that I can't see the answer. I do not expect an error message that the format must be 2.000 or 02.00. I do not expect the number to be upside down.
I'll be learning what is expected and what is not expected as I get experience with the system under test. Testing earlier in the development process means I can help the team define what is expected. Last week I worked with the team twice to define how a bug I found would be fixed and the expected behavior changed for the better. Compared to the waterfall projects I've been on in the past, this was really cool! This involvement in defining how the software SHOULD work boosted my morale. I feel good about my team despite my awareness of the expiration date of my work.
Static expected results in planned tests are often a waste of time. Expected behavior is not the "best behavior". It isn't even what the majority of our customers expect. It is our best prediction at thetime we planned the scripted test. I dream of test automation with genius validation which can learn to update it's own expected results. What if "expected behavior" improved itself and updated automatically without tester intervention? Imagine the regression test suites understanding instantly that when run on the Mac OS that they now expect About information under the Apple menu without you coding it. If your tests knew that the system was Japanese, so that string is now the equivalent glyphs instead? It would be powerful for the regression test to understand each environment and teach itself what is expected rather than a tester predicting every behavior that is and is not expected. Screenshot recapture is not needed with UI changes when the automation now expects a new button. The future of UI test automation could be more exciting and maintainable if the expected behavior was dynamic. That would also free us up to make test inputs dynamic.
Expected behavior is dynamic. Unexpected behavior is even more unpredictable. Trying to define it in advance doesn't make much sense. Test planning that defines the point of the test is more useful in practice.


Many "best practices" I hear & read tell me test automation must be tolerant of application changes. Test automation can work well as a change detector but making it detect fewer changes seems to defeat its purpose. This is the paradox: Improving maintainability and reducing false failures tends to lessen the value of the automation. Yet, balance is possible.
I find that rather than just turn bars green and red, I like to also create automation that provides people with info that supports testing. The road to automation that learns & supports seems to require giving up the dream of automatically flagging everything red or green.
Your example reminds me of automation I did for checking software that computes numbers – numbers used by millions. I read technical spec docs. I clarified ambiguity. I had questions related to precision. I had to deal with concepts of business days – which varied by location & year. I carefully designed code to implement the calculations, compare the results to those generated by the system under test, and report differences. The first time I ran my test, everything turned red – the tests failed.
I thought that either I or a developer must have made a mistake. I reported my findings. I got some clarification. I tweaked my automated check code & ran it again. This time about 25% of the checks failed. I analyzed the results. I learned the differences were due to rounding. Perhaps I still misunderstood the requirements. Perhaps I had found a major bug in the system under test. I carefully explained what I was seeing and got new clarification on the requirements. I ran my scripts again and this time a different 25% of the results failed.
This is grade school math. How can a team of smart people fail to nail down the one right answer?
Computers are bad at floating point math -- something I’d learned years before this. I remembered testing drug dosing calculators that generated different results on different platforms. I remembered having to discover what was good enough & adjust the automation for good enough instead of correct. I took the same approach this time.
We were able to define good enough. I coded for good enough. I modified the automated checks to report green, orange, and red – depending on how close the calculations matched. Instead of pass/fail results, we now have three result states. Additionally, I added reporting to help guide people analyzing the results.
The green/orange/red classification and supporting data reported by the automation helps testers analyze the results and direct further testing. Instead of testing raw data put in a database by the system under test, testers have the option of using reports generated by the automation.
Rather than purely pass/fail checks, we have checks that provide testers with information to test.
The iterative and interactive clarification of results also proved valuable in helping everyone understand the requirements and what constitutes good enough.
Ben
Reply to this
Good enough--This is really helpful! I will think about this today as I work on my validation. How much detail am I looking for? Hmm. Very good things to consider first.
Pass/Fail checks are really limited and as you point out, there are other alternatives! I'm going to try to put in an orange status and see what I can learn.
I have faith that automation is useful because some of my basic scripts are returning huge value. Before I made and ran sanity and smoke tests our builds were less reliable. I believe in validating builds every time to catch test blocking errors soon enough to keep testing rolling. Not because it is a best practice or anything like that, but because I get mad if I can't test all day and have to work the weekend instead when it could be prevented.
Thanks for the comment and the insight. I'll be looking to answer what "good enough" means for this script I'm working on now.
Reply to this
Lanette:
I'd be careful. I'm pretty sure the calculator example you started out your post with is legally copyrighted.
Just kidding! Nice post.
Reply to this
Brilliant, as usual. This is a key software behavioral issue that is extremely difficult to solve given current fundamental architecture limitations. It's one of the key types of behaviors that my OS is explicitly designed to help enable. Now, if I can just get it built and out there... (:
Reply to this