Last year I saw a picture of a distinguished looking man with gray hair and a beard. There was a caption that read: “I don't always test my code, but when I do I do it in production”. At the time I had no idea that this was a well-known image of an advertising persona called the “most interesting man in the world.” I guess I don't watch enough television—or drink enough beer.
I do remember that I was slightly amused and slightly disgusted. In my experience there have been far too many times when an application was not thoroughly tested before being promoted to production. Those applications often failed and had to be rolled back for remediation and proper testing. No real application team would test in production, right?
Ummm…Wrong
In fact we test in production all the time. And we do it intentionally. Even well-disciplined, well-managed application development teams must rely on a production release to fully vet and test their code. I am not talking about unit testing, or functionality testing, or integration testing, or system testing or even performance testing.
We still need to do all those things and if we do them correctly we won't have any bugs or code defects when we go to production. But it is impossible to validate actual performance of a complex system in a lower life cycle. The key word here is test. You can test and retest, but all you are really doing is validating the performance and functionality of the application in a testing environment.
Performance Testing
I was once involved in rolling out a large intranet. The production infrastructure was replicated server-for-server in an environment for performance testing. We knew exactly what our anticipated load would be. We knew when the load would ramp up and ramp down. We knew the transactions that would be performed, so we created test scripts and scenarios to replicate that load.
We created 25K test users to run those scripts. After six weeks of performance testing we gave up. We were never able to perform repeatable performance tests to validate our required SLA's. Some tests were fine, other identical tests were not. We tweaked the test scripts. We ran tests at different times of the day. We rewrote the test scripts. We used different machines to generate the loads. We validated that there were no code defects or bugs.
The system design was more than adequate. It was sized to manage 100 percent of peak annual load at 50 percent utilization. The decision was finally made to move the portal to production. We called it a pilot. Traffic was gradually moved to the new production environment. The production farm ran as designed at loads we were never able to sustain in the performance environment. Two identical farms produced different results under load.
Huh?
So what happened? How can two farms which are identical perform differently? The answer is obvious: The performance testing infrastructure and the production infrastructure were not really identical; they just looked that way. On paper everything matched, but in reality we had a lot of differences. Most of the servers were virtualized—so that while they had identical specifications we were never 100 percent certain that the physical hosts and storage matched.
The database servers were indistinguishable physical machines, but the storage was never validated to be identical. The load balancers were not exactly the same. The NIC's were not exactly the same. The storage virtualization fabric was not exactly that same. In fact except for the server specifications nothing really matched production.
In retrospect we now know that we can use that environment for performance testing, but only to establish a reference benchmark. We know the metrics on the production farm at 1000 transactions per second. We are able to load the test farm until those same metrics are achieved.
Testing new code in the test environment now provides something we can extrapolate to provide expected results in production. So something like 350 transactions per second equals a lode of 1000 transactions per second in production. Not the best way to test but it provides some comfort level.
Test Scripts
Even if the environments were truly identical we are still stuck with the limitations of test scripts. Test scripts are able to create load but they are never able to duplicate the kind of load that occurs when the end user is a human being. I can structure a test script in such a way that I can almost guarantee good results. I can also create scripts that will almost certainly bring the farm down in a minute.
What I can't do is replicate human generated site traffic. Maybe you have had better luck than me, but I have yet to find an algorithm that can manage load test scripts to simulate actual use. That is why I generally design websites to handle so many transactions per second. I can then create synthetic events to generate however many transactions I need to validate design.
Data
Then there is data. You simply don't have the same data in your lower life cycles that you do in production. Test data is not real data. It may look like real data but it isn't. Production data is sometimes replicated for pre-production or staging but never for testing. Scary IT campfire stories abound of the fate that befell organizations (or CIO's) that mixed production data in test systems. Too many things can go wrong to ever consider that option.
Even if you completely isolate your test life cycles so that they can never find their way past their firewalls, using production data in test is not a good idea. Software testing is often vended out or handled by contractors or offshore. You cannot move PCI or regulated data from a secure production environment to a test environment that is surrounded with less rigor, compliance and security.
Integration
Even bad software developers and teams know that they must test integration with other applications and data sources before deeming their application production ready. But successful integration with other test environments does not guarantee final success. There is so much more to rolling out good applications than writing code and integrating.
How many times have you seen a team stumble when all the unit testing and integration is done and it's time for production? Little things like developer accounts embedded in connection strings creep out of the woodwork. All those little tricks that the development team hacked together to get the application running come back to haunt them.
We all know that properly designed applications use parameters that can be defined at run time so that they can run in any supported environment. We also know that these things are (almost) always put off in the initial rush to get something working. Ideally this would have all been detected in the progression through the development, quality assurance, and performance life cycles, but none of those environments are in the same domain as production, so all bets are off. That is why you never let your development team install and configure code release through the testing environment. Dedicated build teams discover these things early on and prevent developers from hacking an application or environment to make a delivery date.
Pilot
And that is why we pilot—even if all of our prior testing was flawless; even if we have zero defects; even if we have passed all test cases with flying colors. The production world with real users and real data is a new world. A well thought out pilot process does a number of things. First it allows the user community to adapt to the new application. Too often the folks who sign off on UAT aren't the same ones that use the application day in and day out. It also provides a rigorous workout of the application.
It is not possible to posit enough test cases to cover all the edge cases that actual users will create. Those edge cases will eventually prove the value of the application. Pilot is also the first real opportunity to fully test all those integrations we thought were working in pre-production. The customer master database in pre-production had 2 million dummy records. The real customer master has 150 million actual records (some of which you strongly suspect are bogus). A complex transaction that took sub seconds in pre-production is now running for a second and a half. Pilot allows us to identify these bottlenecks.
Infrastructure
Pilot is also the first real opportunity you have to test your environment. When you first built out your production infrastructure you specified everything you needed including the IOPS for your database storage as well as the number and capacity of LUNs. Your servers were delivered with the requested number of cores and RAM.
Are you sure you really were provided what was specified? Data centers use virtualization managers for everything from server to storage. You may have had 2000 IOPS for LUN 22 on day 1. What do you have on day 90? Or day 900? Virtualization allows optimal use of all available assets, but it also allows for over utilization of available assets.
On day 1 your application may have been the only one using the physical storage. On day 90 you may now be sharing that storage with a digital asset management application. The application team needs to check what their actual throughput and response times are. Run low level tests from your servers to the storage and back.
The same rules apply to virtualized servers. All physical hosts are not created equal and all hypervisors are managed properly. Do you really have access to all eight cores 24×7? And what do those cores look like traced back to the physical device? Were you actually allocated a single processor with four hyper-threaded cores? Are you consuming half of a hyper-thread from eight different processors? These are not equal.
Application owners must benchmark machine metrics when they are confident they are being provided maximum resources from the various virtualized fabrics. And they need to repeat those benchmark tests throughout the life of the application.
So do we test in production? Of course we do. But we are looking for different defects in our production testing. You still need to do all the necessary work in lower life cycles and only promote code that has been rigorously tested and is defect free. Code that is promoted to production must be fully functional and have been proven through testing to satisfy all business requirements. We aren't expecting to find any surprises or discover hidden defects in production, but we do need this final validation that it runs just as well on busy interstates and crowded city streets as it did on the test track.
Please address comments, complaints, and suggestions to the author at [email protected].
Want to continue reading?
Become a Free PropertyCasualty360 Digital Reader
Your access to unlimited PropertyCasualty360 content isn’t changing.
Once you are an ALM digital member, you’ll receive:
- Breaking insurance news and analysis, on-site and via our newsletters and custom alerts
- Weekly Insurance Speak podcast featuring exclusive interviews with industry leaders
- Educational webcasts, white papers, and ebooks from industry thought leaders
- Critical converage of the employee benefits and financial advisory markets on our other ALM sites, BenefitsPRO and ThinkAdvisor
Already have an account? Sign In Now
© 2024 ALM Global, LLC, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to [email protected]. For more information visit Asset & Logo Licensing.