By Anil Kollipara, vice president of Product Management, Spirent Communications
How many large enterprises have suffered a significant outage due to a bad software update or security issue? You might be better off asking how many haven’t. In the past year alone, outages in major financial networks have shut down trading platforms, knocked out payment services, even locked customers out of their bank accounts. In the most severe cases, like the CrowdStrike outage, thousands of businesses were affected, causing hundreds of billions in direct financial losses.
The truth is, as the networks that the global economy depends on grow more dynamic and complex, software conflicts and security vulnerabilities become more likely. Indeed, modern networks have so many complicated software interdependencies, even seemingly minor problems can cause outages that impact millions of users. These issues have become so serious, they’ve spurred new regulations like the E.U.’s Digital Operational Resilience Act (DORA), which mandates rigorous resiliency standards and proactive testing for key industries like banking and insurance.
Fortunately, modern testing methodologies can help organizations fix most potential issues before they impact customers. By automating security and resilience testing, you can get ahead of the risk, meet evolving compliance requirements, and minimize disruptions for your customers and your business.
Evolving Threats Demand New Strategies
Growing regulatory focus on the resilience of digital infrastructure shouldn’t come as a surprise. It was likely inevitable given the long-running trends affecting modern businesses. Start with the fact that as organizations come to rely on digital tools for more of their day-to-day operations, they become much more vulnerable to outages. As networks evolve from physical to virtual to cloud-native, the infrastructures enabling those tools also keep growing more complex.
Modern networks encompass multiple vendors and APIs and third-party components, with constant updates released for every part of the stack. Different stacks are also increasingly nested, creating scenarios where problems in one vendor’s software can quickly spread. (The faulty CrowdStrike update, for example, didn’t just disrupt CrowdStrike software, it brought down the Windows OS on 8.5 million devices.) From a cybersecurity perspective, this complexity creates potential security gaps that adversaries could exploit, necessitating even more ongoing updates and patches. Each update represents a change in the network—and a new opportunity for something to break.
Now, governments and regulators are taking action to create a more stable and resilient foundation for critical digital services. DORA, which went into effect this January, requires financial institutions, cloud providers, and others to perform ongoing operational resilience testing—including proactively testing security mechanisms—or face fines up to 2 percent of total revenues.
In this environment, organizations can’t assume that any change is safe to promote in the network, even when it comes from a trusted vendor. Just as businesses carry ultimate responsibility for security breaches that affect their customers, they must start treating operational resilience the same way. It doesn’t matter which vendor in the stack issued a faulty update. If you’re the party providing digital services to customers, it’s now your responsibility to keep that infrastructure online.
Rethinking Testing
For enterprises moving to address these requirements, the only option is to thoroughly test and validate everything. Yet this change is not always easy, especially for organizations that still use manual testing processes designed for yesterday’s vertically integrated infrastructures. Those legacy approaches can’t keep up with modern software-driven, cloud-native environments, where millions of test cases may be needed to fully cover the network. In too many cases, current testing approaches are also:
- Arbitrary, with human beings deciding when and what to test, potentially missing important issues
- Siloed, with different teams (Security, Engineering, Operations) focusing only on their specific part of the stack
- Incomplete, with validation often limited to basic functional testing (“Is this new node live?”), without investigating impact under peak loads, non-optimal conditions, and “rainy day” scenarios
In some cases, the more automated an organization’s DevOps processes are—the more advanced their continuous integration/continuous delivery (CI/CD) implementation—the more challenging these issues become. If organizations aren’t careful, they can over-optimize for speed of deployment at the expense of thorough testing and service resilience.
Proactive Assurance Drives Efficiency and Compliance
The only way to comply with new operational resilience mandates—and avoid the risks and costs of failures—is to perform more exhaustive, proactive testing. This testing should be:
- Comprehensive, encompassing all network elements, software upgrades, and the attack surface
- In-depth, extending beyond basic functional testing to investigate how changes and security threats could impact availability and quality
- Proactive, using synthetic traffic and emulation of real-world scenarios to test under peak load and other stress conditions, and continually validating security and resiliency mechanisms
- Flexible, using virtualized, federated lab and testing equipment that can be accessed from anywhere, improving capital efficiency as testing expands
- Automated, as legacy manual approaches simply can’t execute the millions of test cases needed to protect modern business infrastructure and customers
The following figure illustrates what a more modern and effective approach to continuous testing, automation, and operational resilience looks like.
It starts with an Infrastructure Access abstraction layer, which allows continuous testing tools to access all servers, routers, switches, and security appliances in the environment. Next, Operational Resilience Testing Methodologies provide diverse test case libraries to measure operational resiliency, including under peak loads and failure conditions. With Lab Automation, virtualized lab equipment now functions as a flexible, automated resource pool. So if a new security patch comes in, for example, you can spin up a testbed for it—with the right topologies and configurations for all test cases—in minutes. Finally, with Test Automation, testbeds can execute testing across all categories of resilience and automatically route artefacts to the proper team if any test fails.
The result is a more comprehensive and automated testing environment. Now, you can avoid most customer-impacting issues because you’re exhaustively testing every change and proactively validating security defenses. You can innovate more quickly, with greater peace of mind, because you’ve automated end-to-end testing and verification within your CI/CD/CT framework. You’ve shortened the time spent testing new patches and software releases from months to days, achieving significant operational efficiencies. You’ve simplified compliance, with documentation at every stage of testing. Best of all, you can consistently deliver more reliable, higher-quality services to customers.




