As the dust settles on the cyber-incident caused by CrowdStrike releasing a corrupted update, many businesses will, or should, conduct a thorough post-mortem on how the incident affected their business and what could be done differently going forward.
For most critical infrastructure and large organizations, their tried-and-tested cyber-resilience plan undoubtedly will have been kicked into action. However, the incident, dubbed “the largest IT outage in history”, was likely something that no organization, however large and cyber-framework compliant, could have prepared for. It felt like an “Armageddon moment”, as evidenced by disruptions at major airports on Friday.
A company may prepare for their own systems, or for some key partner systems, to be unavailable. However, when an incident is so widespread that, for example, it affects air traffic control, government transport departments, transport providers, and, even the restaurants in the airport through to TV companies that could warn passengers of the issue, preparedness is likely to be limited to your own systems. Fortunately, incidents on this scale are rare.
What the incident on Friday does demonstrate is that only a small percentage of devices need to be taken offline to cause a major global incident. Microsoft confirmed that 8.5 million devices were affected – a conservative estimate would put this between 0.5-0.75% of the total PC devices.
This small percentage, though, are the devices that need to be kept secure and always operation, they are in critical services, which is why the companies that operate them deploy security updates and patches as they become available. Failure to do so could result in severe consequences and prompt cyber-incident experts to question the organization’s reasoning and competence in managing cybersecurity risks.
Importance of cyber-resilience plans
A detailed and encompassing cyber-resilience plan can help get your business back up and running quickly. Still, in exceptional circumstances like this, it may not mean your business becomes operational due to others that your business relies on not being as prepared or quick to deploy necessary resources. No company can anticipate all scenarios and completely eliminate the risk of business operational disruption.
That said, it’s important that ALL businesses adopt a cyber-resilience plan, and on occasion test the plan to ensure it performs as expected. The plan can even be tested alongside direct business partners, but testing on the scale of ‘CrowdStrike Fridays’ incident is likely to be impractical. In past blogs I have detailed the core elements of cyber-resilience to provide some advice: here are two links that may provide you some assistance – #ShieldsUp and these guidelines to help small businesses enhance their preparedness.
The most important message after the incident last Friday is not to skip the post-mortem or put the incident down to exceptional circumstances. Reviewing an incident, and learning from it, will improve your ability to deal with future incidents. This review should also consider the issue of reliance on just a few vendors, the pitfalls of a monoculture technology environment, and the benefits of implementing diversity in technology to reduce risk.
All eggs in one basket
There are several reasons why companies select single vendors. One is, of course, cost-effectiveness, the others are likely to be a single-pane-of-glass approach and efforts to avoid multiple management platforms and incompatibility between similar, side-by-side solutions. It may be time for companies to examine how tested co-existence with their competitors and diversified product selection could lower risk and benefit customers. This could even take the form of an industry requirement, or a standard.
The post-mortem should also be conducted by those not affected by ‘CrowdStrike Friday’. You have seen the devastation that can be caused by an exceptional cyber-incident, and while it did not affect you this time, you may not be as lucky next time. So, take the learnings of others from this incident to improve your own cyber resilience posture.
Lastly, one way to avoid such an incident is not to run tech that is so old that it can’t be affected by such an incident. Over the weekend, someone highlighted to me an article about Southwest Airlines not being affected, reportedly due to the fact they use Windows 3.1 and Windows 95, which, in the case of Windows 3.1 has not been updated for more than 20 years. I am not sure there are any anti-malware products that still support and protect this archaic technology. This old tech strategy might not give me the confidence needed to fly Southwest anytime soon. Old tech is not the answer, and it’s not a viable cyber-resilience plan – it’s a disaster waiting to happen.