The CrowdStrike outage was a vivid reminder of how interconnected the world’s systems are – and how dramatically every organisation can be affected. In the aftermath of the outage, Bla Sweeney at Keysight Technologies shares how every C-suite needs to be asking: how do we mitigate the risk of systemic software failure in our organisation?
Four suggestions that provide food for thought as we pursue the quest of building resilience into our systems, and minimising risks at both the systemic and organisational levels.
Before answering, let’s look at how and why we became so reliant on third party software.
The original software development lifecycle
Not too many years ago, the software development lifecycle (SDLC) took months, if not years.
Software was installed on-premise. We would only deploy it after extensive and exhaustive testing. Each time we wanted to upgrade the software to a newer, stable release, we’d go through the same process again. Some organizations did this each year. Many more waited several years because the investment in time and money was too much to consider on an annual basis.
The process was incredibly costly and inefficient. On the other hand, we were in control of our own destinies when something went wrong.
What changed?
Every company is a tech company
Well, everything changed.
In the past, an organisation might have had a single ERP system.
Today, multiple software tools underpin every business. We have the core IT we depend on to “keep the lights on”. We also have the IT that each individual department or business unit uses – manufacturing apps, product design tools, customer support portals and so on, each one likely talking to the others.
As the saying goes, every company is a tech company now.
The CIO is still expected to have oversight of all the tools in use. The size of the task has altered out of all recognition. Yet, at the same time, their task each year is to achieve all three of better, faster, and cheaper.
In today’s world, time and resource-intensive ways of working are no longer viable. It simply isn’t feasible to have the same monolithic update processes we used when we only relied on a handful of systems.
SaaS helps achieve better, faster, and cheaper
The software-as-a-service (SaaS) model provided the answer that was needed, allowing us to outsource software maintenance and updates to third parties.
It has helped CIOs achieve the seemingly impossible and deliver the holy grail of better, faster, and cheaper. By 2022, a typical organisation used 130 SaaS applications.
And because updates are rolled out monthly, weekly, or even daily, organisations are always harnessing the best the technology has to offer.
On the other hand, when something goes wrong, we can no longer fix it by going down to the basement and “turning it off and on again”.
When there’s a bug, the IT team is dependent on the third party to find it, fix it, and roll out a revised update. When an upgrade causes conflicts with other systems, IT teams are forced to be reactive, developing a solution that will resolve the clash.
When a tool used by a large percentage of the world goes wrong, chaos ensues, as we saw recently.
The SaaS model undoubtedly brings huge benefits. The modern world of business couldn’t and wouldn’t exist without it. But organisations are no longer as proactive or in control as they would like to be.
So, what’s the answer?
Resilience is the answer
We need to focus on resilience in IT and put in place the processes that allow us to take back control. Here are four ways to think about it.
1. Fail forward
Firstly, it’s a fact of life that in a world where we’re all dependent on software there will always be bugs. The only variable is how serious they are.
The task, therefore, is to understand what our options are if something fails.
If you’re a company that releases software, what’s your testing strategy? What’s your rollout strategy? How do you revert to a previous stable release if something goes wrong?
If you’re a company that relies on software, what are your options when something fails? What’s your fallback position?
2. Have someone play devil’s advocate
The dangers of group-think are well-documented. In any decision-making process, including software investment decisions, make sure there’s someone playing devil’s advocate. Why do we need this software? What are the alternatives? What due diligence have we done on the provider?
3. Prioritise transparency
The CrowdStrike outage showed very clearly that organisations rely on systems that rely on other systems that rely on other systems. Entire infrastructures depend on modems in data centres that people rarely, if ever, visit.
We need to demand a new level of openness and transparency that allows us to look “under the hood” rather than trusting our providers to look after it.
As part of this, we should remember that cheaper rarely means better. We must be confident that a lower price doesn’t mean lower standards.
4. Introduce quality resilience engineering
Finally, there’s scope for an entirely new role, one that’s tasked with engineering quality into our systems and developing the back-up plan for when things go wrong.
On a day-to-day basis, engineers focusing on quality resilience, are using digital twins and tools such as Eggplant Monitoring, Eggplant Test to stay on top of their testing. At a strategic level, they’re looking beyond software development lifecycle and IT operations management to the bigger picture of an organisation and its systems. Their role is to put organisations back in control.
Not old days or new days, just different days
Our modern tech-enabled world depends on huge numbers of software systems. We can’t go back to the “old days” where we had complete control – nor would we want to. But we do have to think about how we can build resilience into our systems and minimise the risks at both a systemic level and an organisational level.
Tags: byline, commentary, interviews, Keysight, opinion, software, Tech Focus, technology
This entry was posted on Saturday, August 31st, 2024 at 12:00 pm and is filed under Brief, Business IT, Keysight, Software, Tech Focus, Trends. You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.
Tech Focus: Mitigating the Risk of Systemic Failure
The CrowdStrike outage was a vivid reminder of how interconnected the world’s systems are – and how dramatically every organisation can be affected. In the aftermath of the outage, Bla Sweeney at Keysight Technologies shares how every C-suite needs to be asking: how do we mitigate the risk of systemic software failure in our organisation?
Four suggestions that provide food for thought as we pursue the quest of building resilience into our systems, and minimising risks at both the systemic and organisational levels.
Before answering, let’s look at how and why we became so reliant on third party software.
The original software development lifecycle
Not too many years ago, the software development lifecycle (SDLC) took months, if not years.
Software was installed on-premise. We would only deploy it after extensive and exhaustive testing. Each time we wanted to upgrade the software to a newer, stable release, we’d go through the same process again. Some organizations did this each year. Many more waited several years because the investment in time and money was too much to consider on an annual basis.
The process was incredibly costly and inefficient. On the other hand, we were in control of our own destinies when something went wrong.
What changed?
Every company is a tech company
Well, everything changed.
In the past, an organisation might have had a single ERP system.
About the Author: Bla Sweeney is product marketing manager at Keysight Technologies.
Today, multiple software tools underpin every business. We have the core IT we depend on to “keep the lights on”. We also have the IT that each individual department or business unit uses – manufacturing apps, product design tools, customer support portals and so on, each one likely talking to the others.
As the saying goes, every company is a tech company now.
The CIO is still expected to have oversight of all the tools in use. The size of the task has altered out of all recognition. Yet, at the same time, their task each year is to achieve all three of better, faster, and cheaper.
In today’s world, time and resource-intensive ways of working are no longer viable. It simply isn’t feasible to have the same monolithic update processes we used when we only relied on a handful of systems.
SaaS helps achieve better, faster, and cheaper
The software-as-a-service (SaaS) model provided the answer that was needed, allowing us to outsource software maintenance and updates to third parties.
It has helped CIOs achieve the seemingly impossible and deliver the holy grail of better, faster, and cheaper. By 2022, a typical organisation used 130 SaaS applications.
And because updates are rolled out monthly, weekly, or even daily, organisations are always harnessing the best the technology has to offer.
On the other hand, when something goes wrong, we can no longer fix it by going down to the basement and “turning it off and on again”.
When there’s a bug, the IT team is dependent on the third party to find it, fix it, and roll out a revised update. When an upgrade causes conflicts with other systems, IT teams are forced to be reactive, developing a solution that will resolve the clash.
When a tool used by a large percentage of the world goes wrong, chaos ensues, as we saw recently.
The SaaS model undoubtedly brings huge benefits. The modern world of business couldn’t and wouldn’t exist without it. But organisations are no longer as proactive or in control as they would like to be.
So, what’s the answer?
Resilience is the answer
We need to focus on resilience in IT and put in place the processes that allow us to take back control. Here are four ways to think about it.
1. Fail forward
Firstly, it’s a fact of life that in a world where we’re all dependent on software there will always be bugs. The only variable is how serious they are.
The task, therefore, is to understand what our options are if something fails.
If you’re a company that releases software, what’s your testing strategy? What’s your rollout strategy? How do you revert to a previous stable release if something goes wrong?
If you’re a company that relies on software, what are your options when something fails? What’s your fallback position?
2. Have someone play devil’s advocate
The dangers of group-think are well-documented. In any decision-making process, including software investment decisions, make sure there’s someone playing devil’s advocate. Why do we need this software? What are the alternatives? What due diligence have we done on the provider?
3. Prioritise transparency
The CrowdStrike outage showed very clearly that organisations rely on systems that rely on other systems that rely on other systems. Entire infrastructures depend on modems in data centres that people rarely, if ever, visit.
We need to demand a new level of openness and transparency that allows us to look “under the hood” rather than trusting our providers to look after it.
As part of this, we should remember that cheaper rarely means better. We must be confident that a lower price doesn’t mean lower standards.
4. Introduce quality resilience engineering
Finally, there’s scope for an entirely new role, one that’s tasked with engineering quality into our systems and developing the back-up plan for when things go wrong.
On a day-to-day basis, engineers focusing on quality resilience, are using digital twins and tools such as Eggplant Monitoring, Eggplant Test to stay on top of their testing. At a strategic level, they’re looking beyond software development lifecycle and IT operations management to the bigger picture of an organisation and its systems. Their role is to put organisations back in control.
Not old days or new days, just different days
Our modern tech-enabled world depends on huge numbers of software systems. We can’t go back to the “old days” where we had complete control – nor would we want to. But we do have to think about how we can build resilience into our systems and minimise the risks at both a systemic level and an organisational level.
Tags: byline, commentary, interviews, Keysight, opinion, software, Tech Focus, technology
This entry was posted on Saturday, August 31st, 2024 at 12:00 pm and is filed under Brief, Business IT, Keysight, Software, Tech Focus, Trends. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.