Stay ahead of cyber threats with the latest insights from our security experts.
Subscribe now to receive:
- New ThreatTank episodes as they launch
- Top trending attacks by industry
- Actionable insights & response strategies
- And, more!
An Introduction to ThreatTank – Episode 5: The Global IT Outage
Tom Gorup: Welcome to Threat Tank, a podcast covering the latest threat intelligence, threat response, and insights about the security landscape around the globe. I’m your host, Tom Gorup, vice president of security services at Edgio.
And today, we’re going to be diving into the recent high-profile IT outage, exploring what happened, what the impact was, and really what we can learn from it.
Joining me today are Richard Yew and Matt Fryer.
Welcome, Richard, Matt.
Matt Fryer: Happy to be here. Thanks for having me.
Richard Yew: Thanks for having me here again.
Tom Gorup: Yeah, again. So Matt, since you are new to Threat Tank, who are you? Introduce yourself.
Matt Fryer: Happy to! Matt Fryer, I’m the CISO architect at Fortinet.
So my background is a long road of cybersecurity from individual contributors as a project and program managers working directly in security operations center on application security all the way to mid-management all the way up to CISO, an executive for security operations at a couple of different organizations.
So, I have a kind of a background between working directly with enterprise within a corporate as well as federal government and civilian DoD level government stuff.
Tom Gorup: So awesome. That sounds like a lot of fun kind of all over the place.
Matt Fryer: It’s like a shotgun approach to security. Just do everything.
Tom Gorup: I feel like that’s a lot of people in security, especially in the last decade, right? It’s a lot of figuring things out along the way.
And Richard, how about you? I know you’ve been on before, but yeah.
Richard Yew: I was wondering if you can say Richard needs no introduction, but here I am.
Yeah, Richard Yew, I’m the VP of Product Management at Edgio.
I am in charge of the developments and growth market strategy for our security portfolio as well as, you know, applications edge compute as well as a web performance line of business.
Tom Gorup: Awesome. Well, again, welcome to Threat Tank. It’s gonna be, I think it’s gonna be a fun episode.
I think we have a whole lot to talk about, especially it had this, this event happening, you know, nearly a month ago now. So we get a little bit more information, a little bit more details on the back end.
But before we get started, I’d love to start all of our podcast with a question. And for everybody listening, they have no idea what this question is. So, are you both ready? Are you ready for this?
Matt Fryer: I was ready as one can be.
Tom Gorup: Yeah. It didn’t really matter. I was gonna be ready to go again.
So, all right, here’s a question.
You are now a superhero with the most mundane power imaginable.
What is it and how do you use it to save the day?
Matt Fryer: Mundane superpower. So, we get to define our mundane superpower. Yeah, let’s see.
What’s a good, mundane superpower? I don’t know.
Tom Gorup: Maybe you can. When you breathe, you take in, you maximize O2 intake.
Richard Yew: I don’t know, man. I mean, just being able to grow a beard like you guys. I mean, that’s superpower, man.
Tom Gorup: It is pretty mundane, right?
Matt Fryer: Like accelerate.
Richard Yew: What do you do with this thing? I can change the world, you know?
Matt Fryer: Yes, accelerated follicle growth allows me to look good and manipulate people to do the good, do the right and not the bad things—an arrow of information and disinformation. You can use the good looks to create good things throughout the world.
Richard Yew: Awesome.
Tom Gorup: Yeah, that’s part of the Neanderthal in me. You know, just the hair just grows.
Matt Fryer: It’s the Irish in me.
Richard Yew: That’s the Neanderthal. The red comes from Neanderthal, so I heard.
Matt Fryer: Its So there’s much red and more gray in it. When you get cybersecurity long enough, the color goes, you get the grays.
Tom Gorup: So Richard, yours would be beard. Do you grow a beard? You know, I’m imagining you now with a beard like mine, like planting it on you that.
Richard Yew: See, like the reason I mentioned beard is because I was thinking about hair like my most mundane superpower is to just have the hair grow the same way and stop growing at a certain length so I can just wake up, not have to worry about my hair, turn on my cameras and do whatever I gotta do, I don’t know, save 30 minutes of hair doing. Yeah, I’ll be one time.
Matt Fryer: Yeah. It’s powerful. I have like 4 calyx. You know how hard it is to deal with that.
Tom Gorup: You know, that’s pretty mundane, but still useful.
Richard Yew: Yeah. I would like to wake up every day with the same hair.
Tom Gorup: Yeah. It’s like not trying to figure out what you’re going to wear. Like, if I didn’t have to manage my hair, it’d be great. I’ve tried to shave. I shaved it during COVID, but my wife really is an inhabitant. So the struggle is real.
Richard Yew: Being able to take a shower with your clothes on and not get them wet. Right.
Matt Fryer: That’s great. Getting consolation. I told my wife I was going to shave my beard. Your beard will poke me, but I was going to shave my beard. And her response was, no one will ever trust you, so you have to keep your beard. You can’t trust me without a beard.
Tom Gorup: That’s fair.
Richard Yew: You’re hurting my feelings now.
Tom Gorup: All right, all right, so let’s jump in. So what happened?
CrowdStrike pushed out an update and it caused some problems and then when I give some more detail in the background of what happened here?
Matt Fryer: A few problems. So, they rolled out an update to their software and it did not play well with Windows specific versions, right. So, not every version, there was definitely people running versions that weren’t affected, but it affected Windows and caused what is notoriously known in the IT industry as the blue screen of death and required a manual intervention to actually remove specific file inputs that was causing a blue screen death so that you can get the actual operating system to operate normally. So, actually, rolled out an update, and broke Windows in the process is kind of the 10,000-foot view of what happened.
Richard Yew: Yeah, I was doing some last-minute homework and re-read the whole RCA yesterday. It’s funny. It’s like a simple mistake like it’s essentially you have a sensor, but there’s also a rapid response content RRC, which is essentially just a configuration. But the configurations, like when they’re expecting 21 inputs, but the configuration only has 20 inputs. So, guess what happened? You try to read that, and there’s a bit of memory issues that’s causing a crash in the kernel drivers. I mean, get into why is in the kernel in the 1st place.
Tom Gorup: So, so when I was reading the RCA, I couldn’t help but they’re like, imagine some sort of GIF of that one parameter and it’s like out there in the wild, just never made it in, you know, it’s like it’s this loss parameter, you know, 20 out of 21. Just one parameter causes global IT outages as everybody is describing it. That, to me, is spectacular. So, to your point, so this this the CrowdStrike agent is running at the kernel level, but let’s explore that a little bit. Does it need that level of access? Like what? Why does it have access, I guess, to the kernel?
Richard Yew: Do you want to bring Microsoft into this?
Tom Gorup: Yeah. I mean, you have to, right? When you’re exploring this problem, like abstractly, this is what I’ve been thinking about the last few weeks, is who’s at fault, right? We start breaking down this problem. We talk about CrowdStrike, push out an update that had 20 parameters instead of the required 21, byproduct is crash of an agent. OK, well, software crashes. It happens. Should a signature update crash software? Probably not. Should there have been more rigorous testing? Probably. But now we’re crashing operating systems on top of that, right? So now Microsoft is in the mix.
Matt Fryer: So, when you start, when that question gets asked right from a, from a strategy level for from a risk level, it’s like who’s at fault? Who’s who? Who’s really, when you narrow in on it at fault for what happened and, and the easy answer is to blame CrowdStrike. That’s the easiest answer that everyone’s going to do. It’s like, oh, well, you push the update, you broke everything. It’s the easiest answer to give. Often not the right answer. I would say from a professional opinion, from a strategist’s opinion, it’s everyone’s fault, right? So, if I’m CrowdStrike, do I push out updates late at the end of the week globally with no, you know, phased approach, just one big push that’s probably not the best process in the world. As an individual that’s running an environment, do I take no consideration and into availability of my environment, right? Do I become wholly reliant on the supply chain and create problems within my supply that allows my supply chain to create disruptions? Do I not think that through as part of my supply chain management program, but not consider that because part of that lands on me as a CISO or, or directors of security to say, hey, like you have to take into consideration that you’re going to have SolarWinds or Microsoft or CrowdStrike or someone that’s going to cause a disruption in your environment. You have to have a BCP or a method in place to overcome that. So can’t just be, oops, sorry, CrowdStrike broke it. It’s all your fault. That can’t be the answer. And from a regulatory standpoint, we have regulatory bodies that come in and dictate to organizations, to enterprises how they need to operate in order to create a level playing field, right? And we can prep for a little bit of this, but when you have regulatory bodies that are dictating level playing fields for, you know, enterprise, you’re creating holes. It’s inadvertent. It’s not malicious in any way, but you’re creating holes by nature. So, when you ask that question, why do you have access to the kernel? I think if you do a little digging, you’ll find out why, right? The easy answer is they didn’t have a choice. You had to give it to them because that was a dictate that came back from regulatory bodies.
Richard Yew: Yeah, I think it all comes down to anti-competitive laws. So, for example, part of that is that Windows has Windows Defender as if Defender is the only one that has access to kernel to be able to do install kernel drivers, then it gives Windows Depender unfair advantage, right? In order for free market competitions, right? Other security providers should have access to it. However, though like I’m going to go from a technical perspective from my technical hats here.
Tom Gorup: And it’s black too. That’s good. Yeah, the white one in his left hand…
Richard Yew: But I got a white one!
So, what happens was that the according to according to expert opinions, right. A lot of these security features requires kernel-level access to really performs their functions. And you might argue is that, hey, why is it just more on the Microsoft side? Actually, to take a step back, you know, just for everybody’s background purposes, right? You know, in computing, right, there’s usually, there’s generally two phase two, like two spaces in commute. There’s a kernel space where the hardware is interacting with the software and then there’s a user space. This is where you know, your Windows load up your programs, you can install your games on your software, your Microsoft Words and whatever and they can use that’s the user space. So then there’s always a debate, right? Hey, you should install the security services in your kernel space because it has the highest-level access or more like the lowest-level access, right?
Tom Gorup: In this case, screen zero, right?
Richard Yew: Yeah, you can see whatever if there’s a malware who’s spinning up new processes and threat or if you have a malware who’s writing, you know, malicious file into the disk, you’re able to intercept and stop that at that level. This is why, however, there’s also the con that is dangerous, right? So, as shown in this case, the crash in the driver resulted in failure to boot, which means that the end user is never going to be able to reach the user space in the first place.
Matt Fryer: You’ve sacrificed availability for the sake of a singular control mitigation strategy. So it was, hey, I need kernel access because I’ll get this deep level of inspection against the malicious attacks, right? That’s kind of the thought process from (Microsoft) Defender, not everyone else, right? I need to monitor the kernel because if something bad happens in the kernel, I need to see that so I can mitigate it. And I feel like step back and say, well, does that the right approach to it? Or is isolation and quarantine a better approach to create a level playing field with availability as well as security. So, it’s like, do you need kernel access ’cause you’re going to break stuff if you do something bad, right?
I’d rather you just monitor it and then isolate it and quarantine it if there’s a problem ’cause then I’m not breaking bad things.
Does that make sense?
Tom Gorup: It does, I think there’s a few different directions we could, you know, take to take this conversation too in that way. Like, you know, one is this particular driver that was signed by Microsoft, was written in a way that effectively extended itself into leveraging other DLLs in a sense. I don’t think they were actually system files, but they were equivalent where the drivers now executing files outside of its kind of parameters in a sense, right? Which is what caused the crash. Is that kind of how I’m understanding it?
Richard Yew: Yeah, something like that. It’s honestly there, there there’s always a debate and, and as, as you can kind of look at CrowdStrike RCA, right? They kind of nicely alluded to the fact that the later version of Windows provides more functionality to run security services on the user space. And we will continue to work diligently with Microsoft to ensure that we’re able to port more of this security functionality. So, they do recognize that running these things on the kernel space is probably no bueno, but they have to because of the environment that was provided to them, right? In order for them to operate effectively, they have to do that. However, if again, this is where we goes back to like whose fault it is and you know what is this? Is this CrowdStrikes fault for like are they demanding to only run this in the kernel space? We don’t know, right? But it do seems like there’s definitely a desire to port all of those to the user space? To be able to provide the same level of inspections and visibility whatever a malware is being written on this, for example, the processes at a higher level exces. So, like airline industries or any safety code is written in blood. I feel like in this case, every process improvement, any improvements of porting a security features from a kernel level to a user space, it’s written with outages, right?
Tom Gorup: Yeah, it’s a least privileged model. How much does it actually need? What can be done? What needs to be at kernel level? But when I think about this too, I wonder, you know, the way you were describing it earlier, Matt, was the almost felt like a security culture thing, right? We need more visibility, more access. We need to be able to see all the things, all the time. And you know, did we do this to ourselves from a like a security culture standpoint? We have to get agents on everything, like add more software.
Matt Fryer: When you get into like security operations for the most part that you talk to like the strategists 10 years ago, they wanted the logs of everything. You mean the logs of everything? There’s like this a model of send me everything possible until they realized pretty quickly that there’s no way to do all of that. You know what I mean? You’re taking all this stuff in. So, there’s this culture with insecurity. It’s like, just give me access to everything so that I can do my job. And it never you never took a step back. And the Jurassic Park reference, right, is you want access. Should you right? It’s should you have that access? Should you be doing this? It’s not that you can, it’s should you right? And we never as a culture in security, we’re starting to figure that out and we’re starting to take that step back and go, OK, do we need that access? Is there another strategy in place where we don’t need that? We can do it a different way, right? Because the other way, we’ll be a little bit more safer, create a better availability program or you know, has a better response time if we do have a problem, right? There’s a lot of different analysis that goes into that. But we’ve done it to ourselves to the point that if we need access to everything and we’re stepping on our own toes in the process of that, right? Give me access to all these things, right? Give me all the simplest analogy you can give is like, send me all the logs, right? But you missed 30 different attacks because you had 55 gigabytes or petabytes of logs that are flying into your SIM and you only had six people reviewing them.
Tom Gorup: Yeah. So target breach, right?
Matt Fryer: Yeah. You wanted everything and we gave it to you, but you couldn’t do everything with it because you didn’t have the ability to do it, right? So maybe take a step back and start understanding what you can do and then analyze that and then take the access that you need to have. So, we’re starting to see that shift a little bit, right? Whereas an industry you’ve always just been send us everything, security industry has been given me everything ’cause I need it all. And never took a step back to say you can’t do everything you need to do because you don’t have either the technology or the people to pull it off.
Tom Gorup: Yeah, one thing along those lines I’ve been thinking about is securities, you know, and it’s a shot at my own job, I guess is security’s just a patch. At the end of the day, security is just a patch for, you know, either something the human did, bad configuration, clicked on the wrong link, wrote some bad code; it’s security’s just a patch. How do we get to you know, something I’ve been thinking about is the zero-sum budget approach to security. Leveraging what’s there. First, before we start adding all these layers and adding additional software, which we’re seeing time and time again is becoming a problem.
Matt Fryer: I mean, you crash the economy and then don’t have any budget and you figure out the best way to do the best you can. What you got is not the answer. It’s a good question. I think even five years ago it was like we’re seeing security programs start gobbling up more of the IT stack, right? And in multiple, you know, past lives, I’ve seen security have their segment. It’s like, oh, well, I run the SIM and I run IDS, run IPS, run application security, IT runs the firewalls and other stuff. And when they’re finding out as you’re seeing this convergence of network and security, and I’m saying that knowing full well the oil and water that actually is, but you’re seeing this convergence of network and security coming together or IT and security coming together where you’re seeing IT just kind of gobble up more of what IT is doing because they feel like they have to secure it as they’re doing it, right? And you’re seeing that take more and more and more. And eventually you’re just going to be one group, right? You’re just going to be one platform, one organization of secure IT, right? That’s kind of the way it’s going to be if you give it long enough. But I think once you you stop saying I need an individual technology or individual person to actually solve one problem is when you’ll start moving away from that need to always have these layers of different technologies that do all these things, right? Because you’re trying to solve one problem with one technology. And we’ve been doing that for so long and the industry’s been doing it to us, right? We have vendors out there that sell one technology. That’s all they do. They’re the best of the best. Let’s go buy that, right? But you’re seeing as an output of that in the and probably see in the next five years is you’re going to start seeing organizations, security programs say, look, I’m going to stop doing that and start building platforms, right? I need something that solves 100 problems, not solves one problem. And in doing that, what you’re going to see is a reduction in layers of that security program. And at the same time, I’m not a proponent of this. I’ll set the table. There is, I get that one throat to choke kind of mentality, right? Single source of truth. You’re starting narrowing that stack down a lot to where it gets a lot easier to manage those things right? The idea of buying one piece of technology or hiring one person to solve one problem is quickly going away. You’re starting to see a lot of platforming happening or a lot of team building around multiple problems happening. So, instead of having an application team, you’re going to have a networks security team that has application security, network security, certain network services security teams, release teams, all that stuff into that one singular team. Because that cohesion of all those individuals working together reduces, you know, not just risk, but financial overhead when it comes into all that. So, there’s a lot of stuff that’s kind of happening to reduce the layers for a multitude of reasons. At 10,000 feet, solve a financial problem. It gets cheaper when you do it that way. And then the other one, the other big one is an efficiency problem, right? When you have a big platform doing one thing, it becomes more efficient to do it. When you have a bunch of people working on a bunch of different things all in one group that get more efficient.
Richard Yew: You know, like speaking of platformization, it’s really near and dear to my heart, I think that it’s strongly, you know, aligns with what you said that, you know, we find we have this term, you know, back in the days, we called it DDoS paradox. Essentially, you have an organization you’re so worried about, you know, your availability issues, right, getting DDoSed. You’re so afraid of going down. You started buying the best of breed products here and there, daisy-chained them into a train and have all of your stuff goes through that. And then, well, what you ended up happening is that you, you add latency, you add a single point of failures, you added specializations and you create a multiple bus factor of ones across the chain. And then you ended up having self-induced outages when you do that, which then goes back to the initial problems that you’re trying to solve in the 1st place, right? You know, layers of securities, you know, like they can be layers in security that do like things well, but they have to make sense. They have to be architected in a way. They have to share data. They have to be in the same control pane. They have to be managed by the team that has redundancy and managed across that what makes it a platform, right? And I really think that, you know, as we start seeing more and more the industry more and more moving towards that direction, it’s all about creating a platform so that instead of acquiring a best of breed and try to figure out how to date change and create all of this nonfunctional requirements and process just to make sure that this thing doesn’t go down. It’s all about having the right platforms and the right people and then see what is the right solution to plug in. You don’t even need that in the 1st place you know like I quote Elon Musk, they always say never optimize the part that should not exist in the 1st place remove it first.
Tom Gorup: Yeah, you should be removing first, because you bring up a kind of a good point too, though is when we again going back to this global IT outage we talked about CrowdStrike. Are they at fault Microsoft, are they at fault? And we started talking about here is like, are the architects of these solutions that, excuse me, said another way, the customers of Microsoft and CrowdStrike who deployed CrowdStrike in a live environment to do real-time updates, right. So, you had a binary with root level, kernel level access and access to the Internet able to run whatever it wants on a production system. So, when we’re thinking about a platform and we’re thinking about availability and redundancies and backups, like how does that all play? Like where do we take that? How do we contemplate that problem?
Matt Fryer: So, I think a lot of times in security, we get really like stuck on technology. Really we do. I think we spend a lot of time. We’re practitioners like that’s just where we live, right? The tool is the thing and we often, it took a while, at least in we use my career as an example, like I spent a lot of time on tools, right? I spent a lot of time either deploying them, designing them, securing them, architecting them. I spent a lot of time on tools and it took a while in my career to go, Hey, I got to take a step back from tools and understand there’s a people and process to this as well, right? So, I think when it comes down to the tools themselves, I think first we got to take a step back from the tools and say, OK, from a people in process standpoint, this problem is global IT outage. Was this a tool problem or was this a people in process problem? Or was it both? Because I think we’ve spent a lot of time focused on, especially when you look at the news and the media and what’s going on, all the people talking about what happened. They’re talking about is this CrowdStrike’s fault or is this Microsoft’s fault? Why do CrowdStrike’s technology do XY and Z? You’re seeing some people talk about it, but it’s like there’s a process problem here too, right? Like as part of your security program, you have an infosec team that kind of works on process and policies and understanding how you roll things out. Once you have an architecture and in design that makes sense and your architecture and design involves and adapts and overcomes as those processes and policies mature. So, it’s one can’t be exclusive from the other. So, when we say, you know, what are we taking away from the technology itself in this global IT outage? I think we have to start first with the process policies and people that happened in this entire thing, right? So did CrowdStrike have a process problem when it released it? Absolutely, without a doubt. There was a testing problem, there was a QA problem, there was a release problem as far as the process went on how they did it. There was an approval problem on how that thing even got approved to get released the way it got released. And then on the customer side, like you had no process on how to deal with manual outages like that. Your design and your architecture was flawed. If you had everything wholly dependent, unless you were Southwest Airlines apparently.
Tom Gorup: Well, Windows 3.1 wasn’t impacted. So, Southwest was good.
Matt Fryer: Yeah, Southwest perfectly fine Windows like here we are talking about how we need to be more mature and Southwest Airlines was setting the new bar on being immature and using the oldest software you can find and they solve more of a problem. Don’t update it all, you’ll be fine.
Tom Gorup: Well, I think also Delta had been accused of some of that as well as some of the articles I’ve been reading and the back and forth between CrowdStrike, Delta and Microsoft. Because I mean, ultimately this is sometimes I think that’s something that’s lost on us, right? In general, there are people, humans, there’s a human element to all of this, everything that we do. And at the end of the day, I was reading articles of there’s one lady, her and her husband were on vacation and their vacation got extended another seven days because of Delta cancellations, right?
Richard Yew: Sounds like a good problem,
Tom Gorup: Maybe if you could afford it. Because I also, you know, the, the response in that was something like their policy’s like 30 bucks a day. It’s like, OK, what are you gonna get for 30 bucks a day? McDonald’s cost you 30 bucks between the two of you these days, you know, but there’s a human element that I think we lose, like we lose ourselves into this like tool, tool, tool, solve these problems and then we forget like, who are we solving this for? Like what really like think about that individual that’s impacted as a byproduct of our decisions. Do we contemplate that, you know, when we when we make these sorts of moves?
Matt Fryer: We had a family friend that was in Atlanta. If you could imagine when all this went down and Atlanta, as far as those working, if they aren’t aware, Atlanta is a major hub for the airlines, right? It’s one of the biggest ones in America. So, she basically got her plane delayed, sat there for two days, still got delayed and then said, OK, well, we can’t wait. Let’s go get a rental car. Well, there were no more rental cars because everybody got a rental car. So now they’re trying to figure out all this stuff. So, a single taking it to a personal level, right? Because it can get hard when you make these decisions to say, how can this impact the people? There’s so many of them you can get lost and how universally or globally impactful this can be. Sometimes you just have to take it to an individual level and and marry it to A to your person and say, OK, well if I make this decision what impact and a half well, Susie can be stuck at the airport before days may not be able to get a rental car. She’s sleeping on the floor at the airport waiting for her flight to get rescheduled. And why did all this happen? It was because we made a decision to push an update or we made a decision to become flat in our architecture or regulatory bodies decided it was only fair that everyone has access to the kernel. Or, you know, all these decisions that inadvertently have adverse effects to the person. And do we ever stop and think as to why? And kind of the easiest example is like, hey, like having access to the everyone, having access to the kernel who does that help, right? It’s like, well, you know, we got to make it fair because Microsoft will just roll Defender out to everyone. Well, that’s not how the market works, right? Yeah, I would absolutely have Defender, but I can count on two hands and nine fingers how many people don’t like Microsoft, right? So, it’s like, you’re right, you’re leveling the playing field, but half the market would have bought CrowdStrike anyway. So, you’re saying that CrowdStrike has to have kernel access, level playing field? Well, the market’s going to buy what they want to buy, whether it has kernel access or not. I think you’re creating an artificial level playing field there to do things that you think are right. And then by all means, they probably are in most cases. But not realizing the adverse effect of doing that right, you impact by doing that. You know what I mean?
Richard Yew: You know, like of all of the conversations, you know, in the blogs and the Reddit posts and I, you know, talk about whose fault it is, what happens, like who, who is, who’s headed worse, whatever. But so far I haven’t come across an article to talk about the cost. I mean like how much it costs, how many hours per persons to go like manually put in the boot disk and you know, like fix manually update, to bring back those hoes, right? But, but it’s like, how do we even quantify the impact of people just like the late getting stranded. I mean, this is impacting hospitals, this is impacting 911 emergency services, right? With that, what are the cost to that, right? It’s like so far, I suspect it’s going to take us a long time to figure out exactly the impact. And I don’t think you can quantify that with money, right? I’m of course, I in the amount of lawsuit or class-actions, how much there’s a payout, right, that the monetary dollar assigned to everything. But I think the impact of the lifetime loss, potential life event that would be happening with potentialy unquantifiable. And at the end of the day, that’s the people that we’re impacting if we don’t do things right.
Tom Gorup: Yeah, 100%. So, I saw numbers ranging from 3.5 million to 8 million computers that were impacted by this update. And bear in mind this was in 79 minutes. So, they rolled out the update at like 4:00 AM UTC and in 79 minutes took down upwards of eight million computers, right? Spectacular event. I mean, for what?
Matt Fryer: It’s efficient.
Tom Gorup: Their rollout of new updates. Amazing they were able to touch 8 million computers in a such a short period. I think the rollout was like within a minute they were able to hit all these and then as these machines started coming online as the morning rolled up, you know, that’s where the impact. But yeah, there’s a significant cost.
So, two more questions before we close this out.
One is what do we think the impact to the security, the reputation of the security industry had on this as well. What more hesitancy do we see or have we seen in executive level agreement to bring in new security tooling in technology?
Matt Fryer: I would say not a huge one. I think you’re going to get these big red flag, you know, on go off. People always like, ah, it’s so horrible, right? You know, I read a lot and I think you’re seeing a lot of the professionals, the guys that have been in this business long enough. I say, oh, can you imagine how much this cost CrowdStrike and how bad this was for CrowdStrike, blah, blah, blah, blah, and talk about how bad this is for CrowdStrike. And then you have the small voices that are coming out saying, yeah, but how much did CrowdStrike save you, right? They’ve been around a long time. They’ve had one incident in how many years, right? I mean, they’ve saved a lot of companies from a lot of malware, ransomware, a lot of breaches. A lot of stuff has been saved because of CrowdStrike. So oftentimes we get lost but forgetting about that, right? So, a lot of these technologies again, you’re only as good as your last breach, but you get a lot of these strategists who are very calm in their nature. A lot of these executives don’t like severe knee-jerk reactions to their environments. I don’t know a lot of CISOs that are just like, hey, CrowdStrike got breached, let’s immediately rip it out, right? I think if there is going to be any adverse effect.
Tom Gorup: I think Elon did. I’m pretty sure I thought I saw a tweet that maybe Elon did.
Matt Fryer: So, I like Elon, he just, he has billions. He can do that. It’s very expensive to make that change. It really is. It’s very expensive to say I’m going to rip out a technology because they had it. Now, the supply chain management side of your security program just got a new thing to check for. I think you’re going to see changes and you know, risk and transference of risk and highlighting of, you know, some of the things that your supply chain can cause you. So you’re going to see probably some executives take a step back in their new contracts as things renew and kind of roll out from their supply chain to say, hey, we need to have some assurances from each one of these organizations that you’re buying from that when these events, these global events, when these things that keep happening, we need to have some assurances within these contracts that the risk, you know, I think you’ll see a lot of that. I don’t think it does damage us whenever we spend our entire security people are only as good as last breach. So, it’s like you’re going to see this kind of make the cycle and we’ll spend a year or two recovering from it, but I think they’re just going to have both qualitative and quantitative effect to it overall there.
Richard Yew: There’s a bit of that, you know, for many years, right? We’ve always been trying to at security. We used to be the organization of no, right? So, now we are trying to portray it and work well with the business security, when done well, we always say security is like a good strong break on a supercar, right? It allows you to accelerate fast and move fast because you can brake hard, right? But now obviously there’s going to be some trust issues, right? Because it doesn’t help with the, we’ll always say, hey, we’re not here to slow down a business. We’re not here to add complexities and issues to the business. But obviously there’s going to be a bit of trust issues that hey, you said that, but then look at what your tool did to me. Well, it’s not like, I don’t think anyone’s going to change that like of the needs for the security anymore. But it does not help with the stories that the perceptions that people still have that, hey, this guy’s is just trying to make my life hard. So of course, there’s still works to do to make to try to convince our non-security peers that, hey, we’re not here to slow your process down. We’re not here to say no to what you’re doing. We’re here to try to help you run faster. We’re trying to like we’re trying to implement DEVSECOPS and have secure CI/CD pipeline because we want you should code faster day one. You don’t have to slap on new security stuff like after the fact, right? But of course, this kind of incident sets us back a little bit. But I do think that industries do recognize that, you know, the fact that security is not the one that the intentions not to slow down the business even though crap happens from time to time.
Tom Gorup: That’s great. Yeah.
Matt Fryer: The best analogy, the best saying you can give to that. And it kind of sparked as you were saying it. And Tom is going to smile. I know he’s going to smile. Is that there’s a saying that I’ve always used, especially in the operation side and to kind of help create some ease in what we do. Slow is smooth and smooth is fast.
Tom Gorup: That’s right. Smooth is fast.
Matt Fryer: That’s the easiest way. I’m going to slow you down, but you’re going to move a lot faster, right?
Tom Gorup: Yeah, it’s a great segue into, I’d say, the last question here. And I think this is where the rubber meets the road.
Richard, what can we learn from this? What can we do better? What which we think about like any of it like what? What can we learn from this?
Richard Yew: Well, what have we learned is that security is everybody’s business availability, not time. It’s everybody’s problem. Just because a particular vendors makes you go down does not means that other people don’t bear the responsibility. It goes back to like who the finger should be pointed to, right? I think like I said, it would be holistic. Think about your process, people, process, technologies. It all goes hand in hand. Like why does Microsoft needs to do this? Should we promote this things through user space eventually, like this CrowdStrike needs to have a better process. Do they need to implement Canary rolled out, which should be a standard when you when you touch millions of agents, right? So, does corporations need to implement the right process? To ensure that you account for this kind of apocalyptic event. So, to me, it’s that you can’t look at things in a single dimension. Things have just had to be viewed as listing if there’s any call to actions needs to be a joint call to actions between your supplier and between yourself. It shouldn’t be Microsoft just work in the vacuums. Microsoft should work with CrowdStrike and their customers together and make sure that they create resiliency going forward.
Tom Gorup: Love it! I think that’s a big takeaways, you know, working together more. We often see vendors working in these silos and not being introduced to each other and that’s a really good take away.
How about you, Matt?
Matt Fryer: Yeah, I think it’s kind of along those lines, right. I think the big call to action is not to step away from security. It’s kind of a step to lean harder into it. I think that the take away I when I see these things happen isn’t like, oh man, we really got to, you know, go least privilege and really start, you know, cutting back on what these people are doing, blah, blah, blah. Like that’s not my reaction. My reaction is, OK, well, we have an availability issue and let’s lean into that, right? It’s to lean a little bit further into maybe a different segment of what security is and solve the problems. I think that’s kind of the call to action is take a step back and understand what you’re doing with your program, right? Is this something that caused you a huge problem? Whether your step back should be what? One of my processes and policies around the other pieces of this triangle to make sure that when things like this happen, I have a better response to it. I can’t tell you how many 3:00 AM phone calls I have gotten over my career because someone’s BCP was trash, right? And then they blame everyone else. It’s like, oh, it was so and so’s fault. Ah, the help desk clicked on the link. God, man, I can’t believe that. Well, you’re down for two weeks. Well, yeah, it’s their fault for clicking the link, but how is it their fault it took you two weeks to recover from it, right? So, I think the call to action for a lot of these organizations, for us as leaders, for us, for the vendors themselves, is to kind of just take a step back and analyze how you’re doing things in security and adjust those for those things that happen. There’s always going to be things that happen. Everyone’s worried about black-hat. Black-hat is the thing. It’s the boogeyman behind the curtain right now. It’s supply chain out. There’s black-hat, there’s now a supply chain is the other big boogeyman. All my vendors are just trying to sell my data or they’re just trying to crash. Take a step back, man. Your call to action is to go. We should all be doing annual reviews of our program. This is the call to action. Do a review of your program. Find out where your gaps are because you have them. Whether you believe it or not, you have gaps. Take a step back and review it. Take a look at it, understand it, look at your contracts. Become best friends with legal that’s there’s a lot of things that are going to happen here. I don’t think running away from it. It’s going to be anyone’s solution. I’m not a panic button guy. I think Tom knows that I don’t ever slam on panic buttons. So, people that are hitting the panic button going, man, fire CrowdStrike. I think it’s probably just responding from emotions to the analysis right? Jerich, who’s a CISO I think for Waste Management, I read his stuff on LinkedIn, and I was like, he’s 100% right. CrowdStrike solved a lot of problems and no one’s acknowledging that right now. They’re just saying CrowdStrike, you know, bad guy, bad-bad guy. It’s like, no, it’s analyzed.
Tom Gorup:
Yep. I think as Richard mentioned earlier, security is everybody’s problem and every part of the organ, whether it’s an IT, it’s an engineering, it’s the security team themselves, HR and finance, like everyone needs to be part of taking in that and really taking a hard, hard look.
Well, we’re at the mark is awesome heavy ball. I think it was a great conversation. I could probably carry this on for at least another 45 minutes, but I know you both have places to be.
So, I appreciate you being on the show, Matt. Once again, thanks, Richard.
Thanks for listening. Stay Frosty.