Home Podcast ThreatTank – Episode 6 – Q2 2024 Quarterly Attack Trends
Applications

ThreatTank – Episode 6 – Q2 2024 Quarterly Attack Trends

About The Author

Outline

Stay ahead of cyber threats with the latest insights from our security experts.

 

Subscribe now to receive:

An Introduction to ThreatTank – Episode 6: Q2 2024 Quarterly attack Trends

Tom Gorup: Welcome to Threat Tank, a podcast covering the latest threat intelligence, threat response, and insights about the security landscape around the globe.
I’m your host, Tom Gorup, Vice President, Security Services at Edgio, and today, we’ll be doing a deep dive into Edgio’s quarterly attack trends report.
There’s a ton to cover. Before we get started, let’s introduce today’s guests.
So, joining me today are Andrew Johnson and Kenneth Thomas.
Welcome, Andrew, Kenneth.

Andrew Johnson: Hey, thanks, Tom. It’s great to be here.

Tom Gorup: So, before we get started and we got a lot to cover here, but before we get started, let’s get a little bit of intro. So, Andrew, tell me about yourself. Tell the folks about who is Andrew Johnson.

Andrew Johnson: Oh, dang, of course. Well, yeah, Andrew Johnson. I lead product marketing here at Edgio for our security solutions covering WAF, Bot, DDoS, API security, Attack Service Management, and all that good stuff.
I have about 8 years in the cybersecurity industry in product marketing and product management roles across web application security, and I originally came into the industry in endpoint security.

Tom Gorup: Awesome. Well, welcome to podcasts, and I’m excited to dig into this.
So, Kenneth, how about you? Who’s Kenneth Thomas?

Kenneth Thomas: Hey Tom, Andrew, thanks for having me here. I’m brand new to the organization. I’m here working in threat intelligence and primarily functioning, doing the day-to-day task of finding trends and surfacing those so that customers can action upon them.
I have recently made a pivot into both artificial intelligence and cyber within the last decade or so. And so coming from a systems background, it’s been a bit of a drinking, from a fire hose, but that’s kind of how I like it.

Tom Gorup: So yeah, indeed, and the amount of data having to pour through for this report was a fire hose in and of itself. So, I’m happy you’re both here. This is going to be a lot of fun.
But before we start, there’s an obligatory icebreaker question. And again, I’ve for everybody else listening, I’ve not told them what it is because it’s a lot more fun to get the answer kind of ‘off the cuff’.
So, here’s the question. You guys ready?
If animals could talk, what species would be the rudest of them all? So, if animals could talk, which one would be the rudest of them all?

Kenneth Thomas: The mockingbird.

Tom Gorup: The mockingbird? Is it because they’re mocking people? Like that’s their natural state?

Kenneth Thomas: No, they’re very territorial. I only know this because I see them routinely in my backyard, but I’ve absolutely seen a mockingbird attacking a squirrel and go into, you know, epic battle. I’ve also had a mockingbird fly at my head too, right before somebody called and I told him like, that’s the head for ringtone, you know, that’s myself.

Tom Gorup: I had a bird attack me one time too. It was like a baby bird that fell out of it’s a nest and got stuck in the tree and I was trying to get the baby bird out, but I think these were sparrows. They were like dive bombing in on me. I’m like, man, I’m just trying to help, but I couldn’t help because these birds were literally attacking me when I was trying to save their baby. It was a sad day.

Andrew Johnson: Aw, man, you’re giving me a bad memory too. I was attacked by an Egyptian goose out here. We have these geese out in California on the golf course and they’re extremely nasty and territorial, and literally I had my bag on my back and it came at me from behind and I had to dive. I must have came too close to a nest or something. So, I have a little PTSD about that. But anyway, what animal I think the rudest, I’m not a big fan of poodles, I feel like they’re the snootiest of dogs. Not sure why people get them. Or maybe Chihuahuas, we have a lot of those around here. I’m not super big fans, even though I’m a dog person. So yeah, I don’t want to know what they’re thinking.

Tom Gorup: Yeah, Cats. Super rude. They don’t even need to speak and they’re rude.

Andrew Johnson: Yeah. That was my number two on the list.

Tom Gorup: Good stuff. All right there, we can jump into the topic at hand. It’s not very cyber-related, but hopefully a little bit of an ice breaker.
So, we’re looking at this attack trends report, and one theme that really stood out I think was human-led versus machine-led interactions with your websites and your web applications.
So, I don’t know, Andrew, tell me a little bit about that. Like what do we, what do we kind of see there? What kind of highlights did you glean from that?

Andrew Johnson: Yeah, I mean, I think in this report, one of the most surprising things was the number of blocks of open AI by our customers, nearly 3000% quarter over quarter.
And I think that really, signals a shift in the way companies are thinking about, artificial intelligence and their applications and the data and information that lives on their applications.
This number doesn’t mean that open AI was doing, 3000 times more scraping or requests to our customers sites, but rather that most likely like web administrators are thinking a lot about AI and data or, information the value prop of the information on their site. So, that was one of the more interesting things I think.

Tom Gorup: Yeah, that is interesting because what we’re seeing there that sort of trend is people taking their data more seriously. So, AI bots and bots in general are constantly scraping the Internet looking to gain information. But typically, it’s, I guess, more for your good bots like your Googlebot and your Bingbot and that are helping with your SEO, making your site more highly available. But with AI, we’re looking at potentially training models which then are being monetized. So, in a sense, your data is being monetized and maybe some don’t like that. So, I guess Kenneth, AI scraping your site is that a bad thing? Like is that something we should be blocking or what do you think?

Kenneth Thomas: It really depends on what you’re doing. And I know that’s kind of a painful term for folks outside of technology hearing that, it depends on whatever, but it really does. And the reason being is that, for example, if I am doing a website and I want to be discovered, then I want AI to know about my services. Because in short, AI can scrape through my site and even potentially I can have my site tooled so that it’s more easily digested or understood by AI. But then on the other side, if I’m a private individual or private group of entities and I operate, as such, I probably would not want to have my private data as part of a public monetized data set. To Andrew’s point, these companies most definitely are reconsidering the value prop on what the data that they have publicly facing is now saying about them and even how it can be leveraged from a training and a mud usage standpoint. So, you know, it really depends on the site, the operator, and even to an extent their appetite for risk.

Tom Gorup: Yeah, it’s interesting. So like, I guess to that end, are we seeing a shift in way people surf the web? Like is that changing?

Kenneth Thomas: I think so, just as a caveat on the AI side, within, let’s say OpenAI is a ChatGPT platform, you do have the ability to use it almost as a concierge for the Internet. So, while you’re sitting there having your conversation, something may pop up where, oh well, it’s not within the data set of the AI, but it can make a request to the web for you and then retrieve that data. And as such, it turns the web into a much richer or fuller experience for these folks. But on the other hand, the sites that they are getting this data from will need to operate within that same parameter. They’ll need to know that their sites are potentially being scraped and used in this way. And so again, it goes to the whole ethos of, understanding really your application, your customer and then ultimately the value prop of the data that you have available for general consumption.

Andrew Johnson: Totally. And to add to that, I don’t know about you guys, but I’d say the way I search is very different today in product marketing. We do a lot of research, I use Google, of course, just like anyone else’s search even today. So, it is usually the first thing under the results. Now, it’s called Search Labs AI Overview. And it’s kind of an LLM-powered snippet that gives you an overview and then the links to underlying websites and things. I also use Copilot, through Microsoft Bing as well. So yeah, that’s basically the way I search today.

Tom Gorup: I feel like that’s the future that we’re probably heading toward is AI. I need a new refrigerator. The refrigerator is going bad. So, first of all, what’s wrong with it? Maybe AI can help you solve that problem. If not, I just want a new fridge. And here are my requirements, right? And then AI is going to populate, ‘Hey, here are the top five refrigerators that you might want to purchase and one’s available at the nearest Best Buy.
You want me to order it for you, right? I could see the entire flow being taken over by your AI bot of choice. So, how does that change the way businesses should be looking at their site or assessing their application of those use cases?

Kenneth Thomas: One thing that that comes to mind immediately is, and this may not be, let’s say common knowledge, but for those of us who are familiar with let’s say Google search, we’re absolutely we know that the search has evolved over time. But one thing that may not be a cognizant for everyone is that roughly around 2018 or so, Google search begin to incorporate what are known as bidirectional encoders for representing transformers.
I know that’s kind of a jumble of words, but.

Tom Gorup: There has to be an acronym in there somewhere.

Kenneth Thomas: The acronym is Bert exactly. And it’s funny because there used to be Elmo. I forget what Elmo stood for, but Bert is effectively an encoder representation of transformers, meaning that the one of it is to encode data as well as decode. And by doing so, you can do the same sort of comparing sentences, saying which ones have a similar structure, or to say, show me words that match this sentence type. Or even perhaps doing like a sentiment analysis to say is this sentence good or bad? But the whole point is that this technology of Bert is wrapped into every search that you do on Google, whether they’re powering it by an LLM or not. And so it’s one of these things where, even well before the sort of current artificial intelligence revolution, it’s happened a few times. But before this current one, many of the search companies, Bing as well, have tapped into this idea of, ‘Oh, well, maybe we would need to, we would best serve our search customers by encoding their data differently and in a better, more representational way’. And so it’s very interesting to see all of this coming full circle where people are now having to consider how do I make sure that the sites are discoverable the way that I want them to be?

Tom Gorup: It’s almost like a machine-readable version of your site versus the human-readable. It’s a lot of wasted content when you think about a human interacting with a website than a machine, right? The machine doesn’t care about your graphics or how good your logo looks, but the human does, right? So, knowing that the future really resides in how users are interacting with AI to purchase or become aware of your product is a pretty big deal, right?

Andrew Johnson: I think it is and will be especially in marketing. I’m still trying to wrap my head around it, like I see these articles all the time like SEO is dead. And I need to wrap my head around this quick because I think this is it’s here.

Tom Gorup: It’s a great question. So, do you think, Andrew, there’s going to be like an AI SEO something?

Andrew Johnson: I’m sure there is. I couldn’t even tell you what it looks like, but I believe so, yeah.

Tom Gorup: Yeah, because discovering a new brand could be simplified or amplified a little bit with an AI or like what are the top five brands in particular product with, you know, maybe at least more niche delivery. But like, how do you know, how do we interact with AI in the future? Is it going to become a pay-to-play type of scenario where you’re going to have to pay Claude to rise to the top of their proverbial search engine or AI search engine or how’s that going to play out? Any thoughts on that? Because it’s an interesting, maybe fun topic to explore.

Andrew Johnson: It seems logical that in marketing, we bid on Google Ads. Google’s gonna incorporate AI more and more into their search too. So, presumably, that’s the way it’s gonna go.

Tom Gorup: I mean, it changes the whole dynamic too when we think about click-through rates, there’s probably a lot of things that in marketing you measure around the value of a website or the effectiveness of a website. What happens when you’re no longer able to measure the human’s interaction with your site, but instead there’s an AI component, right? There’s a machine aspect. So how do you validate that your site is as effective as it was before if humans aren’t surfing to it.

Kenneth Thomas: You’re essentially making sales because if I’m getting paid by a person’s agent or themselves, it is kind of immaterial to me. So long as the payment went through, they got their product and they’re satisfied. And to that end, I think that we very easily could see the rise of, let’s say, the digital influencer. What I mean by that is like we already have people who do influencing from a marketing standpoint. However, we haven’t necessarily seen the marriage of, let’s say, these semi-autonomous systems and then artificial intelligence kind of combined together in a way that is advocating for you because you can go in there and say I’m looking for product X or I’m looking for a service Y. But it’s you who is saying that, and so there are so many different ways that this could play out where, in reality, we’re still very much pecked to a legacy model of advertising and getting effectively monetizing attention. And I don’t see that sort of ethos going away anytime soon. But in terms of how attention is leveraged and even how you can get to it, I do see that there’s a lot of room for improvement and opportunity within the market to build new ways to get to customers to tell about product and all these different things. Ultimately, that’s one of our biggest hurdles within technology. And it’s like it’s meaningful for you.

Tom Gorup: Yeah, it’s interesting. Go ahead, Andrew.

Andrew Johnson: Yeah, I think one of the first things Kenneth mentioned, are you getting sales? That’s one measurement. Like if you create content out there and you see lift, you’re going to be able to measure it. But I think the key is to measure and to be able to figure out which part is from humans coming directly to your site which is going through an AI-powered search engine. OK, so to Kenneth’s point about monetizing or looking at sales lift you can do that. Obviously, you have to, for any campaign or web page you create. I think people will be looking more closely. There’s solutions out there that distinguish bots versus human traffic today, and those are actually powered by AI too, doing different ways of fingerprinting clients to figure out if they are AI scrapers or humans. So you can categorize and drill into who’s looking at your web page, maybe make some inferences on, and there may be ways to tie purchases to those clients.

Tom Gorup: That’s a good point. I mean, where you’re making the purchases from. Anthropic starts leveraging a certain marketplace or method by which they can make purchases, it’s these hooks that this is how do I let my AI talk to your AI in a sense, right? How do we connect the dots? But at the end of the day, I think to your point, though, Andrew is like there where I feel I was heading to is this is a cost to businesses for these AI bots or any bot scraping your site. And we have to temper that a little bit because it could get out of control if there’s thousands of bots out there pouring through your site over and over again. So, how can we control some of that? What are some more tactical things that a business comply with?

Andrew Johnson: Yeah, definitely. First and foremost, this is an old technology, but the robots.txt file is one thing that web administrators should think about at least with the larger AI start-ups, OpenAI, Anthropic and others, they actually publish their IP ranges or user agent information at least. And you could choose to block those if you don’t want them to see your site or scrape your site or allow users to discover you through their search. You can set rate limits so that they can’t do it often and exhaust your resources or cost you resources and bandwidth. Not all companies are gonna follow that, but I think the larger ones are saying that they will, and those larger ones are the ones that have the resources to make many requests too. So, there’s a cost on their side.

Tom Gorup: I recall there was like a crawl. There was a field that you guys had discovered while we were digging into this report. It’s like a crawl value.

Andrew Johnson: Crawl-delay.

Tom Gorup: And robots.txt file. I’ve not heard of that before.

Andrew Johnson: Yeah, that was new to us when we were researching this. But yeah, I think even Anthropic mentions they would respect that.

Tom Gorup: So, you have the technical controls of a rate limiting solution to be more intentional with it. And then there’s the polite request by way of the robots.exe file like, ‘hey, can you limit your crawl rate?’ That’s cool.

Andrew Johnson: And also bot management solutions are designed to detect automated traffic and automated clients. So yeah, you can set those up. You can, set up mitigations like Captchas. I remember Anthropic also said they would respect by not trying to solve or bypass Captchas. That’s a consideration as well.

Tom Gorup: Any other tactical things you think businesses could do to control this a little bit? Kenneth?

Kenneth Thomas: One idea, I don’t know if we had spoke to it earlier, but one idea would be to have specific portions of the site where they are intended for the AI to land, let’s say. And in kind of a not necessarily legacy, but the way that is a best practice is the site map. You know that XML file which is intended to indicate the mapping, of your site and the resources they’re in. But on the same hand, this kind of shows where there’s once again room for improvement and ways to implement it better. So in effect, imagine that you have a safe harbor portion of your website for AI bots or different LLMs that are out there searching. And that way, they’re not necessarily having to download all of the graphics, all of the styling, etcetera. They just have the context of the site in terms of the text and it’s almost turns the website into a text based service for them. But nonetheless, all these things of course, have to be invented and then agreed upon, before being successfully implemented.

Tom Gorup: That’s a good point. So, I’m gathering here and kind of reassessing your site and thinking about the different types of interactions that your site might experience. There are human-led interactions where your site needs to show good user experience. And then there’s the machine driven aspect where it effectively needs to be easily parsed, right? That data needs to be highly available and easily parsed. And ideally, it doesn’t cost you a whole lot to serve, right? So, here are standards out there that can be picked up on, but I think your point, Kenneth, is that they’re not widely agreed upon right now. Right?

Kenneth Thomas: Just something that immediately came to mind is that this is almost forcing providers, like website providers to have a programmatic mindset. Whereas, like, let’s say I run a site and the site has an API attached to it, I’m already thinking of these things. I already going through the necessary stuff to figure out. OK, who’s connecting here? How do they get to our services? What are we telling them and what is available when they connect? And that’s kind of like the cornerstone of making sure that your API is not only available, but that it’s secure. That people can’t just come in and make a request and then get all the data, let’s say. And conversely, the same will have to be done on these sites. And let’s say I just simply run, I don’t know, a Shopify-powered fashion boutique-type site. Now, I have to get more into a programmer’s mindset where it’s almost like my website has to turn into an API of sorts, perhaps not for the purpose of conducting sales directly, but for the listing of information and making sure that it is available. As we will want it to be, or as this desire to be for the LLMs and all the different things that might be consuming that data.

Tom Gorup: Yeah, it’s interesting as you’re talking; I was thinking about like this battle between respecting the user agent in a sense. And that’s respect on both ends of the spectrum. Bot creators identify themselves as such by way of their user agent, but also web administrators or developers, creating a method to use that user agent to direct them in a more machine-readable outcome, right? That’s kind of feels like the kind of world that we’re heading into to kind of respect the user agent might be a future hashtag or something you can jump on. So, any other things come to mind with the machine-led, human-led AI world. I know we definitely dove into this topic.

Andrew Johnson: I saw an article the other day. It’s related to the information on your website and where it goes. And I just want to get your guys’ thoughts. I know we haven’t spoken about this, but these large AI companies need training data, right? And the perfect place to get a lot of free training data would be your website or just the Internet. And they’re concerned that these AI companies, they put this training data in the cloud. And often times it’s not going to be even in your geography or region. So, your information could actually leave your region. How big of a problem is that? To me, it’s like, OK, what you put on your website, you’re not going to put too much PII. I’m trying to think of use cases where this could actually harm.

Tom Gorup: That’s a great thought. Like what was, oh, the recent background check provider who leaked, what was it hundreds of millions or millions, we’ll say millions because I don’t have the numbers right in front of me of Social Security Numbers and identities. In essence. As a matter of fact, I got an e-mail or, excuse me, a letter from Home Depot that I was declined, my home Home Depot credit card, because they couldn’t identify me. I was like, well, yeah, because it wasn’t me. So obviously, my Social Security Number is being attempted to be leveraged by stuff out Home Depot. But to that end, I think it was discovered and Kenneth, correct me if I’m wrong, that the password or the credentials to access the database were available on the website in plain text. Oh yeah, these mistakes happen all the time, right? And not necessarily the database was available in plain, but that could have been scraped as well, right by an AI bot and that bots now in memory has a bunch of Social Security Numbers, you know, maybe waiting to be plucked out. I don’t know. What do you think? How could something like that be used?

Kenneth Thomas: No, this is a great point here, Andrew, and one quite frankly, that I don’t think it’s really been properly kind of fleshed through and thought through. But the whole idea is that often organizations, if they take a lax approach to security or securifying their enterprise, that an even a determined attacker, regardless of skill, could find those secrets and then leverage those secrets to their own ends. And well, I don’t believe that from like, let’s say an AI standpoint, that’s the concern. In other words, I don’t believe that we would have to worry, at least at this point, about AI finding a login password combination and then trying that out, even though it absolutely could. I think more so the concern is when humans find that, and often as organizations or projects increase in size, the focus on security often doesn’t scale at the same rate. And because of that, you can have these gaps in security, gaps in processing that would lead to a larger exploit or larger compromise of security. So, I don’t have a good answer for the question, but it’s definitely a consideration, especially if you have, let’s say, localized data protection laws. For example, the CCPA does give certain protections to Californian citizens that wouldn’t, let’s say, be extended to us in Texas, you know, kind of thing. And so, to that end, if you’re a data provider or broker in California, operating in California, there are certain protections that have to be in place in order for you to do those sort of things. So, I think, once again, we have so much development and things ahead of us to build up this economy, build up this future of software. But at the same time, many of the standards and best practices that we draw from are really from the early days of the web like the mid to late 90s. So, it’s just interesting how all of these different forces are combining and we’re having to think about this in real time as well.

Andrew Johnson: That helps a lot guys, I was thinking about it from a perfectly functioning website point of view. And Tom, you were helping me think about more of like a misconfiguration or lax security point of view and how that could be a problem. But also, when you’re talking about different privacy laws, I would think that requests from these AI search engines or AI scrapers, they can exit from your local region from a node, right? They could come from servers in your region, but they’re going to take that data. It’s not necessarily going to stay in your region. It usually goes to a more centralized cloud that’s probably out of your region.

Tom Gorup: It’s a great question. Great topic to explore. We are running about time is a lot more in this report. We were talking about one topic and the report is DDoS spidering, we’re seeing attackers traverse various endpoints within the application to find the weakest one and so some controls and recommendations inside the report. There we have the top five weaknesses that’ll land you in the news. We’re leveraging AI to parse news to glean more threat intelligence and pick out those not only the top CVE’s, but also the weaknesses that we identified in the news. And a little bit of a spoiler, some of the models that we ran against the projections for CVE growth this year. This could be the biggest year in CVE’s identified that we’ve seen since 2017, which isn’t necessarily a bad thing, but I think there’s a whole lot more to kind of pluck out of the report. So, with that being said, there’s still more to sprinkle in there. Do you guys have any closing thoughts around AI, DDoS Spidering, weaknesses? Anything around the report?

Andrew Johnson: I thought the report was pretty interesting where people today hear about CVE’s everyday, but how to prevent that in your code? I think the report does a good job of actually highlighting the weakness, the underlying weaknesses itself. I mean, there are a lot of the same culprits from, over the years, but really focusing in on building more secure software. I think there’s plenty of ideas on that in the report.

Tom Gorup: Yeah, that’s a great point. I keep joking and often times security people get upset with me when I say it, but it’s meant to be abrasive in that security is just a patch. It’s a patch for a misconfiguration. It’s a patch for poor coding. It’s a patch for frankly, ignorance, right? Somebody clicking on something that they don’t understand, right? Security’s a patch. And the closer we can get further, the further down we can get to the root of the problem. To your point, poorly written software and it’s not intentional, right? Nobody’s doing it on purpose. It’s typically speed to market or you’re just not knowing that that’s a problem. The more we can get down to the root, the better off we’ll be.

Andrew Johnson: Exactly. There’s a lot of developers that don’t have a secure coding background. Probably most developers have had it.

Tom Gorup: So yeah, even as a security person, I’m confident I’ve written insecure code at some point in time. It happens, right? And a lot of it has to do just ignorance, you know, awareness. Did I know that that could happen?
Kenneth any closing thoughts around the report as a whole or AI whichever?

Kenneth Thomas: Sure. I had real fun time going through and doing this report and in particular finding out about, some of the things like the crawl delay. I was mentioning earlier, a lot of our standards come from back in the day, they come from the 90s and whatnot. Robots.txt is definitely one of those. But the crawl delay does indicate that even though it’s kind of like a legacy standard people are still innovating within it. And there’s also players on the other side of the market that are respecting that innovation. So all that to say, the report has a bunch of actionable intelligence within it, where as a site operator, you can immediately leverage the intel to make a decision about what you need to do relative to this new traffic and this these patterns that are being seen. But that said, AI is most definitely here to stay. I’d absolutely encourage everybody to, follow whatever interests them within it. But in particular, be on the lookout for new developments and new protocols relative to AI and the web. Because I do think that we still have quite a ways to go in the development of the technology. And in particular, even something you had posted the other day, Tom, the Semantic Web, which is one of my sort of, you know, I want to see the Semantic Web really take a foothold, but that’s just me.

Tom Gorup: Yeah, that’s great. I think you’re bringing up a great point that we don’t necessarily always need to reinvent the wheel either, right? We can leverage the technology that’s available already today. And even the Semantic Web, like it’s been around for a while. Maybe we can leverage it to springboard ourselves into any crazy future, especially when you contemplate AI and what that looks like in the next 5-10 or even 50 years, you know the world is going to change.
But I appreciate you both. This has been an outstanding conversation. Again, I point everybody to go check out the Q2 attack trends report. It’s a whole lot of value in that, and these are some of the rock stars who contributed to it.

So, until next time, stay frosty.