Podcast: Beyond the Edge Episode 10 – Bot Management
In this episode of “Beyond the Edge,” Kenneth Thomas hosts an insightful discussion on Bot Management, exploring the advanced strategies and technologies businesses need to protect their online presence from malicious automated threats. Kenneth, who leads threat intelligence at Edgio, is joined by Danny Benefield, a seasoned Solutions Engineer with over a decade of experience in security and content delivery, and Faraz Waseem, Senior Data Science Manager and architect behind Edgio’s premier bot management solutions. Together, they delve into the complexities of distinguishing between good and bad bots, the innovations in machine learning models, and the best practices for implementing a multi-layered security approach. From understanding bot behavior to leveraging data-driven insights, the team shares practical advice and cutting-edge developments to help organizations defend against bot attacks, protect sensitive data, and optimize user experience in the ever-evolving digital landscape.
Kenneth Thomas: Welcome to Beyond the Edge Episode 10, where today’s topic is Bot management for Premier Bot Manager.
I’m Kenneth Thomas and I function as threat intelligence here at Edgio.
I’m welcomed and joined by Danny and Faraz.
Welcome, gentlemen.
Tell us a bit before we get into the call, just give us a bit a brief rundown of both of your backgrounds within Edgio, how do you function within the company and just what do you do?
Danny Benefield: I am part of the solutions engineering team. I work with the EMEA team over the pond. So essentially my function is to engage with the presales stage with customers.
We scope out what their technical workflow is and how we can help them with their security, their content delivery, their performance and all those good things. And I’ve been with the company for nearly eleven years coming in January. I joined originally as Edgecast. So, I’ve seen the original WAF and all those products come to life. And yeah, we’ve come a long way since then. So there we go.
Faraz Waseem: Hi, I’m Faraz Waseem. I am a senior data science manager. I am leading the data science team, and we have built bot management solution and we are working on the advanced algorithm behind bots and we are working on the next generation platform of bots. And like we have built this product from the ground when we joined, we have heuristic-based solution and in last two years we have built this product and now it is being used by a lot of our customers and it is protecting, it is providing protection against different kind of attacks.
Kenneth Thomas: Very good, very good. Thank you, gentlemen.
So, Danny, you’ve been with us for a while, tell us, from your perspective what are bots? What’s going on there?
Danny Benefield: So bots are typically automated scripts or programs, so they’re designed to perform repetitive tasks much better than what a human can. So the idea is if a bad actor is trying to penetrate an application layer or a customer’s website, then invoking a bot to perform tasks is a way more efficient method than using humans to do so.
Traditionally, bots have been split into two categories. So we have this concept of both good bots and bad bots, although the situation is a bit more grey sometimes and I think we’ll talk about that a little bit later. But with the good bot side, there are things like search engine optimization crawlers. So making sure that your website is showing up at the top of Google search results and the Bing results, social network bots, chatbots and, most the types of bots that you would typically want to access your application.
On the flip side of that, you have bad bots, right? So these are things like scrapers, credential stuff as e-commerce clients see a lot of problems with scrapers, especially if they’ve got say a hot new product. If people may remember things like PlayStation 5, when that first came out, it was being scraped by bots and then resold on eBay. So, there are all sorts of different malicious types of bots out there. Some perform things like DDoS attacks. So there’s a whole breadth of different categorizations. But typically you don’t want those ones accessing your application.
Kenneth Thomas: Understood. This is why having a product that performs bot management for you is so crucial because if you’re exposed to the raw bear Internet, you’re in constant contact with a fair amount of bots. It’s been believed even that up to half the traffic on certain sites like say Twitter and just doing searches are related by a bots, like they’re bots executing these things on behalf of a person to whatever end.
And having a bot management solution in place is quintessential in a modern the Internet.
Danny Benefield: Yeah, absolutely! I’ve worked with customers for many years who don’t quite understand the breath of the bot problem. Sometimes you’ll even find when a solution is put in place, the data that gets collected as part of having that solution is quite astounding, and the customer doesn’t even realize they have that many bots on there on their applications.
Actually come across a funny situation once where we were asked to proof a concept in a live environment and we put a bot management solution in place for this particular customer and the application. The solution must have been running for say about a week or so and we got a call from someone in their marketing team saying, ‘Hey, my KPIs are hits to the site and all of our hits are down like half what they were before’. Because what was going on was we were blocking half the traffic because it ended up just being automated traffic. And they end up saying, well, can you put it back because I’m not going to get my bonus. So it was like this really weird scenario where they, you know, a lot of it’s unwanted traffic, right? But for those guys they saw a huge impact and they had no idea of that. But having a bot management solution, as you say, is crucial. You want to be able to manage both good bots and bad bots. I’ve kind of mentioned before that you’ve got this kind of grey area. We help customers host lots of different environments on there for their applications where it’s dev or staging and their production.
We saw in one of our quarterly attack trends reports recently that some of our customers are now using the bot to protect things like their dev environments, because things like robots.txt traditionally should be preventing access to certain bots. Even some good bots are bypassing those types of controls. So having the ability to see and manage what kind of bot activities on your site, good or bad is, is huge. And then when we’re looking at the sort of the bad bot side of things, the impact of malicious bots is, is huge, right? So I’ve already mentioned scrapers, they could take up all of your stock on new products. So even though you know you’re still getting the sales, your public image of your brand can be diminished, right? Because you’re just seen as a site that gives all their stock away to scrapers and bots, and it could prevent genuine sales in the future. But then having protection against things like credential stuffing and DDoS mitigation is huge because you have a duty to your customers to protect all of their credentials and things.
If you have bots going in trying lots of different combinations of usernames and passwords, then you do not want that. And as we say, DDoS attacks are still common attack vectors these days. On the application, there can be devastating because a lot of that interaction can look human. So, being able to decipher whether it’s a genuine human request or a bot is huge to your overall security posture as well.
Kenneth Thomas: Absolutely. Thank you for that.
Faraz, we haven’t heard from you on the bad bot management side. Talk to us a little bit about just how does bot management work?
Faraz Waseem: Yes, I think it is important to understand that bot management is a multi-layer approach. So, when we are thinking of bots, we have different layers of security. Our first layer is that we look at the user-agent and we see that who are the good bots and who are like who are claiming to be a good bots are the good bots. So, for example, someone can be claiming that they are a Google bot. So what we do is we do a reverse lookup to see that, OK, this IP is actually coming from Google. That is our first layer that we have. We identify who are the good bots and who are a spoofer who are trying to pretend to be a good bot. Then we have another layer of security which are custom rules. So what happens is that many time a customer can say that based on this IP or based on this request header, I don’t want them to allow to my website. Then we have a machine learning layer which identifies the bad bot. And the important thing is we look for both signature. We are analyzing billions of event every hour across customers, and based on that, we use IP addresses, we use JA4, we use user-agent and we create signatures. Based on that, we analyze different behaviors. We analyze how many requests they are making, what is the frequency of requests, and what is the proportion of text versus images.
For example, bots will likely to download less images as compared to a human beings. If I’m a human being, and I’m browsing site, I will be hitting a lot of images. But if a bot is hitting a website they are interested in let’s say price or something. So we look for that behavior. We look for how many time someone is hitting a specific like API again and again. Let’s say if we see that someone is hitting account management, account creation API or login API, that can be a signal that there’s something fishy about it. On the land of machine learning, what we do that we have a layer of models. So we don’t have a single model. We have a model which just looks at the statistical behavior. We have models that use more classic machine learning like XD boost, logistic regression. Then we also have model using deep learning. In the future, we are also looking into large language models and how we can use large language models to enhance our bot solution.
Based on this multi-layer approach of security and based on an array of different models, we secure our customers, and these models are retrained every 30 minutes based on the global trend. So if there’s a new pattern, we can detect it and then we can update our fingerprints and we can safeguard our customers.
Kenneth Thomas: Very good. Thank you. Thank you for that. Danny, what are some of the common, tools and platforms that are out there currently for bot management?
Danny Benefield: So there’s a bit of a mixture really. We work with customers who use like on-premise solutions and cloud solutions. A lot of our customers are using a multi-pronged approach to bot, right? Because as we’ve mentioned before, as previously stated already is that bot is a multi-layer problem, right? So, bot can be using a multitude of different attack vectors, whether it’s launching overwhelming number of requests or if they’re doing signature-based patterns.
Even a traditional WAF can still help with protecting against bots from a signature-based model. But then some of the solutions that are out there, things like Cloudflare is one of our competitors, obviously our own solutions homegrown that we’ve built as well. And there’s other players out there like F5 who have done both on-premise and they can deploy to the cloud as well. We’re seeing solutions that look at the user behavior on the application itself. So anything down to things like mouse movement across the screen, is that indicative of a normal human behavior, things like rotation of devices and getting into other sort of biometrics as well into the actual device level solutions.
There are a lot of different solutions out there, and one of the things that I would say is that when I’m talking to customers, it can be quite overwhelming when they’re piece together all of these solutions into a single offering or trying to manage a whole security stack. So having controls that are centralized is usually quite important to our customers. They know that when they make a change, they can deploy it very quickly across different applications. And if they need to make a change on their WAF, if it’s not a change in one portal, then they’ve got to go to another portal to do some rate-limiting protection and another portal to do that API security. What we’re finding is customers want that in one place in an easy to manage single pane of glass.
So yeah, there are lots of solutions out there, but it’s not a simple problem either. It’s not a simple fix. So, I understand why there’s lots of different people skin in the game.
Kenneth Thomas: I think the change management piece that you mentioned is crucial. And to that end, what are some of the best practices for managing these bots out here, whether it be from a single plane of glass, a single product, or if they have to implement changes and change control across different type of organization that would require that. What are some of the different best practices that you’ve seen for managing bots?
Danny Benefield: Well, kind of going back to one of my previous points the visibility factor is absolutely crucial to any of it, right? So, the biggest thing is a lot of customers we speak to just don’t have the data available to them to make any sort of decisions about how to manage their bot workflow. And as I’ve mentioned before, the single pane of glass really helps with that because if you’ve got all of your analytics being pumped into a single source and you can see that interaction that’s going on your application in real-time, that puts you in the best position to make changes and manage that workflow much better.
One of the things that we do suggest to our customers as well is that the way they’re managing their bots needs to be a laid approach as well. Our solution allows customers to set different types of actions based on the type of certainty of whether there is a bot or not trying to access the application. There’s a scoring mechanism from zero to 100, 100 being it’s absolutely a bot, 0 being it’s absolutely human. And then you can set the controls between those levels. One of the great things about doing this as well and having different mechanisms in place means that you’re not impacting end user experience as much. So, one of the things that I find really frustrating when I’m on a website and I end up having to do lots of different bot manager captures and things like that when I’m quite obviously human, right? I’m not going in and accessing lots of different links really quickly, and my behavior is indicative of a normal human. Not overburdening our end users with different things like only serving a CAPTCHA, for example, when you’re sure that that’s a bot.
So for instance, if you scored them at like 80%, then serve a CAPTCHA to do that extra layer of validation rather than just serving it to everyone, right? I don’t know about you, but my mum is pushing 70 and when she wants to go and buy something online and she’s told to click all the buses and things, she just goes away from the site and they’ve lost the customer at that point. So trying to detect and have that data and set the appropriate controls and action types is really, really important.
Kenneth Thomas: Very good. Thank you. That kind of leads us to our next segment here and talking about the innovation space within bot management. Faraz, why don’t you speak that a bit for us? Like what are some of the innovations that we can expect to see coming down the pipeline within the realm of managing these bots, both good and bad?
Faraz Waseem: You asked a very interesting question. As we know, bot management is an adversarial problem, and the bot makers are advancing every day. In this space, the innovation is really important that how you keep up to date. On our end, when we think about innovation the one thing we do is that the base of any machine learning solution is good features. We spend a lot of time in researching into features, understanding what is going on in the industry and how we can create new features.
For example, we have features which we have created from some statistics about data like how much data someone is consuming, what is their click rate, what is the frequency of their request, what is the size of their URL. In addition to that, we also look into some kind of natural language processing pattern. We look for some kind of keywords, some kind of attack factors present in request and we also use as a signal. One of the innovation we are doing in our space is explainability. Now we are talking about transparent bot detection. It’s not just important to detect a bot; it’s why we think that this is a bot. That is the area we are working on, and we are giving different explanations as to why we detected this bot and what the intention is. Now in near future we will be telling that this is actually a login bot who is trying to create a login and someone is trying to do create fake account.
Also, if there’s a sneaker bot, we should be able to detect it and we should be give explanation that why we think it’s a sneaker bot. And when we are thinking of an explanation, there are two aspects of it. When we can give some explanation from our algorithm, we can also bump up the score and bump up client confidence to block those requests. Because more explanation give more confidence, and more confidence means that they will feel free to block all those requests for which there is a good explanation, using advanced algorithms. So for explanation we are using deep learning, but we are also looking into making use of large language model. The idea is that if we can generate a good explanation, we will increase more and more score. And in addition to that one of the innovation we are doing that now we are using JA4. So that is part of our premier bot manager.
Previously we were using IP and user-agent, but they’re easy to spoof. JA4 is harder to spoof the way the signatures are created because they have three parts A, B and C and it is difficult to spoof all different parts of JA4. That is a new thing. We have actually already deployed it as part of our premier bot manager. In addition, we are looking into custom models. So the concept of the custom model is that now we have capabilities of the custom model because so in addition to global model for our premier customer, we are offering them, premier custom model.
For custom model we only analyze their data. So we have the collective intelligence we are gathering for all over our customer base. But we also look specifically into our customer data. We look into their patterns, we know their key URL, we know what is the typical usage of their API endpoints, their website for a typical user. And based on that typical user, we can understand that who is not a typical user. If someone is trying to hack into their account system, if someone is trying to hack do a login or someone is doing like sculpting, we can actually we should be able to detect it much better in custom models. So that is the new direction we are working as a company. In addition to the explanation part, the custom model and JA4 is already deployed on the edge and custom model is also available right now. We have different features, but we will increase more and more features as we go in future.
Danny Benefield: I love the explanation thing actually fits in so nicely from the customer perspective, because one of the things that when I’m talking to customers is whether they’re using our solution or they’re using another, another third party solution. Sometimes there’s a little bit of a trust me, right? This is a bot, right? So, they come to me and say, well, why have you scored it at 85% assurity that it’s a bot, right? So having more details as to why that was pinged as a bot in the analytics. Also puts more confidence in our product as well. And or if, if you that was on another product, you know, you can say you’re not just trusting me that this was a bot, right? You’re trusting the data that’s behind it, which I really, really like. And I think that’s going to resonate really well with our customers.
Kenneth Thomas: Thank you very much for that Faraz and Danny.
Just to recap a bit of what’s been going on here and what’s been said.
It’s very important that our customers have a layered and in-depth defense strategy when it comes to mitigating bots, be they good or bad. As an aside, a good bot typically will self-identify. The publisher will publish the user-agent for that bot and they will publish what IP space the bot is expected to come from. But with that said, this is why we have our product in place so that we can determine when we have scenarios. If there’s somebody claiming to be Googlebot and they’re not Google, for example, or OpenAI and they’re not OpenAI.
Part of having a bot management system in place is that you update it regularly and that you ensure the traffic that is being received is what’s being expected. One additional call-out as well is just a verification of any traffic that may be received by the site. Even if you do have let’s say a robots.txt in place.
As Danny had indicated earlier, many site operators are finding out now that even with our standard controls, that we have had on the web for decades now, you even have good bots that are potentially abusing those. And so, this is where once again, having a bot management solution in place is a requirement for the modern-day website operator.
With that, I’m going to close it off and I thank you for joining us on today’s podcast.