AI and Software Testing with the Tech League

This podcast episode is a Podcast team up - a chat with The Tech League podcast Toby Sears and Krisztian Fischer. This time Alan was the interview subject and the topic was AI in Software Testing and Development.

AI and Software Testing

Krisztian Fischer & Toby Sears are the hosts of The Tech League podcast

Summary

AI might make testing more important, but success depends on changing the entire development approach: work in small chunks, architect well, and guide AI with both requirements and tests.
Testing and programming increasingly merge: encode requirements as code (tests) close to the implementation; focus human review on test adequacy and architecture quality.
Classic TDD cycles map poorly to AI; better results come from architecture-first guidance, then generating code and tests, iterating with coverage checks and tweaks.
For UI/E2E, insist on solid abstractions (page objects/components, interfaces) rather than low-level, locator-heavy tests; avoid ‘self-healing’ tests that mask real issues and increase maintenance.
Treat AI like a team of fast juniors: specify principles and patterns (e.g., DDD, interfaces, small classes, linting, page objects), provide examples in the codebase, and micromanage structure where needed.
Prioritize high-value, stable coverage at interfaces/APIs and contracts; keep volatile areas flexible to enable rapid iteration, adding stronger automated coverage once components stabilize.
AI’s exploratory/adversarial testing has promise but gaps remain: limited memory, biased/low-variance inputs, poor randomness, and unclear coverage; tools need better history/coverage reporting.
Security is a standout AI use case: LLMs help generate exploits from CVEs and support red teaming; still pair with expert-built scanners and verify findings with concrete proofs of exploitability.
Think ‘requirements’ over ’tests’: encode what must hold in code, use interfaces to migrate/refactor safely, and validate parity; be wary of accidentally cementing unintended behavior.
Industry trend favors generalists/seniors who can orchestrate AI; risk of under-training juniors and homogenized output—human creativity (e.g., design) and deep review remain essential.

Notable Quotes

Alan Richardson (00:01:51): “AI changes your entire development approach, which then naturally changes your test approach because your test approach should adapt to whatever you’re doing.”
Krisztian Fischer (00:02:51): “it’s so easy to generate code that testing has become absolutely crucial because just the sheer number of lines that is generated and how do you know that it does what you prompted or you wanted to do, is aligned with your intention, but also Who writes the tests?”
Alan Richardson (00:03:55): “when I see that it’s struggling to create tests, it’s very often because it’s struggled to architect the code effectively.”
Alan Richardson (00:04:56): “if you architect as your first review step and get that done well, the test code that it generates from that is usually much better and is a lot easier to expand.”
Alan Richardson (00:07:20): “I’m not convinced that TDD works particularly well with AI.”
Alan Richardson (00:13:21): “The abstractions are really important, not just in the application code, but still in the code that we’re using to execute it at whatever level.”
Alan Richardson (00:15:47): “Yeah, for me, the concept of self-healing tests is horrible. There’s no such thing.”
Alan Richardson (00:18:04): “We test to get information about stuff, which is new.”
Alan Richardson (00:22:20): “I don’t think it’s very random at all. It’s highly probabilistic. And path-based, which is very different from random.”
Alan Richardson (00:24:39): “So you need to know the buzzwords for effective automation. Like we are going to use page objects. We are not going to test the page objects. We are going to use the page objects to test the applications.”
Alan Richardson (00:29:40): “I think it’s much better to prompt the AI with, here’s the outcome I want, here are the kind of things I want you to do, and here are the principles I want you to use, rather than say, all right, you are now an experienced test consultant”
Alan Richardson (00:41:33): “we have to also optimise for changing quickly, not just looking for regression coverage, because what we’ve got now may not even look like that in a week’s time.”
Alan Richardson (00:44:04): “You cover your APIs, you cover your interfaces, that stuff you don’t want to break because it’s fundamental.”

Transcript

Alan Richardson:

And welcome to another episode of the Evil Tester Show. This episode is slightly different because it’s a recording of another podcast. This is the Tech League podcast that I was a guest on and Toby Sears and Krisztián Fisher both host that podcast and they’re Both really experienced DevOps and senior tech professionals. So it was a good conversation to have. We had fun and hopefully you will too. The main topic of conversation is AI and how do we test it. So with that, I will pass over to Toby from the Tech Lead Podcast who’s going to interview me.

Toby Sears:

Hello and welcome to the Tech League Podcast. I’m Toby Sears, your host as usual, and I’m joined with my fellow co host, Krisztián Fisher.

Krisztián Fischer:

Hello.

Toby Sears:

This is a bit more of a special episode. We have our first proper guest here. We have a fellow podcaster, YouTuber, author of multiple six books, 10 training courses, and a senior software developer consultant with 30 years of experience. You can find him@evilTest.com we have Alan Richardson.

Krisztián Fischer:

Hello.

Toby Sears:

Okay, so Alan, great to have you here. I really want to like poke your brain because you are, you’re deep into the testing side of, of our industry.

Alan Richardson:

That is generally what I’m known for.

Toby Sears:

Yeah. And I’ve seen a lot of around testing AI generated code, using AI to generate tests. And I see. Maybe it’s a bit more of a personal opinion, but I see AI testing being more and more important as we carry on with this generating code era.

Alan Richardson:

So I think that’s true. So we could just start with, yes, yes, we’re all on the same level there. I mean, yes, testing will be, I think, more important. I think testers are hoping that it’s going to be more important because they feel undervalued for so long in the development process. So they’re all hoping that suddenly they will be the superstars in the AI world. But I’m not sure that’s exactly going to happen. But the concept that you don’t just guide the AI with requirements, you also guide it with tests. And some of the tests are in code.

Alan Richardson:

It also depends how you operate with AI. How do you interact with it? Because a lot of people are sitting there with agents that are going on for hours and hours and multiple agents chaining each other. Whereas I tend to work in tiny chunks. So the testing that I do is for that tiny chunk. What am I doing? And then we’re architecting in small changes. So it doesn’t just change your test approach, it’s change your entire development approach, which then naturally changes your test approach because your Test approach should adapt to whatever you’re doing.

Krisztián Fischer:

Yeah, I think there are multiple areas of like or concerns here with testing. First, testing the code that AI has generated. That’s, that’s what you were referring to. I think this is a huge topic. So let’s dig into that. There’s also how do you test that AI is doing what it’s supposed to do?

Alan Richardson:

Right.

Krisztián Fischer:

That’s a whole like testing the agents themselves and then. And the third area is I think generating tests with AI and that whole problem space we could start with like how to test AI generated code. But what I’m seeing is in the industry now is it’s so easy to generate code that testing has become absolutely crucial because just a sheer number of lines that is generated and how do you know that it does. What you prompted or you wanted to do is align with your intention. But also who writes the tests? Does AI write a test? Because can a single person write a test and the code? It’s an age old question that we figured it’s better to separate. What’s your approach on that? Because I really don’t have a strong opinion how to do it.

Alan Richardson:

So I don’t have a strong opinion because I think it depends on the project, what you’re building, how many people you’ve got, what the priority of what you’re doing is. And it doesn’t just impact the tests.

Toby Sears:

Right.

Alan Richardson:

Because you’re getting AI to generate code. You can, you can let the AI generate really bad code and it almost doesn’t. And then the tests will probably be bad and they’ll be at the wrong level. So what we have to try and do is spend a little bit more time having the AI generate code that is good and well architected. Then you can get tests that are easy to review and at the right level and easy to maintain. One of the issues you have, or one issues I always have with this is I will be reviewing most of the stuff that goes in because of the way that I work in tiny chunks. I don’t always review the code in depth, but I review the tests to make sure they’re covering and there’s enough stuff and then I’ll prompt it, do this. But when I see that it’s struggling to create tests, it’s very often because it’s struggled to architect the code effectively.

Alan Richardson:

And the temptation is, because you’re doing AI and it’s generating code really, really quickly, to not worry about the structure of the code, not worry about the classes, not worry about what’s public what’s private, not worry about the interfaces, but because it is doing it so fast, if you do that as your first review step and get that done well, the test code that it generates from that is usually much better and is a lot easier to expand. So I think one of the issues that we have when we’re talking about testing and testing AI is that people separate testing from programming and they are so intertwined. It’s just that we’ve kind of specialized over the years, so we’re used to having this role concept when really what we’ve got is a concept of code that does stuff. Code that is implementing our requirements to check that the code that does stuff is doing what we want it to. So that we call that testing rather than coding. So that way of validating the requirements in code, you don’t necessarily need to call testing. It’s just that we annotate everything with that test. So we assume that it’s the testing part.

Alan Richardson:

It was by. Historically it’s been testing because we split the development role into programming and testing a long time ago. So people just wrote code and then other people came in and tested it. So we assume that’s testing. But there’s no reason why encoding your requirements in a code form to check your code is actually working properly, should be separated from programming. So it’s part of that whole programming concept. Then testing can become different in the AI world.

Krisztián Fischer:

Very interesting. So in your view, the testing and the writing the code sort of merges back together again.

Alan Richardson:

So when it’s unit tests, it can. Yeah, but human is always going to be there to review it, hopefully. I think some people want the AI a different AI agent to review it because they want to try and work really quickly. But all that does is then pushes out the point at which the human gets involved to double check them.

Krisztián Fischer:

So technically you would. So you said like you would start with the proper software engineering architecture first, so you. To facilitate the proper testing and the proper writing of the code. Which means TDD becomes architecture first and then. Or how do you think about like tdd? It’s still TDD in a sense. Right, but architecture first, then tests and

Alan Richardson:

then coding TDD in the sense that TDD itself was never a particularly good name. Right. Because we weren’t. We do tdd, we’re not writing tests, we’re writing requirements for the next thing and we’re evolving the design. But we called it tdd, so that’s fine. But also I’m not convinced that TDD works particularly well with AI. I do see people saying blog posts going, tdd is good for AI, but I have no idea when the AI is writing code, whether it’s writing tests and then application code and then test. And that would feel like such a horrible cycle to force it into because it’s not.

Alan Richardson:

I don’t find AI is really good at context switching. I think AI is really good at a stream of stuff, then another stream of stuff to check it.

Krisztián Fischer:

Mm.

Alan Richardson:

But if you want to write the test first to give it a head start and say, here’s my basic thing, now go and write the code to do this and add more code and add more requirements and have it. It’s about exploring to have a mix. But I think using the AI for what it’s good at, which you’re generating things fast, then saying, nah, I don’t quite like that. Tweak this and remember to do some additional coverage on the side and then go, did you run the test with coverage? Can you check the coverage output, make sure you’ve covered everything more? Because there’s no reason why all the. That we haven’t. We don’t do as humans because they take a long time. Can’t be done with AI because it doesn’t take a long time anymore. Like, there’s no reason for our code to ever be badly formatted.

Alan Richardson:

We should have good designs coming in straight away. We should split it into interfaces. We should have small classes. And these are the kind of concepts that you put into the agents MD file. You don’t necessarily tell it what a small class looks like because it knows. And once you’ve got a lot of small classes in the code that sits guidance, your AI is looking at the code as context and copying what it’s already done, which is what allows you to switch between different agents and different models quite easily because the good patterns are in your code already, not encoded in the agent’s MD file.

Krisztián Fischer:

Yeah, my experience with the I tried tdd, like pure tdd, as in like I gave the specification said, like write tests first and then write the code. But it didn’t really work for two reasons. Like, if you don’t have a structure of the application yet, what are you testing? So you can’t write unit tests if you don’t have the skeleton of the application code. And the other one is it went into these weird cycles of just satisfying the tests. And we know that from testing testing techniques. Like you write the test and then you write the minimum code that satisfies it. Like return a static string. Right.

Krisztián Fischer:

It Satisfies the test and then it doesn’t do anything. So I got into this like split brain situation with it that it didn’t do what I wanted. It just barely satisfied the tests and that was it. And it didn’t. So the outcome wasn’t what it should have been. Maybe it’s my poor testing skills. And then.

Alan Richardson:

So I think that’s, that’s a natural way that the AI would work. I mean, you would expect that from it because we as humans, when we do tdd, we do it really tiny chunks. Like we write a test and then we’ve got the benefit that we’ve written code that doesn’t exist. So we use the IDE to create that code. The AI is not doing any of that, so why would it need those steps?

Krisztián Fischer:

Yeah, true. And I ended up doing, just generating the code from the prompt. And it, it was quite good code, actually, I have to say. And then what usually is a blocker for me in one blocker, but it’s a hard mental barrier for me is to set up the test harnesses for all the environments, the end to end test, the UI tests. It’s just a pain in the back. Usually it went like a breath. It was super easy to set up. And then I was surprised how easy it was to review the test.

Krisztián Fischer:

I use Playwright and other similar tools and I realized like, it’s now I can express what I want way easier in the test after the code has been written and say like, make sure that the button always shows up or the button is disabled if this condition is not met. So I found it very helpful with the tedious work actually.

Alan Richardson:

So that’s interesting. How did, how did you find the quality of those Playwright tests? Because I found that’s where I have to guide the AI much, much more than the unit test.

Krisztián Fischer:

Well, this was a quite simple UI and I just wanted to have the basics, like, really, does the application work? Can I log in? Can I sign up? So these very, very basic functionalities with only a few components and it was quite generically written. I don’t know how it does it under the hood. It didn’t contain any IDs of the components or anything like that. It was just. Is there a button with whatever title?

Alan Richardson:

Yeah, that’s just Playwright.

Krisztián Fischer:

Yeah.

Alan Richardson:

But did in your tests, were you seeing actual Playwright actions or were you seeing component abstractions or page abstractions?

Krisztián Fischer:

Not in the test, but the code was properly written, so the abstraction layer was there in the generated code. Everything was a component. Pages were nicely done. The Routing was nicely done. The tests were a bit more generic.

Alan Richardson:

Yeah. So that’s why I’m seeing that the actual application code is well structured and uses all the good patterns. The test code often doesn’t. It’s often. Here’s a thing, click here, use this locator, find this object. And one of the lessons that we’ve learned for a long time when doing automating of applications from any level other than unit level is that we create abstractions to represent the application because then the maintenance effort is low. If you change an ID somewhere, you only change it in your component abstraction or your page object. You don’t have to change it in every test.

Alan Richardson:

But now that we’ve got AI writing a lot of these tests, people are coming back to concepts like, well, we need self healing tests, right? Clearly the test will fail because we’ve changed the application. So then we have to go in and change every single test because we didn’t write those tests properly because the AI did it and we don’t care. But if you don’t really care, you can’t really review that test coverage properly because it’s been written at such a low level that it’s very hard to review and you’re much more likely to miss things. The abstractions are really important not just in the application code, but still in the code that we’re using to execute it at whatever level. So that might be one of the skills we have to teach everyone, because I think testers and people who’ve been doing a lot of automation know that. But people also do automated execution of applications really badly. Historically you’ve got case studies of, well, we created a thousand tests, now we can’t maintain them, they take too long to run. And it’s like.

Alan Richardson:

But we already have skills and experiences to know how to avoid that. But now the AI is bringing that back, but it can fix everything faster, so everyone’s happy. And that’s just like having a really bad application that you find a bug and the AI goes in and fixes it really quickly. But it’s still a really poorly architected application. So I think we’re going to see in the short term a lot of really poorly architected test execution code bases, but quite well architected applications.

Krisztián Fischer:

I probably contribute to that.

Alan Richardson:

But do you think it’s because the

Krisztián Fischer:

models weren’t trained on good tests?

Alan Richardson:

I think it’s because if you get your AI to write just application code, it will write the most basic application code it can. It will try and create methods that are 600 lines long if it can get away with it, and we have to tell it not to because we know as developers what good code and good architecture looks like. Many developers have not had to automate applications from the outside in very much because test team has done that or they just haven’t done that. So they’re not used to seeing the architectural patterns that go into that area. So we don’t prompt for it. But as soon as you prompt for it and give it guidance and tell it what you need and then start having a code base that has those patterns, it will start generating that. Like I do not create skills or agents for creating UI tests. What I do is, is I either get the agent to write the test and then I say, no, no, I want a page object.

Alan Richardson:

I want you to structure this way. I want you to put the page object over here. I want an interface. I don’t want you to test the page objects, I want you to use the page objects in test and then you get the code that the next session can follow and say, oh, I see, we’re using page objects and this is the format we’re using. So then I’ll create them.

Krisztián Fischer:

So you’re architecting the tests in a better way than it would naturally write it? Yes, micromanaging it a little bit so it has a better outcome.

Alan Richardson:

Yeah, for me the concept of self healing test is horrible. There’s no such thing.

Krisztián Fischer:

Yeah, yeah, for you. Because it just heals itself around the bug.

Alan Richardson:

Yeah, exactly. And you don’t know whether it’s done it properly unless you review it. And it shouldn’t have to. Like, why have you got, why do you get back into the bad pattern of we’ve changed the application, now our tests are broken. It’s like, oh no, you change the application and amend the test at the same time so that they’re in keeping because the tests are representing your requirements. So, so why is one breaking before the other? It’s because we in our heads have that role split of programmer, tester. So now we have application code, test code, and they’ll be updated differently. So we only realize reason we know we’ve changed the application code is because our tests have failed.

Alan Richardson:

It’s like, well that’s, that’s, that’s insane. Why, why are we getting back into that when we should be doing better?

Krisztián Fischer:

Very interesting to hear that from you. You’re a specialist in testing and usually this is the attitude that, or the opposite is the attitude that I see from test. Like, this is how you should like break the test first and then implement the code. And you’re thinking in a completely different way.

Alan Richardson:

But that’s because this is automated execution coverage, not necessarily tests. Tests, yeah. When we test, we’re trying to find the gaps and the missing parts of our model. The automated execution coverage that is passing is supposed to continue pass because it represents what we want to happen. If it starts failing, then it usually means we’ve had unintended consequences of what we did. Because if we know we’re changing the IDs over here, we know we have to change the IDs in our page objects. Now you might choose to change the application, run the tests, see that it has changed, then amend the tests. But that’s an expected outcome of the process.

Alan Richardson:

So it’s just a maintenance flow rather than a surprise where the tests now have to self heal themselves. We need to be in control of the whole process. Particularly when it’s automated execution. It’s different when you start using AI to test the application and give you new information. Because the way I always look at it is information. If you look at. When I looked at information theory and I was trying to understand it and it’s complicated, I simplified it in terms of where information is the stuff that surprises us, it’s the stuff that’s new, it’s not data. Right.

Alan Richardson:

With like if you ask someone whether they’ve tested everything and you’ve got a test team and development team and the testers are going, no, we haven’t found any problems, you go, well, that’s kind of what we expected because we’re doing this process well, we don’t expect much information there. But that also means that you’re probably not testing, you’re probably doing a lot of regression coverage and it’s stuff that should have been automated. So you’ve got the test team doing kind of the wrong things, which is why companies have historically got rid of test teams when they introduce more automating. Because the test teams have only ever had time to continue to go through the same requirements and say does this still work? Does this still work? Does this still work? And the information that you’re getting at that point is very obvious information that you can cover through automated execution coverage. What you want is continually new information because we’re expanding out into areas that we haven’t looked at or we’re looking at old areas in more depth.

Krisztián Fischer:

Do you think we could use AI testing? AI in this manner, have an adversarial testing agent that just always tries to find something New that we haven’t discovered yet. Edge cases. And is there a reality of that? My philosophical approach to that is if a model has a fault, a failure that generates the code, it will have the same failure mode when it tries to test it. So it can’t be the same model. It has to be something different. But, but even then, like, how do you know that it will find different holes or they don’t make the same mistake? Like, how do you think about that? Do you think we can reach a point where testing can be primarily done by AI and then only a little bit of human in the loop? Because that’s what I see as a bottleneck right now. The humans are the bottleneck now in software engineering.

Alan Richardson:

So that is what people in the kind of test community are trying to figure out in various ways. So tool vendors are creating tools and telling you that that is exactly what, what they’re doing. They’re creating tools that are scanning applications and finding issues and doing a whole bunch of coverage for stuff that you would not normally create execution coverage for. Right? So they’re saying, right, we, we will do all your accessibility stuff, we’ll look for security, we’ll look for performance issues, stuff that you would never write tests for, and we will find problems. But with any kind of scanning tool like that, you get a lot of noise. So how do you evaluate from the noise what is actually a problem and what is not? What do you care about? So then the test community is going on. What we will do is we will have tools that scan the systems and use them. And when we find something that looks like an issue, we’ll create the code that gives you the automated execution coverage for that, and then that becomes a test for you, and then you fix the test.

Alan Richardson:

So it’s trying to get you less noise. But then your issue is, well, how much coverage did it do during that process? Do I know that it did the right things? And that is a problem that I have not seen any tool solve properly yet.

Krisztián Fischer:

Like that is it completeness question of the test? Is that what you mean?

Alan Richardson:

So not even complete test? It’s more when you were doing this process of investigation, AI into my system, because all you’ve given me is an output that is information, because it’s a bug. What did you do? What was this space that you explored? Do I know that you explored the correct space? Or did you just do everything that we’ve already covered in unit tests and one extra thing? Right? So that coverage part is what testing and testers have always been hopefully good at. And then trying to communicate to people not just the problem that we found, but the coverage space that we explored and how we explored it. And then what we have yet to explore, what we still have to try and cover so that we know what the gaps are. The AI models are very focused, so they don’t really have the memory. So every time you ask them to come in and test my app, they go, I know how to test an app. Then they do exactly the same things. They might have slight variation in the data, but even then with the variation in data, like I’ve got some test applications and I’ll get the AI to test it and I’ll look at the inputs that it’s created, because some of the time when it’s automating through a browser, I get it to write the test code and write the data in as a file that I can review to see the coverage scope.

Alan Richardson:

But the variation in that input is very small. Like everything starts with a capital A. It doesn’t vary as much as a human would or even just a random algorithm would. I don’t think it’s very good. We associate AI with being random. I don’t think it’s very random at all. It’s highly probabilistic and path beast, which is very different from random.

Krisztián Fischer:

Yeah, it feels sometimes it’s very biased in one direction. It’s hard to knock it out from that little groove that it went into and you have to force it, really. And if you don’t have manual oversight, it will not get out of the

Alan Richardson:

groove that you went into or memory or history. So a lot of the test tools are trying to build in a memory so that next time the AI comes in, it can see what it’s done before and then can try do something different. And that’s really important because that’s what we do in order to find information. We know what the existing model is and we’re trying to go, what are the gaps in our model? What have we not compared from our model into the application and explore that. So it is an interesting problem. The hard part is commercial tools are designed to be sold and marketed rather than necessarily do a job. So given that they’re still experimenting and exploring, there’s a lot of scope for doing this on your own and building your own agents and skills and tools to do this. And there’s a lot of open source tools that are trying to expand on this.

Toby Sears:

I was going to ask if you had any like, good suggestions for these kinds of things because, like Me as a, I’m not a software engineer, I’m more of a infrastructure guy. But I now using AI tools to create software, I can do it enough to create websites and phone applications. What I would really like is some sort of curated like because you were saying the structure of the tests, the layout of the code of the tests is super important and then once it’s there, the model will then read it and then continue in that kind of fashion. Right. Is that, is there any like skills or open source tools that you would recommend to get me started on good framework of tests?

Alan Richardson:

So I don’t think you need special tools for that. I think what you need in the same way that when you work with application code you have some patterns and buzzwords that you use. So when I’m starting my agents file I’ll say, right, we’re going to build an application that’s going to do these things. This is the ultimate goal. I want to use domain driven design, I want test driven code, I want lots of unit tests, I want linting, I want this. So we’ve got all these buzzwords. So you need to know the buzzwords for effective automation. Like we are going to use page objects, we are not going to test the page objects, we are going to use the page objects to test the applications.

Alan Richardson:

We’re going to have interfaces so that all page objects can wait for the page to load. But it also depends which framework you’re going to use. So you have to know the concepts that you want and they’re all encoded in these models and it’s just finding the right words to pull them out. And once you’ve pulled them out, they’re there in your code base. So you have to know a little bit about what that effective architecture looks like. But if you’re in the position where you don’t, you do the same thing that you do with the application code. You start and you go, I’m going to build an application that does X, Y and Z. What would you suggest is the right architecture? What would you suggest is the right hosting place to put this to make it free.

Alan Richardson:

What storage should I use? So you ask the same thing of the test code and it will then tell you. What we shouldn’t do is prompt agents with things like you are an experienced test automator, do XYZ because then it will go, I am an experienced test automator, let me create some rubbish, but I am experienced, so this is good rubbish. I and that’s not what we want, we want to pull out what the best practices it knows of, then tell it to use them.

Krisztián Fischer:

But isn’t this a skill that you’re describing, A good testing architecture skill that you could formulate and include in your projects and base the outcome on that

Alan Richardson:

so you could do it as a skill. And the interesting thing is. So Dragon is working on something called the Agentic QE fleet, the agentic quality engineering fleet, that’s Open source in GitHub and he has a lot of skills encoded in there to help try and do these things. And you need those skills if you’re going to have a kind of suite of agents running autonomously against your code. So if you want again to have a tool that is acting as another part of your team that will give you information, then that’s what you need. If you want this done as part of your normal coding flow, you don’t necessarily need skills. What you want are the information that agents file to tell it to do it and examples in your code base of what good looks like. I think that that can compensate a lot for the skills because that evolves over time and you get what you want in your code base.

Alan Richardson:

If you have a skill, you have to keep amending it.

Krisztián Fischer:

You rather use examples. Yes, to teach the like to show the agents what you want instead of abstract skills and formulated the knowledge.

Alan Richardson:

So that’s what I do and that’s working well for me on my projects. Whether that scales, I don’t know. You could certainly never offer that as a service. Right, like, because you’re doing that internally and you can certainly never sell that as a tool because again, you’re giving people the paragraph to put in their agents MD file. Pay me $1,000 please and I’ll give you my paragraph. It’s not scalable as a business model,

Krisztián Fischer:

but it actually aligns with how we like to do software engineering nowadays that, you know, the coder actually should consider security, they should consider the testing. They should. So maybe this is a much more natural way of writing code. If you can build it in from the first minute into your architecture. Coding or software architecture.

Alan Richardson:

I think it’s really important to have these skills if you are not a developer. Right. And what AI is doing is it’s helping people who do not have development experience build systems they will have. We’ve said a whole bunch of words throughout this that a lot of people will never have heard of if they don’t come from a development background. Like we’ll just go, yeah, do domain driven design. When you’re coding, it’s Fine. No person that’s not in software development will even know what that means or why it’s important. So if they have a skill that has representative of like skills from agile developers, skills for security conscious teams, then that will help them.

Alan Richardson:

But it’s skills have a tendency to like clutter up your context quite a lot. And your agent has to know which skill to use. You either have an agent that knows what skill to use, or you have to prompt this system using skill X and skill Y. I want you to create code that does this, so you have to again know how to prompt. And that’s where the tooling and the agents and the skills come in. If you’re working as a developer, you can bypass a lot of that by knowing what to ask for, knowing what high level things to put in. Because I think it’s much better to prompt the AI with here’s the outcome I want, here are the kind of things I want you to do and here are the principles I want you to use rather than say, all right, you are now an experienced test consultant and I. So it’s like, no, it’s just we want this as a whole cycle.

Alan Richardson:

Yeah.

Krisztián Fischer:

I constantly get the feeling when I talk to AI, when I want the best to achieve the best outcome, I always have to treat it as a junior developer. It’s not an expert, it’s the opposite of an expert. It can do write a lot, of course, very enthusiastic, it can look up information very quickly, very sharp, talented junior. But it makes architectural mistakes, let’s say, or maintainability problems. And I constantly have to tailor that. I really feel like I’m working with 10 juniors who are really fast and they produce a lot of output, but I have to herd them a little bit. Right? So, so maybe, maybe my role has become just to be the senior engineer in the room. And I feel the same with testing.

Krisztián Fischer:

You can’t just test everything. It doesn’t, it doesn’t work. It starts testing meaningless things like oh, does this function work? And the function is just a string concatenation. Yeah, why wouldn’t it work? It’s like standard library things. I don’t want to test that, I want to test against my specification. And that’s my constant feeling. I’m not sure if this will evolve over time, but right now I feel we need more senior people in the loop. And still the human has to be in the loop.

Alan Richardson:

So again, it depends what the outcome is. So if you’re doing professional software development for a company that’s livelihood depends on the Software and it has to be maintainable over time. You want experienced software developers in there. If you are single person startup and you’re trying to crank something out in order to get some funding or to try and monetize it quickly, then just do that and take the hit later on because it might go nowhere. It is important to learn the principles that are encoded because your example there of it’s writing its own string concatenation or string parsing things and you look at the code and go, don’t do that. Find a library that will do this, then get rid of all that code. Because I don’t want to have to maintain or test this. But you have to know what good looks like in your code base.

Alan Richardson:

One of the interesting things is if you. What’s the word? Anthropomorphicize it into junior coders. I’m often thinking that the AI instead has been optimized to reduce its context and to reduce its token use. So it’s trying to get something out as quickly as possible, as fast as possible in a stream and we have to prompt it to say, no, no, I’m, I’m quite happy for you. Use more tokens. Revisit this. And we have the advantage now that the system’s context window is getting much, much larger so it can put more in there. So we don’t really have to worry about putting everything in the prompt and it can find it and pull out the code.

Alan Richardson:

So again, having multiple abstractions about how we think about our tooling is useful because the tendency is always going to be to anthropomorphize it and communicate to it like it’s a human and tell it things like you are an expert thing. When it’s like it’s a tool. I want this as the outcome. I want it in this form. I want you to pull out of your massive brain these concepts and implement them. Then I think we get better results.

Krisztián Fischer:

But that assumes that you are an expert in the field.

Alan Richardson:

So assumes you’re an expert if you haven’t gone through the first step, which is help have the AI help you become the more senior member of the team. So like if you’re working, say you’re brought into a project as a consultant, right? And you don’t know that technology. Your job as a consultant is to pull out the knowledge from the people on the team that do know that technology to get the right results. So you’re asking them high level questions about general concepts that you know. And even if you don’t know those concepts you go, well, how will I know if this works? How will I know if this is efficient? So you have certain words which don’t necessarily need to be domain specific. Words like domain driven design. You have words like efficiency, low cost, fast. And we ask the AI, how do we make this system fast, operate at low cost, be maintainable, make sure it doesn’t break in the future.

Alan Richardson:

We can use English words, it won’t know how to convert them into IT terms. And that’s, that’s the skill, having the AI, getting the AI to give you those answers and those answers become either your prompt or your agents MD file longer term.

Toby Sears:

So it sounds like there’s a, there’s a method I was told about recently, it’s like asking Claude Code or Opus to deep interview you about stuff. So you could ask it to deep interview about testing there.

Alan Richardson:

So if you’re an expert, you can get it to deep interview you. But if, if you’re not an expert, your job is to understand how to ask questions of the AI because it will be your deep expert. You’re trying to help it expose the deep information that it has to pull out into some sort of expertise. And once you’ve got something there, you can ask it to critique it for the development plan. A set of consultancy and mentoring skills for getting the best out of this agent for the certain procedure or project that you’re working with. And those skills are documented. If you go and learn, get books on management or consultancy or mentoring, psychotherapy. Even if you’re looking at brief therapy approaches, those skills are all documented in there and that’s what we start using.

Alan Richardson:

So it’s not even that you need to be a domain expert, you need to be an expert in interaction to get the best out of people.

Krisztián Fischer:

So it almost feels like our role has become a bit more abstract or like we’re going up a little bit, either more towards management of these agents or the AIs, or become the more senior in the room. And I think that’s why use it as a tool.

Alan Richardson:

Yeah, that’s why a lot of like single company founders are doing really well because they don’t have the development skills, but they know how to get the best out of people, so they know how to ask that question. They know that they don’t know anything. So they start with the AI and saying, I don’t know anything. How can I build an application that does this? Where can I host it so it doesn’t cost me any money to start with. We know those things, so we have Biases. So we’ll go away and do web searches and check. They won’t know. They will just ask the AI and then they’ll pull more knowledge out of it that we don’t even know is in that model.

Krisztián Fischer:

Interesting. Tob is a practitioner of agentic coding nowadays. Like he’s harnessing a swarm of agents. Do you have any tips and tricks maybe to around testing in this environment? To me it seems like it’s very chaotic ages.

Toby Sears:

You’re asking me because I don’t have any idea. Something that really resonated with me was the idea of not knowing the coverage or what it’s done. Because what I do a lot is I ask it to audit itself. Like audit yourself for security, audit yourself for like, you know, cross role boundaries and stuff like this. And then it spits out an audit. But I don’t know what it’s checked. I don’t know if it’s gone and checked everything. So that really I struggle a lot with.

Krisztián Fischer:

Do you have any ideas how to, how to operate in an agentic environment when it comes to testing or ensuring that what we wanted got done?

Alan Richardson:

So there’s a couple of things there. So one of the things that I would encourage you to do, Toby, is experiment with the agentic QE Fleet repo, which is a set of agents and skills that you can install into Claude and then you can ask Claude to do that type of audit and it will use all the skills that are in there and give you a report and it can maintain a history over time. Dragon’s continually trying to work on getting the history working better. So there are that use case. You said there are tools that are trying to help do that. One of the interesting things also is I haven’t done the agentic development part. That’s not something I’ve experimented with because it seems too risky for me. It seems too uncontrolled.

Alan Richardson:

But if you wanted, what we can do is we can go through that consultancy style process like so if I asked you if you were. If I asked you a few questions about the agentic approach you’re using and you give me the answers, then that can help me answer that question better. So when you have multiple agents, what agents do you have and what are they doing?

Toby Sears:

Well, I mean I, I don’t do the spread, you know, the item spread of Claude. I usually, I do one application equals one main Claude session and then I do, I get it to spin multiple. So I do my, my general workflow is I plan a feature. I get it to plan and then I use some skills to critique the plan and then I get it to implement the plan with bypass restrictions on and everything and just, just go right. But it’s per. So like for example, I have Expo Web Expo phone app. I have a claude session for that. I have the, the same but as same API usage, but a front end in vitae claude session for that.

Toby Sears:

And then I’m running multiple at the same time in different like bounded areas.

Alan Richardson:

All right, so that’s a slightly different approach because that basically just means you have two team members working in parallel.

Toby Sears:

Yeah, Yep.

Alan Richardson:

Which is slightly different from the whole agentic process. But what I’m finding interesting about what you said is you have a set of skills that critique their plan. So I haven’t even got to that point. So I will critique the plan and then I will iterate on the plan and then I’ll say that’s good enough to go. Have you got skills to critique? Because I mean, how long does it go away and do this on its own? How big is that iterative chunk you’re asking it to do?

Toby Sears:

Okay, I can do. An example is the other day I was like, I want to add a, a mapping feature to this, this phone app and it needs to have markers, it needs to have areas defined. You. I want to be able to import gpx. It needs to have a filter like I’ll go through and then I’ll say, right, let’s start discussing it. And then we. So include code. Was it Shift tab to plan mode? Always plan mode first.

Toby Sears:

And then once it’s asked all its questions and I’ve answered them, it will spit out its plan and then I will say, so there’s an Expo skill for reactive front end design. I think it is. There’s a react skill. I have a react skill. I say use these three plugins to critique the plan you’ve just written and it will find all the problems it thinks and update the plan. And then I’m like, oh, that’s what I. So that’s how I’m doing it at the moment. It works quite well, I think.

Krisztián Fischer:

How long does it take?

Toby Sears:

How long does it take isn’t 10 minutes to do the plan and then maybe 20, 30 minutes to do the implementation.

Alan Richardson:

So implementation is also huge. It’s a large complex implementation. So depending on the feature that you’re working on, you have a different test approach. So I mean, I would be comfortable working the way that you just described, but I know that I would take the hit once it’s been delivered that I’m going to have to go in and test this thing and then I’m going to be thinking, okay, because we’re using a fairly complicated set of JavaScript and React and whatever components and it’s on mobile. I’m going to try this on different mobile devices. I’m going to have to try this on different browsers because the risk of cross platform stuff has now entered in the equation. I’m going to be thinking, is it accessible, does it work for SEO? All these other things and. But what I’m also going to be looking at is what bridge, what execution coverage did we achieve on this? Because it’s quite complicated and I don’t want to review that execution coverage all the time because I’m trying to evolve this front end and we’re in the position with AI a lot that we’re trying to build something, we’re trying to build something fast and we’re trying to evolve it.

Alan Richardson:

So how much effort do we put into checking whether we’ve broken anything? Because we know that the next prompt we give it is probably going to break everything because we’re going to get it to redesign it.

Toby Sears:

Yeah, yeah, yeah, that’s exactly what happens.

Alan Richardson:

So your test approach changes as a result of that. And it may be that you don’t try and add a lot of external execution coverage. What you do is you look at some of the features you put in place. So you mentioned, yeah, we want to hook into the Geolocation API and do xyz. That’s probably not going to change very much. So you want to get execution coverage in there quite early, make sure it’s mocking out that service, make sure you’ve got some integration tests in there to test against the real service, checking the contract so you spend some time on the things that are not going to change. And that’s where interface based design API first type interactions lets you operate fairly quickly because you can put the coverage at the interface level, at the API level and you take the hit that the other stuff might not work well. So in your example there, where you’ve built a front end and you’re iterating on it, the kind of tools that people are trying to build might help you because the kind of tools people try and build is throw my tool at your front end.

Alan Richardson:

We just do a whole bunch of stuff, we tell you problems and for you that’s great because you’re testing it in parallel with that thing. It’s giving you information. You go, all right, I found some Stuff, it’s found some stuff. Fix these problems. Okay, it seems good enough. Let’s put it live because we can fix the issues in live when it goes there, we don’t. It depends on your like releasing it too. How mission critical is it? But with AI, we have to also optimize for changing quickly, not just looking for regression coverage, because what we’ve got now may not even look like that in a week’s time.

Alan Richardson:

So if we spend a long time optimizing to find, to detect unexpected changes, we probably want a lot of unexpected changes. And we don’t want the test code impacting the AI’s ability to make those changes because we’re trying to do it in advance. So it’s a different way of working.

Krisztián Fischer:

That’s so interesting. You say, I would expect you to double down on testing and be strict with testing. And you’re saying you have to be more flexible in a sense because you’re going to hinder AI to iterate. So you have to be very smart about testing. Not so it’s a qualitative test change, not the quantitative. It’s not like generate hundred more tests. It’s more like do the right test.

Alan Richardson:

So I think one of the reasons it might sound different is that my concept of testing is about information. We test to get information about stuff which is new. A lot of the stuff that when we talk about testing, we’re talking about existing coverage for things that are stable that we don’t want to break. So I’m suggesting that’s where we put the energy in. You cover your APIs, you cover your interfaces. That stuff you don’t want to break because it’s fundamental. And if you’ve got the coverage there, you can flex it more easily in different places. And then that’s where the thoughtful innovation comes in when you’re using it.

Alan Richardson:

And then you. So you explore an experiment to do the testing, or a human does. And in parallel, you use the fairly stupid AI to do things it thinks and tell you stuff. And that’s just a bonus set of information. You’re not relying on that, and you’re also not using that for your standard execution coverage that you’re using for stability. But again, that also comes down to how are we architecting the application? Because if we’re architecting the application well, we will have a base that is fairly stable, and we will try and keep it fairly stable. And then we will have some other components we put on top that we’re expecting to flex for a while once they’re stable. That’s when we put the effort in to making sure we have automated coverage so that that doesn’t break when we start looking in a different place.

Krisztián Fischer:

That’s very interesting because that’s not what I’m seeing typically, especially from less senior engineers. What they use AI for is to where they would have written one or two tests for the code. They now generate 10. And it’s very low quality tests actually it’s just repeatedly testing nonsense or it’s not really high quality tests. So. But your approach is very interesting and I think it could be applied like you think about certain areas of the software as being stable and that’s where you cover properly and the rest you let a bit more loosey goosey so you can experiment and add new features faster. We have touched on a topic several times already in the discussion the maintainability of the code. And I wanted to ask your opinion about this because we discussed this with Toby in a previous episode that now because it’s so easy to generate code, it almost becomes like we use the example of it’s like plastic bags, you can just throw them away if it doesn’t work.

Krisztián Fischer:

And historically we had this rule of thumb that one unit of cost is going into generating the software and then nine other to maintain it over the lifecycle of the software. Upgrades, fixing the tests or having tests, regression tests, all that goes into software development.

Alan Richardson:

Do you see this changing with the

Krisztián Fischer:

AI and the low cost of code generation?

Alan Richardson:

So I do, but also I think we already have the patterns and processes to deal with that. Say, I mean I’ve done that, I’ve written an application and it’s horrible. And then you’ve got a choice, do I refactor this or do I just get the AI to just create a completely new version, knowing what we’ve learned. And so verifnuguard just completely create a completely new version. So then my choice is once it’s created this new version, how will I know it’s going to work? Do I want to put in the effort to completely test this again or do I want this AI to build an agreed set of interfaces? Then I’ll take the lessons from the old application and say put an interface in the old application, cover it with those tests, new application, surface the same interface, do whatever you want under the covers and use these tests. So it’s a migration that we have with the execution coverage where the requirements are already encoded into test code and we apply them. That is something we do as humans all the time when we’re migrating applications, we don’t just migrate applications. We go, maybe we should check if it’s going to do the same thing.

Alan Richardson:

So then we create tests that we’ve run against both to achieve parity. If you care about that aspect in your migration, because that’s what you’re doing, then you do that process. Otherwise you take the hit that you have to test it from scratch from the set of requirements that you’ve decided on. I mean, it’s. We already have a lot of the processes and concepts that we need in order to do this, but we can do them faster. Like historically, if you were migrating systems, if you don’t have those tests already, it’s a complete pain to create them. Now it’s not. Now you can just say our API, I want create an interface.

Alan Richardson:

We don’t have an interface. The key point for a lot of this work is interfaces, because that’s where your stability is. And if you can start thinking in interfaces or injection points to put the interfaces, I mean, it’s like Michael Feathers book, working with legacy code. When you’re working with legacy code and it doesn’t have tests, you’re looking for the seams and those seams give you the interfaces. Then you put the tests on those interfaces, then it doesn’t really matter what happens underneath because you’ve got that point at which you’ve agreed the way of working.

Krisztián Fischer:

But that also can solidify unintended behavior because you don’t know if it was intentional behavior or unintentional and you just copy it over into the new systems. And I see that, that in, in AI as well. And it finds something in the code that is wrong actually and it just keeps competing, repeating the pattern, because that’s what it trained itself on pretty much. But do you think this changes the cost of software or the cost of testing of software? Like, do you see a tendency there where it’s the relevance or importance of testing changes with this new tooling?

Alan Richardson:

So again, it’s not. So testing is often the wrong word, right? So what we were talking about there wasn’t really testing. It was creating a set of requirements that are encoded in code. And we have to review those to make sure they’re doing the right things. Otherwise we can encode the wrong behavior. So what it does change is throughout software development in the last 20 years, years we’ve been trying to do more agile, more lean, and we’ve been trying to cut down on the costs of decisions. So we’ve been deferring decisions until later, not really committing to them. And until we get it solid, that mental attitude is changing, I think, because we can say, let’s get out there, let’s.

Alan Richardson:

I’m going to release it all on Supabase. We’ll get it working, we’ll get out there. It’s not working properly right now. I’m going to convert this onto Mongo and some other system and we’ll migrate it across. So our, our commitment to the technology is massively reduced because we can get the AI creating an entirely new set of code, or it could do it in parallel. We can experiment with both these systems at the same time, but the interface is important, the requirements are important. I think what’s becoming important, I think it’s not the word testing, I think it’s the word requirements. Yeah, requirements of what we want.

Alan Richardson:

And they are very often encoded in code and we call them tests, but really they’re requirements. And it’s interesting to me that we’ve spent so long often moving away from that concept of requirements to stories and acceptance criteria, because we’re trying to emphasize conversations, which is important for speed. But we encode as tests those requirements, but we don’t associate them that way. So I think those of us in this call at the moment are older. So we have in our head the concept of requirements, and it doesn’t always feel like a bad concept. And I think we’re getting back to the point where that’s what we’re doing, working with. We’re essentially creating the requirements for the application. We’re not even necessarily creating the spec.

Alan Richardson:

Right, because the AI is implementing, is creating a spec and it’s documenting as it goes along and it’s describing the decisions that it makes. But we’re deciding ultimately requirements. Then we have to figure out how do we, how are those requirements met, which is the execution, coverage, and then what have we missed, what mistakes have we made? And that’s where testing comes in. And the AI can find some of those mistakes and it can find some of those problems. Humans can do that probably better, in more depth. But again, humans that are augmented with algorithmic tools and automation can do more random testing than the AI. Like, if I’m doing a lot of scope and I through a certain function and I want a lot of randomized input, I don’t let the AI come up with the data. I’ll have the AI generate code that will create random data for me, because that randomization process is better than the AI’s randomization.

Alan Richardson:

The human randomization is good and varied but often it’s hard to generate a lot of data, but we’ll come up with wildly, very much more wildly varying data than the AI or the randomized process. Because the randomized process is based on a schema, the AI’s randomization is based on probability. Our randomization is based on seeing this. Then get an idea, Then we get something completely new in there. And all three in combination give you a lot of scope coverage.

Krisztián Fischer:

So AI is not really good for fuzzing?

Alan Richardson:

I don’t think so. But then so many fuzzing tools are not very good for fuzzing because they’re not actually random. They’re like sets of pooled data in text files.

Toby Sears:

I don’t know if this is a complete side tangent or if it’s relevant, but like something that popped into my head is the. So recently or. Yeah, quite recently GitHub started introducing this concept of natural language actions. And I’m wondering if that might be something in future for like defining requirements, defining text, like natural language test. Do you think that would translate well into testing? Or like, can you give an example? The question’s here. Well, for GitHub Actions, it’s like, okay, create an image from. Pull the repo, create an image, push it to this one. It’s natural language actions.

Krisztián Fischer:

And you keep repeatedly running that natural language script. If I don’t know what to call it.

Toby Sears:

Well, it’s like a GitHub action, right? And then it will run it and then it’ll just. It’s just natural language. You can tell it in natural language how to deploy your application.

Krisztián Fischer:

This touches on the subject that I wanted to talk about a little bit. Maybe, maybe it’s too big for. For one episode to. How do you test that the agent does what you want? Because this is exactly a scenario where. How do you guarantee that it’s repeatedly doing the right thing, it’s building the right image, it’s building the. Or pushing it to the right place and not introducing new random changes because then the build is not repeatable.

Alan Richardson:

Yep. So people have been experimenting with this for a while because it was when AI came along and it’s all prompt based. The natural thing is, what I’ll do is I’ll create a prompt and I’ll feed it into the AI and it will do the stuff for me. It would just magically do it. So there are test tools that will do that and you give it the natural prompt and it will go away and it will explore your thing to try and achieve the outcome and it will Be non deterministic. So it will do it slightly differently each time. And if it’s using something like Playwright, which it probably has to, it can bypass UI issues. So you know how Playwright works.

Alan Richardson:

Playwright. You basically say, click the button, click the submit button and submit is the text that it’s got in there. And it will go along and it will find a submit button and then it’ll click that and move on. But if the comes to the form, it adds the item into the details in the form and the submit button isn’t present. You say click submit button, it’ll go, yeah, I can’t find the submit button. Let me just wait a little bit. All right, here’s the submit button. Now I’ll click it.

Alan Richardson:

So you get a really bad user experience, but the actual flow completes. So you don’t get the information about some of the issues in the flow, but your flow completes and your test passes, which is why you want to augment that with much more deterministic. When we hit this page, I’m expecting the submit button to appear as soon as we’ve done these things. And then if that doesn’t, then that’s a requirement that has been met, then you get that information. So you need to have the multiple levels of coverage in order to find as many problems as possible. And then what people discovered is, wait a minute, we’re getting different results each time. Like with your GitHub action. Wait a minute, it’s sometimes doing this call, other times it’s doing this call, followed by this call, followed by this call to achieve the same result.

Alan Richardson:

So what people started doing was taking the natural language and then putting it into a domain specific format like cucumber, and then implementing the steps behind cucumber. And then people went, wait, why have we even got this intermediate domain specific language? This is stupid. Let’s just go from text straight into code. And then we’ve got the determinism and we can review the code and if we’re not happy, we can regenerate it. Then the code only gets regenerated if we change the prompt that comes in at start. So there are tools that will do that, if that’s how you want to work. But if you’re a developer, you don’t need those tools because you’re going to tell as part of the requirements, I want coverage of these things. It’s going to generate the code, you’re going to review the code, it’s going to follow good coding patterns.

Alan Richardson:

So again, those tools exist because we have that role Concept and we want to put the testing out over here and we want our development work to be separate. Mm.

Krisztián Fischer:

My natural language is bash anyhow, so I can just express it. Yeah, that’s very interesting. I. I found some testing areas that I know I would have never been able to do without AI, namely pen testing. We did an experiment with a few colleagues where we took a code base, a single service, and we gave. Gave that as the context to the AI and also the CVEs of that code base, the, the known vulnerabilities of the code base. And what we. The experiment was, let’s try to create actually exploits for all the critical CVEs that it, that the security scanning found.

Krisztián Fischer:

Because then we translate from. To the teams, we translate a hypothetical risk into an actual one. Because if I can exploit your code, I can give you the code. Look, this is the code. This is how I’m going to hack your system. Now go do something about it. So I think it also opens up whole new opportunities in the testing area. This is pen testing, right? In a sense, like automated testing, pen testing, if you wish.

Krisztián Fischer:

This is also something that I found super interesting. I’m not an expert in this, of course, but, but we could actually exploit a few, few bugs or sometimes it. It turned out to be unexploitable cv, which is also information, right? Do you have any, any tips or tricks on, on any random. I don’t know how. What you would call this, maybe exploratory testing is this.

Alan Richardson:

Or so. So now I would class that as security testing. And it is interesting to me because when AI was coming out and I’m looking around at the scope of what could change and I’m thinking, all right, where do I focus my learning energies? And I was thinking, right, I need to get back into security testing. Because that’s the one thing that the AI is never going to be able to do, right? It’s going to be able to create code, it’s going to be able to write test coverage, it’s going to be able to click on things, never going to be able to do security testing. And then the first thing that they have real success for after coding is security testing. And it’s like, I did not expect that. And it’s not just the CVE stuff, right? So the CVE stuff like Snyk and Aikido, they’ll both scan, they’ll create tests for the vulnerabilities, they’ll give you code, they’ll give you fixes for your existing code, but that’s different from the, the exploitation penetration testing which is not based on the CVEs. So that red teaming which is looking at your red domain, your APIs and actually testing it, I thought that’s safe until like what was it six months ago? The top hacker on HackerOne, like for bug bounties was a bot.

Alan Richardson:

And it’s like, that’s incredible. I did not expect that to happen. So there’s massive scope here. What’s really interesting to me and what you described Krisztián , is you created a tool that prevents you having to buy create a license for Sneak or Aikido by using the, the public CVE database. You don’t get the pre disclosures and you don’t get the stuff that is silently fixed but might be in there in some other places, but you get the CVE coverage. Again, what this comes down to is you have to know that the concept exists. Like you have to know that security testing is important. You have to know that security testing is important for your particular technology and domain.

Alan Richardson:

And then just from that point the AI can guide you through that process, can create the tools and scripts you need in order to do it. So it’s, everything has changed. You need to know the top level meta concepts now, not the actual details.

Krisztián Fischer:

Yeah, I think, I hope that this can help teams build more secure software. And we’ve been criticizing in the comments before because I think I said that if attackers can use AI to attack, I can use AI to defend. And this is exactly one of the examples that I wanted to try because I have the advantage of having the code, I can look at the code, the attacker can’t. So they have to attack us on the surface or on the supply chain or some like from outside. But we have the advantage of knowing what we already screwed up. We just have to find that screw up. Right. So, so that’s what I’m trying to use AI for.

Krisztián Fischer:

So I think there’s a lot of potential in security testing for this. I already see in experimental mode in, in the security tooling we, we’re evaluating Visa at the moment at the company that I work for and they had it in a preview where they do exactly this, like try to propose changes to fix the code based on the CVs and evaluate if it’s actually exploitable and reachable and a lot of very nice features and what the impact would be if it gets breached, like what resources the attackers have access to, like what databases or AWS accounts or S3 buckets or all those. That’s super Interesting.

Alan Richardson:

One of the things we have to remember is that we’re not domain experts in everything. Domain experts will always have the advantage using AI over someone who’s not a domain expert. So it’s still important to use the security scanning tools because they’re created by domain experts. They know exactly what they’re doing. They have access to more technology than just your LLM because they’ll be RAG enabled and they’ll pool in information from different places at different times. So. But those tools may struggle to get into some teams because some teams are prepared in lower risk environments to just use AI themselves. But the benefit is it’s a lot easier for us to train ourselves in those domains and increase our understanding of them.

Alan Richardson:

Which then helps you when you’re using those bigger tools because you understand the output a lot more. You can focus in on what’s important. You can possibly work with a cheaper plan than the big plan. It evolves our role and understanding in ways we, we didn’t necessarily expect we’d have time to do.

Krisztián Fischer:

It looks like everybody needs to be a generalist rather than a specialist. At least the specialization is not to the same depth that it used to be because we replace that with AI. I don’t know the Java library argument list anymore for a long time. I just know there’s a concurrent hashmap I can use. Right. That’s a good enough level. Then what the methods are, I don’t know, I don’t remember anymore. It’s the same but.

Krisztián Fischer:

But at the same time we have to know about more areas. Testing, security, code architecture, all these things. Do you see in the testing world? Well, in general in software engineering we see a tendency of hiring less juniors and more seniors because as the impact of AI or the anticipation of the impact of AI, I should say. Do you see a similar trend in the testing community?

Alan Richardson:

So testing community is, is interesting in that it’s very wide in terms of there’s a lot of people who are professional, specialized testers that can automate things that don’t understand how the technology works. They’re very good at process, they’re very good at working with people, they’re very good at managing things, they’re very good at managing outsourced teams. They’re very good at identifying risk but not necessarily exploiting it. And so they’re very good at working in teams where that builds. The testers are often by nature generalists and put into management positions and able to communicate. Testing has had a tendency to not emphasize the more technical aspects and to focus more on the kind of high level management approaches. So I don’t know whether that’s good long term or not. In the short term, it means that the people who have that kind of skill set should be able to take advantage of AI in the way that I’ve kind of been describing it, to pull the information out of AI, to coordinate it, to manage it.

Alan Richardson:

But at some point you have to be prepared to get your hands dirty with the technicalities and go in depth to review the output from the AI. So it’s more. I think if people are not prepared to evolve and adapt and expand their skill sets, then they’ll get stuck. But it almost doesn’t matter where you start from because you can learn the other skills that you need to take advantage of it. But I do think that concept of generalization is important because I consider myself a generalist. But you can only go. You still need to go in depth at certain points, or you need someone or to be able to do that technical work or an agent that you trust. And at this point in time, I don’t trust the AI, so I would never delegate that extreme knowledge off to the agent.

Alan Richardson:

If it tells me something I want it exploited with an example vulnerability to show me that people take advantage of, then I’ll accept that finding. I may not trust it completely that it’s complete, that it’s the only way of exploiting it. I remember I found a security bug in one system through a bug binding and I found three or four different ways of exploiting it. I only reported one because the company took a long time to respond and then said, oh no, we fixed that in a previous release. And. And it’s like, oh, did you. But you didn’t fix this other way, so give me my bug bounty, please. Because it, it’s.

Alan Richardson:

That’s just that attitude, like, there’s only one way to do something that’s never, never the case.

Krisztián Fischer:

Yeah, I think it’s a good takeaway. Keep learning, folks. It’s not over yet.

Alan Richardson:

Not by a long shot.

Krisztián Fischer:

All right. I think we touched on most of the topics that we wanted to touch. Toby, we haven’t gave you much space. Do you have any?

Toby Sears:

I mean, it’s like when me and Xavi talk about Kubernetes for an hour. It’s fine. You guys could talk about this stuff. Super interesting though. But I do feel like I’m happy that I’m a generalist and that I like learning because otherwise I’d be fucked right now. I mean, like, if you’re stuck in your little bubble. That’s not going to work anymore.

Krisztián Fischer:

Well, I think for to us as seniors, it puts a responsibility on us because how do we teach the next generation? There has to be a next generation, right?

Alan Richardson:

There has to. Then the danger is that there isn’t a next generation. Right. Because companies are hiring seniors, they’re not hiring juniors. When juniors are using AI, they don’t know how to go into depth. So they try to single prompt everything so they get results and go, hey, I’ve made a. The example is make me a version of Tetris. It’s like great.

Alan Richardson:

Now we have like a million versions of Tetris but no new games. So we need to have people continue explore with this and they need to develop technical skills. Otherwise we will just have a world built on AI generated garbage that will fail.

Toby Sears:

It’s going to be super homogenized. Everything’s the same.

Alan Richardson:

Yeah. And designers are going to have a field day because AI generates the same design all the time. So that if you’ve got that human creativity for UI design, you’re going to be in demand.

Toby Sears:

I said it in a previous episode. I worked with a designer recently and they made a really fresh new design and it felt exciting because it was so different to the stuff that you just see everywhere at the moment that it. I do agree. I think they’re going to be super in demand. But my. I have a conspiracy theory is that they’re going to. The AI companies are going to push subsidized code generation for long enough that we don’t train enough juniors. So then we rely on the, the AI code generation instead of juniors and then.

Toby Sears:

Yeah, then where it’s locked in in a way.

Alan Richardson:

So it’s good for us as seniors because at some point they’ll realize, wait a minute, we don’t have enough people that can do this. But it’s bad in general. Yeah. I don’t want to be the last living COBOL expert.

Toby Sears:

Okay. Thank you, Alan for joining us this week. It’s been super interesting. Learned a lot. Please listen to the podcast multiple times because there’s a lot of. There’s a huge information density. So like let it sink in over a couple of times. So if you want to check out Alan’s stuff, go to eviltester.com we really appreciate you coming on the podcast and being our first proper guest.

Alan Richardson:

No problem. Thanks for letting me talk too much. Thank you.

Toby Sears:

Thank you.

Alan Richardson:

Bye Bye.

Toby Sears:

I’ll say Puss. Puss, which is Swedish for this is the way Swedish Kiss. Kiss. It’s Puss. Puss.

Alan Richardson:

You don’t have to do that.

Toby Sears:

Yeah, cool. Thank you, Ale.

Krisztián Fischer:

No problem.

Toby Sears:

Thank you for listening to the Tech League podcast. For more information on us, including links to socials and contact information, please check out our website@techleaguepodcast.com we’ll see you next week.