Assisting development with AI tools can be quite a decisive topic. Some people feel they’re going to replace developers entirely, some feel they can’t produce good enough code to be useful at all, and a lot of people fall somewhere in the middle. Given the interest in these types of tools over the last few years, we spoke with Phillip Carter, principal product manager at Honeycomb, in the latest episode of our podcast, about his thoughts on them.
He believes that overall these tools can be beneficial, but only if you can narrow down your use case, have the right level of expertise to verify the output, and set realistic expectations for what they can do for you.
The following is an abridged version of the conversation.
SD Times: Do you believe that these AI tools are good or bad for development teams?
Phillip Carter: I would say I lean towards good and trending better over time. It depends on a couple of different factors. I think the first factor is seniority. The tools that we have today are sort of like the worst versions of these tools that we’re going to be using in the next decade or so. It’s kind of like how when cloud services came out in like 2010, 2011, and there were clear advantages to using them. But for a lot of use cases, these services were just not actually solving a lot of problems that people had. And so over a number of years, there was a lot of “hey, this might be really helpful” and they eventually sort of lived up to those to those aspirations. But it wasn’t there at that point in time.
I think for aiding developers, these AI models are kind of at that point right now, where there’s some more targeted use cases where they do quite well, and then many other use cases where they don’t do very well at all, and they can be actively misleading. And so what you do about that depends very heavily on what kind of developer you are, right? If you’re fresh out of college, or you’re still learning how to program and you’re not really an expert in software development, the misleading nature of these tools can be quite harmful, because you don’t really have a whole lot of experience and sort of like a gut feel for what’s right or wrong to compare that against. Whereas if you are a more senior engineer, you can say, okay, well, I’ve kind of seen the shape of problem before. And this code that it spat out is looks like it’s mostly right.
And there’s all sorts of use it to it, such as creating a few tests and making sure those tests are good, and it is a time saver in that regard. But if you don’t have that sense of okay, well, this is how I’m going to verify that it’s actually correct, this is how I’m going to compare what I see with what I have seen in the past, then that can be really difficult. And we have seen cases where some junior engineers in particular have struggled with actually solving problems, because they sort of try it and it doesn’t quite do it, they try it again, it doesn’t quite do it. And they spend more time doing that than just sort of sitting through and thinking through the problem.
One of the more junior engineers at our company, they leaned on these tools at first and realized that they were misleading a little bit and they stepped away to build up some of their own expertise. And then they actually came back to using some of those tools, because they found that they still were useful, and now that they had more of an instinct for what was good and bad, they could actually use a little bit more.
It’s great for when you know how to use it, and you know how to compare it against things that that you know are good or bad. But if you don’t, then you’ve basically added more chaos into the system than there should have been.
SDT: At what point in their career would a developer be at the point where they should feel they’re experienced enough to use these tools effectively?
PC: The most obvious example that comes to mind for me is writing test cases. There this understanding that that’s a domain that you can apply this to even when you’re a little bit more junior in your career. Stuff is going to either pass or fail, and you can take a look at that and be like, should this have passed? Or should this have failed? It’s a very clear signal.
Whereas if you’re using it to edit more sophisticated code inside of your code base, it’s like, well, I’m not really sure if this is doing the right thing, especially if I don’t have a good test harness that validates that it should be doing the right thing. And that that’s where that seniority and just more life experience building software really comes into play, because you can sort of have that sense as you’re building it, and you don’t need to sort of fall back on having a robust test suite that really sort of checks if you’re doing the right thing.
The other thing that I’ll say is that I have observed several junior engineers thrive with these tools quite a bit. Because it’s not really about being junior, it’s just that some engineers are better at reading and understanding code than they are at writing it. Or maybe they’re good at both, but their superpower is looking at code and analyzing it, and seeing if it’s going to do the job that it should do. And this really pushes the bottleneck in that direction. Because if you imagine for a moment, let’s say they were perfect at generating code. Well, now the bottleneck is entirely on understanding that code, it really has nothing to do with writing the code itself. And a lot of more junior people in their career can thrive in that environment, if the writing of the code is more of a bottleneck for them. But if they’re really good at understanding stuff and reading it, then they can say, this thing actually does do things faster. And they can almost use it to sort of like generate different variations of things and read with the output and see if it actually does what it should be doing.
And so I don’t know if this is necessarily like something that is universal across all engineers and junior engineers but like if you have that mindset where you’re really good at reading and understanding code, you can actually use these tools to a significant advantage today and I suspect that will get better over time.
SDT: So even for more senior developers (or junior devs that have a special skill at reading and understanding code), are there ways in which these tools could be overused in a negative way? What best practices should teams put in place to make sure they’re not like relying too heavily on these AI tools?
PC: So there’s a couple of things that can happen. I’ve done this before, I’ve had other people on the team do this as well, where they’ve used it and they sort of cycled through the suggestions and so on, and then they’ve sort of been like, wait a minute, this would have been faster if I just wrote this myself. That does happen from time to time, it actually doesn’t happen that often, but it can.
And there are some cases where the code that you need to write is just, for whatever reason, it’s too complicated for the model. It may not necessarily be super conceptually complicated code, it’s just that it might be something that the model right now is just not particularly good at. And so if you recognize that it’s outputting something where you’re scratching your head and going like I don’t really agree with that suggestion, that’s usually a pretty good signal that you should not be relying on this too heavily for at this moment in time.
There’s the ChatGPT model of you say you want something and it outputs like a whole block of code, you copy + paste it or do something. That’s one model. The other model that I think is more effective that people lean on more, and that, frankly, is more helpful is the completions model where you’re, you’re actually writing the code still, but son like a single line by single line basis, it makes a suggestion. Sometimes that suggestion is bonkers, but usually, it’s actually pretty good. And you’re still kind of a little bit more in control and you’re not just blindly copy + pasting large blocks of code without ever reading it.
And so I think in terms of tool selection, the ones that are deeply ingrained in you actually writing the code are going to lead to a lot more actual understanding of what’s going on, when you compare that to the tools that just output whole big blocks of code that you copy + paste and sort of hopes it works. I think organizations should focus on that, rather than the AI coding tools that barely even work. And maybe it’ll get better over time, but that’s definitely not something organizations should really depend on.
There’s another model of working with these tools that’s developing right now, by GitHub as well, that I think could show promise. It’s through their product called GitHub Copilot Workspace. And so basically, you start with like a natural language task and then it produces an interpretation of that task in natural language. And it asks you to sort of validate like, “hey, is this the right interpretation of what I should be doing?” And then you can add more steps and more sub interpretations and edit it. And then it takes the next step, and it generates a specification of work. And then you say, okay, like, do I agree with the specification of work or not? And you can’t really continue unless you either modify it or you say, “yes, this looks good.” And then it says, “Okay, I’ve analyzed your codebase. And these are the files that I want to touch. So like, are these the right places to look? Am I missing something?” At every step of the way, you intervene, and you have this opportunity to like, disagree with it and ask it to generate something new. And eventually it outputs a block of code as a diff. So it’ll say, “hey, like, this is what we think the changes should be.”
What I love about that model, in theory, and I have used it in practice, it works. It really just says, software development is not just about code, but it’s about understanding tasks. It’s about interpreting things. It’s about revising plans. It’s about creating a formal spec of things. Sometimes it’s about understanding where you need to work.
Because if I’m being honest, I don’t think these automated agents are going to go anywhere, anytime soon, because the space that they’re trying to operate in so complicated, and they might have a place for, tiny tasks that people today shunt off to places like Upwork, but for like replacing teams of engineers actually solving real business problems that are complicated and nuanced, I just don’t see it. And so I feel like it’s almost like a distraction to focus on that. And the AI powered stuff can really be helpful, but it has to be centered in keeping your development team engaged the entire time, and letting them use their brains to like really drive this stuff effectively.
SDT: Any final thoughts or takeaways from this episode?
PC: I would say that the tools are not magic, do not believe the hype. The marketing is way overblown for what these things can do. But when you get past all that, and especially if you narrow your tasks to like very concrete, small things, these tools can actually really be wonderful for helping you save time and sometimes even consider approaches to things that you may not have considered in the past. And so focus on that, cut through the hype, just see it as a good tool. And if it’s not a good tool for you discard it, because it’s not going to be helpful. That that’s probably what I would advise anyone in any capacity to, to frame up these things with.