Jeff Gabriel

AI-Driven Software 9 Months In

published 2025-12-30 14:03:00 -0500

Beyond the Prompt

When Cursor reported to me that I have used over 1 billion tokens this year, I figured it’s time for an update on my adventures in AI-driven software development over the last 9 months. I wrote last time that while powerful, there are limits to what the tools can do, some of which must be overcome by changes to the Engineer’s approach. This basic theme has continued - even as the tools have improved a lot. The models have improved a little - I don’t think major advances came this year from models though, more so everything around them. My hobby codebase is now relatively complex, with hundreds of source files containing over 50k lines of code committed in over 100 PRs, representing over 50 endpoints, and enough complexity for an online multiplayer game. The game is relatively simple as games go, but it’s involved enough that neither a human nor an AI can reason about it all at once. I have also advanced the game enough that difficult engineering trade-offs are mixed with prioritization of end-user UX concerns.

The main point for me in stressing the level of complexity is that I’ve had to continue to evolve how I work. I think I’m much closer now to a real production codebase which requires me to try new things in order to get good results. I consider it to be the holy grail of AI-driven development to enable engineers to make massive strides in productivity on large, complex codebases and not just greenfield work. Vibe code a prototype? Absolutely - you should, and so should anyone giving you product requirements. Use AI to code? Absolutely, but complex, brownfield codebases will continue to present challenges. Even the methods I talked about last time have had to grow - and my role in managing the AI has become more clear even as I report on a few areas where there is much more to learn.

The New Programmer

I am one engineer working on a team of one. Much of what follows is about the individual programming task, and I have validated these observations with full-time programmers who are also exploring this space. These are recurrent observations for me:

Your job is now primarily made up of constructing good instructions and reviewing plans and code. These have always been skills of the programmer - but now they are the main job. For me at least, this was hard to get used to. I like writing code. There has always been a sense of artistry to it for me, and more practically it’s how I know exactly what’s going on. Of course in a few months time, I have little idea what’s going on with code I wrote by hand, so knowing all the details isn’t really how I interact with ‘legacy’ code. Ultimately the code is the truth, and the model can reason about it with you. A couple of additional points on these jobs now being your primary tasks. First, they must be taken very seriously; take the time to do them thoroughly. One of the most important lessons for me in these past several months is that I wasn’t reviewing what was written nearly well enough. Second, these are primarily the tasks of the Senior Engineer in the ‘old’ world. I am certain for this reason that Senior Engineers will be the most productive with AI coding, and that I am very uncertain how newer entrants to programming are going to get their skills up. The new programmer who is also new to programming is an interesting question we’ll have to revisit in a different post. What I can suggest for now is that if you don’t have someone senior reviewing at least occasionally - the code will end up a mess because the AI just doesn’t care about some things even when you have good rules in place. Tools do help here. I have found that asking one agent to review the work of another is a good strategy. Cursor has this built in with the ‘Find Issues’ routine. I have also been relying heavily on automated PR reviews. I’ve settled on bito.ai. I tried Codex, and it was OK but not as thorough.
A lot of productivity comes from keeping the agents busy. I have found a comfortable level at about 3 agents going at once. I have friends telling me they’re using 5 or 6 at a time. I start to mix up the context myself when I get too many going - the branches and PRs also overlap too much. Maybe a symptom of my single-person team and relatively small codebase. In any case, I am actively coding with one, working on the details of a plan with another in ‘plan’ mode, and switching over to do PR reviews or small updates with a 3rd. In this way I minimize my wait time on agents and increase my overall throughput. I sometimes have a 4th going in ‘ask’ mode where I am interrogating the codebase in order to add details to a requirements document/prompt. The ask and plan modes are excellent tools to use often. You should only let the agent run away with code changes once you’re convinced the requirements are clearly presented in a solid plan representing an approach you would take yourself.
You will have to get into the code and do some things yourself. This year has seen a lot of improvement in the quality of the code produced, but certain problems continue to require my direct intervention. Complex state management across multiple components was one I can recall. A UI refactor to change my UI framework was another. Could I have gotten to just the right requirements and context for a prompt? Maybe. I treat this as a laboratory and I try hard to get the AI to do all the work first. In these cases it just couldn’t get there - and it was finally better for me to do it. In a production code/day job situation I think I would do this even more often.
A relatively new practice now that the models/agents are outputting their chain-of-thought reasoning while working out their approach to a new prompt, I actively watch where they’re headed and stop and redirect if needed before the changes or start. Claude’s interface is easier to work with naturally in this way, but just stopping Cursor and telling it why you stopped it and how to change works. This has been very helpful and is somewhat more like pairing.
AI makes non-production code extraordinarily cheap, and this has meant letting the AI create tools to help me where I never would have invested my own time. I have had it build codebase analyzers, visual query builders, admin interfaces, memory and UI debuggers, database analyzers, and visual editors. I don’t spend a lot of time trying to get it right, but if I think something might help me, I ask an agent for it and see what it comes up with. I then throw it away (or orphan it on a branch I don’t merge).

New Process

As I said above, I have the growing conviction that the full Engineering process must be revisited. What I mean is that for the last 25ish years we have approached the software development process as an evolution of ideas from Beck, Cockburn and friends in their attempts to improve communication, tighten feedback loops, and ultimately put a framework around all things anti-waterfall. Many of the core concepts will hold across time, but what is the best method for requirements elaboration when you can sit and build a prototype as you work through them? What is the place of pair coding when I can pair with the machine? How should stories be written when the primary consumer is an AI agent? What’s the right way to break down the units of work when a single engineer can produce 3-5x the finished work of the fastest engineers you’ve worked with over the last 25 years?

To be honest, while I have some ideas on what is necessary, I don’t yet have confirmed detailed guidance to offer. I am sure lots of consultants are very busy figuring out how to create a new business (I know a few) and all of our practices will tighten up as we go along - it is still very early. There are a ton of good discussions around this in the many other places, so I want to highlight just a few things I have leaned on most heavily.

Working together with an agent (I like to go with ChatGPT instead of in the dev env) on a detailed requirements document. Especially as you build up a library of documents to feed into a GPT project, you can get a pretty good early requirements sessions going. I use persona documents, and have trained a GPT with value proposition design information as well. It’s very easy to interrogate for details you’ve left out, or simply bullet list desired end goals and work outward from the way it fills in the gaps.
Taking the requirements into the development plan has become easier as the tools have developed as they now primarily work off of todo lists, breaking down the work before moving forward. I have found additional success in the outcomes by keeping the plan short - and not letting the current coding agent session go too far without a review.
Given that I want a record of different plans in the work breakdown, I like to feed the different sub-sections of the implementation plan into tickets filled with the plan information - including the specific areas of the code to impact. This allows me to pull the plan parts back later, and have that additional data in context and ask for a review of the plan based on how the code base has changed while building the other requirements.

Measuring Success

How do you know your team level investment in AI-driven dev is working out? For me, I would like to at least see productivity increases. There is so much software to write, and if we could get even 20% faster it would be huge. Given that the economics of using LLMs aren’t realistic and may explode once the models are no longer underwritten as investments - that percentage may need to be much higher. Time and the growth of the technology will tell. Whatever measure you trust for the slippery notion of productivity; get a baseline and see if you can push that up. I personally like the level of the story or epic - that thing which tells you you’ve delivered value to customers. I don’t have a measure for myself on this hobby project. I can only say that there is no way I would have pushed 100 PRs this year on minimal time expenditure without the AI doing most of the planning and coding. I chose the particular project I did because I had written it before by hand, and I didn’t make it nearly as far. This was a greenfield project, making the start much faster than later additions; but knocking out features and bugs every time I sit down certainly feels like much greater productivity.

Future gains should go beyond productivity given the broad capabilities of the tools built. I look forward to increased tooling around CVE remediation, real time incident resolution, legacy codebase refactoring, security reviews and remediation, and as I said above, requirements elaboration with working prototypes. These are just a few of the ways in which the function of our Engineering teams should be making strides alongside AI.