How I use LLMs as a senior engineer on a complex project. Part 1.

May 7, 2025

The thing about my current project – it’s complex. Really complex. And critical. So critical that when it fails, you might see a blip on a NASDAQ-100. Jokes aside, AI struggles to do any meaningful contribution on its own. Believe me, I’ve tried. So simply asking to “do the task” doesn’t work. I’ve had to come up with some creative ways to get the best out of it. Here are a few.

Use the Best Coding Model

Let’s start with the obvious: the underlying model matters – a lot. To find the best one, check out https://lmarena.ai. Go to Leaderboards –> Language –> Category: Coding. As of May 2025, Gemini 2.5 Pro is leading, but OpenAI o3 works great too.

If your company limits which models you can use (which is often the case due to code ownership or NDA concerns), your options may be fewer—but sometimes that actually makes things simpler.

Make sure the model supports thinking tokens—it significantly improves results. For best outcomes, use an agentic workflow, but this depends on the tool you’re using.

Context is Everything

The model needs to know everything relevant. Think of it like a human engineer lacking key details—they’d ask questions. Models don’t. They just make assumptions. And we all know how great those can be.

In general, context comes from three sources:

The code itself
The task definition (i.e. your prompt)
RAG sources (retrieval-augmented generation—out of scope here)

If the model doesn’t have the information it needs, it simply won’t do a good job.

For backend systems, this might include:

System topology and load patterns – Crucial for reasoning about optimization and data flow
Runtime configuration parameters – Especially if they’re dynamic
Tribal knowledge – What’s worked before, what hasn’t, how the team usually does things
Other systems in the stack – And their constraints

So: provide all the relevant context in the prompt. Spell it out.

Drafting in Parallel (a.k.a. Shadow Coder)

One of my favorite ways to tackle a task is to have an LLM work in parallel with me. I’ll spend 1–5 minutes writing a quick, lightweight prompt (sometimes called “lazy prompting”), toss in some relevant context, and let the model do its thing quietly in the background.

Meanwhile, I focus on another task.

I call this approach Shadow Coder—like a silent partner coding alongside me, often unnoticed until it hands me something useful.

Usually, within 30 minutes to 2 hours, it delivers some code. At that point, one of three things happens:

80% of the time: The solution is wrong, but it includes reusable code snippets or surfaces code pointers I might have overlooked.
15% of the time: It’s a solid first draft—missing some tests, logging, or edge-case handling, but a good foundation to build on.
5% of the time: It’s spot-on. I triple-check it, ask for a strict code review, and ship it.

Either way, it rarely costs me more than 5 minutes to prompt and another 15 to review. That’s a pretty great deal.

Fixing Tech Debt While I’m Too (Lazy) Focused

Sometimes I spot tech debt—like a stale config flag from five years ago—but I’m deep in another task. Normally I’d throw it on the to-do list (and forget it forever).

Now, I just take the exact same note I would’ve written in my task list and paste it into the LLM prompt instead. And surprisingly, that works.

The beauty is that these kinds of refactors often need little more context than the code itself. Success rate for me? Around 50%.

In the Next Part…

Let me know what you’d like me to cover next. Otherwise, I’ll share:

How I use LLMs for code reviews (without nitpicking everything)
How I use them to ramp up quickly on unfamiliar, complex codebases and feel at home