How I Reduced My AI Coding Costs by Changing One Habit
I started noticing something uncomfortable while using Cursor.
I wasn’t just using AI tools, I was burning through tokens way faster than expected. At first, I justified it as part of the productivity boost, but once I looked at my usage stats, it was clear something was off.
I wasn’t being efficient. I was just being careless with how I used the tools.
What was happening
After breaking down my usage, a few patterns stood out:
- I defaulted to expensive models (like Sonnet-level reasoning) for almost everything
- I asked for full solutions instead of structuring the problem first
- I retried prompts multiple times due to unclear context
In practice, I was using a Ferrari to do tasks that didn’t require that level of power.
What actually worked
Instead of reducing usage, I changed how I used AI in my workflow.
1. Separate thinking from execution
This was the biggest improvement.
I split my workflow into two phases:
Planning (strong model):
- Break down the problem
- Define architecture
- Evaluate trade-offs
Execution (lighter model):
- Write code
- Refactor
- Iterate on small changes
Most tasks don’t need deep reasoning, they need consistency and speed.
2. Stop re-explaining context
A big source of token waste was repeating context.
I used to:
- Paste large chunks of code
- Re-explain the same feature multiple times
- Restart prompts from scratch
Now I:
- Keep context minimal
- Only include what’s relevant
- Build incrementally on previous prompts
This alone reduced both token usage and iteration count.
3. Ask for a plan before asking for code
Jumping straight into implementation usually led to more retries.
Instead, I started asking:
“What’s the best way to approach this?”
This improved:
- Initial direction
- Quality of responses
- Number of iterations needed
Spending more tokens upfront on planning actually reduced total usage.
4. Use strong models intentionally
Not every task needs a high-end model.
I now reserve them for:
- Architecture decisions
- Debugging complex issues
- Understanding unfamiliar areas
Everything else goes to a cheaper model.
5. Fewer, better prompts
Poor prompts were causing unnecessary iterations.
Now I focus on:
- Clear goal
- Explicit constraints
- Minimal relevant context
Better prompts reduce retries, which directly reduces token usage.
Where this showed up in practice
This became obvious while working on real production tasks.
Things like refactoring networking layers or setting up CI pipelines don’t require the most powerful models, but deciding how to approach them does.
Separating planning from execution made a noticeable difference in both cost and efficiency.
Takeaway
The main shift wasn’t technical, it was behavioral.
Treating AI as a tool with a cost model changed how I approached problems:
- Think before prompting
- Use the right model for the task
- Avoid unnecessary iterations
Final thoughts
Tools like Cursor are powerful, but without intentional usage, they can become expensive and inefficient.
Optimizing token usage wasn’t just about reducing cost. It made me more deliberate about how I think and build.
If you’re working on similar problems or exploring AI-assisted development, I’d be curious to hear how you’re approaching it.