How I Reduced My AI Coding Costs by Changing One Habit

I started noticing something uncomfortable while using Cursor.

I wasn’t just using AI tools, I was burning through tokens way faster than expected. At first, I justified it as part of the productivity boost, but once I looked at my usage stats, it was clear something was off.

I wasn’t being efficient. I was just being careless with how I used the tools.

What was happening

After breaking down my usage, a few patterns stood out:

I defaulted to expensive models (like Sonnet-level reasoning) for almost everything
I asked for full solutions instead of structuring the problem first
I retried prompts multiple times due to unclear context

In practice, I was using a Ferrari to do tasks that didn’t require that level of power.

What actually worked

Instead of reducing usage, I changed how I used AI in my workflow.

1. Separate thinking from execution

This was the biggest improvement.

I split my workflow into two phases:

Planning (strong model):

Break down the problem
Define architecture
Evaluate trade-offs

Execution (lighter model):

Write code
Refactor
Iterate on small changes

Most tasks don’t need deep reasoning, they need consistency and speed.

2. Stop re-explaining context

A big source of token waste was repeating context.

I used to:

Paste large chunks of code
Re-explain the same feature multiple times
Restart prompts from scratch

Now I:

Keep context minimal
Only include what’s relevant
Build incrementally on previous prompts

This alone reduced both token usage and iteration count.

3. Ask for a plan before asking for code

Jumping straight into implementation usually led to more retries.

Instead, I started asking:

“What’s the best way to approach this?”

This improved:

Initial direction
Quality of responses
Number of iterations needed

Spending more tokens upfront on planning actually reduced total usage.

4. Use strong models intentionally

Not every task needs a high-end model.

I now reserve them for:

Architecture decisions
Debugging complex issues
Understanding unfamiliar areas

Everything else goes to a cheaper model.

5. Fewer, better prompts

Poor prompts were causing unnecessary iterations.

Now I focus on:

Clear goal
Explicit constraints
Minimal relevant context

Better prompts reduce retries, which directly reduces token usage.

Where this showed up in practice

This became obvious while working on real production tasks.

Things like refactoring networking layers or setting up CI pipelines don’t require the most powerful models, but deciding how to approach them does.

Separating planning from execution made a noticeable difference in both cost and efficiency.

Takeaway

The main shift wasn’t technical, it was behavioral.

Treating AI as a tool with a cost model changed how I approached problems:

Think before prompting
Use the right model for the task
Avoid unnecessary iterations

Final thoughts

Tools like Cursor are powerful, but without intentional usage, they can become expensive and inefficient.

Optimizing token usage wasn’t just about reducing cost. It made me more deliberate about how I think and build.

If you’re working on similar problems or exploring AI-assisted development, I’d be curious to hear how you’re approaching it.

How I Reduced My AI Coding Costs by Changing One Habit

What was happening

What actually worked

1. Separate thinking from execution

2. Stop re-explaining context

3. Ask for a plan before asking for code

4. Use strong models intentionally

5. Fewer, better prompts

Where this showed up in practice

Takeaway

Final thoughts

Read more

Building a Small Automation with Supabase, Cron, and Edge Functions

The Difference Between Code Complete and Launch Ready

Why I Built My App with Flutter Instead of Swift

Firebase vs Supabase: When I Would Choose Each One