How thinking models work

I took a short "Reasoning with O1" course on deeplearning.ai. Here's what I learned: Thinking models like humans pause and consider different approaches.

What Makes Thinking Models Different?

Here's the thing about traditional language models like GPT-4 or Claude: they're guessing the most probable output. When talking to these models, it's like having a conversation with someone who never pauses to collect their thoughts - they just start talking and figure it out as they go.

Thinking models, like OpenAI's o1, work differently. They spend time reasoning internally before they show you anything. While you're waiting for a response, the model is exploring different solution paths, checking its own work, and sometimes even backtracking when it realizes an approach won't work.

Think of it this way: traditional models are like that brilliant colleague who answers immediately in meetings. Thinking models are like the equally brilliant colleague who says "let me think about that for a moment" and then gives you a more thoughtful response.

How Do They Actually Work?

Under the hood, thinking models use something called chain-of-thought reasoning, but they do it invisibly. Here's what's happening:

Extended reasoning: The model has learned through reinforcement learning that spending more time on complex problems leads to better outcomes. It's not just generating text - it's genuinely working through the problem.

Hidden thought process: Unlike when you prompt a regular model to "think step by step" thinking models do this automatically. You don't see the thinking tokens being generated, just the final result.

Self-correction: The model can catch its own mistakes mid-thought and try different approaches. If it realizes it's heading down the wrong path, it can backtrack.

The trade-off: You're exchanging speed for quality. A simple question might take a few extra seconds. A complex problem might take 30 seconds or more. But for hard problems, that investment pays off dramatically.

Prompting: Keep It Simple

One of the best parts about thinking models is that you can drop a lot of the prompt engineering tricks you've learned. You can and actually you should drop the step-by-step walkthrough guidance for the sake of simpler prompts and let the model figure it out.

The key: Be comprehensive in your initial prompt and include all the requirements, constraints, edge cases that you're worried about. The model will use that information during its reasoning process.

Real Benefits for Your Code

Better first drafts: Thinking models are much more likely to produce code that actually works on the first try. They've already reasoned through the edge cases and potential issues.

Handling complexity naturally: They can handle tasks that need to be broken down to multiple steps better

Genuine debugging: When you're stuck on a bug, a thinking model doesn't just throw solutions at the wall. It thinks through your code.

When NOT to Use Thinking Models

With simple tasks, speed-critical situations, and when you need to save cost.

Conclusion

With thinking models you pay extra for the thinking token, but it's worth it because it can improve the quality of the output tokens drastically.