NashTech Blog

Azure OpenAI Service Models Part 2 – Reasoning & Problem-Solving Models

Table of Contents

I have introduced Azure OpenAI models and gone into detail about Language Models in a previous Blog. You can find it here:

Now, let’s look at another type of model: Reasoning & Pronlem-Solving Models!

Focused on tasks like complex logic, math, code reasoning, and structured decision-making. Some example models:

  • o4-mini (latest at present, part of the GPT-4o family)
  • o3, o3-mini
  • o1, o1-mini, o1-preview
  • Trained for structured logic, problem-solving, and math-heavy tasks
  • Ideal for use in apps where speed and reasoning matter more than long, creative outputs
  • Excellent for tools like retrieval-augmented generation (RAG) and code analysis

Now, let’s go through some examples to see more clearly the strengths of these models as well as the differences between them!

If a train leaves at 8:00 AM and travels at 60 km/h, and another leaves the same station at 9:00 AM at 90 km/h, when will the second train catch up?

o4-mini

At 9:00 AM, the first train has a 60 km head start.
Relative speed = 90 – 60 = 30 km/h.
Time to catch up = 60 / 30 = 2 hours.
Answer: 11:00 AM

o3-mini

60 km head start. Catch-up rate = 30 km/h → Time = 2 hrs → 11:00 AM

o1

2 hours to catch up. So, at 11.

o1-mini

Maybe 10 or 11? (Less reliable)

o4-minio3-minio1o1-mini
EfficiencyPerfect reasoning, clean stepsCorrect and well-structuredCorrect, but terseOften less confident/clear

Write a Python function that takes a list of integers and returns a new list containing only the prime numbers, in ascending order.

o4-mini (GPT-4o)

def is_prime(n):
    if n < 2:
        return False
    for i in range(2, int(n**0.5)+1):
        if n % i == 0:
            return False
    return True

def filter_primes(numbers):
    return sorted([n for n in numbers if is_prime(n)])

03-mini

def is_prime(n):
    if n <= 1:
        return False
    for i in range(2, int(n**0.5)+1):
        if n % i == 0:
            return False
    return True

def get_primes(lst):
    primes = []
    for num in lst:
        if is_prime(num):
            primes.append(num)
    return sorted(primes)

o1

def prime_filter(nums):
    def is_prime(n):
        if n < 2:
            return False
        for i in range(2, n):
            if n % i == 0:
                return False
        return True
    return sorted([n for n in nums if is_prime(n)])

o1-mini

def get_primes(lst):
    def prime(n):
        if n < 2:
            return False
        for i in range(2, n):
            if n % i == 0:
                return False
        return True
    primes = []
    for x in lst:
        if prime(x):
            primes.append(x)
    return sorted(primes)
o4-minio3-minio1o1-mini
EfficiencyCorrect, optimal, readable, clean.Correct, slightly more verbose.Correct, but uses less efficient range(2, n).Works, but less optimal and lacks explanation.

The following function is supposed to return the longest word in a sentence. Fix the bug:

def longest_word(sentence):
    words = sentence.split()
    longest = ""
    for word in words:
        if len(word) < len(longest):
            longest = word
    return longest

Problem

The bug is in the comparison: it uses < instead of >.

Test Case

print(longest_word("The elephant is big"))  

Expected Output

Output: “elephant” (when fixed)

o4-mini

The logic is reversed. It should update longest when a word is longer, not shorter.

✅ Fixed Code

def longest_word(sentence):
    words = sentence.split()
    longest = ""
    for word in words:
        if len(word) > len(longest):
            longest = word
    return longest

=> Now it correctly finds the longest word in the sentence

o3-mini

The comparison is flipped. Use > instead of <.

✅ Fixes and explains

def longest_word(sentence):
    words = sentence.split()
    longest = ""
    for word in words:
        if len(word) > len(longest):
            longest = word
    return longest

=> Reasoning is solid and efficient

o1

Use > not <. Fixed:

def longest_word(sentence):
    words = sentence.split()
    longest = ""
    for word in words:
        if len(word) > len(longest):
            longest = word
    return longest

=> Correct, but minimal explanation.

o1-mini

Maybe use >?
(No explanation, often uncertain)

def longest_word(sentence):
    words = sentence.split()
    longest = ""
    for word in words:
        if len(word) > len(longest):
            longest = word
    return longest

=> Works, but doesn’t justify change or confirm confidence

ModelBug DetectionExplanationCode Fix QualityConfidence
o4-mini✅ Accurate✅ Clear✅ Clean + readable⭐⭐⭐⭐⭐
o3-mini✅ Accurate✅ Brief✅ Effective⭐⭐⭐⭐
o1✅ Accurate⚠️ Minimal✅ Works⭐⭐⭐
o1-mini⚠️ Tentative❌ Poor✅ Lucky fix⭐⭐
  • o4-mini is much better at mixed tasks (e.g., logic + code + explanations), and performs close to GPT-4, but is not as cheap as o1/o3.
  • o3-mini is the most reasoning-optimized of the o-Series and does well on math puzzles, logic chains, and agent planning.
  • o1/o1-mini are ideal for low-cost or lightweight applications, but they’re not as precise for complex logic.
o4-minio3-minio1o1-mini
Best ForGeneral LLM + Reasoning + Vision/AudioPure logic/math/RAG/code agentsLightweight reasoning agentsBudget logic with fast response
Reasoning Depth⭐⭐⭐⭐⭐
Very strong (GPT-4-level)
⭐⭐⭐⭐
Strongest of o-Series
⭐⭐⭐
Moderate
⭐⭐
Basic
Speed⭐⭐⭐⭐
Fast (on par with GPT-3.5)
⭐⭐⭐⭐⭐
Very fast
⭐⭐⭐⭐⭐
Very fast
⭐⭐⭐⭐⭐⭐
Extremely fast
Cost (in tokens)💲💲💲
Higher than o-Series
💲💲
Low
💲
Very low
💲
Extremely low
Model TypeMultimodal / General Purpose LLMSpecialized Reasoning ModelLightweight Reasoning ModelMini Reasoning Model
Modality SupportText, Vision, (Audio soon)Text onlyText onlyText only

Picture of mydinhletra1

mydinhletra1

Leave a Comment

Your email address will not be published. Required fields are marked *

Suggested Article

Scroll to Top