Anyone who crammed for exams in college knows that an impressive ability to regurgitate information is not synonymous with critical thinking.
The large language models (LLMs) first publicly released in 2022 were impressive but limited—like talented students who excel at multiple-choice exams but stumble when asked to defend their logic. Today’s advanced reasoning models are more akin to seasoned graduate students who can navigate ambiguity and backtrack when necessary, carefully working through problems with a methodical approach.
As AI systems that learn by mimicking the mechanisms of the human brain continue to advance, we’re witnessing an evolution in models from rote regurgitation to genuine reasoning. This capability marks a new chapter in the evolution of AI—and what enterprises can gain from it. But in order to tap into this enormous potential, organizations will need to ensure they have the right infrastructure and computational resources to support the advancing technology.
The reasoning revolution
“Reasoning models are qualitatively different than earlier LLMs,” says Prabhat Ram, partner AI/HPC architect at Microsoft, noting that these models can explore different hypotheses, assess if answers are consistently correct, and adjust their approach accordingly. “They essentially create an internal representation of a decision tree based on the training data they’ve been exposed to, and explore which solution might be the best.”
This adaptive approach to problem-solving isn’t without trade-offs. Earlier LLMs delivered outputs in milliseconds based on statistical pattern-matching and probabilistic analysis. This was—and still is—efficient for many applications, but it doesn’t allow the AI sufficient time to thoroughly evaluate multiple solution paths.
In newer models, extended computation time during inference—seconds, minutes, or even longer—allows the AI to employ more sophisticated internal reinforcement learning. This opens the door for multi-step problem-solving and more nuanced decision-making.
To illustrate future use cases for reasoning-capable AI, Ram offers the example of a NASA rover sent to explore the surface of Mars. “Decisions need to be made at every moment around which path to take, what to explore, and there has to be a risk-reward trade-off. The AI has to be able to assess, ‘Am I about to jump off a cliff? Or, if I study this rock and I have a limited amount of time and budget, is this really the one that’s scientifically more worthwhile?’” Making these assessments successfully could result in groundbreaking scientific discoveries at previously unthinkable speed and scale.
Reasoning capabilities are also a milestone in the proliferation of agentic AI systems: autonomous applications that perform tasks on behalf of users, such as scheduling appointments or booking travel itineraries. “Whether you’re asking AI to make a reservation, provide a literature summary, fold a towel, or pick up a piece of rock, it needs to first be able to understand the environment—what we call perception—comprehend the instructions and then move into a planning and decision-making phase,” Ram explains.
Enterprise applications of reasoning-capable AI systems
The enterprise applications for reasoning-capable AI are far-reaching. In health care, reasoning AI systems could analyze patient data, medical literature, and treatment protocols to support diagnostic or treatment decisions. In scientific research, reasoning models could formulate hypotheses, design experimental protocols, and interpret complex results—potentially accelerating discoveries across fields from materials science to pharmaceuticals. In financial analysis, reasoning AI could help evaluate investment opportunities or market expansion strategies, as well as develop risk profiles or economic forecasts.
Armed with these insights, their own experience, and emotional intelligence, human doctors, researchers, and financial analysts could make more informed decisions, faster. But before setting these systems loose in the wild, safeguards and governance frameworks will need to be ironclad, particularly in high-stakes contexts like health care or autonomous vehicles.
“For a self-driving car, there are real-time decisions that need to be made vis-a-vis whether it turns the steering wheel to the left or the right, whether it hits the gas pedal or the brake—you absolutely do not want to hit a pedestrian or get into an accident,” says Ram. “Being able to reason through situations and make an ‘optimal’ decision is something that reasoning models will have to do going forward.”
The infrastructure underpinning AI reasoning
To operate optimally, reasoning models require significantly more computational resources for inference. This creates distinct scaling challenges. Specifically, because the inference durations of reasoning models can vary widely—from just a few seconds to many minutes—load balancing across these diverse tasks can be challenging.
Overcoming these hurdles requires tight collaboration between infrastructure providers and hardware manufacturers, says Ram, speaking of Microsoft’s collaboration with NVIDIA, which brings its accelerated computing platform to Microsoft products, including Azure AI.
“When we think about Azure, and when we think about deploying systems for AI training and inference, we really have to think about the entire system as a whole,” Ram explains. “What are you going to do differently in the data center? What are you going to do about multiple data centers? How are you going to connect them?” These considerations extend into reliability challenges at all scales: from memory errors at the silicon level, to transmission errors within and across servers, thermal anomalies, and even data center-level issues like power fluctuations—all of which require sophisticated monitoring and rapid response systems.
By creating a holistic system architecture designed to handle fluctuating AI demands, Microsoft and NVIDIA’s collaboration allows companies to harness the power of reasoning models without needing to manage the underlying complexity. In addition to performance benefits, these types of collaborations allow companies to keep pace with a tech landscape evolving at breakneck speed. “Velocity is a unique challenge in this space,” says Ram. “Every three months, there is a new foundation model. The hardware is also evolving very fast—in the last four years, we’ve deployed each generation of NVIDIA GPUs and now NVIDIA GB200NVL72. Leading the field really does require a very close collaboration between Microsoft and NVIDIA to share roadmaps, timelines, and designs on the hardware engineering side, qualifications and validation suites, issues that arise in production, and so on.”
Advancements in AI infrastructure designed specifically for reasoning and agentic models are critical for bringing reasoning-capable AI to a broader range of organizations. Without robust, accessible infrastructure, the benefits of reasoning models will remain relegated to companies with massive computing resources.
Looking ahead, the evolution of reasoning-capable AI systems and the infrastructure that supports them promises even greater gains. For Ram, the frontier extends beyond enterprise applications to scientific discovery and breakthroughs that propel humanity forward: “The day when these agentic systems can power scientific research and propose new hypotheses that can lead to a Nobel Prize, I think that’s the day when we can say that this evolution is complete.”
To learn more, please read Microsoft and NVIDIA accelerate AI development and performance, watch the NVIDIA GTC AI Conference sessions on demand, and explore the topic areas of Azure AI solutions and Azure AI infrastructure.
This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.
This content was researched, designed, and written entirely by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.