Latest from MIT Tech Review – A new AI translation system for headphones clones multiple voices simultaneously

Imagine going for dinner with a group of friends who switch in and out of different languages you don’t speak, but still being able to understand what they’re saying. This scenario is the inspiration for a new AI headphone system that translates the speech of multiple speakers simultaneously, in real time. The system, called Spatial…

Latest from MIT Tech Review – How to build a better AI benchmark

It’s not easy being one of Silicon Valley’s favorite benchmarks.  SWE-Bench (pronounced “swee bench”) launched in November 2024 to evaluate an AI model’s coding skill, using more than 2,000 real-world programming problems pulled from the public GitHub repositories of 12 different Python-based projects.  In the months since then, it’s quickly become one of the most…

O’Reilly Media – Think Different

There’s something that bothers me about the chatter that AI is making “intelligence” ubiquitous. For example, in a recent Bloomberg article, “AI Will Upend a Basic Assumption About How Companies Are Organized,” Azeem Azhar wrote: As intelligence becomes cheaper and faster, the basic assumption underpinning our institutions—that human insight is scarce and expensive—no longer holds….

O’Reilly Media – Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

Let’s be real: Building LLM applications today feels like purgatory. Someone hacks together a quick demo with ChatGPT and LlamaIndex. Leadership gets excited. “We can answer any question about our docs!” But then…reality hits. The system is inconsistent, slow, hallucinating—and that amazing demo starts collecting digital dust. We call this “POC purgatory”—that frustrating limbo where…

O’Reilly Media – AI and the Structure of Scientific Revolutions

Thomas Wolf’s blog post “The Einstein AI Model” is a must-read. He contrasts his thinking about what we need from AI with another must-read, Dario Amodei’s “Machines of Loving Grace.”1 Wolf’s argument is that our most advanced language models aren’t creating anything new; they’re just combining old ideas, old phrases, old words according to probabilistic…

O’Reilly Media – A Field Guide to Rapidly Improving AI Products

Most AI teams focus on the wrong things. Here’s a common scene from my consulting work: AI TEAMHere’s our agent architecture—we’ve got RAG here, a router there, and we’re using this new framework for… ME[Holding up my hand to pause the enthusiastic tech lead]Can you show me how you’re measuring if any of this actually…