Latest from MIT : Study shows vision-language models can’t handle queries with negation words

Imagine a radiologist examining a chest X-ray from a new patient. She notices the patient has swelling in the tissue but does not have an enlarged heart. Looking to speed up diagnosis, she might use a vision-language machine-learning model to search for reports from similar patients. But if the model mistakenly identifies reports with both…

Latest from MIT : MIT Department of Economics to launch James M. and Cathleen D. Stone Center on Inequality and Shaping the Future of Work

Starting in July, MIT’s Shaping the Future of Work Initiative in the Department of Economics will usher in a significant new era of research, policy, and education of the next generation of scholars, made possible by a gift from the James M. and Cathleen D. Stone Foundation. In recognition of the gift and the expansion…

Latest from MIT Tech Review – Police tech can sidestep facial recognition bans now

Six months ago I attended the largest gathering of chiefs of police in the US to see how they’re using AI. I found some big developments, like officers getting AI to write their police reports. Today, I published a new story that shows just how far AI for police has developed since then.  It’s about…

Latest from MIT Tech Review – How a new type of AI is helping police skirt facial recognition bans

Police and federal agencies have found a controversial new way to skirt the growing patchwork of laws that curb how they use facial recognition: an AI model that can track people using attributes like body size, gender, hair color and style, clothing, and accessories.  The tool, called Track and built by the video analytics company…

Latest from MIT Tech Review – A new AI translation system for headphones clones multiple voices simultaneously

Imagine going for dinner with a group of friends who switch in and out of different languages you don’t speak, but still being able to understand what they’re saying. This scenario is the inspiration for a new AI headphone system that translates the speech of multiple speakers simultaneously, in real time. The system, called Spatial…

Latest from MIT Tech Review – How to build a better AI benchmark

It’s not easy being one of Silicon Valley’s favorite benchmarks.  SWE-Bench (pronounced “swee bench”) launched in November 2024 to evaluate an AI model’s coding skill, using more than 2,000 real-world programming problems pulled from the public GitHub repositories of 12 different Python-based projects.  In the months since then, it’s quickly become one of the most…

O’Reilly Media – Think Different

There’s something that bothers me about the chatter that AI is making “intelligence” ubiquitous. For example, in a recent Bloomberg article, “AI Will Upend a Basic Assumption About How Companies Are Organized,” Azeem Azhar wrote: As intelligence becomes cheaper and faster, the basic assumption underpinning our institutions—that human insight is scarce and expensive—no longer holds….

O’Reilly Media – Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

Let’s be real: Building LLM applications today feels like purgatory. Someone hacks together a quick demo with ChatGPT and LlamaIndex. Leadership gets excited. “We can answer any question about our docs!” But then…reality hits. The system is inconsistent, slow, hallucinating—and that amazing demo starts collecting digital dust. We call this “POC purgatory”—that frustrating limbo where…

O’Reilly Media – AI and the Structure of Scientific Revolutions

Thomas Wolf’s blog post “The Einstein AI Model” is a must-read. He contrasts his thinking about what we need from AI with another must-read, Dario Amodei’s “Machines of Loving Grace.”1 Wolf’s argument is that our most advanced language models aren’t creating anything new; they’re just combining old ideas, old phrases, old words according to probabilistic…