It’s been well publicized that Google’s Bard made some factual errors when it was demoed, and Google paid for these mistakes with a significant drop in their stock price. What didn’t receive as much news coverage (though in the last few days, it’s been well discussed online) are the many mistakes that Microsoft’s new search engine, Sydney, made. The fact that we know its name is Sydney is one of those mistakes, since it’s never supposed to reveal its name. Sydney-enhanced Bing has threatened and insulted its users, in addition to being just plain wrong (insisting that it was 2022, and insisting that the first Avatar movie hadn’t been released yet). There are excellent summaries of these failures in Ben Thompson’s newsletter Stratechery and Simon Willison’s blog. It might be easy to dismiss these stories as anecdotal at best, fraudulent at worst, but I’ve seen many reports from beta testers who managed to duplicate them.
Of course, Bard and Sydney are beta releases that aren’t open to the wider public yet. So it’s not surprising that things are wrong. That’s what beta tests are for. The important question is where we go from here. What are the next steps?
Large language models like ChatGPT and Google’s LaMDA aren’t designed to give correct results. They’re designed to simulate human language—and they’re incredibly good at that. Because they’re so good at simulating human language, we’re predisposed to find them convincing, particularly if they word the answer so that it sounds authoritative. But does 2+2 really equal 5? Remember that these tools aren’t doing math, they’re just doing statistics on a huge body of text. So if people have written 2+2=5 (and they have in many places, probably never intending that to be taken as correct arithmetic), there’s a non-zero probability that the model will tell you that 2+2=5.
The ability of these models to “make up” stuff is interesting, and as I’ve suggested elsewhere, might give us a glimpse of artificial imagination. (Ben Thompson ends his article by saying that Sydney doesn’t feel like a search engine; it feels like something completely different, something that we might not be ready for—perhaps what David Bowie meant in 1999 when he called the Internet an “alien lifeform”). But if we want a search engine, we will need something that’s better behaved. Again, it’s important to realize that ChatGPT and LaMDA aren’t trained to be correct. You can train models that are optimized to be correct—but that’s a different kind of model. Models like that are being built now; they tend to be smaller and trained on specialized data sets (O’Reilly Media has a search engine that has been trained on the 70,000+ items in our learning platform). And you could integrate those models with GPT-style language models, so that one group of models supplies the facts and the other supplies the language.
That’s the most likely way forward. Given the number of startups that are building specialized fact-based models, it’s inconceivable that Google and Microsoft aren’t doing similar research. If they aren’t, they’ve seriously misunderstood the problem. It’s okay for a search engine to give you irrelevant or incorrect results. We see that with Amazon recommendations all the time, and it’s probably a good thing, at least for our bank accounts. It’s not okay for a search engine to try to convince you that incorrect results are correct, or to abuse you for challenging it. Will it take weeks, months, or years to iron out the problems with Microsoft’s and Google’s beta tests? The answer is: we don’t know. As Simon Willison suggests, the field is moving very fast, and can make surprising leaps forward. But the path ahead isn’t short.