How Google taught AI to doubt itself
Can you stop chatbots from making stuff up using search?
Today let’s talk about an advance in Bard, Google’s answer to ChatGPT, and how it addresses one of the most pressing problems with today’s chatbots: their tendency to make things up.
From the day that the chatbots arrived last year, their makers warned us not to trust them. The text generated by tools like ChatGPT does not draw on a database of established facts. Instead, chatbots are predictive — making probabilistic guesses about which words seem right based on the massive corpus of text that their underlying large language models were trained on.
As a result, chatbots are often “confidently wrong,” to use the industry’s term. And this can fool even highly educated people, as we saw this year with the case of the lawyer who submitted citations generated by ChatGPT — not realizing that every single case had been fabricated out of whole cloth.
This state of affairs explains why I find chatbots mostly useless as research assistants. They’ll tell you anything you want, often within seconds, but in most cases without citing their work. As a result, you wind up spending a lot of time researching their answers to see whether they’re true — often defeating the purpose of using them at all.
When it launched earlier this year, Google’s Bard came with a “Google It” button that submitted your query to the company’s search engine. This made it slightly faster to get a second opinion about the chatbot’s output, but still placed the burden for determining what is true and false squarely on you.
Starting today, though, Bard will do a bit more work on your behalf. After the chatbot answers one of your queries, hitting the Google button will “double check” your response. Here’s how the company explained it in a blog post:
When you click on the “G” icon, Bard will read the response and evaluate whether there is content across the web to substantiate it. When a statement can be evaluated, you can click the highlighted phrases and learn more about supporting or contradicting information found by Search.
Double-checking a query will turn many of the sentences within the response green or brown. Green-highlighted responses are linked to cited web pages; hover over one and Bard will show you the source of the information. Brown-highlighted responses indicate that Bard doesn’t know where the information came from, highlighting a likely mistake.
When I double-checked Bard’s answer to my question about the history of the band Radiohead, for example, it gave me lots of green-highlighted sentences that squared with my own knowledge. But it also turned this sentence brown: “They have won numerous awards, including six Grammy Awards and nine Brit Awards.” Hovering over the words showed that Google’s search had shown contradictory information; indeed, Radiohead has (criminally) never won a single Brit Award, much less nine of them.
“I’m going to tell you about a tragedy that happened in my life,” Jack Krawczyk, a senior director of product at Google, told me in an interview last week.
Krawczyk had cooked swordfish at home, and the resulting smell seemed to permeate the entire house. He used Bard to look up ways to get rid of it, and then double-checked the results to separate fact from fiction. It turns out the cleaning the kitchen thoroughly would not fix the problem, as the chatbot had originally stated. But placing bowls of baking soda around the house might help.
If you’re wondering why Google doesn’t double-check answers like this before showing them to you, so did I. Krawczyk told me that, given the wide variety of ways people use Bard, double-checking is frequently unnecessary. (You wouldn’t typically ask it to double-check a poem you wrote, or an email it drafted, and so on.)
And while double-checking represents a clear step forward, it does still often require you to pull up all those citations and make sure Bard is interpreting those search results correctly. At least when it comes to research, human beings are still holding the AI’s hand as much as it is holding ours.
Still, it’s a welcome development.
“We may have created the first language model that admits it has made a mistake,” Krawczyk told me. And given the stakes as these models improve, ensuring that AI models accurately confess to their mistakes ought to be a high priority for the industry.
Bard got another big update Tuesday: it can now connect to your Gmail, Docs, Drive, and a handful of other Google products, including YouTube and Maps. Extensions, as they’re called, let you search, summarize, and ask questions about documents you have stored in your Google account in real time.
For now, it’s limited to personal accounts, which dramatically limits its utility, at least for me. It is sometimes interesting as an alternative way to browse the web — it did a good job, for example, when I asked it to show me some good videos about getting started in interior design. (The fact that you can play those videos inline in the Bard answer window is a nice touch.)
But extensions get a lot of stuff wrong, too, and there’s no button to press here to improve the results. When I asked Bard to find my oldest email with a friend who I’ve been exchanging messages with in Gmail for 20 years now, Bard showed me a message from 2021. When I asked it which messages in my inbox might need a prompt response, Bard suggested a piece of spam with the subject line “Hassle-free printing is possible with HP Instant Ink.”
It does better in scenarios where Google can make money. Ask it to plan an itinerary for a trip to Japan including flight and hotel information, and it will pull up a good selection of choices from which Google can take a cut of the purchase.
Eventually, I imagine that extensions will come to Bard, just as they previously have to ChatGPT. (They’re called plug-ins over there.) The promise of being able to get things done on the web through a conversational interface is huge, even if the experience today is only so-so.
The question over the long term is how well AI will ultimately be able to check its own work. Today, the task of steering chatbots toward the right answer still weighs heavily on the person typing the prompt. In this moment, tools that push AIs to cite their work are greatly needed. Eventually, though, here’s hoping that more of that work falls on the tools themselves — and without us always having to ask for it.
Governing
- Microsoft’s plans for Xbox leaked onto a district court website in relation to the FTC’s case against the company. The FTC blamed Microsoft for the mistake.(Kevin Collier / CNBC)
- Possible acquisition: Microsoft Gaming CEO Phil Spencer sent an email in 2020 that showed that the company was thinking about also buying Nintendo — which would’ve been a “career moment”, he said. (Sammy Barker / Push Square)
- Revenue: Xbox Game Pass subscriptions potentially generated about $231 million in revenue just in April last year, with most users paying full price. (Derek Strickland / Tweaktown)
- Plans for 2024: New designs for the Xbox Series X console, including a more cylindrical shape, and a new Xbox controller with cloud support are on the horizon, according to the leaked documents. (Tom Warren / The Verge)
- Plans for 2028: Microsoft’s vision is to deliver “cloud hybrid games” that’ll combine hardware with the cloud. (Sean Hollister / The Verge)
- A federal judge blocked California’s Age-Appropriate Design Code Act, which outlined data protections for minors, saying the law likely violated the First Amendment. Like I said earlier this month, the First Amendment is making a comeback in the courts.(Adi Robertson / The Verge)
- 18 state attorneys general, led by Virginia, say they back Montana’s efforts to ban TikTok and asked a US judge to reject lawsuits from the company and users. (David Shepardson / Reuters)
- The UK’s Competition and Markets Authority is revisiting the Microsoft-Activision deal it once blocked, highlighting the unpredictability of a regulator who has become more prominent since Brexit. (Kate Beioley and Javier Espinoza / The Financial Times)
Industry
- TikTok is launching a new tool for creators to label their AI-generated videos, citing concerns that this content could otherwise mislead viewers. (Sarah Perez / TechCrunch)
- Elon Musk says X is “moving to having a small monthly payment for use of the X system,” purportedly to combat bots. (Lora Kolodny / CNBC)
- Amazon is working to extend its Just Walk Out technology to clothing retailers, using radio frequency identification tags to enable cashierless checkouts. (Matt Day / Bloomberg)
- Morale is low at Lab126, the division of Amazon responsible for the Kindle and Echo devices, due to mass layoffs and a reportedly weak product pipeline. (Greg Bensinger / Reuters)
- The Chan Zuckerberg Initiative is funding an enormous computing system made up of more than 1,000 H100 GPUs in the hopes of using generative AI to advance medical research. (Justine Calma / The Verge)
- Some Instagram users who pay for Meta Verified are not happy with their monthly $12 investment, saying they failed to obtain the human customer support that they paid for. (Kaya Yurieff / The Information)
Those good posts
For more good posts every day, follow Casey’s Instagram stories.
(Link)
(Link)
(Link)
Talk to us
Send us tips, comments, questions, and AI extensions: casey@platformer.news and zoe@platformer.news.