ChatGPT's deep research might be the first good agent
OpenAI's new research tool still makes mistakes — but in its speed and average quality of analysis, it represents a remarkable step forward
This is a column about AI. My boyfriend works at Anthropic. See my full ethics disclosure here.
It feels odd to be writing a newsletter about new technology on a day like this. Elon Musk and his team of henchmen are working to dismantle the federal government with stunning speed. As many observers have noted, Musk’s initiative closely resembles his ruinous takeover of Twitter. And he’s now having his team remove X posts that name the people in charge of this new effort, a move that in his capacity as a federal government employee may actually violate the First Amendment. Former White House lawyers are calling the whole thing “wrong and illegal.” Just describing what is happening in plain language can make you wonder if you sound like a crank. But you are not a crank. The United States’ system of checks and balances — and the rule of law itself — are under direct assault; the Republicans who control Congress appear content to let it happen; and the risks that we fall into authoritarian collapse look as grave as they ever have.
But I’m going to assume that you knew all that already, and are perhaps looking today for a little distraction. In which case: let’s talk about OpenAI’s deep research.
I.
Last month, Benjamin Breen, a history professor at the University of California at Santa Cruz wrote a blog post about the use of artificial intelligence in his chosen field. Breen describes himself as “very interested in the use of AI for experiential learning,” but has been dismayed with the rise of AI inside colleges so far.
“Ask anyone you know in education: ChatGPT has been a disaster when it comes to facilitating student cheating and — perhaps even more troubling — contributing to a general malaise among undergraduates,” he writes. “It’s not just that students are submitting entirely AI-written assignments. They are also (I suspect) relying on AI-generated answers far more comprehensively, not just in their homework but in their daily lives. This has a kind of flattening effect.”
At the same time, Breen has been exploring the use of AI in historical research: transcribing a block of text written in 16th-century Italian cursive handwriting, for example, and asking it to provide the surrounding historical context. Or analyzing a page from a manuscript of medical recipes written in Mexico in the 1770s and offer a historical analysis.
In perhaps his most impressive example of the state of the art, Breen supplied two large language models with quotes from two historical figures who are main characters in a book he is writing. Breen tested both OpenAI’s o1 model, which is meant to excel at reasoning; and Anthropic’s more general-purpose Sonnet 3.5. Breen is surprised to find that o1 came up with, in effect, eight different ideas for historical arguments he could pursue at book length.
Speaking of one such idea, Breen writes:
The above excerpt is… good. I almost want to say depressingly good, because at first glance, it’s fairly close to the level of analysis that I’m currently at with my book project, and it was generated in exactly 5 seconds.
II.
I thought of Breen today while using deep research, the latest offering from OpenAI. Announced on Sunday, deep research is effectively the second AI agent that the company has introduced in the past two weeks, after its more general-purpose Operator. But while Operator struggled to complete even basic tasks, in my early tests deep research appears to be impressively competent — and underscores how even the tools we have today will accelerate research, analysis, argument, and other forms of knowledge work.
To start, deep research is available only to subscribers of ChatGPT’s $200-a-month Pro tier. (Users are limited to 100 deep research queries a month, reflecting the high cost of the computation involved; for now, it’s accessible only on the web.) To use it, you type out your query as usual in the ChatGPT chat box and then click the “deep research” button.
ChatGPT then analyzes your query and asks you follow-up questions. When I asked for a report on a current subject of interest — how publishers can benefit from the Fediverse — the bot asked me four clarifying questions, such as whether I was looking from the perspective of a legacy publisher or a digital-only outlet, and how technical it should get in its analysis of the tradeoffs between using two different federated protocols. I answered those questions, and deep research got to work.
Like DeepSeek, OpenAI’s deep research exposes some of its chain of thought as it answers your query. This let me see some of the websites that the agent was visiting, what conclusions it was drawing from them, and how it was beginning to organize its reasoning. Five minutes later, my 4,955-word report was available. (Read the whole thing here.) It outlined how the Fediverse can help people find new news sources; offered real-world examples of how sites like The Verge and 404 Media are leaning in to federation; explored different monetization strategies and described the trade-offs involved with each; and analyzed the pros and cons of building on the two main federated protocols.
As it so happened, I had run the same search a few days back using Google’s identically named deep research product, which it released in December. (This may be the first time Google has successfully named something since it named itself Google in 1998.) Google’s deep research uses a standard LLM, Gemini 1.5 Pro, rather than a reasoning model like OpenAI’s version.
You can see it in the results. (Read Google Gemini’s take on the identical query here.) Google’s report comes in at only around 2,000 words, and is shallower in every way. It hits many of the same basic ideas, but at a higher level of abstraction, and doesn’t explore any of the nitty-gritty tradeoffs that ChatGPT discussed at great length. It isn’t bad — my first time reading it, before ChatGPT’s deep research agent had been released, I found it moderately useful — but ChatGPT blows it out of the water.
As you would hope, for a product that is 10 times as expensive.
That said: consultants charge their clients exponentially more to create reports like this, taking significantly more time. And while great consultants can undoubtedly still produce much better reports than this agent, I suspect some of them might be surprised how good deep research is even in this early stage.
Meanwhile their clients — like Professor Breen — may find themselves impressed with what a good research assistant is now available to them.
III.
Like every AI product, OpenAI’s deep research makes mistakes. Some of them can be quite embarrassing.
For my second deep research project, I asked the agent the following:
Put together an explanation of how deep research works that is sophisticated but still comprehensible to people who do not work in AI. Compare and contrast how it works with what we know about Google's own 'deep research' product that is part of Gemini. Suggest how OpenAI's deep research could fit into workflows for academics, journalists, product managers, and people who work in tech policy. Identify the likely next leaps forward in the technology that may make it even more useful in this regard.
This was a test of how useful something like deep research would be in helping me prepare to write a column. In effect, it’s the sort of thing I tell myself before I go research whatever subject I’m writing about. I wanted to see how OpenAI’s agent would perform given that it was researching a story that was less than a day old, and for which much of the coverage was behind paywalls that the agent would not be able to access.
And indeed, the bot struggled more than I expected. See if you can spot the mistake here (emphasis ChatGPT’s):
It’s no coincidence that soon after OpenAI announced deep research, Google rolled out its own Gemini “deep research” feature. Despite sharing a name and a general goal, the two have some key design differences.
Gemini’s deep research feature, of course, predated OpenAI’s. And the fact that it got something so basic that backwards should be enough to make you wonder what else it might have gotten wrong. (Nothing in this column, for what it’s worth, comes from the agent’s analysis.)
IV.
So what do we do with this?
OpenAI’s deep research suffers from the same design problem that almost all AI products have: its superpowers are completely invisible, and must be harnessed through a frustrating process of trial and error.
Generally speaking, the more you already know about something, the more useful I think deep research is. This may be somewhat counterintuitive; perhaps you expected that an AI agent would be well suited to getting you up to speed on an important topic that just landed on your lap at work, for example.
In my early tests, the reverse felt true. Deep research excels for drilling deep into subjects you already have some expertise in, letting you probe for specific pieces of information, types of analysis, or ideas that are new to you.
It’s possible that you can make this work better than I did. (I think all of us will get better at prompting these models over time, and presumably the product will improve over time as well.)
On the whole, though, I came away from my first experiments with OpenAI’s deep research agent impressed. Already, I think any number of consultants, academics, journalists, marketers, product managers, or tech policy professionals could find valuable uses for the agent. Where the Operator agent felt like a tech demo — barely more than a proof of concept — deep research is something that you can just start using.
And in case you want to read even more deep research, I also asked the agent to research some practical advice to people who may be living through authoritarian consolidation and wondering what they should do. Hopefully you’ll never need any of these tips. But just in case!
Elsewhere in OpenAI:
- SoftBank created a joint venture with OpenAI in Japan and will commit to spending at least $3 billion a year on its technology. (Hayden Field / CNBC)
- OpenAI released o3-mini, the first version of its next-generation reasoning model, and Simon Willison put it through its paces.
- Sam Altman said that OpenAI's decision to stop open-sourcing its technology might have put the company on "the wrong side of history." (Kyle Wiggers / TechCrunch)
- Paid subscribers to ChatGPT nearly tripled last year to 15.5 million. (Stephanie Palazzolo and Amir Efrati / The Information)
- Ethan Mollick tries deep research and finds that, among other things, it's better at citations than its predecessors. (One Useful Thing)
- Handsome podcaster Kevin Roose tries Operator. (New York Times)
Governing
- President Trump reportedly views Musk as a person doing “the dirty work” for him by doing his assigned tasks and taking heat for controversial measures. (Isaac Arnsdorf and Jacqueline Alemany / Washington Post)
- Elon Musk aides have reportedly locked senior employees at the Office of Personnel Management out of systems that contain the personal data of millions of federal employees. (Tim Reid / Reuters)
- A look at the Musk-linked young engineers with little to no government experience taking over critical government infrastructure. (Vittoria Elliott / Wired)
- Musk spent at least $288 million to help elect Donald Trump and other Republican candidates, which makes him the biggest political donor of the 2024 elections. (Trisha Thadani, Clara Ence Morse and Maeve Reston / Washington Post)
- A look at how many of the US’s largest companies have announced deals with Musk’s businesses in recent days as they seek to curry favor, including Visa and United Airlines. (Joe Miller, Claire Bushey, Rafe Uddin and Hannah Murphy / Financial Times)
- The National Transportation Safety Board said reporters will have to follow the agency’s X account for updates on accidents and disasters, including the two recent plane crashes, and will no longer issue updates through email. (Matthew Keys / The Desk)
- Musk is effectively carrying out government censorship on social media by removing accounts who name DOGE staffers, this author argues. (Mike Masnick / Techdirt)
- Trump signed an executive action that would direct officials to create a sovereign wealth fund for the US, he said, in a potential move to facilitate a TikTok sale. (Jenny Leonard and Katherine Burton / Bloomberg)
- Trump’s new executive orders on tariffs target a loophole often exploited by companies like Temu and Shein, by removing the “de minimis” exemption for small packages. (Jennifer A Dlouhy, Josh Wingrove and Spencer Soper / Bloomberg)
- US officials are reportedly looking into whether DeepSeek avoided US restrictions on AI chip sales to China by buying Nvidia chips through companies in Singapore. (Jordan Robertson, Mackenzie Hawkins and Jenny Leonard / Bloomberg)
- The Trump administration has fueled a surge in crypto prices that has “no substance” and could cause “havoc” when prices collapse, hedge fund Elliott Management reportedly warned investors. (Costas Mourselas / Financial Times)
- Trump’s memecoin has racked up nearly $100 million in trading fees and has lost tens of thousands for small traders, analysis firms say. (Tom Wilson and Michelle Conlin / Reuters)
- A look at Trump’s new science adviser nomination, Michael Kratsios, who has no degrees in science or engineering but is a policy specialist on AI. (William J. Broad / New York Times)
- Adam Candeub, a prominent Big Tech critic, will be general counsel of the FCC. (Ben Smith and Shelby Talcott / Semafor)
- Apple asked a district court to pause the remedies phase of Google’s search monopoly trial while it appeals a denial of its request to have a more direct role in the phase. (Lauren Feiner / The Verge)
- Its emergency pause request was denied. (Umar Shakir / The Verge)
- Apple agreed to a $20 million settlement in a class action suit alleging that Apple Watch batteries in the first generation, Series 1, Series 2 and Series 3 models swelled over time. (Samantha Kelly / CNET)
- A San Francisco appeals court seemed skeptical of Google’s argument that competition with Apple in smartphones was enough to overturn its illegal Android app monopoly verdict. (Josh Sisco and Leah Nylen / Bloomberg)
- Amazon is suing Washington state and seeking to limit the release of records related to investigations into the company’s Project Kuiper satellite facility in Seattle to the Bezos-owned Washington Post. (Todd Bishop / GeekWire)
- Texas governor Greg Abbott banned the use of Chinese-backed AI and social media apps – DeepSeek, Lemon8, Moomoo, RedNote, Tiger Brokers and Webull – on government-issued devices. (Karoline Leonard / Austin American-Statesman)
- A Massachusetts man pled guilty to using AI chatbots to impersonate a university professor and inviting strangers to her home after stalking her for years. (Katie McQue / The Guardian)
- NetChoice filed a lawsuit against a Maryland law intended to protect kids from inappropriate material online and argued that it violates the First Amendment. (Lauren Feiner / The Verge)
- Hackers are experimenting with Gemini to improve productivity for cyber attacks, Google’s Threat Intelligence Group said. (Bill Toulas / BleepingComputer)
- The UK is introducing four new laws that would make it illegal to possess, create or distribute AI tools designed to create child sexual abuse material. (Sima Kotecha / BBC)
- Temu, Shein and Amazon Marketplace could soon be liable for dangerous or illegal products sold on their platform in the EU, according to a draft proposal. (Andy Bounds / Financial Times)
- After Meta blocked links to news outlets in Canada, scammers began posting fake ads posing as publishers. (Thomas Seal / Bloomberg)
- Almost 100 journalists and nonprofit members were targeted on WhatsApp by spyware owned by Israeli hacking software company Paragon Solutions, WhatsApp said. (Stephanie Kirchgaessner / The Guardian)
- Many Russians are finding ways to circumvent the government’s throttling of YouTube in the country. (Paul Sonne / New York Times)
- Australia’s YouTube exemption for its laws banning social media for minors under 16 citing education purposes could expose children to addictive and harmful content, experts say. (Byron Kaye / Reuters)
Industry
- DeepSeek’s hyped R1 reasoning model failed to detect or block 50 malicious prompts designed to draw out toxic content, researchers said. (Matt Burgess and Lily Hay Newman / Wired)
- DeepSeek had the most downloaded mobile app in 140 markets, with its largest group of new users in India. (Vlad Savov / Bloomberg)
- The introduction of the R1 model could lead to major LLMs being commoditized this year, executives at AI labs say. (Ryan Browne / CNBC)
- A look at how DeepSeek has invigorated the debate over how much AI information companies should share. (Christopher Mims / Wall Street Journal)
- DeepSeek is helping to reveal a complicated landscape of performance and pricing in AI when compared to other major models, these researchers say. (Dylan Patel, AJ Kourabi, Doug O’Laughlin and Reyk Knuhtsen / SemiAnalysis)
- Anthropic said it has developed a new system, “constitutional classifiers,” that works as a protective layer on LLMs to prevent users from eliciting harmful content from its models. (Cristina Criddle / Financial Times)
- While TikTok has maintained almost all of its user traffic in the US despite its brief shutdown, creators are still looking to diversify their online presences. (Zach Vallese / CNBC)
- Meta reportedly warned employees in an internal memo that it will fire employees who leak information. (Alex Heath / The Verge)
- Meta is reportedly considering moving its incorporation to Texas from Delaware. A spokesperson said there aren’t plans to move corporate headquarters from California. (Emily Glazer, Berber Jin and Meghan Bobrowsky / Wall Street Journal)
- Meta is on track to invest more than $100 billion in virtual and augmented reality this year. (Tim Bradshaw and Hannah Murphy / Financial Times)
- Meta will not release AI systems if it considers them too risky, the company said in its Frontier AI Framework policy document. (Kyle Wiggers / TechCrunch)
- Google’s “moonshot” X division announced a new spinout, Heritable Agriculture, a startup using data and machine-learning to improve the way crops are grown. (Brian Heater / TechCrunch)
- Apple has reportedly canceled a Mac-connected AR glasses project. (Mark Gurman / Bloomberg)
- Microsoft is creating the Advanced Planning Unit, a new unit within its AI business division focused on understanding the implications of AI on society. (Kyle Wiggers / TechCrunch)
- LinkedIn’s push into video seems to be working, as short-form video is now the fastest-growing content category on the platform. (Alexander Lee / Digiday)
- Amazon is expected to spend 60% more than previously announced on a data center project in Mississippi, now spending $16 billion, according to state documents. (Brody Ford and Matt Day / Bloomberg)
- Nonprofit AI safety group MLCommons partnered with Hugging Face to release one of the largest collections of public domain voice recordings for AI research. (Kyle Wiggers / TechCrunch)
- A look at Swedish buy-now-pay-later company Klarna and why its CEO Sebastian Siemiatkowski has been hyping up AI in the workplace. (Noam Scheiber / New York Times)
- Cloudflare launched a new feature using the Content Credentials system to help users verify whether an image or video has been created or manipulated by AI. (Jess Weatherbed / The Verge)
- The Beatles’ track “Now and Then,” created with AI assistance, won Best Rock Performance at the Grammys. (Jess Weatherbed / The Verge)
- A native porn app for iOS will soon be available for EU users through the alternative app store AltStore PAL. (Sarah Perez / TechCrunch)
- The app, Hot Tub, is proof of how EU laws could harm Apple customers, Apple said. (Mark Gurman / Bloomberg)
- This author experimented with talking to a chatbot meant to emulate an 80-year-old version of her. (Heidi Mitchell / Wall Street Journal)
- A look at the three possible scenarios for how AI could impact the economy. (David Wilcox and Tom Orlik / Bloomberg)
Those good posts
For more good posts every day, follow Casey’s Instagram stories.
(Link)
(Link)
(Link)
Talk to us
Send us tips, comments, questions, and deep research assignments: casey@platformer.news. Read our ethics policy here.