OpenAI connects ChatGPT to the internet

OpenAI’s viral AI-powered chatbot, ChatGPT, can now browse the internet — in certain cases.

OpenAI today launched plugins for ChatGPT, which extend the bot’s functionality by granting it access to third-party knowledge sources and databases, including the web. Available in alpha to ChatGPT users and developers on the waitlist, OpenAI says that it’ll initially prioritize a small number of developers and subscribers to its premium ChatGPT Plus plan before rolling out larger-scale and API access.

Easily the most intriguing plugin is OpenAI’s first-party web-browsing plugin, which allows ChatGPT to draw data from around the web to answer the various questions posed to it. (Previously, ChatGPT’s knowledge was limited to dates, events and people prior to around September 2021.) The plugin retrieves content from the web using the Bing search API and shows any websites it visited in crafting an answer, citing its sources in ChatGPT’s responses.

A chatbot with web access is a risky prospect, as OpenAI’s own research has found. An experimental system built in 2021 by the AI startup, called WebGPT, sometimes quoted from unreliable sources and was incentivized to cherry-pick data from sites it expected users would find convincing — even if those sources weren’t objectively the strongest. Meta’s since-disbanded BlenderBot 3.0 had access to the web, too, and quickly went off the rails, delving into conspiracy theories and offensive content when prompted with certain text.

OpenAI ChatGPT
Image Credits: OpenAI

The live web is less curated than a static training dataset and — by implication — less filtered, of course. Search engines like Google and Bing use their own safety mechanisms to reduce the chances unreliable content rises to the top of results, but these results can be gamed. They also aren’t necessarily representative of the totality of the web. As a piece in The New Yorker notes, Google’s algorithm prioritizes websites that use modern web technologies like encryption, mobile support and schema markup. Many websites with otherwise quality content get lost in the shuffle as a result.

This gives search engines a lot of power over the data that might inform web-connected language models’ answers. Google has been found to prioritize its own services in Search by, for example, answering a travel query with data from Google Places instead of a richer, more social source like TripAdvisor. At the same time, the algorithmic approach to search opens the door to bad actors. In 2020, Pinterest leveraged a quirk of Google’s image search algorithm to surface more of its content in Google Image searches, according to The New Yorker.