Why Top Websites Fear This AI Bot

Dozens of large companies including Amazon and The New York Times have rushed to block GPTBot, a tool that OpenAI recently announced it was using to crawl the web for data that would be fed to its popular chatbot, ChatGPT.

As of this week, 70 of the world's top 1,000 websites have moved to block GPTBot, the web crawler OpenAI revealed two weeks ago was being used to collect massive amounts of information from the internet to train ChatGPT. Originality.ai, a company that checks content to see if it's AI-generated or plagiarized, conducted an analysis that found more than 15% of the 100-most-popular websites have decided to block GPTBot in the past two weeks.

The six largest websites now blocking the bot are amazon.com (along with several of its international counterparts), nytimes.com, cnn.com, wikihow.com, shutterstock.com, and quora.com.

The top 100 sites blocking GPTBot include bloomberg.com, scribd.com, and reuters.com, as well as insider.com and businessinsider.com. Among the top 1,000 sites blocking the bot are ikea.com, airbnb.com, nextdoor.com, nymag.com, theatlantic.com, axios.com, usmagazine.com, lonelyplanet.com, and coursera.org.

“GPTBot launched 14 days ago and the percentage of Top 1,000 sites blocking it has been steadily increasing,” the analysis said.

How these websites block GPTBot is relatively simple, even crude, depending on your perspective. The sites include a file called robots.txt, and GPTBot has been added to its “disallow” list.

Robots.txt is a tool created in the 1990s meant to stop web crawlers, such as Google or Bing's search crawlers, from extracting data and information from a website. When revealing the crawler, OpenAI said it would abide by robots.txt and GPTBot would not crawl websites that deploy it.

Much of what is available on the internet, particularly text and images, is technically under copyright. Crawlers like GPTBot do not ask for permission, license, or pay to use any data or information they extract. The only way to avoid them at this point is through robots.txt, although companies that deploy crawlers are not legally bound to recognize robots.txt restrictions.

There's been an increasing awareness about copyright rules and the ownership of data these crawlers take to train AI projects based on large language models, or LLMs, as tools like ChatGPT have exploded onto the tech scene. Several lawsuits are already in the works. The author Stephen King, after learning his books have been used in AI training sets, said he's looking to the future with a “certain dreadful fascination.”

For its part, OpenAI has taken to trying to hide that ChatGPT was trained on any copyrighted material.

A representative of OpenAI could not be immediately reached for comment.

See below for a full list of those among the biggest websites to have blocked GPTBot between August 8 and August 22:

amazon.com
quora.com
nytimes.com
shutterstock.com
wikihow.com
cnn.com
foursquare.com
healthline.com
scribd.com
businessinsider.com
reuters.com
medicalnewstoday.com
amazon.co.uk
insider.com
yourdictionary.com
slideshare.net
amazon.de
bloomberg.com
amazon.in
studocu.com
ikea.com
uol.com.br
amazon.fr
geeksforgeeks.org
pcmag.com
theverge.com
nextdoor.com
amazon.ca
amazon.co.jp
airbnb.com
vulture.com
polygon.com
prnewswire.com
mashable.com
nymag.com
detik.com
theatlantic.com
trulia.com
amazon.es
eater.com
picclick.com
bustle.com
etymonline.com
teacherspayteachers.com
archiveofourown.org
vox.com
kumparan.com
theathletic.com
amazon.it
alltrails.com
thrillist.com
amazon.com.br
usmagazine.com
pikiran-rakyat.com
city-data.com
hellomagazine.com
stern.de
chicagotribune.com
spanishdict.com
lonelyplanet.com
inverse.com
actu.fr
fool.com
coursera.org
france24.com
myfitnesspal.com
dotesports.com
theglobeandmail.com
axios.com

Originally published on BusinessInsider.com

Get Access To Marc Chaikin's "Power Gauge Report"

Enter your email address to access all the details.

The AI Stocks Every Investor Should Have On Their Radar

Enter your email address to access all the details.

Write This Stock Ticker Down Right Now

Write This Stock Ticker Down Right Now

Write This Stock Ticker Down Right Now

Enter your email below to see the stock name and ticker on the next page.

Write This Stock Ticker Down Right Now

Enter your email address to see the name and ticker on the next page.

How to Collect "Amazon Royalty" Payouts Before the Deadline

Thanks to a little-known IRS loophole, regular Americans can collect up to $28,544 (or more) in payouts from what is called “Amazon’s secret royalty program”…

Enter your email address to access all the details.

Elon Musk's "Project Omega"

It could soon mint new millionaires, while plunging millions of unprepared Americans into poverty. Get the stocks at the center of it all.

Enter your email address to receive the names and ticker symbols.

The #1 Trade For 2023

Elon Musk is set to reveal his secret “Project X” that could revolutionize a $23 Trillion industry, and potentially be 1,000x bigger than EV’s.

This Backdoor play could hand early investors a windfall of gains.

Enter your email address to receive this company’s name and ticker symbol for free.

Every Investor Should Have This $3 AI Stock On Their Radar

Enter your email address to get access to all the details.

Project An-E

Breakthrough A.I. Just Predicted What the Stock Prices of Tesla, Nvidia, and Apple Will Be 30 Days from Now…

Enter your email address for immediate access.

Elon Musk's "Project Omega"

It could soon mint new millionaires, while plunging millions of unprepared Americans into poverty. Get the stocks at the center of it all.

Enter your email address to receive the names and ticker symbols.

#1 A.I. Stock Currently Trading For $3

Gain immediate access to this revolutionary $3 A.I. stock that is set to disrupt a $15 Trillion Market soar 75X.

Enter your email address to receive the name and ticker symbol for free.

You may also like

Get Access To Marc Chaikin's "Power Gauge Report"

Enter your email address to access all the details.

The AI Stocks Every Investor Should Have On Their Radar

Enter your email address to access all the details.

Write This Stock Ticker Down Right Now

Write This Stock Ticker Down Right Now

Write This Stock Ticker Down Right Now

Enter your email below to see the stock name and ticker on the next page.

Write This Stock Ticker Down Right Now

Enter your email address to see the name and ticker on the next page.

How to Collect "Amazon Royalty" Payouts Before the Deadline

Thanks to a little-known IRS loophole, regular Americans can collect up to $28,544 (or more) in payouts from what is called “Amazon’s secret royalty program”…

Enter your email address to access all the details.

Elon Musk's "Project Omega"

It could soon mint new millionaires, while plunging millions of unprepared Americans into poverty. Get the stocks at the center of it all.

Enter your email address to receive the names and ticker symbols.

The #1 Trade For 2023

Elon Musk is set to reveal his secret “Project X” that could revolutionize a $23 Trillion industry, and potentially be 1,000x bigger than EV’s.

This Backdoor play could hand early investors a windfall of gains.

Enter your email address to receive this company’s name and ticker symbol for free.

Every Investor Should Have This $3 AI Stock On Their Radar

Enter your email address to get access to all the details.

Project An-E

Breakthrough A.I. Just Predicted What the Stock Prices of Tesla, Nvidia, and Apple Will Be 30 Days from Now…

Enter your email address for immediate access.

Elon Musk's "Project Omega"

It could soon mint new millionaires, while plunging millions of unprepared Americans into poverty. Get the stocks at the center of it all.

Enter your email address to receive the names and ticker symbols.

#1 A.I. Stock Currently Trading For $3

Gain immediate access to this revolutionary $3 A.I. stock that is set to disrupt a $15 Trillion Market soar 75X.

Enter your email address to receive the name and ticker symbol for free.