News | Cloudflare Introduces A New No-code Feature to Prevent Web Scraping

Cloudflare Introduces A New No-code Feature to Prevent Web Scraping

Published by: Insights Desk Released: Jul 05, 2024 Source: DemandTalk

Highlights:

Cloudflare claims its software can detect bots attempting to scrape content for LLM training projects, even those trying to evade detection.
Cloudflare plans to update the feature regularly to adapt to changes in AI scraping bots’ technical patterns and new crawler developments

Recently, Cloudflare Inc. introduced a new no-code feature designed to deter AI developers from scraping content from websites.

The feature is included in the company’s leading CDN and is widely utilized across a significant portion of the world’s websites to enhance page load speeds. Cloudflare has made the new scraping prevention feature accessible in both free and paid versions of its CDN.

Numerous AI firms utilize public web content to train their large language models. While entities like OpenAI and Google LLC allow website operators to opt out of scraping, not all LLM developers offer this choice. This is the challenge that Cloudflare aims to tackle with its scraping prevention tool.

The feature employs artificial intelligence to identify automated attempts to extract content. Cloudflare claims that its software can detect bots attempting to scrape content for LLM training projects, even those trying to evade detection.

“Sadly, we’ve observed bot operators attempt to appear as though they are a real browser by using a spoofed user agent. We’ve monitored this activity over time, and we’re proud to say that our global machine learning model has always recognized this activity as a bot,” Cloudflare engineers wrote in a blog post recently.

Cloudflare identified a crawler used by Perplexity AI Inc., a well-funded search engine startup, to collect content. A media house reported recently that the bot mimics regular user traffic in its website scraping method, making it challenging for website operators to block Perplexity AI from accessing their content.

Cloudflare assigns a score between 1 and 99 to every website visit processed through its platform. A lower score indicates a higher likelihood of the request being generated by a bot. According to Cloudflare, requests made by the bot collecting content for Perplexity AI consistently receive a score below 30.

Cloudflare’s engineers detailed, “When bad actors attempt to crawl websites at scale, they generally use tools and frameworks that we are able to fingerprint. For every fingerprint we see, we use Cloudflare’s network, which sees over 57 million requests per second on average, to understand how much we should trust this fingerprint.”

Cloudflare plans to continually update the feature to adapt to evolving technical signatures of AI scraping bots and the emergence of new crawlers. As part of this effort, the company is introducing a tool that allows website operators to report encounters with new bots.

il est temps de devenir sérieux avec le genai dan...

harnessing ai: the future of business transformati...

prepare for the future now. achieve greater, secur...

stay ahead with modern technology...

stay ahead...

workforce upskilling for the ai era...

unlock the full potential of generative ai at work...

ai pcs are quickly becoming the key to achieving s...

developing tomorrow’s ai on today’s ai-ready w...

unveiling ai-level productivity...

the new cyber security opportunity in an ‘ai eve...

how ai is changing managed detection and response...

answering your 4 biggest questions about generativ...

understanding the costs of generative ai...

the top 5 generative ai questions on every executi...

7 leading generative ai use cases...

6 steps to success with generative...

revolutionize your product launches with ai-driven...

unlock the full potential of ai-powered software d...

new era energy efficiency whitepaper longform...

compliance automation: a strategic investment for ...

leading the way: how modern workplaces embrace cha...

choosing the right ai foundation model for your ne...

ai governance: the path to responsible ai...

ai in market research: new possibilities, new insi...

ai ready workforce: upskilling for the ai era...

ai pricing strategy: the key to sustainable busine...

ai in business strategy: enhancing decisions boos...

genai at work: revolutionizing modern business ope...

ai misinformation: ai’s role in amplifying misin...

decision intelligence empowering business actions ...

committee machine in ml harnessing ensemble techni...

information processing language serves scalable an...

ai agents in business: transforming operations dr...

ai adoption framework: key components for effectiv...

machine learning use cases that deliver tangible r...

profitable ai-powered data management solutions to...

business-centric cognitive architecture revolution...

ai use cases – innovations for business success...

the role of ai in software development...

alibaba cloud unveils qwen2.5-omni-7b...

openai upgrades chatgpt’s image generation tool ...

microsoft is improving security copilot service wi...

deepseek unveils enhanced v3 model under mit licen...

nvidia reportedly acquires gretel to generate arti...

dataminr raises usd 85 m for real-time analytics...

ai code review startup graphite raises usd 52 m to...

zoom upgrades with agentic ai for enhanced video c...

google introduces gemini robotics and gemini robot...

google launches next-gen lightweight gemma ai mode...

ai21 labs introduces maestro for enhancing llm qua...

servicenow to acquire moveworks in a usd 2.9 b...

qualcomm acquires edge impulse, edge ai startup...

google introduces two new ai features to enhance i...

coreweave plans to buy weight biases for seamless...

openai launches nextgenai consortium with 15 insti...

anthropic pbc raises usd 3.5 b at usd 61.5 b value...

openai introduces gpt-4.5 as the most advanced and...

amazon launches alexa , an llm-powered assistant...

perplexity ai is creating a browser for ‘agentic...

role of machine learning in networking...

Cloudflare Introduces A New No-code Feature to Prevent Web Scraping

Insights Desk

Related posts

Alibaba Cloud Unveils Qwen2.5-Omni-7B...

OpenAI Upgrades ChatGPT’s Image Generation Tool ...

Microsoft is Improving Security Copilot Service wi...

DeepSeek Unveils Enhanced V3 Model Under MIT Licen...

Nvidia Reportedly Acquires Gretel to Generate Arti...

Dataminr Raises USD 85 M for Real-time Analytics...

AI Code Review Startup Graphite Raises USD 52 M to...

Zoom Upgrades with Agentic AI for Enhanced Video C...

Google Introduces Gemini Robotics and Gemini Robot...

Google Launches Next-Gen Lightweight Gemma AI Mode...

Our Brands