Skip to main content

Command Palette

Search for a command to run...

How to Avoid Bot Detection Using Scrapy and Playwright

Updated
•4 min read

When pure Scrapy isn't enough—when the website checks for a real browser, executes complex JavaScript, or has advanced anti-bot protection—it's time to bring in the heavy artillery: Scrapy + Playwright.

This guide shows you how to configure them together for maximum stealth, making your scraper look exactly like a real user browsing Chrome.

1. Why Playwright?

Pure Scrapy is just a script. It doesn't have a screen, a mouse, or a JavaScript engine. Playwright is a real browser (Chromium, Firefox, WebKit). It passes almost all "Are you a robot?" checks by default because it is the tool humans use.


2. Installation

First, you need to install the integration plugin and the browsers.

Run these commands in your terminal:

pip install scrapy-playwright
playwright install chromium

3. Basic Configuration

You need to tell Scrapy to use Playwright for downloading pages instead of its default downloader.

Open settings.py and add/update these lines:

# settings.py

DOWNLOAD_HANDLERS = {
    "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}

# This is required for Playwright to work with Scrapy
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

4. The "Stealth" Configuration (The Secret Sauce)

Just using Playwright isn't always enough. Sophisticated sites check for "automation flags" (variables that say "Hey, I'm being controlled by a script"). We need to disable them.

Add this to your settings.py:

# settings.py

PLAYWRIGHT_LAUNCH_OPTIONS = {
    "headless": True,  # Set to False to see the browser pop up (good for debugging)
    "args": [
        "--disable-blink-features=AutomationControlled", # <--- THE KEY to stealth
        "--no-sandbox",
    ],
}

PLAYWRIGHT_CONTEXT_ARGS = {
    "javaScriptEnabled": True,
    "ignoreHTTPSErrors": True,
    # Set a real browser viewport size
    "viewport": {"width": 1920, "height": 1080},
    # Set a real User-Agent (very important!)
    "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
}
  • --disable-blink-features=AutomationControlled: This removes the "I am a robot" flag that Chrome usually sends when controlled by code.

  • user_agent: We manually set a modern Chrome user agent.


5. How to Use It in Your Spider

Now that settings are configured, you need to tell your spider to use Playwright for specific requests.

In your spider file (e.g., spiders/myspider.py):

import scrapy

class StealthSpider(scrapy.Spider):
    name = "stealth"

    def start_requests(self):
        yield scrapy.Request(
            url="https://nowsecure.nl",  # A site to test security
            meta={
                "playwright": True,
                "playwright_include_page": True, # Optional: if you need to interact with the page
            },
            callback=self.parse
        )

    async def parse(self, response):
        # Extract data normally
        title = response.css('title::text').get()
        print(f"Title: {title}")

        # If you need to interact (click/scroll), you get the 'page' object
        page = response.meta["playwright_page"]
        await page.close()

6. Advanced Stealth: Randomizing User-Agents

Using the same User-Agent for every request is suspicious. Let's randomize it for every request.

Update your Spider to pass context arguments dynamically:

import scrapy
import random

# List of real User-Agents
USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36",
]

class RandomStealthSpider(scrapy.Spider):
    name = "random_stealth"

    def start_requests(self):
        ua = random.choice(USER_AGENTS)
        yield scrapy.Request(
            url="https://bot.sannysoft.com", # A bot detection test site
            meta={
                "playwright": True,
                "playwright_context_args": {
                    "user_agent": ua,
                    "viewport": {"width": 1920, "height": 1080},
                }
            },
            callback=self.parse
        )

    def parse(self, response):
        # ... extraction logic
        pass

7. Complete settings.py for Copy-Paste

Here is the full configuration block for settings.py.

# settings.py

# 1. Enable Playwright
DOWNLOAD_HANDLERS = {
    "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

# 2. Launch Options (The Browser App)
PLAYWRIGHT_LAUNCH_OPTIONS = {
    "headless": True, # Set False to watch it work
    "args": [
        "--disable-blink-features=AutomationControlled", # Hides the 'robot' flag
        "--no-sandbox",
    ],
}

# 3. Context Options (The Browser Tab)
PLAYWRIGHT_CONTEXT_ARGS = {
    "javaScriptEnabled": True,
    "ignoreHTTPSErrors": True,
    "viewport": {"width": 1280, "height": 720},
    "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
}

# 4. Standard Scrapy Politeness (Still applies!)
DOWNLOAD_DELAY = 2
CONCURRENT_REQUESTS = 4

Summary

  1. Install scrapy-playwright.

  2. Configure DOWNLOAD_HANDLERS and TWISTED_REACTOR.

  3. Add Stealth Args: --disable-blink-features=AutomationControlled is the most important line.

  4. Use Meta: Pass meta={"playwright": True} in your requests.

With this setup, you are running a real Chrome browser that explicitly lies about being automated. This bypasses 99% of bot detection systems.

More from this blog

Tech Priya

24 posts

Tech Priya is a knowledge blog where electronics, Python, and core tech concepts are explained using real-world analogies in Kannada-English, making learning clear, relatable, and enjoyable.