Bypass Bot Detection: Scrapy & Playwright Tips

When pure Scrapy isn't enough—when the website checks for a real browser, executes complex JavaScript, or has advanced anti-bot protection—it's time to bring in the heavy artillery: Scrapy + Playwright.

This guide shows you how to configure them together for maximum stealth, making your scraper look exactly like a real user browsing Chrome.

1. Why Playwright?

Pure Scrapy is just a script. It doesn't have a screen, a mouse, or a JavaScript engine. Playwright is a real browser (Chromium, Firefox, WebKit). It passes almost all "Are you a robot?" checks by default because it is the tool humans use.

2. Installation

First, you need to install the integration plugin and the browsers.

Run these commands in your terminal:

pip install scrapy-playwright
playwright install chromium

3. Basic Configuration

You need to tell Scrapy to use Playwright for downloading pages instead of its default downloader.

Open settings.py and add/update these lines:

# settings.py

DOWNLOAD_HANDLERS = {
    "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}

# This is required for Playwright to work with Scrapy
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

4. The "Stealth" Configuration (The Secret Sauce)

Just using Playwright isn't always enough. Sophisticated sites check for "automation flags" (variables that say "Hey, I'm being controlled by a script"). We need to disable them.

Add this to your settings.py:

# settings.py

PLAYWRIGHT_LAUNCH_OPTIONS = {
    "headless": True,  # Set to False to see the browser pop up (good for debugging)
    "args": [
        "--disable-blink-features=AutomationControlled", # <--- THE KEY to stealth
        "--no-sandbox",
    ],
}

PLAYWRIGHT_CONTEXT_ARGS = {
    "javaScriptEnabled": True,
    "ignoreHTTPSErrors": True,
    # Set a real browser viewport size
    "viewport": {"width": 1920, "height": 1080},
    # Set a real User-Agent (very important!)
    "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
}

--disable-blink-features=AutomationControlled: This removes the "I am a robot" flag that Chrome usually sends when controlled by code.
user_agent: We manually set a modern Chrome user agent.

5. How to Use It in Your Spider

Now that settings are configured, you need to tell your spider to use Playwright for specific requests.

In your spider file (e.g., spiders/myspider.py):

import scrapy

class StealthSpider(scrapy.Spider):
    name = "stealth"

    def start_requests(self):
        yield scrapy.Request(
            url="https://nowsecure.nl",  # A site to test security
            meta={
                "playwright": True,
                "playwright_include_page": True, # Optional: if you need to interact with the page
            },
            callback=self.parse
        )

    async def parse(self, response):
        # Extract data normally
        title = response.css('title::text').get()
        print(f"Title: {title}")

        # If you need to interact (click/scroll), you get the 'page' object
        page = response.meta["playwright_page"]
        await page.close()

6. Advanced Stealth: Randomizing User-Agents

Using the same User-Agent for every request is suspicious. Let's randomize it for every request.

Update your Spider to pass context arguments dynamically:

import scrapy
import random

# List of real User-Agents
USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36",
]

class RandomStealthSpider(scrapy.Spider):
    name = "random_stealth"

    def start_requests(self):
        ua = random.choice(USER_AGENTS)
        yield scrapy.Request(
            url="https://bot.sannysoft.com", # A bot detection test site
            meta={
                "playwright": True,
                "playwright_context_args": {
                    "user_agent": ua,
                    "viewport": {"width": 1920, "height": 1080},
                }
            },
            callback=self.parse
        )

    def parse(self, response):
        # ... extraction logic
        pass

7. Complete `settings.py` for Copy-Paste

Here is the full configuration block for settings.py.

# settings.py

# 1. Enable Playwright
DOWNLOAD_HANDLERS = {
    "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

# 2. Launch Options (The Browser App)
PLAYWRIGHT_LAUNCH_OPTIONS = {
    "headless": True, # Set False to watch it work
    "args": [
        "--disable-blink-features=AutomationControlled", # Hides the 'robot' flag
        "--no-sandbox",
    ],
}

# 3. Context Options (The Browser Tab)
PLAYWRIGHT_CONTEXT_ARGS = {
    "javaScriptEnabled": True,
    "ignoreHTTPSErrors": True,
    "viewport": {"width": 1280, "height": 720},
    "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
}

# 4. Standard Scrapy Politeness (Still applies!)
DOWNLOAD_DELAY = 2
CONCURRENT_REQUESTS = 4

Summary

Install scrapy-playwright.
Configure DOWNLOAD_HANDLERS and TWISTED_REACTOR.
Add Stealth Args: --disable-blink-features=AutomationControlled is the most important line.
Use Meta: Pass meta={"playwright": True} in your requests.

With this setup, you are running a real Chrome browser that explicitly lies about being automated. This bypasses 99% of bot detection systems.

Don't sleep on viewport and screen dimensions because headless browsers literally default to 0x0, which is basically screaming I'm a bot to any halfway decent detection system. Next up, stick with one consistent User-Agent per session instead of constantly rotating them - sounds counterintuitive, but constant UA switching is literally a bot pattern that modern detection systems clock immediately. Playwright-stealth patches some of the JavaScript signals, but it's not catching everything - especially the deeper CDP-layer leaks and behavioral red flags - so if you're hitting harder targets, you might wanna check out CDP-minimal tools like rebrowser-patches or camoufox. TLS fingerprinting gets way more hype than it deserves with playwright; since it uses real chromium with boringSSL, your JA4 fingerprint is already legit. The actual gotchas are HTTP/2 SETTINGS frame ordering, how your requests behave over time, and JS-level detection tricks. So, throw residential proxy rotation and some smart pacing at it, and managed services can hit 98% success against Cloudflare and 89% against DataDome, while even DIY Playwright plus a residential proxy hits 31% sustained success over 24 hours against enterprise WAFs - way better than those outdated 1-5% numbers floating around

How to Avoid Bot Detection Using Scrapy and Playwright

1. Why Playwright?

2. Installation

3. Basic Configuration

4. The "Stealth" Configuration (The Secret Sauce)

5. How to Use It in Your Spider

6. Advanced Stealth: Randomizing User-Agents

7. Complete `settings.py` for Copy-Paste

Summary

Comments (1)

Mastering Web Scraping with Scrapy: From Zero to Hero

Introduction to Scrapy and Installation

More from this blog

The Physics of Resistance: Suresh the Security Guard and Ohm's Law

How to Use Scrapy for Stealthy Web Scraping Without Getting Caught

The Ultimate Decision Guide: Scrapy vs. Playwright vs. Selenium vs. Proxies

Essential AI Prompts to Boost Your Scrapy Development

Command Palette

1. Why Playwright?

2. Installation

3. Basic Configuration

4. The "Stealth" Configuration (The Secret Sauce)

5. How to Use It in Your Spider

6. Advanced Stealth: Randomizing User-Agents

7. Complete settings.py for Copy-Paste

Summary

Comments (1)

Mastering Web Scraping with Scrapy: From Zero to Hero

Introduction to Scrapy and Installation

More from this blog

7. Complete `settings.py` for Copy-Paste