How to Avoid Bot Detection Using Scrapy and Playwright
When pure Scrapy isn't enough—when the website checks for a real browser, executes complex JavaScript, or has advanced anti-bot protection—it's time to bring in the heavy artillery: Scrapy + Playwright.
This guide shows you how to configure them together for maximum stealth, making your scraper look exactly like a real user browsing Chrome.
1. Why Playwright?
Pure Scrapy is just a script. It doesn't have a screen, a mouse, or a JavaScript engine. Playwright is a real browser (Chromium, Firefox, WebKit). It passes almost all "Are you a robot?" checks by default because it is the tool humans use.
2. Installation
First, you need to install the integration plugin and the browsers.
Run these commands in your terminal:
pip install scrapy-playwright
playwright install chromium
3. Basic Configuration
You need to tell Scrapy to use Playwright for downloading pages instead of its default downloader.
Open settings.py and add/update these lines:
# settings.py
DOWNLOAD_HANDLERS = {
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}
# This is required for Playwright to work with Scrapy
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
4. The "Stealth" Configuration (The Secret Sauce)
Just using Playwright isn't always enough. Sophisticated sites check for "automation flags" (variables that say "Hey, I'm being controlled by a script"). We need to disable them.
Add this to your settings.py:
# settings.py
PLAYWRIGHT_LAUNCH_OPTIONS = {
"headless": True, # Set to False to see the browser pop up (good for debugging)
"args": [
"--disable-blink-features=AutomationControlled", # <--- THE KEY to stealth
"--no-sandbox",
],
}
PLAYWRIGHT_CONTEXT_ARGS = {
"javaScriptEnabled": True,
"ignoreHTTPSErrors": True,
# Set a real browser viewport size
"viewport": {"width": 1920, "height": 1080},
# Set a real User-Agent (very important!)
"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
}
--disable-blink-features=AutomationControlled: This removes the "I am a robot" flag that Chrome usually sends when controlled by code.user_agent: We manually set a modern Chrome user agent.
5. How to Use It in Your Spider
Now that settings are configured, you need to tell your spider to use Playwright for specific requests.
In your spider file (e.g., spiders/myspider.py):
import scrapy
class StealthSpider(scrapy.Spider):
name = "stealth"
def start_requests(self):
yield scrapy.Request(
url="https://nowsecure.nl", # A site to test security
meta={
"playwright": True,
"playwright_include_page": True, # Optional: if you need to interact with the page
},
callback=self.parse
)
async def parse(self, response):
# Extract data normally
title = response.css('title::text').get()
print(f"Title: {title}")
# If you need to interact (click/scroll), you get the 'page' object
page = response.meta["playwright_page"]
await page.close()
6. Advanced Stealth: Randomizing User-Agents
Using the same User-Agent for every request is suspicious. Let's randomize it for every request.
Update your Spider to pass context arguments dynamically:
import scrapy
import random
# List of real User-Agents
USER_AGENTS = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36",
]
class RandomStealthSpider(scrapy.Spider):
name = "random_stealth"
def start_requests(self):
ua = random.choice(USER_AGENTS)
yield scrapy.Request(
url="https://bot.sannysoft.com", # A bot detection test site
meta={
"playwright": True,
"playwright_context_args": {
"user_agent": ua,
"viewport": {"width": 1920, "height": 1080},
}
},
callback=self.parse
)
def parse(self, response):
# ... extraction logic
pass
7. Complete settings.py for Copy-Paste
Here is the full configuration block for settings.py.
# settings.py
# 1. Enable Playwright
DOWNLOAD_HANDLERS = {
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
# 2. Launch Options (The Browser App)
PLAYWRIGHT_LAUNCH_OPTIONS = {
"headless": True, # Set False to watch it work
"args": [
"--disable-blink-features=AutomationControlled", # Hides the 'robot' flag
"--no-sandbox",
],
}
# 3. Context Options (The Browser Tab)
PLAYWRIGHT_CONTEXT_ARGS = {
"javaScriptEnabled": True,
"ignoreHTTPSErrors": True,
"viewport": {"width": 1280, "height": 720},
"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
}
# 4. Standard Scrapy Politeness (Still applies!)
DOWNLOAD_DELAY = 2
CONCURRENT_REQUESTS = 4
Summary
Install
scrapy-playwright.Configure
DOWNLOAD_HANDLERSandTWISTED_REACTOR.Add Stealth Args:
--disable-blink-features=AutomationControlledis the most important line.Use Meta: Pass
meta={"playwright": True}in your requests.
With this setup, you are running a real Chrome browser that explicitly lies about being automated. This bypasses 99% of bot detection systems.