Skip to main content

Command Palette

Search for a command to run...

Step-by-Step Guide to Using Scrapy with Playwright

Updated
•2 min read

Playwright is a newer, faster, and more reliable browser automation tool than Selenium. Integrating it with Scrapy is often preferred for modern web scraping projects.

Why Playwright?

  • Faster: Generally faster execution than Selenium.

  • Better Waiting: Auto-waits for elements to be ready.

  • Modern Web Support: Better handling of modern web features.

Setup

We will use the scrapy-playwright plugin, which makes integration seamless.

  1. Install the package:

     pip install scrapy-playwright
     playwright install
    

Configuration

Update your settings.py to enable the scrapy-playwright download handler:

# settings.py

DOWNLOAD_HANDLERS = {
    "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}

TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

Using Playwright in Your Spider

To use Playwright for a request, you simply need to pass meta={"playwright": True}.

# spiders/playwright_spider.py
import scrapy


class PlaywrightSpider(scrapy.Spider):
    name = "playwright_spider"

    def start_requests(self):
        yield scrapy.Request(
            url="https://example.com/dynamic",
            meta={"playwright": True},
            callback=self.parse
        )

    def parse(self, response):
        # The response is now the rendered HTML from Playwright
        yield {
            "text": response.css("div.content::text").get()
        }

Advanced Usage: Page Interactions

You can also interact with the page using playwright_page_methods.

from scrapy_playwright.page import PageMethod


def start_requests(self):
    yield scrapy.Request(
        url="https://example.com/login",
        meta={
            "playwright": True,
            "playwright_page_methods": [
                PageMethod("fill", "input[name='user']", "myuser"),
                PageMethod("fill", "input[name='pass']", "mypass"),
                PageMethod("click", "button[type='submit']"),
                PageMethod("wait_for_selector", "div.dashboard"),
            ],
        },
        callback=self.parse_dashboard
    )

Comparison with Selenium Integration

FeatureScrapy + SeleniumScrapy + Playwright
SetupManual MiddlewarePlugin (scrapy-playwright)
SpeedSlowerFaster
Ease of UseModerateEasy (with plugin)
ReliabilityGoodExcellent

Conclusion

For new projects requiring JavaScript rendering, Scrapy + Playwright is the recommended approach due to its performance and ease of integration.

Next Steps

In the next article, we will discuss how to debug Scrapy spiders effectively.

More from this blog

Tech Priya

24 posts

Tech Priya is a knowledge blog where electronics, Python, and core tech concepts are explained using real-world analogies in Kannada-English, making learning clear, relatable, and enjoyable.