Skip to main content

Command Palette

Search for a command to run...

Essential AI Prompts to Boost Your Scrapy Development

Updated
•4 min read

Using AI tools like GitHub Copilot, ChatGPT, Gemini Code Assist can significantly speed up your Scrapy workflow. However, the quality of the output depends heavily on the quality of your prompt. Here are detailed prompts for various Scrapy use cases.

1. Creating a New Spider

Use Case: You want to create a basic spider to scrape a list of products.

Prompt:

"Create a Scrapy spider named ProductSpider for the domain example.com.

  • Title: h2.product-title::text

* Price: .price::text (clean it to be a float) * Link: a.product-link::attr(href)

  • Pagination: Follow the link in a.next-page::attr(href) recursively.

  • Output: Yield a dictionary for each product. Please include the necessary imports and the full spider class."

2. Generating Configuration (Settings)

Use Case: You need a robust settings.py file that avoids bans and rotates user agents.

Prompt:

"Generate a settings.py configuration for a Scrapy project with the following requirements:

  1. Politeness: Set a download delay of 2 seconds and enable RANDOMIZE_DOWNLOAD_DELAY.

  2. User Agents: Configure a middleware to rotate user agents (assume scrapy-user-agents is installed).

  3. Robots.txt: Respect robots.txt rules.

  4. Concurrency: Limit concurrent requests to 16.

  5. Logging: Set log level to INFO and save logs to scrapy.log. Provide the code snippet to add to settings.py."

3. Integrating Selenium

Use Case: You need to scrape a site that loads data via JavaScript, and you want to use Selenium.

Prompt:

"I need to integrate Selenium with Scrapy to scrape a dynamic website.

  1. Middleware: Write a custom SeleniumMiddleware that intercepts requests.

  2. Condition: It should only trigger if request.meta['selenium'] is True.

  3. Driver: Use a headless Chrome driver.

  4. Logic: The middleware should load the URL with Selenium, wait for the element div.content to appear, and then return a HtmlResponse object to Scrapy.

  5. Spider Usage: Show me how to call this in a spider's start_requests method."

4. Integrating Playwright

Use Case: You want to use the modern scrapy-playwright plugin for better performance.

Prompt:

"I want to use scrapy-playwright for my Scrapy project.

  1. Settings: Show me the DOWNLOAD_HANDLERS and TWISTED_REACTOR configuration needed in settings.py.

  2. Spider: Write a spider that uses Playwright to visit https://example.com/infinite-scroll.

  3. Interaction: The spider should scroll to the bottom of the page to trigger lazy loading before extracting data.

  4. Context: Explain how to pass playwright=True in the request meta."

5. Writing Complex XPath Selectors

Use Case: You are stuck trying to select a specific element.

Prompt:

"I have the following HTML snippet:

<div class="product">
  <div class="header">
    <span class="category">Electronics</span>
  </div>
  <div class="details">
    <label>Price:</label> <span>$500</span>
    <label>Stock:</label> <span>In Stock</span>
  </div>
</div>

Write an XPath selector to extract the price ('$500') specifically by looking for the 'Price:' label and getting its following sibling. Also, write a selector to get the category text."

6. Debugging a Spider

Use Case: Your spider is running but not finding any items.

Prompt:

"My Scrapy spider visits https://example.com but yields 0 items.

  • Logs: The logs show 200 OK responses.

  • Code: Here is my parse method: [INSERT CODE].

  • Issue: response.css('.item') returns an empty list.

  • Question: What are the common reasons for this? Could it be JavaScript rendering? How can I verify if the content is loaded dynamically using Scrapy shell or open_in_browser?"

7. Data Cleaning Pipeline

Use Case: You want to clean the scraped data before saving it.

Prompt:

"Write a Scrapy Item Pipeline named PriceCleaningPipeline.

  • Input: An item with a price field (e.g., '$1,200.50').

  • Logic: Remove the '$' and ',' characters and convert the string to a float.

  • Error Handling: If the price is missing or invalid, drop the item using DropItem.

  • Configuration: Show how to enable this pipeline in settings.py."

Conclusion

Using these detailed prompts will help you get accurate, working code snippets from AI tools, saving you time and effort in your Scrapy projects.

More from this blog

Tech Priya

24 posts

Tech Priya is a knowledge blog where electronics, Python, and core tech concepts are explained using real-world analogies in Kannada-English, making learning clear, relatable, and enjoyable.