Essential AI Prompts to Boost Your Scrapy Development
Using AI tools like GitHub Copilot, ChatGPT, Gemini Code Assist can significantly speed up your Scrapy workflow. However, the quality of the output depends heavily on the quality of your prompt. Here are detailed prompts for various Scrapy use cases.
1. Creating a New Spider
Use Case: You want to create a basic spider to scrape a list of products.
Prompt:
"Create a Scrapy spider named
ProductSpiderfor the domainexample.com.
Start URL:
https://example.com/productsItems to Extract:
- Title:
h2.product-title::text
* Price:
.price::text(clean it to be a float) * Link:a.product-link::attr(href)
Pagination: Follow the link in
a.next-page::attr(href)recursively.Output: Yield a dictionary for each product. Please include the necessary imports and the full spider class."
2. Generating Configuration (Settings)
Use Case: You need a robust settings.py file that avoids bans and rotates user agents.
Prompt:
"Generate a
settings.pyconfiguration for a Scrapy project with the following requirements:
Politeness: Set a download delay of 2 seconds and enable
RANDOMIZE_DOWNLOAD_DELAY.User Agents: Configure a middleware to rotate user agents (assume
scrapy-user-agentsis installed).Robots.txt: Respect
robots.txtrules.Concurrency: Limit concurrent requests to 16.
Logging: Set log level to INFO and save logs to
scrapy.log. Provide the code snippet to add tosettings.py."
3. Integrating Selenium
Use Case: You need to scrape a site that loads data via JavaScript, and you want to use Selenium.
Prompt:
"I need to integrate Selenium with Scrapy to scrape a dynamic website.
Middleware: Write a custom
SeleniumMiddlewarethat intercepts requests.Condition: It should only trigger if
request.meta['selenium']is True.Driver: Use a headless Chrome driver.
Logic: The middleware should load the URL with Selenium, wait for the element
div.contentto appear, and then return aHtmlResponseobject to Scrapy.Spider Usage: Show me how to call this in a spider's
start_requestsmethod."
4. Integrating Playwright
Use Case: You want to use the modern scrapy-playwright plugin for better performance.
Prompt:
"I want to use
scrapy-playwrightfor my Scrapy project.
Settings: Show me the
DOWNLOAD_HANDLERSandTWISTED_REACTORconfiguration needed insettings.py.Spider: Write a spider that uses Playwright to visit
https://example.com/infinite-scroll.Interaction: The spider should scroll to the bottom of the page to trigger lazy loading before extracting data.
Context: Explain how to pass
playwright=Truein the request meta."
5. Writing Complex XPath Selectors
Use Case: You are stuck trying to select a specific element.
Prompt:
"I have the following HTML snippet:
<div class="product"> <div class="header"> <span class="category">Electronics</span> </div> <div class="details"> <label>Price:</label> <span>$500</span> <label>Stock:</label> <span>In Stock</span> </div> </div>Write an XPath selector to extract the price ('$500') specifically by looking for the 'Price:' label and getting its following sibling. Also, write a selector to get the category text."
6. Debugging a Spider
Use Case: Your spider is running but not finding any items.
Prompt:
"My Scrapy spider visits
https://example.combut yields 0 items.
Logs: The logs show
200 OKresponses.Code: Here is my parse method:
[INSERT CODE].Issue:
response.css('.item')returns an empty list.Question: What are the common reasons for this? Could it be JavaScript rendering? How can I verify if the content is loaded dynamically using Scrapy shell or
open_in_browser?"
7. Data Cleaning Pipeline
Use Case: You want to clean the scraped data before saving it.
Prompt:
"Write a Scrapy Item Pipeline named
PriceCleaningPipeline.
Input: An item with a
pricefield (e.g., '$1,200.50').Logic: Remove the '$' and ',' characters and convert the string to a float.
Error Handling: If the price is missing or invalid, drop the item using
DropItem.Configuration: Show how to enable this pipeline in
settings.py."
Conclusion
Using these detailed prompts will help you get accurate, working code snippets from AI tools, saving you time and effort in your Scrapy projects.