Skip to main content

Command Palette

Search for a command to run...

How to Set Up a Scrapy Project: A Beginner's Guide

Updated
•2 min read
R

I’m Ravikirana B – an engineer driven by curiosity and clarity. My work sits at the intersection of hardware and software. I specialize in Python programming and electronics, building real-world solutions that don’t just work—they make sense. I started 'Tech Priya' with a simple mission: to share the joy of technology. "Priya" means dear or beloved, and this platform is dedicated to everyone who loves to understand the "why" and "how" behind the machines we use every day. What you’ll find here: 🔌 Electronics Simplified: Complex circuits explained with relatable analogies (think water tanks, gates, and traffic flows). 🐍 Python in Practice: Automation ideas, coding insights, and tool development. 💡 Real Reflections: Honest takes on tech, bridging the gap between textbook theory and hands-on reality. 🌿 Native Connection: Tech concepts explained with a Kannada-English touch to make learning feel like home. I believe technology shouldn't be a barrier. Whether you are a student from a small town or a self-learner with big dreams, Tech Priya is here to make the complex simple. Let’s keep exploring—clearly, curiously, and together. 🙌

Creating a New Scrapy Project

Once Scrapy is installed, the first step is to set up a new project. Navigate to the directory where you want to store your code and run:

scrapy startproject myproject

This will create a myproject directory with the following structure:

myproject/
    scrapy.cfg            # deploy configuration file
    myproject/            # project's Python module, you'll import your code from here
        __init__.py
        items.py          # project items definition file
        middlewares.py    # project middlewares file
        pipelines.py      # project pipelines file
        settings.py       # project settings file
        spiders/          # a directory where you'll later put your spiders
            __init__.py

Understanding the Project Structure

  • scrapy.cfg: The project configuration file. It defines the project settings module.

  • items.py: Defines the data structures (containers) for the scraped data, similar to Django models.

  • middlewares.py: Hooks to process requests and responses globally.

  • pipelines.py: Processes the scraped items (e.g., cleaning data, saving to a database).

  • settings.py: Contains project settings like user agent, download delay, and enabled pipelines.

  • spiders/: This is where your "spiders" (the classes that define how to scrape a site) will live.

Basic Scrapy Commands

Scrapy provides a command-line tool to control your project. Here are some common commands:

  • scrapy shell [url]: Opens an interactive shell to try out selectors and debug.

  • scrapy crawl [spider_name]: Runs a spider.

  • scrapy genspider [name] [domain]: Generates a new spider file.

Your First Spider

Let's create a simple spider to scrape quotes from quotes.toscrape.com.

  1. Navigate into your project: cd myproject

  2. Generate a spider: scrapy genspider quotes quotes.toscrape.com

This creates myproject/spiders/quotes.py. Let's edit it:

import scrapy


class QuotesSpider(scrapy.Spider):
    name = "quotes"
    allowed_domains = ["quotes.toscrape.com"]
    start_urls = ["https://quotes.toscrape.com/"]

    def parse(self, response):
        for quote in response.css("div.quote"):
            yield {
                "text": quote.css("span.text::text").get(),
                "author": quote.css("small.author::text").get(),
            }

Running the Spider

To run the spider and save the output to a JSON file:

scrapy crawl quotes -O quotes.json

This command runs the quotes spider and outputs the results to quotes.json.

Next Steps

In the next article, we will compare Scrapy with other tools like Selenium and Playwright to understand when to use which.

More from this blog

Tech Priya

24 posts

Tech Priya is a knowledge blog where electronics, Python, and core tech concepts are explained using real-world analogies in Kannada-English, making learning clear, relatable, and enjoyable.