CSS & XPath Selectors: A Beginner's Guide

Web scraping is all about selecting the right data. If you can't select it, you can't scrape it. In this guide, we will break down CSS and XPath selectors from the very basics to advanced filtering, so even if you've never used them before, you'll be a pro by the end.

1. What are Selectors?

Imagine a webpage is like a library.

HTML is the building.
Elements (like <div>, <a>, <p>) are the books.
Selectors are the instructions to find a specific book (e.g., "Go to the 3rd shelf, 2nd book from the left").

Scrapy uses two types of selectors:

CSS Selectors: Easy to read, similar to how you style websites.
XPath Selectors: More powerful, allows complex logic.

2. CSS Selectors: The Basics

CSS selectors are great for simple tasks.

Selecting by Tag

To select all paragraphs <p>:

response.css('p')

Selecting by Class (`.`)

To select elements with class="price":

response.css('.price')

Example HTML: <div class="price">100</div>

Selecting by ID (`#`)

To select an element with id="main-title":

response.css('#main-title')

Example HTML: <h1 id="main-title">Welcome</h1>

Combining Them

To select a div that has the class quote:

response.css('div.quote')

Nested Selection (Descendants)

To select a span inside a div with class quote:

response.css('div.quote span')

3. XPath Selectors: The Powerhouse

XPath looks a bit like a file path on your computer.

Selecting by Tag

To select all div elements:

response.xpath('//div')

// means "search anywhere in the document".
/ means "direct child" (must be immediately inside).

Selecting by Attribute

To select a div with class="quote":

response.xpath('//div[@class="quote"]')

@ is used for attributes (class, id, href, src, etc.).

Selecting by Text

This is where XPath shines. To select a button that says "Next Page":

response.xpath('//button[text()="Next Page"]')

Contains (Partial Match)

If the class is product-item active and you just want to match product-item:

response.xpath('//div[contains(@class, "product-item")]')

Or matching text that contains "Price":

response.xpath('//span[contains(text(), "Price")]')

4. Extracting Data: Getting the Good Stuff

Once you've selected the element, you need to extract the data (text, link, etc.).

Getting Text

CSS:

response.css('span.text::text').get()

XPath:

response.xpath('//span[@class="text"]/text()').get()

Getting Attributes (Links, Images)

To get the URL from <a href="https://example.com">:

CSS:

response.css('a::attr(href)').get()

XPath:

response.xpath('//a/@href').get()

`get()` vs `getall()`

get(): Returns the first match as a string.
getall(): Returns all matches as a list of strings.

# Get all quotes on the page
quotes = response.css('div.quote span.text::text').getall()

5. Advanced Filtering and Logic

Sometimes simple selection isn't enough.

"OR" Logic

Select h1 OR h2 tags:

response.xpath('//h1 | //h2')

"AND" Logic

Select a div that has BOTH class="item" AND data-id="123":

response.xpath('//div[@class="item" and @data-id="123"]')

Selecting Based on Position

Select the first item in a list:

response.xpath('//ul/li[1]')

Select the last item:

response.xpath('//ul/li[last()]')

Selecting Siblings (Neighbors)

Imagine this HTML:

<div class="label">Price:</div>
<div class="value">$50</div>

You want the price, but it has no unique class. You can find the "Price:" label and get the next element.

response.xpath('//div[text()="Price:"]/following-sibling::div[1]/text()').get()

Selecting Parent

You found a "Buy Now" button and want to get the product title, which is in a parent container.

response.xpath('//button[@class="buy-now"]/../h2/text()').get()

.. moves up to the parent.

6. Real-World Cheat Sheet

Goal	CSS Example	XPath Example
Get ID	`#header`	`//*[@id="header"]`
Get Class	`.item`	`//*[@class="item"]`
Get Attribute	`a::attr(href)`	`//a/@href`
Get Text	`p::text`	`//p/text()`
Contains Text	Not supported	`//div[contains(text(), "Hello")]`
Parent	Not supported	`//div/..`
Next Sibling	`div + span`	`//div/following-sibling::span[1]`

7. How to Practice

Open any website (e.g., quotes.toscrape.com).
Open your terminal and run: scrapy shell "https://quotes.toscrape.com"

Try typing these commands:

 >>> response.css('title::text').get()
 'Quotes to Scrape'
 >>> response.xpath('//span[@class="text"]/text()').get()
 '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”'

Conclusion

CSS is great for speed and simplicity. XPath is essential for complex navigation (parents, siblings, text matching). Mastering both gives you the superpower to scrape almost any website!

Beginner's Guide to Mastering CSS and XPath Selectors

1. What are Selectors?

2. CSS Selectors: The Basics

Selecting by Tag

Selecting by Class (`.`)

Selecting by ID (`#`)

Combining Them

Nested Selection (Descendants)

3. XPath Selectors: The Powerhouse

Selecting by Tag

Selecting by Attribute

Selecting by Text

Contains (Partial Match)

4. Extracting Data: Getting the Good Stuff

Getting Text

Getting Attributes (Links, Images)

`get()` vs `getall()`

5. Advanced Filtering and Logic

"OR" Logic

"AND" Logic

Selecting Based on Position

Selecting Siblings (Neighbors)

Selecting Parent

6. Real-World Cheat Sheet

7. How to Practice

Conclusion

Comments

Mastering Web Scraping with Scrapy: From Zero to Hero

Essential AI Prompts to Boost Your Scrapy Development

More from this blog

The Physics of Resistance: Suresh the Security Guard and Ohm's Law

How to Avoid Bot Detection Using Scrapy and Playwright

How to Use Scrapy for Stealthy Web Scraping Without Getting Caught

The Ultimate Decision Guide: Scrapy vs. Playwright vs. Selenium vs. Proxies

Essential AI Prompts to Boost Your Scrapy Development

Command Palette

1. What are Selectors?

2. CSS Selectors: The Basics

Selecting by Tag

Selecting by Class (.)

Selecting by ID (#)

Combining Them

Nested Selection (Descendants)

3. XPath Selectors: The Powerhouse

Selecting by Tag

Selecting by Attribute

Selecting by Text

Contains (Partial Match)

4. Extracting Data: Getting the Good Stuff

Getting Text

Getting Attributes (Links, Images)

get() vs getall()

5. Advanced Filtering and Logic

"OR" Logic

"AND" Logic

Selecting Based on Position

Selecting Siblings (Neighbors)

Selecting Parent

6. Real-World Cheat Sheet

7. How to Practice

Conclusion

Comments

Mastering Web Scraping with Scrapy: From Zero to Hero

Essential AI Prompts to Boost Your Scrapy Development

More from this blog

Selecting by Class (`.`)

Selecting by ID (`#`)

`get()` vs `getall()`