{"id":64373,"date":"2025-01-21T13:12:28","date_gmt":"2025-01-21T13:12:28","guid":{"rendered":"https:\/\/proxidize.com\/?post_type=blog&#038;p=64373"},"modified":"2025-10-02T12:07:03","modified_gmt":"2025-10-02T11:07:03","slug":"scrapy-playwright","status":"publish","type":"blog","link":"https:\/\/proxidize.com\/blog\/scrapy-playwright\/","title":{"rendered":"Guide to Using Scrapy Playwright"},"content":{"rendered":"\n<p>Scrapy Playwright is a library that adds JavaScript rendering to Scrapy. It allows users to instruct a headless browser to scrape dynamic web pages and simulate human behavior to reduce getting spiders blocked. Scrapy is used as a web scraping library with comprehensive architecture support for common web scraping processes. Despite its power, it does lack JavaScript rendering. This is where Playwright comes in. This tutorial will explain how to set up and install Scrapy Playwright and how and why to use proxies with your setup.&nbsp;<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized centered\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1010\" height=\"569\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/01\/what-is-scrapy-playwright-2.png\" alt=\"Image of a large web page with a person sitting on top holding a computer, a second person is standing on a ladder and interacting with the web page. Text above the image reads &quot;What is Scrapy Playwright&quot;\" class=\"wp-image-64390\" style=\"object-fit:cover\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/01\/what-is-scrapy-playwright-2.png 1010w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/01\/what-is-scrapy-playwright-2-300x169.png 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/01\/what-is-scrapy-playwright-2-768x433.png 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/01\/what-is-scrapy-playwright-2-600x338.png 600w\" sizes=\"(max-width: 1010px) 100vw, 1010px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">What is Scrapy Playwright?<\/h2>\n\n\n\n<p>Scrapy Playwright is an integration between Scrapy and Playwright. It allows users to scrape dynamic web pages with Scrapy by processing the <a href=\"https:\/\/proxidize.com\/use-cases\/web-scraping\/\">web scraping<\/a> requests using a Playwright instance. It also enables most of Playwright\u2019s features including simulating mouse and keyboard actions, waiting for events, load states, and HTML elements, taking screenshots, and executing custom <a href=\"https:\/\/proxidize.com\/blog\/what-is-javascript\/\" target=\"_blank\" rel=\"noreferrer noopener\">JavaScript<\/a> code.<\/p>\n\n\n\n<p><a href=\"https:\/\/playwright.dev\/\" target=\"_blank\" rel=\"noopener\">Playwright.js<\/a> is a recent addition to the programming world as it was released by Microsoft in 2020 and is quickly becoming a popular <a href=\"https:\/\/proxidize.com\/blog\/headless-browser\/\">headless browser<\/a> library for browser automation and web scraping. This is in part due to its cross-browser support and developer experience improvements over <a href=\"https:\/\/proxidize.com\/blog\/puppeteer-alternatives\/\">Puppeteer<\/a>.<\/p>\n\n\n\n<p>Scrapy is a fast and powerful Python web scraping framework that can be used to efficiently crawl websites and scrape their data. We had previously written a <a href=\"https:\/\/proxidize.com\/blog\/scrapy-web-scraping\/\">guide showing how to use Scrapy in Python.<\/a> However, using Scrapy is often more complex when compared to other scraping libraries such as BeautifulSoup. If you wish to use a headless browser, you will need to install additional dependencies and configure settings parameters.&nbsp;<\/p>\n\n\n\n<p>Websites that rely on JavaScript to render their content need a tool that can handle the dynamic content, which is where Playwright comes in. It is an open-source automation library that is great for end-to-end testing and can perform web scraping. By combining both tools, Scrapy Playwright can assist with carrying out complex web scraping tasks. Some users choose to implement <a href=\"https:\/\/github.com\/scrapy-plugins\/scrapy-splash\" target=\"_blank\" rel=\"noopener\">Scrapy Splash<\/a> to handle JavaScript-heavy websites but Playwright remains a widely adopted tool for many Scrapy users thanks to its powerful features and extensive documentation.&nbsp;<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized centered\"><img decoding=\"async\" width=\"1010\" height=\"569\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/01\/installing-scrapy-playwright-2.png\" alt=\"Image of a computer with a tractor on top of it carrying a web page. Text above the image reads &quot;Installing Scrapy Playwright&quot;\" class=\"wp-image-64387\" style=\"object-fit:cover\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/01\/installing-scrapy-playwright-2.png 1010w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/01\/installing-scrapy-playwright-2-300x169.png 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/01\/installing-scrapy-playwright-2-768x433.png 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/01\/installing-scrapy-playwright-2-600x338.png 600w\" sizes=\"(max-width: 1010px) 100vw, 1010px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Installing Scrapy Playwright<\/h2>\n\n\n\n<p>The first thing you would need to do before you write a Scrapy Playwright script is to install all the necessary libraries. We will be installing a few Python libraries:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Scrapy for creating a Scrapy project and executing the scraping spiders.&nbsp;<\/li>\n\n\n\n<li>Scrapy-playwright for processing the requests using Playwright.&nbsp;<\/li>\n\n\n\n<li>Playwright which is the API for automating the headless browsers.&nbsp;<\/li>\n<\/ul>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>Scrapy Playwright is written in Python so the first step is to ensure you have the latest version downloaded. In your terminal, enter the following prompt:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><textarea class=\"code-block-pro-copy-button-textarea\" aria-hidden=\"true\" readonly>python \u2013-version<\/textarea><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #EA9A97\">python<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">\u2013-version<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>Check the <a href=\"https:\/\/www.python.org\/\" target=\"_blank\" rel=\"noopener\">official Python website<\/a> to ensure you have the latest version. As of the writing of this article, that version is 3.13.1<\/p>\n\n\n\n<p>Once that is done, you must create a scrapy-playwright-project folder in your Python virtual environment. This can be done with the following command written in your terminal:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><textarea class=\"code-block-pro-copy-button-textarea\" aria-hidden=\"true\" readonly>mkdir scrapy-playwright-project\ncd scrapy-playwright-project\npython3 -m venv env<\/textarea><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #EA9A97\">mkdir<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">scrapy-playwright-project<\/span><\/span>\n<span class=\"line\"><span style=\"color: #EB6F92; font-style: italic\">cd<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">scrapy-playwright-project<\/span><\/span>\n<span class=\"line\"><span style=\"color: #EA9A97\">python3<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">-m<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">venv<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">env<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>This will create a folder within your primary folder where everything will be placed. After that, install Scrapy. This might take a minute or so to initiate.<br><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><textarea class=\"code-block-pro-copy-button-textarea\" aria-hidden=\"true\" readonly>pip3 install scrapy<\/textarea><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #EA9A97\">pip3<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">install<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">scrapy<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>Once Scrapy has been installed in your environment, you need to open a Scrapy file. This can be done with the following command:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><textarea class=\"code-block-pro-copy-button-textarea\" aria-hidden=\"true\" readonly>scrapy startproject playwright_scraper<\/textarea><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #EA9A97\">scrapy<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">startproject<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">playwright_scraper<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>The scrapy-playwright-project file will now have the Scrapy files loaded into it. This includes the init, items, middlewares, pipelines, settings, and spiders files.&nbsp;<\/p>\n\n\n\n<p>The next step you need to follow is to install the Scrapy Playwright library onto your virtual environment terminal. To do this, you must run the following command. Once this is done, Playwright will be added as part of your project dependencies by default.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><textarea class=\"code-block-pro-copy-button-textarea\" aria-hidden=\"true\" readonly>pip3 install scrapy-playwright<\/textarea><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #EA9A97\">pip3<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">install<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">scrapy-playwright<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>Finally, complete the Playwright configuration with the Chromium system dependencies by using this command in your terminal:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><textarea class=\"code-block-pro-copy-button-textarea\" aria-hidden=\"true\" readonly>playwright install chromium<\/textarea><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #EA9A97\">playwright<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">install<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">chromium<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>If you wish to use a non-Chromium based browser, simply replace chromium with the name of your target browser engine. For the purposes of this guide, we will be using chromium.<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized centered\"><img decoding=\"async\" width=\"1010\" height=\"569\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/01\/setting-up-scrapy-playwright-2.png\" alt=\"Image of two construction workers staring at a crane that is carrying web pages out of a phone. Text above the image reads &quot;Setting Up Scrapy Playwright&quot;\" class=\"wp-image-64388\" style=\"object-fit:cover\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/01\/setting-up-scrapy-playwright-2.png 1010w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/01\/setting-up-scrapy-playwright-2-300x169.png 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/01\/setting-up-scrapy-playwright-2-768x433.png 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/01\/setting-up-scrapy-playwright-2-600x338.png 600w\" sizes=\"(max-width: 1010px) 100vw, 1010px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Setting Up Scrapy Playwright<\/h2>\n\n\n\n<p>Before we start writing out the script, there is one more step you must follow; setting up Scrapy Playwright within your environment. The previous steps are there to introduce the necessary libraries needed for the script to work, this step is the start of your web scraping project.<\/p>\n\n\n\n<p>Open the settings.py file that was created during the \u201cscrapy startproject playwright_scraper\u201d command. It should be located in your directory under the playwright_scraper file. Add the following lines to configure ScrapyPlaywrightDownloadHandler as the default http\/https handler. This will allow Scrapy to perform HTTP or HTTPs requests through Playwright.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><textarea class=\"code-block-pro-copy-button-textarea\" aria-hidden=\"true\" readonly>DOWNLOAD_HANDLERS = {\n        &#8220;http&#8221;: &#8220;scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler&#8221;,\n        &#8220;https&#8221;: &#8220;scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler&#8221;,\n\t}<\/textarea><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">DOWNLOAD_HANDLERS<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #F6C177\">&quot;http&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #F6C177\">&quot;https&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">\t<\/span><span style=\"color: #908CAA\">}<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>After you enter that within the bottom of the settings.py script, you will need to enable the <a href=\"https:\/\/stackoverflow.com\/questions\/69247524\/how-to-run-async-twisted-reactor-and-asyncio-loops-infinitely\" target=\"_blank\" rel=\"noopener\">asyncio-based Twisted reactor.<\/a> However, most recent versions of Playwright should already have it in the document so ensure it is there before placing it.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><textarea class=\"code-block-pro-copy-button-textarea\" aria-hidden=\"true\" readonly>TWISTED_REACTOR = &#8220;twisted.internet.asyncioreactor.AsyncioSelectorReactor&#8221;<\/textarea><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">TWISTED_REACTOR<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;twisted.internet.asyncioreactor.AsyncioSelectorReactor&quot;<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>By default, Playwright will operate in headless mode. If you do wish to see your actions performed, add this value to the settings.py script:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><textarea class=\"code-block-pro-copy-button-textarea\" aria-hidden=\"true\" readonly>PLAYWRIGHT_LAUNCH_OPTIONS = {\n        &#8220;headless&#8221;: False,\n\t}<\/textarea><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">PLAYWRIGHT_LAUNCH_OPTIONS<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #F6C177\">&quot;headless&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">False<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">\t<\/span><span style=\"color: #908CAA\">}<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>Chrome will initiate in headed mode with the UI. You must keep in mind that the browser will not show up on <a href=\"https:\/\/ubuntu.com\/desktop\/wsl\" target=\"_blank\" rel=\"noopener\">WSL<\/a> as it is just bash and not a GUI desktop.&nbsp;<\/p>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Writing the Scrapy Python Script<\/h3>\n\n\n\n<p>For this Scrapy Playwright example, we will be using <a href=\"https:\/\/scrapingclub.com\/exercise\/list_infinite_scroll\/\" target=\"_blank\" rel=\"noopener\">this URL<\/a> from the ScrapingClub exercise. It loads dynamic content which loads more products as you scroll. The first thing you need to do is enter the following command in your terminal:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><textarea class=\"code-block-pro-copy-button-textarea\" aria-hidden=\"true\" readonly>scrapy genspider scraping_club https:\/\/scrapingclub.com\/exercise\/list_infinite_scroll\/<\/textarea><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #EA9A97\">scrapy<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">genspider<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">scraping_club<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">https:\/\/scrapingclub.com\/exercise\/list_infinite_scroll\/<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>Doing this will create a new Scrapy spider called \u201cscraping_club.py\u201d. This is where we will be writing the script. Once you open the file, you should see this initial script:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><textarea class=\"code-block-pro-copy-button-textarea\" aria-hidden=\"true\" readonly>import scrapy\n    \n    class ScrapingClubSpider(scrapy.Spider):\n        name = &#8220;scraping_club&#8221;\n        allowed_domains = [&#8220;scrapingclub.com&#8221;]\n        start_urls = [&#8220;https:\/\/scrapingclub.com\/exercise\/list_infinite_scroll\/&#8221;]\n    \n        def parse(self, response):\n            pass<\/textarea><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">import<\/span><span style=\"color: #E0DEF4\"> scrapy<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">class<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #9CCFD8\">ScrapingClubSpider<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">scrapy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #C4A7E7; font-style: italic\">Spider<\/span><span style=\"color: #908CAA\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        name <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;scraping_club&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        allowed_domains <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">[<\/span><span style=\"color: #F6C177\">&quot;scrapingclub.com&quot;<\/span><span style=\"color: #908CAA\">]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        start_urls <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">[<\/span><span style=\"color: #F6C177\">&quot;https:\/\/scrapingclub.com\/exercise\/list_infinite_scroll\/&quot;<\/span><span style=\"color: #908CAA\">]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">def<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">parse<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">self<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #C4A7E7; font-style: italic\">response<\/span><span style=\"color: #908CAA\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #3E8FB0\">pass<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>In order to open the page in Chrome through Playwright, rather than making an HTTP GET request for the first page the spider should visit, implement the start_requests() method instead of specifying the starting URL in start_urls. This should look something like this:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><textarea class=\"code-block-pro-copy-button-textarea\" aria-hidden=\"true\" readonly>def start_requests(self):\n        url = &#8220;https:\/\/scrapingclub.com\/exercise\/list_infinite_scroll\/&#8221;\n        yield scrapy.Request(url, meta={&#8220;playwright&#8221;: True})<\/textarea><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">def<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">start_requests<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">self<\/span><span style=\"color: #908CAA\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        url <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;https:\/\/scrapingclub.com\/exercise\/list_infinite_scroll\/&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">yield<\/span><span style=\"color: #E0DEF4\"> scrapy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">Request<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">url<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #C4A7E7; font-style: italic\">meta<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #908CAA\">{<\/span><span style=\"color: #F6C177\">&quot;playwright&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">True<\/span><span style=\"color: #908CAA\">})<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>The meta={&#8220;playwright&#8221;: True} argument will tell Scrapy to route the request through Scrapy Playwright. Since start_requests() replaces start_urls, you can remove the attribute from the class. This is what the complete code of your new spider should look like:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><textarea class=\"code-block-pro-copy-button-textarea\" aria-hidden=\"true\" readonly>import scrapy\n    \n    class ScrapingClubSpider(scrapy.Spider):\n        name = &#8220;scraping_club&#8221;\n        allowed_domains = [&#8220;scrapingclub.com&#8221;]\n    \n        def start_requests(self):\n            url = &#8220;https:\/\/scrapingclub.com\/exercise\/list_infinite_scroll\/&#8221;\n            yield scrapy.Request(url, meta={&#8220;playwright&#8221;: True})\n    \n        def parse(self, response):\n            # scraping logic&#8230;\n            pass<\/textarea><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">import<\/span><span style=\"color: #E0DEF4\"> scrapy<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">class<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #9CCFD8\">ScrapingClubSpider<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">scrapy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #C4A7E7; font-style: italic\">Spider<\/span><span style=\"color: #908CAA\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        name <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;scraping_club&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        allowed_domains <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">[<\/span><span style=\"color: #F6C177\">&quot;scrapingclub.com&quot;<\/span><span style=\"color: #908CAA\">]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">def<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">start_requests<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">self<\/span><span style=\"color: #908CAA\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            url <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;https:\/\/scrapingclub.com\/exercise\/list_infinite_scroll\/&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #3E8FB0\">yield<\/span><span style=\"color: #E0DEF4\"> scrapy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">Request<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">url<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #C4A7E7; font-style: italic\">meta<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #908CAA\">{<\/span><span style=\"color: #F6C177\">&quot;playwright&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">True<\/span><span style=\"color: #908CAA\">})<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">def<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">parse<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">self<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #C4A7E7; font-style: italic\">response<\/span><span style=\"color: #908CAA\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> scraping logic...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #3E8FB0\">pass<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>Next up, you need to implement the scraping logic in the parse() method. Open the website in your browser and inspect a product HTML mode with DevTools to define a data extraction strategy. For this example, we have chosen a snippet that selects all product HTML elements with the css() function to employ CSS selectors. Then, it will iterate over them to extract their data and use yield to create a new set of scraped items.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><textarea class=\"code-block-pro-copy-button-textarea\" aria-hidden=\"true\" readonly>def parse(self, response):\n        # iterate over the product elements\n        for product in response.css(&#8220;.post&#8221;):\n            # scrape product data\n            url = product.css(&#8220;a&#8221;).attrib[&#8220;href&#8221;]\n            image = product.css(&#8220;.card-img-top&#8221;).attrib[&#8220;src&#8221;]\n            name = product.css(&#8220;h4 a::text&#8221;).get()\n            price = product.css(&#8220;h5::text&#8221;).get()\n    \n            # add the data to the list of scraped items\n            yield {\n                &#8220;url&#8221;: url,\n                &#8220;image&#8221;: image,\n                &#8220;name&#8221;: name,\n                &#8220;price&#8221;: price\n            }<\/textarea><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">def<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">parse<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">self<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #C4A7E7; font-style: italic\">response<\/span><span style=\"color: #908CAA\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> iterate over the product elements<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">for<\/span><span style=\"color: #E0DEF4\"> product <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">css<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;.post&quot;<\/span><span style=\"color: #908CAA\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> scrape product data<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            url <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> product<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">css<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;a&quot;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">attrib<\/span><span style=\"color: #908CAA\">[<\/span><span style=\"color: #F6C177\">&quot;href&quot;<\/span><span style=\"color: #908CAA\">]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            image <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> product<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">css<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;.card-img-top&quot;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">attrib<\/span><span style=\"color: #908CAA\">[<\/span><span style=\"color: #F6C177\">&quot;src&quot;<\/span><span style=\"color: #908CAA\">]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            name <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> product<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">css<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;h4 a::text&quot;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            price <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> product<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">css<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;h5::text&quot;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> add the data to the list of scraped items<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #3E8FB0\">yield<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&quot;url&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> url<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&quot;image&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> image<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&quot;name&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> name<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&quot;price&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> price<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #908CAA\">}<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>The full script should look like this:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><textarea class=\"code-block-pro-copy-button-textarea\" aria-hidden=\"true\" readonly>import scrapy\nclass ScrapingClubSpider(scrapy.Spider):\n    name = &#8220;scraping_club&#8221;\n    allowed_domains = [&#8220;scrapingclub.com&#8221;]\n    def start_requests(self):\n        url = &#8220;https:\/\/scrapingclub.com\/exercise\/list_infinite_scroll\/&#8221;\n        yield scrapy.Request(url, meta={&#8220;playwright&#8221;: True})\n    def parse(self, response):\n        # iterate over the product elements\n        for product in response.css(&#8220;.post&#8221;):\n            # scrape product data\n            url = product.css(&#8220;a&#8221;).attrib[&#8220;href&#8221;]\n            image = product.css(&#8220;.card-img-top&#8221;).attrib[&#8220;src&#8221;]\n            name = product.css(&#8220;h4 a::text&#8221;).get()\n            price = product.css(&#8220;h5::text&#8221;).get()\n            # add the data to the list of scraped items\n            yield {\n                &#8220;url&#8221;: url,\n                &#8220;image&#8221;: image,\n                &#8220;name&#8221;: name,\n                &#8220;price&#8221;: price\n            }<\/textarea><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">import<\/span><span style=\"color: #E0DEF4\"> scrapy<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">class<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #9CCFD8\">ScrapingClubSpider<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">scrapy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #C4A7E7; font-style: italic\">Spider<\/span><span style=\"color: #908CAA\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    name <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;scraping_club&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    allowed_domains <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">[<\/span><span style=\"color: #F6C177\">&quot;scrapingclub.com&quot;<\/span><span style=\"color: #908CAA\">]<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">def<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">start_requests<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">self<\/span><span style=\"color: #908CAA\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        url <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;https:\/\/scrapingclub.com\/exercise\/list_infinite_scroll\/&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">yield<\/span><span style=\"color: #E0DEF4\"> scrapy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">Request<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">url<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #C4A7E7; font-style: italic\">meta<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #908CAA\">{<\/span><span style=\"color: #F6C177\">&quot;playwright&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">True<\/span><span style=\"color: #908CAA\">})<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">def<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">parse<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">self<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #C4A7E7; font-style: italic\">response<\/span><span style=\"color: #908CAA\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> iterate over the product elements<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">for<\/span><span style=\"color: #E0DEF4\"> product <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">css<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;.post&quot;<\/span><span style=\"color: #908CAA\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> scrape product data<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            url <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> product<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">css<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;a&quot;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">attrib<\/span><span style=\"color: #908CAA\">[<\/span><span style=\"color: #F6C177\">&quot;href&quot;<\/span><span style=\"color: #908CAA\">]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            image <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> product<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">css<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;.card-img-top&quot;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">attrib<\/span><span style=\"color: #908CAA\">[<\/span><span style=\"color: #F6C177\">&quot;src&quot;<\/span><span style=\"color: #908CAA\">]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            name <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> product<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">css<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;h4 a::text&quot;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            price <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> product<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">css<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;h5::text&quot;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">()<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> add the data to the list of scraped items<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #3E8FB0\">yield<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&quot;url&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> url<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&quot;image&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> image<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&quot;name&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> name<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&quot;price&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> price<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #908CAA\">}<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized centered\"><img loading=\"lazy\" decoding=\"async\" width=\"1010\" height=\"569\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/01\/using-proxies-with-scrapy-playwright-2.png\" alt=\"Image of a computer with a server behind it. There are three text boxes surrounding it which reads &quot;Spider class customer_settings&quot; &quot;Meta dictionaries in start_requests&quot; and &quot;PLAYWRIGHT_CONTEXTS in settings.py&quot;. Text above the image reads &quot;Using Proxies with Scrapy Playwright&quot;\" class=\"wp-image-64389\" style=\"object-fit:cover\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/01\/using-proxies-with-scrapy-playwright-2.png 1010w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/01\/using-proxies-with-scrapy-playwright-2-300x169.png 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/01\/using-proxies-with-scrapy-playwright-2-768x433.png 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/01\/using-proxies-with-scrapy-playwright-2-600x338.png 600w\" sizes=\"(max-width: 1010px) 100vw, 1010px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Using Proxies with Scrapy Playwright<\/h2>\n\n\n\n<p>One of the biggest challenges when it comes to scraping data from the web is getting blocked by anti-scraping measures like rate limiting and IP bans. One of the most effective ways to avoid bot detection is to use a <a href=\"https:\/\/proxidize.com\/proxy-server\/%5C\">proxy server<\/a>. Once you have taken your choice of either a residential proxy, a datacenter proxy, or a <a href=\"https:\/\/proxidize.com\/proxy-server\/mobile-proxy\/\">mobile proxy<\/a>, follow these steps to implement a proxy within your Scrapy Playwright script. There are three ways you can set up proxies for your Scrapy Playwright script.<\/p>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Spider Class <code>customer_settings<\/code><\/h3>\n\n\n\n<p>You can add the proxy settings as launch options within the <code>custom_settings<\/code> parameter used by the Scrapy Spider class:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><textarea class=\"code-block-pro-copy-button-textarea\" aria-hidden=\"true\" readonly>custom_settings = {\n   \t&#8220;PLAYWRIGHT_LAUNCH_OPTIONS&#8221;: {\n       \t&#8220;proxy&#8221;: {\n           \t&#8220;server&#8221;: &#8220;X&#8221;,\n           \t&#8220;username&#8221;: &#8220;username&#8221;,\n           \t&#8220;password&#8221;: &#8220;password&#8221;,\n       \t},\n   \t}\n   }<\/textarea><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #E0DEF4\">custom_settings <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">   \t<\/span><span style=\"color: #F6C177\">&quot;PLAYWRIGHT_LAUNCH_OPTIONS&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">       \t<\/span><span style=\"color: #F6C177\">&quot;proxy&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">           \t<\/span><span style=\"color: #F6C177\">&quot;server&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;X&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">           \t<\/span><span style=\"color: #F6C177\">&quot;username&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;username&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">           \t<\/span><span style=\"color: #F6C177\">&quot;password&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;password&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">       \t<\/span><span style=\"color: #908CAA\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">   \t<\/span><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">   <\/span><span style=\"color: #908CAA\">}<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Meta dictionary in <code>start_requests<\/code><\/h3>\n\n\n\n<p>You can define the proxy within the <code>start_requests<\/code> function by passing it within the meta dictionary as such:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><textarea class=\"code-block-pro-copy-button-textarea\" aria-hidden=\"true\" readonly>def start_requests(self) -> Generator[scrapy.Request, None, None]:\n   \tyield scrapy.Request(\n       \turl,\n       \tmeta=dict(\n               playwright=True,\n               playwright_include_page=True,\n           \tplaywright_context_kwargs={\n               \t&#8220;proxy&#8221;: {\n                       &#8220;server&#8221;: &#8220;X&#8221;,\n                       &#8220;username&#8221;: &#8220;username&#8221;,\n                       &#8220;password&#8221;: &#8220;password&#8221;,\n               \t},\n           \t},\n       \t    errback=self.errback,\n       \t),\n   \t)<\/textarea><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">def<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">start_requests<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">self<\/span><span style=\"color: #908CAA\">)<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">-&gt;<\/span><span style=\"color: #E0DEF4\"> Generator<\/span><span style=\"color: #908CAA\">[<\/span><span style=\"color: #E0DEF4\">scrapy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">Request<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">None<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">None<\/span><span style=\"color: #908CAA\">]:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">   \t<\/span><span style=\"color: #3E8FB0\">yield<\/span><span style=\"color: #E0DEF4\"> scrapy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">Request<\/span><span style=\"color: #908CAA\">(<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">       \turl<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">       \t<\/span><span style=\"color: #C4A7E7; font-style: italic\">meta<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #9CCFD8\">dict<\/span><span style=\"color: #908CAA\">(<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">               <\/span><span style=\"color: #C4A7E7; font-style: italic\">playwright<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #EA9A97\">True<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">               <\/span><span style=\"color: #C4A7E7; font-style: italic\">playwright_include_page<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #EA9A97\">True<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">           \t<\/span><span style=\"color: #C4A7E7; font-style: italic\">playwright_context_kwargs<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">               \t<\/span><span style=\"color: #F6C177\">&quot;proxy&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                       <\/span><span style=\"color: #F6C177\">&quot;server&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;X&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                       <\/span><span style=\"color: #F6C177\">&quot;username&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;username&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                       <\/span><span style=\"color: #F6C177\">&quot;password&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;password&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">               \t<\/span><span style=\"color: #908CAA\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">           \t<\/span><span style=\"color: #908CAA\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">       \t    <\/span><span style=\"color: #C4A7E7; font-style: italic\">errback<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">errback<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">       \t<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">   \t<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><code>PLAYWRIGHT_CONTEXTS<\/code> in settings.py<\/h3>\n\n\n\n<p>Finally, you can define the proxy you want to use within the Scrapy settings file:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><textarea class=\"code-block-pro-copy-button-textarea\" aria-hidden=\"true\" readonly>PLAYWRIGHT_CONTEXTS = {\n   &#8220;default&#8221;: {\n   \t&#8220;proxy&#8221;: {\n       \t&#8220;server&#8221;: &#8220;X&#8221;,\n       \t&#8220;username&#8221;: &#8220;username&#8221;,\n       \t&#8220;password&#8221;: &#8220;password&#8221;,\n   \t},\n   },\n   &#8220;alternative&#8221;: {\n   \t&#8220;proxy&#8221;: {\n       \t&#8220;server&#8221;: &#8220;X&#8221;,\n       \t&#8220;username&#8221;: &#8220;username&#8221;,\n       \t&#8220;password&#8221;: &#8220;password&#8221;,\n   \t},\n   },<\/textarea><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">PLAYWRIGHT_CONTEXTS<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">   <\/span><span style=\"color: #F6C177\">&quot;default&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">   \t<\/span><span style=\"color: #F6C177\">&quot;proxy&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">       \t<\/span><span style=\"color: #F6C177\">&quot;server&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;X&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">       \t<\/span><span style=\"color: #F6C177\">&quot;username&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;username&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">       \t<\/span><span style=\"color: #F6C177\">&quot;password&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;password&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">   \t<\/span><span style=\"color: #908CAA\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">   <\/span><span style=\"color: #908CAA\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">   <\/span><span style=\"color: #F6C177\">&quot;alternative&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">   \t<\/span><span style=\"color: #F6C177\">&quot;proxy&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">       \t<\/span><span style=\"color: #F6C177\">&quot;server&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;X&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">       \t<\/span><span style=\"color: #F6C177\">&quot;username&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;username&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">       \t<\/span><span style=\"color: #F6C177\">&quot;password&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;password&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">   \t<\/span><span style=\"color: #908CAA\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">   <\/span><span style=\"color: #908CAA\">},<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>Scrapy Playwright is a useful integration that enhances Scrapy\u2019s capabilities by enabling JavaScript rendering. This allows web scrapers to extract content from dynamic websites that rely on JavaScript to display information. While setting up Scrapy Playwright needs additional dependencies and configuration, it provides significant advantages such as simulating human interactions and handling complex web pages.<\/p>\n\n\n\n<p>Key Takeaways:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Scrapy Playwright enables JavaScript rendering for Scrapy spiders which makes it useful for scraping dynamic pages.<\/li>\n\n\n\n<li>Installation requires more than just installing Scrapy and Playwright and configuring browser engines.<\/li>\n\n\n\n<li>Proxies can help circumvent blocking when scaping JavaScript-heavy websites and there are three unique ways to integrate them into Scrapy Playwright.<\/li>\n\n\n\n<li>Proper request handling is vital as playwright operates asynchronously and so additional wait conditions may be needed for complete data extraction.<\/li>\n\n\n\n<li>Scrapy Playwright is powerful but complex and needs more configuration than simpler libraries like BeautifulSoup or Selenium.<\/li>\n<\/ul>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>Using proxies with Scrapy Playwright is crucial to avoid detection and overcome anti-scraping measures. By combining the efficiency of Scrapy with the flexibility of Playwright, developers can create more resilient and scalable web scrapers. With proper implementation such as handling asynchronous behavior and defining wait conditions, you can maximize the effectiveness of your scraping project.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Frequently Asked Questions<\/h2>\n\n\n<div id=\"rank-math-faq\" class=\"rank-math-block\">\n<div class=\"rank-math-list \">\n<div id=\"faq-question-1737463587182\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">How can I manage multiple browser contexts in a Scrapy project using Playwright?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>In Scrapy Playwright, you can define multiple browser contexts to simulate different browser sessions in a single Scrapy spider. This is useful for handling situations such as logging in with different credentials or maintaining separate sessions. You can specify these contexts in your Scrapy settings and reference them in your requests.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1737463641964\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">What is the role of the async def keyword in Scrapy Playwright spiders?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>The async def keyword is typically used to define asynchronous functions in Python. In the context of Scrapy Playwright, asynchronous functions allow for non-blocking execution of tasks, enabling the spider to handle multiple I\/O-bound operations at the same time. This is useful when dealing with dynamic websites that require waiting for JavaScript-rendered content to load.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1737463664693\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">How does the Scrapy Download Handler integrate with Playwright to handle dynamic content?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>The Scrapy Download Handler is responsible for fetching web pages. When used with Playwright, it can render JavaScript-heavy websites by controlling a headless browser. This allows Scrapy to retrieve fully rendered HTML content including dynamic elements that are not present in the initial page load.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1737463678449\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">What are some best practices for using mobile proxies to avoid anti-scraping measures in Scrapy Playwright?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>To effectively use mobile proxies in Scrapy Playwright and mitigate anti-scraping measures, make sure you rotate proxies regularly to distribute requests across different IP addresses, use random delays between requests to mimic human browsing behavior, use realistic user-agent headers and other HTTP headers to avoid detection, and monitor proxy performance and health to ensure reliability.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1737463686010\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">How can I handle browser interactions, such as clicking and scrolling, in a Scrapy spider using Playwright?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>In Scrapy Playwright, you can perform browser interactions by utilizing Playwright\u2019s API within your spider. You can instruct the headless browser to click buttons, fill out forms, or scroll through pages to load additional content. These interactions can be defined in the playwright_page_methods parameter within the meta dictionary of your Scrapy request.<\/p>\n\n<\/div>\n<\/div>\n<\/div>\n<\/div>","protected":false},"author":2627,"featured_media":75409,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","format":"standard","categories":[110],"tags":[],"class_list":["post-64373","blog","type-blog","status-publish","format-standard","has-post-thumbnail","hentry","category-web-scraping-and-automation"],"acf":[],"_links":{"self":[{"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/blog\/64373","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/types\/blog"}],"author":[{"embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/users\/2627"}],"replies":[{"embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/comments?post=64373"}],"version-history":[{"count":5,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/blog\/64373\/revisions"}],"predecessor-version":[{"id":84818,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/blog\/64373\/revisions\/84818"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/media\/75409"}],"wp:attachment":[{"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/media?parent=64373"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/categories?post=64373"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/tags?post=64373"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}