{"id":74349,"date":"2025-06-12T16:40:08","date_gmt":"2025-06-12T15:40:08","guid":{"rendered":"https:\/\/proxidize.com\/?post_type=blog&#038;p=74349"},"modified":"2025-11-27T13:44:29","modified_gmt":"2025-11-27T13:44:29","slug":"pagination-in-web-scraping","status":"publish","type":"blog","link":"https:\/\/proxidize.com\/blog\/pagination-in-web-scraping\/","title":{"rendered":"Handling Pagination in Web Scraping"},"content":{"rendered":"\n<p>Making sure your web scraper can navigate through all the possible results can pose an interesting challenge. While scraping just one page at a time is an acceptable method, it starts to become difficult once you have dozens to hundreds of pages to scrape or have to handle a click-to-load or endless scrolling page.<\/p>\n\n\n\n<p>Approximately <strong>65% of e-commerce websites use pagination<\/strong>, which highlights the value of learning this skill. As modern websites become more complex and dynamic, scrapers have to adapt to the changes.<\/p>\n\n\n\n<p>This <strong>article will explore and explain pagination in web scraping<\/strong> and present code samples detailing how to handle it, as well as going over some challenges that come up. We had to implement pagination for our <a href=\"https:\/\/proxidize.com\/blog\/twitter-scraper\/\" target=\"_blank\" data-type=\"blog\" data-id=\"87991\" rel=\"noreferrer noopener\">Twitter Scraper<\/a>, which can give you a practical example.<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized centered\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1010\" height=\"569\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/06\/what-is-pagination.png\" alt=\"Drawings of two arrows pointing away from each other, a circle with three dots, and a page with text that continues off the edge under the title &quot;What is Pagination?&quot;\" class=\"wp-image-74354\" style=\"object-fit:cover\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/06\/what-is-pagination.png 1010w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/06\/what-is-pagination-300x169.png 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/06\/what-is-pagination-768x433.png 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/06\/what-is-pagination-600x338.png 600w\" sizes=\"(max-width: 1010px) 100vw, 1010px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">What is Pagination?<\/h2>\n\n\n\n<p>E-commerce platforms, job boards, and social media websites use pagination to handle large datasets. Showing everything on one page would result in slow download times and increased memory usage.<\/p>\n\n\n\n<p><strong>Pagination splits the content across multiple pages<\/strong>, making it easier to manage. Pagination in <a href=\"https:\/\/proxidize.com\/use-cases\/web-scraping\/\" target=\"_blank\" rel=\"noreferrer noopener\">web scraping<\/a> is not just important but necessary, especially if you want to get all the possible results, not just the ones that appear on the first page \u2014 it gives scrapers the opportunity to navigate through pages systematically, ensuring comprehensive data collection.<\/p>\n\n\n\n<div style=\"height:6px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Types of Pagination<\/h3>\n\n\n\n<p>Understanding pagination in web scraping requires recognizing the different forms it can take across various websites.<\/p>\n\n\n\n<p>Pagination can come in many forms as each website will experiment with different ways to keep customers engaged with the website. However, <strong>pagination can be broken down into three main categories<\/strong>.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Numbered Pagination:<\/strong> This method lets users navigate through separate pages using numbered links that change using the \/page1 \/page2 method within the URL.<\/li>\n\n\n\n<li><strong>Click-to-Load Pagination:<\/strong> With this method, users have to click a button, typically labeled \u201cLoad More\u201d or \u201cSee More\u201d to reveal additional content. This allows for more controlled loading of data.<\/li>\n\n\n\n<li><strong>Infinite Scrolling Pagination:<\/strong> With infinite scrolling, content will load automatically as a user scrolls further down the page. This creates a seamless browsing experience without needing to constantly click through pages or scroll down and click a button.<\/li>\n<\/ul>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized centered\"><img decoding=\"async\" width=\"1010\" height=\"569\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/06\/tackling-pagination-in-web-scraping.png\" alt=\"Image of a woman pressing the &quot;Next&quot; button on a large webpage. Text above the image reads &quot;Tackling Pagination in Web Scraping&quot;\" class=\"wp-image-74352\" style=\"object-fit:cover\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/06\/tackling-pagination-in-web-scraping.png 1010w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/06\/tackling-pagination-in-web-scraping-300x169.png 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/06\/tackling-pagination-in-web-scraping-768x433.png 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/06\/tackling-pagination-in-web-scraping-600x338.png 600w\" sizes=\"(max-width: 1010px) 100vw, 1010px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Tackling Pagination in Web Scraping<\/h2>\n\n\n\n<p>Mastering pagination in web scraping requires adapting to how different sites structure and load content. What makes pagination in web scraping different from other forms of web scraping is that most websites try to be overtly creative with their structure.<\/p>\n\n\n\n<p>From static and changing URLs to load-more and infinite scroll pages, knowing how each website operates and how to tackle the many forms of pagination will have you ready for any challenges that come your way.<\/p>\n\n\n\n<p>Additionally, <strong>paginated pages are indexed as a single page<\/strong> which is why you may have faced difficulties when using a typical web scraper. We will be providing you with the code necessary to understand and implement pagination in web scraping.<\/p>\n\n\n\n<div style=\"height:6px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Numbered Pagination<\/h3>\n\n\n\n<p>Numbered pagination, often called \u201cNext and Previous Pagination\u201d, \u201cArrow Pagination\u201d, or \u201cURL-Based Pagination\u201d, uses <strong>discrete page links that are displayed at the bottom of the page and allow users to jump between pages<\/strong>.<\/p>\n\n\n\n<p>It is one of the easiest methods to scrape because the URL will change incrementally, making it straightforward to iterate through pages. <strong>To scrape websites with numbered pagination<\/strong>, you will simply need to identify the base URL and URL pattern, and <strong>you will need to increment the page parameter in a loop until the last page is reached <\/strong>so the scraper knows where to stop.<\/p>\n\n\n\n<p>We will be using the website \u201c<a href=\"https:\/\/www.scrapethissite.com\/pages\/forms\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">ScrapeThisSite<\/a>\u201d as practice for this example. If you click on that link and scroll through the pages, you will notice the URL is changing ever so slightly with the addition of \u201c?page_num=x\u201d with X being the current page\u2019s number.<\/p>\n\n\n\n<p>If you inspect the page and check the \u201cNext\u201d button, you will notice it is an anchor tag with an href attribute that links to the next page.\u00a0 The <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/Accessibility\/ARIA\/Reference\/Attributes\/aria-label\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">aria-label attribute<\/a> will show that the button is still active. When analyzing a webpage for scraping, using CSS selectors to target <a href=\"https:\/\/www.w3schools.com\/jsref\/prop_element_children.asp\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">child elements<\/a> or specific attribute selectors allows for precise data extraction.<\/p>\n\n\n\n<p>With all that in mind, here is the script that will present you with all the data available in this collection.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>import requests\nfrom bs4 import BeautifulSoup\n\nbase_url = \"https:\/\/www.scrapethissite.com\/pages\/forms\/\"\nsession = requests.Session()\npage_num = 1\n\nwhile True:\n    response = session.get(base_url, params={'page_num': page_num})\n    if response.status_code != 200:\n        break\n\n    soup = BeautifulSoup(response.text, 'html.parser')\n\n    rows = soup.select('tr.team')\n    if not rows:\n        break\n\n    print(f\"Scraping page {page_num}\u2026\")\n    for row in rows:\n        cells = row.find_all('td')\n        if len(cells) >= 3:\n            team_name = cells&#91;0&#93;.get_text(strip=True)\n            wins = cells&#91;1&#93;.get_text(strip=True)\n            losses = cells&#91;2&#93;.get_text(strip=True)\n            print(f\"Team: {team_name}, Wins: {wins}, Losses: {losses}\")\n\n    next_btn = soup.select_one('li.next')\n    if next_btn and 'disabled' in next_btn.get('class', []):\n        break\n\n    page_num += 1\n\nprint(\"Done.\")<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">import<\/span><span style=\"color: #E0DEF4\"> requests<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">from<\/span><span style=\"color: #E0DEF4\"> bs4 <\/span><span style=\"color: #3E8FB0\">import<\/span><span style=\"color: #E0DEF4\"> BeautifulSoup<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">base_url <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;https:\/\/www.scrapethissite.com\/pages\/forms\/&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">session <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> requests<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">Session<\/span><span style=\"color: #908CAA\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">page_num <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">1<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">while<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">True<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    response <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> session<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">base_url<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #C4A7E7; font-style: italic\">params<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #908CAA\">{<\/span><span style=\"color: #F6C177\">&#39;page_num&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> page_num<\/span><span style=\"color: #908CAA\">})<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">status_code <\/span><span style=\"color: #3E8FB0\">!=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">200<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">break<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    soup <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> BeautifulSoup<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">text<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;html.parser&#39;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    rows <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> soup<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">select<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;tr.team&#39;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">not<\/span><span style=\"color: #E0DEF4\"> rows<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">break<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #EB6F92; font-style: italic\">print<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Scraping page <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">page_num<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">\u2026&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">for<\/span><span style=\"color: #E0DEF4\"> row <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> rows<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        cells <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> row<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">find_all<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;td&#39;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EB6F92; font-style: italic\">len<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">cells<\/span><span style=\"color: #908CAA\">)<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">&gt;=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">3<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            team_name <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> cells<\/span><span style=\"color: #908CAA\">&#91;<\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">&#93;.<\/span><span style=\"color: #E0DEF4\">get_text<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">strip<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #EA9A97\">True<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            wins <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> cells<\/span><span style=\"color: #908CAA\">&#91;<\/span><span style=\"color: #EA9A97\">1<\/span><span style=\"color: #908CAA\">&#93;.<\/span><span style=\"color: #E0DEF4\">get_text<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">strip<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #EA9A97\">True<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            losses <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> cells<\/span><span style=\"color: #908CAA\">&#91;<\/span><span style=\"color: #EA9A97\">2<\/span><span style=\"color: #908CAA\">&#93;.<\/span><span style=\"color: #E0DEF4\">get_text<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">strip<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #EA9A97\">True<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #EB6F92; font-style: italic\">print<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Team: <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">team_name<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">, Wins: <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">wins<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">, Losses: <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">losses<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    next_btn <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> soup<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">select_one<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;li.next&#39;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> next_btn <\/span><span style=\"color: #3E8FB0\">and<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;disabled&#39;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> next_btn<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;class&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">[]):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">break<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    page_num <\/span><span style=\"color: #3E8FB0\">+=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">1<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #EB6F92; font-style: italic\">print<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;Done.&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>In this code, the <strong>scraper will continue to increment the page_num variable and scrape every page until it reaches the end<\/strong>. However, when using this script, there are a few things to keep in mind; some websites have dynamic page numbers or use <a href=\"https:\/\/proxidize.com\/blog\/what-is-javascript\/\" target=\"_blank\" data-type=\"blog\" data-id=\"83360\" rel=\"noreferrer noopener\">JavaScript<\/a> to load content, making this script defunct.<\/p>\n\n\n\n<p><strong>Not all numbered pagination will be visible in the URL<\/strong>, in which case an <a href=\"https:\/\/www.w3schools.com\/whatis\/whatis_ajax.asp#:~:text=AJAX%20allows%20web%20pages%20to,without%20reloading%20the%20whole%20page.\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">AJAX call<\/a> is involved. In some cases, numbered pagination may be controlled through an API endpoint, where the response content includes pagination details, allowing for efficient iteration through a pagination loop.<\/p>\n\n\n\n<div style=\"height:6px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Click-to-Load Pagination<\/h3>\n\n\n\n<p>Click-to-load pagination, usually seen as a \u201cLoad More\u201d button on the bottom of the page, <strong>dynamically loads new content on the same page<\/strong>. This will require the scraper to simulate a click event repeatedly to load all available content.<\/p>\n\n\n\n<p>To handle the dynamic loading of the content as well as having to simulate a click each time, tools like Selenium or Playwright can be used to automate the process by repeatedly clicking the button until no more content is available.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>from selenium import webdriver\nfrom selenium.webdriver.common.by import By\nimport time\n\n# Start browser\ndriver = webdriver.Chrome()\ndriver.get(\"https:\/\/www.scrapingcourse.com\/button-click\")\n\n# Allow full page load\ntime.sleep(5)\n\n# Keep clicking Load More while button exists\nwhile True:\n    try:\n        # Scroll down to bottom of the page to trigger lazy loading\n        driver.execute_script(\"window.scrollTo(0, document.body.scrollHeight);\")\n        time.sleep(1)\n\n        load_more_button = driver.find_element(By.XPATH, \"\/\/button&#91;contains(text(), 'Load More')&#93;\")\n        driver.execute_script(\"arguments&#91;0&#93;.scrollIntoView(true);\", load_more_button)\n        time.sleep(0.5)\n        load_more_button.click()\n        time.sleep(2)\n    except:\n        break\n\n# Wait for final content to load\ntime.sleep(3)\n\n# Grab product data after everything loaded\nitems = driver.find_elements(By.CLASS_NAME, \"card-body\")\nfor item in items:\n    name = item.find_element(By.TAG_NAME, \"h5\").text\n    price = item.find_element(By.TAG_NAME, \"p\").text\n    print(f\"{name} - {price}\")\n\ndriver.quit()<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">from<\/span><span style=\"color: #E0DEF4\"> selenium <\/span><span style=\"color: #3E8FB0\">import<\/span><span style=\"color: #E0DEF4\"> webdriver<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">from<\/span><span style=\"color: #E0DEF4\"> selenium<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">webdriver<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">common<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">by <\/span><span style=\"color: #3E8FB0\">import<\/span><span style=\"color: #E0DEF4\"> By<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">import<\/span><span style=\"color: #E0DEF4\"> time<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> Start browser<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">driver <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> webdriver<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">Chrome<\/span><span style=\"color: #908CAA\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">driver<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;https:\/\/www.scrapingcourse.com\/button-click&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> Allow full page load<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">time<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">sleep<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #EA9A97\">5<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> Keep clicking Load More while button exists<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">while<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">True<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">try<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> Scroll down to bottom of the page to trigger lazy loading<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        driver<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">execute_script<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;window.scrollTo(0, document.body.scrollHeight);&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        time<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">sleep<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #EA9A97\">1<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        load_more_button <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> driver<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">find_element<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">By<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #3E8FB0\">XPATH<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;\/\/button&#91;contains(text(), &#39;Load More&#39;)&#93;&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        driver<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">execute_script<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;arguments&#91;0&#93;.scrollIntoView(true);&quot;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> load_more_button<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        time<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">sleep<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #EA9A97\">0.5<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        load_more_button<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">click<\/span><span style=\"color: #908CAA\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        time<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">sleep<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #EA9A97\">2<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">except<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">break<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> Wait for final content to load<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">time<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">sleep<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #EA9A97\">3<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> Grab product data after everything loaded<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">items <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> driver<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">find_elements<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">By<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #3E8FB0\">CLASS_NAME<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;card-body&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">for<\/span><span style=\"color: #E0DEF4\"> item <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> items<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    name <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> item<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">find_element<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">By<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #3E8FB0\">TAG_NAME<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;h5&quot;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">text<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    price <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> item<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">find_element<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">By<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #3E8FB0\">TAG_NAME<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;p&quot;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">text<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #EB6F92; font-style: italic\">print<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;<\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">name<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\"> - <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">price<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">driver<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">quit<\/span><span style=\"color: #908CAA\">()<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><strong>Click-to-load pagination in web scraping often involves simulating repeated user actions like button clicks.<\/strong> In the example above, the find_element method is used to locate the \u201cLoad More\u201d button, and click() is called to load more content until the button no longer appears.<\/p>\n\n\n\n<p>Be careful when using this script, as <strong>too many requests might result in a CAPTCHA test<\/strong> which could slow down your operations. If necessary, <strong>implement a time delay between requests<\/strong> or use a rotating <a href=\"https:\/\/proxidize.com\/proxy-server\/mobile-proxy\/\">mobile proxy<\/a> to avoid the chances of getting a CAPTCHA; more on that later.\u00a0<\/p>\n\n\n\n<div style=\"height:6px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Infinite Scroll Pagination<\/h3>\n\n\n\n<p>Unlike numbered pagination or click-to-load pagination, <strong>infinite scrolls automatically load more content as the user scrolls down.<\/strong> While this makes it easier for users to navigate, it complicates things for pagination in web scraping due to its reliance on JavaScript and dynamically loaded content. <strong>Playwright, which supports automation with <a href=\"https:\/\/www.chromium.org\/Home\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Chromium<\/a>-based browsers, can handle infinite scrolling.<\/strong><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>import asyncio\nfrom playwright.async_api import async_playwright\nasync def scroll_to_bottom(page):\n    while True:\n        previous_height = await page.evaluate(\"document.body.scrollHeight\")\n        await page.evaluate(\"window.scrollTo(0, document.body.scrollHeight)\")\n        await asyncio.sleep(2)\n        new_height = await page.evaluate(\"document.body.scrollHeight\")\n        if new_height == previous_height:\n            break\nasync def scrape_infinite_scroll(url):\n    async with async_playwright() as p:\n        browser = await p.chromium.launch()\n        page = await browser.new_page()\n        await page.goto(url)\n        await scroll_to_bottom(page)\n        # Extract data after fully loading the page\n        content = await page.content()\n        print(content)\n        await browser.close()\nasyncio.run(scrape_infinite_scroll(\"https:\/\/example.com\/items\"))<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">import<\/span><span style=\"color: #E0DEF4\"> asyncio<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">from<\/span><span style=\"color: #E0DEF4\"> playwright<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">async_api <\/span><span style=\"color: #3E8FB0\">import<\/span><span style=\"color: #E0DEF4\"> async_playwright<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">async<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">def<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">scroll_to_bottom<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">page<\/span><span style=\"color: #908CAA\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">while<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">True<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        previous_height <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> page<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">evaluate<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;document.body.scrollHeight&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> page<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">evaluate<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;window.scrollTo(0, document.body.scrollHeight)&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> asyncio<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">sleep<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #EA9A97\">2<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        new_height <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> page<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">evaluate<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;document.body.scrollHeight&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> new_height <\/span><span style=\"color: #3E8FB0\">==<\/span><span style=\"color: #E0DEF4\"> previous_height<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #3E8FB0\">break<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">async<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">def<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">scrape_infinite_scroll<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">url<\/span><span style=\"color: #908CAA\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">async<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">with<\/span><span style=\"color: #E0DEF4\"> async_playwright<\/span><span style=\"color: #908CAA\">()<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">as<\/span><span style=\"color: #E0DEF4\"> p<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        browser <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> p<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">chromium<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">launch<\/span><span style=\"color: #908CAA\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        page <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> browser<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">new_page<\/span><span style=\"color: #908CAA\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> page<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">goto<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">url<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> scroll_to_bottom<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">page<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> Extract data after fully loading the page<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        content <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> page<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">content<\/span><span style=\"color: #908CAA\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #EB6F92; font-style: italic\">print<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">content<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> browser<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">close<\/span><span style=\"color: #908CAA\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">asyncio<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">run<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">scrape_infinite_scroll<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;https:\/\/example.com\/items&quot;<\/span><span style=\"color: #908CAA\">))<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><strong>The code scrolls down until no more content is loaded<\/strong>, ensuring all items are visible on the page for scraping. The challenge arises with your code detecting when to stop scrolling, as that is generally not straightforward.<\/p>\n\n\n\n<p>Additionally, some websites implement <strong>lazy-loading where content is not loaded until it is visible in the viewport<\/strong>. While pagination in web scraping enables full data coverage, it introduces several technical challenges.<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized centered\"><img decoding=\"async\" width=\"1010\" height=\"569\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/06\/challenges-with-pagination-in-web-scraping.png\" alt=\"Image of a computer screen showing gears, a lock, and an emergency sign and two fishing hooks picking up item. Text above the image reads &quot;Challenges with Pagination in Web Scraping&quot;\" class=\"wp-image-74353\" style=\"object-fit:cover\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/06\/challenges-with-pagination-in-web-scraping.png 1010w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/06\/challenges-with-pagination-in-web-scraping-300x169.png 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/06\/challenges-with-pagination-in-web-scraping-768x433.png 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/06\/challenges-with-pagination-in-web-scraping-600x338.png 600w\" sizes=\"(max-width: 1010px) 100vw, 1010px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Challenges with Pagination in Web Scraping<\/h2>\n\n\n\n<p>When working with pagination in web scraping, there are many risks that you need to keep an eye out for, getting banned and then having to <a href=\"https:\/\/proxidize.com\/blog\/bypass-ip-ban\/\" target=\"_blank\" data-type=\"blog\" data-id=\"59358\" rel=\"noreferrer noopener\">bypass an IP ban<\/a>. Some websites block access if there are too many requests being sent or they will present the user with a CAPTCHA challenge.<\/p>\n\n\n\n<p>If you decide to implement pagination in web scraping, you could encounter a <a href=\"https:\/\/proxidize.com\/blog\/403-error\/\" target=\"_blank\" rel=\"noreferrer noopener\">403 error<\/a>, which typically indicates that you are blocked from accessing the website due to a bot detection system. While there are <a href=\"https:\/\/proxidize.com\/blog\/captcha-solvers\/\">CAPTCHA solvers<\/a> you can implement within your code, a better approach would be to avoid the chance of a CAPTCHA entirely.<\/p>\n\n\n\n<p>To avoid encountering CAPTCHA or suffering from an IP ban, consider using a <a href=\"https:\/\/proxidize.com\/proxy-server\/\" target=\"_blank\" rel=\"noreferrer noopener\">proxy server<\/a> to make your traffic appear as though it is coming from multiple different sources. You could similarly rotate your <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/HTTP\/Reference\/Headers\/User-Agent\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">User Agent<\/a>, giving you the appearance of a real browser.<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>Perfecting pagination in web scraping is essential for handling modern websites that present data across multiple pages, segments, or dynamic content blocks. From the simple pagination buttons to the more complex JavaScript-based pagination, each system requires an understanding of the site\u2019s URL structure, content loading behavior, and adaptive pagination techniques to ensure accurate and successful data retrieval.<\/p>\n\n\n\n<p>Using tools like Beautiful Soup for static pages or browser automation platforms like Selenium and Playwright for dynamic content loading, you can tailor your approach to match the site\u2019s architecture.<\/p>\n\n\n\n<p><strong>Key takeaways:<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Pagination in web scraping is <strong>essential when dealing with large datasets<\/strong> spread across multiple pages.<\/li>\n\n\n\n<li>Different websites implement numeric pagination, click-to-load pagination, and infinite scrolling, <strong>each requiring a specific scraping approach<\/strong>.<\/li>\n\n\n\n<li><strong>Numbered pagination is often the easiest<\/strong> to handle as page numbers are typically visible in the URL.<\/li>\n\n\n\n<li><strong>Click-to-load and infinite scroll require tools like Selenium<\/strong> or Playwright to simulate user behavior during the scraping process.<\/li>\n\n\n\n<li>Handling pagination correctly <strong>ensures complete data extraction<\/strong> without missing segments of important content.<\/li>\n<\/ol>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>This comprehensive guide covered multiple pagination methods including numbers, click-to-load, and infinite scrolling, giving you code examples and solutions for each.<\/p>\n\n\n\n<p>While some cases can be handled with simple code, others demand more advanced techniques to overcome obstacles like anti-bot detection and asynchronous pagination scraping.<\/p>\n\n\n\n<p>By using the scripts provided and keeping in mind the challenges that pop up such as CAPTCHAs and IP bans, you can scrape any website from e-commerce sites to social media platforms. With the right approach, you can overcome the challenges of pagination in web scraping and achieve comprehensive data extraction.<\/p>\n","protected":false},"author":2627,"featured_media":74972,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","format":"standard","categories":[110],"tags":[],"class_list":["post-74349","blog","type-blog","status-publish","format-standard","has-post-thumbnail","hentry","category-web-scraping-and-automation"],"acf":[],"_links":{"self":[{"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/blog\/74349","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/types\/blog"}],"author":[{"embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/users\/2627"}],"replies":[{"embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/comments?post=74349"}],"version-history":[{"count":8,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/blog\/74349\/revisions"}],"predecessor-version":[{"id":90505,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/blog\/74349\/revisions\/90505"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/media\/74972"}],"wp:attachment":[{"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/media?parent=74349"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/categories?post=74349"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/tags?post=74349"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}