{"id":87991,"date":"2025-10-30T14:27:16","date_gmt":"2025-10-30T14:27:16","guid":{"rendered":"https:\/\/proxidize.com\/?post_type=blog&#038;p=87991"},"modified":"2026-01-13T17:26:42","modified_gmt":"2026-01-13T17:26:42","slug":"twitter-scraper","status":"publish","type":"blog","link":"https:\/\/proxidize.com\/blog\/twitter-scraper\/","title":{"rendered":"Twitter Scraper: How to Scrape Twitter for Free"},"content":{"rendered":"\n<p>Let\u2019s say you want to know people&#8217;s opinions on a specific topic, maybe you want to do sentiment analysis for them for research purposes, or you\u2019re a software engineer who has been tasked with scraping (this is me). There\u2019s no better place to do it than Twitter\/X. Millions of people use it every day to tweet and talk about every possible topic under the sun. However in order to accomplish this at any meaningful scale (or efficiency), you must scrape, i.e. use code that collects data for you while you drink coffee or browse the internet.<\/p>\n\n\n\n<p>If you&#8217;ve got a tech background, you&#8217;ll likely be familiar with web scraping. If not, we&#8217;ll walk you through it. This article will be a step-by-step breakdown of how I built an open-source Twitter scraper that allows you to scrape all (or some) of an X account&#8217;s posts. Too many guides leave you with just that, so we went a step further and have included an AI integration for analysis of the tweets you&#8217;ve scraped \u2014 something that might appeal to everyone from researchers and academics to OSINT Twitter.<\/p>\n\n\n\n<p><\/p>\n\n\n\t\t<div data-elementor-type=\"container\" data-elementor-id=\"85693\" class=\"elementor elementor-85693\" data-elementor-post-type=\"elementor_library\">\n\t\t\t\t<div class=\"elementor-element elementor-element-53838f9 e-con-full no-scale elementor-hidden-mobile_extra elementor-hidden-mobile e-flex e-con e-child\" data-id=\"53838f9\" data-element_type=\"container\" data-e-type=\"container\" data-settings=\"{&quot;background_background&quot;:&quot;gradient&quot;}\">\n\t\t<div class=\"elementor-element elementor-element-264a6ec e-grid e-con-full e-con e-child\" data-id=\"264a6ec\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t<div class=\"elementor-element elementor-element-4986847 e-con-full e-flex e-con e-child\" data-id=\"4986847\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t\t\t<div class=\"elementor-element elementor-element-f8b9092 elementor-widget elementor-widget-heading\" data-id=\"f8b9092\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<p class=\"elementor-heading-title elementor-size-default\">High-quality scraping and automation  \nstarts with high-quality mobile proxies<\/p>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-fd5a829 e-con-full e-flex e-con e-child\" data-id=\"fd5a829\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t<div class=\"elementor-element elementor-element-0087840 e-con-full e-flex e-con e-child\" data-id=\"0087840\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t\t\t<div class=\"elementor-element elementor-element-1e530dc elementor-widget__width-initial elementor-widget elementor-widget-image\" data-id=\"1e530dc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"125\" height=\"80\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/20-2.svg\" class=\"attachment-full size-full wp-image-86191\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f634f7d inline-CTA elementor-widget elementor-widget-button\" data-id=\"f634f7d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/proxidize.com\/mobile-proxy-pricing\/?coupon_code=20OFFMPB\" target=\"_blank\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Buy Proxies Now<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\n\n\n\n<p><\/p>\n\n\n\n<p>Despite having far fewer monthly active users (611 million compared to Facebook\u2019s 3 billion), <a href=\"https:\/\/proxidize.com\/research\/twitter-statistics\/\" target=\"_blank\" rel=\"noreferrer noopener\">Twitter statistics<\/a> show that the average Twitter\/X user engages with the site for longer and more deeply than other social media sites. The rules are completely different, with many people using Twitter at work, using it to follow real-time news and events, and more.<\/p>\n\n\n\n<p>As a platform, Twitter\/X is one of the toughest ones to scrape. It has everything from&nbsp; anti-bot detection, captchas, and IP bans, all to prevent people from scraping, but with <a href=\"https:\/\/proxidize.com\/proxy-server\/\" target=\"_blank\" rel=\"noreferrer noopener\">proxies<\/a> and IP rotation, everything is possible.<\/p>\n\n\n\n<p>Together we\u2019ll go on the step-by-step journey of how I suffered to deliver this amazing code. We\u2019ll include what technology stack I used, why cookies are so important and how they can be used properly. I\u2019ll also explain how I figured out Twitter\/X\u2019s infinite scrolling, when to stop scrolling, and how to pick up where you left off across multiple sessions. <strong>If you\u2019re not interested in the journey and just want the repo,&nbsp;<em><a href=\"#conclusion\" data-type=\"internal\" data-id=\"#conclusion\">here you go<\/a><\/em>.<\/strong> We also have a <a href=\"https:\/\/proxidize.com\/blog\/reddit-scraper\/\" data-type=\"blog\" data-id=\"80626\" target=\"_blank\" rel=\"noreferrer noopener\">Reddit scraper<\/a> you might be interested in.<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large centered\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"536\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/twitter-scraping-the-technology-stack-its-not-just-about-speed-1024x536.jpg\" alt=\"\" class=\"wp-image-88268\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/twitter-scraping-the-technology-stack-its-not-just-about-speed-1024x536.jpg 1024w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/twitter-scraping-the-technology-stack-its-not-just-about-speed-300x157.jpg 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/twitter-scraping-the-technology-stack-its-not-just-about-speed-768x402.jpg 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/twitter-scraping-the-technology-stack-its-not-just-about-speed-600x314.jpg 600w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/twitter-scraping-the-technology-stack-its-not-just-about-speed.jpg 1200w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"technology-stack\">Twitter Scraping: The Technology Stack (It\u2019s Not Just About Speed)<\/h2>\n\n\n\n<p>For this project I needed to choose an ecosystem that fit the needs of scraping a platform as strict as Twitter\/X. My first choice was to go for speed over quality but that was a big mistake, it turned out. I went with <a href=\"https:\/\/proxidize.com\/blog\/web-scraping-with-selenium\/\" target=\"_blank\" rel=\"noreferrer noopener\">Python and Selenium<\/a> first, but the results were not good in terms of accuracy of the data. I mixed things up and switched to using Python with Playwright, which offered a great mix of speed and accuracy to the project.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Python Twitter Scraper: The Obvious Choice&nbsp;<\/h3>\n\n\n\n<p><a href=\"https:\/\/proxidize.com\/blog\/what-is-python\/\" target=\"_blank\" rel=\"noreferrer noopener\">Python<\/a> dominates web scraping. If you have seen a <a href=\"https:\/\/proxidize.com\/blog\/web-scraping\/\" target=\"_blank\" rel=\"noreferrer noopener\">web scraping<\/a> tutorial, a data pipeline, or an automation script, the chances it\u2019s written in Python will be high.<\/p>\n\n\n\n<p>Python isn\u2019t <em>just<\/em> popular because it\u2019s \u201ceasy to learn\u201d (though it is). It\u2019s popular because it solves the entire workflow. <a href=\"https:\/\/proxidize.com\/blog\/python-libraries-for-web-scraping\/\" target=\"_blank\" rel=\"noreferrer noopener\">Python libraries for webscraping<\/a>, parsing, processing, analysing, exporting \u2014 there are mature libraries for every single step. You don\u2019t have to switch languages halfway through your project; it\u2019s Python from start to finish.<\/p>\n\n\n\n<p>Python\u2019s syntax reads like English, which means less time spent debugging errors (I hope) and more time building. What\u2019s Python\u2019s real power? Its ecosystem.<\/p>\n\n\n\n<p>Selenium, Playwright, <a href=\"https:\/\/proxidize.com\/blog\/what-is-beautifulsoup\/\" target=\"_blank\" rel=\"noreferrer noopener\">BeautifulSoup<\/a>, Scrapy every major scraping tool has Python support. The community is everywhere, and when you hit a problem, someone has already solved it on Stack Overflow or you can just use AI to help you.<\/p>\n\n\n\n<p>Here\u2019s the part people miss: Python isn&#8217;t just good at scraping, it\u2019s good at everything that comes after scraping. Some developers say &#8220;Python is only for scraping,\u201d and they\u2019re wrong. Python excels at data processing, data analysis, AI integration, and automation.&nbsp;<\/p>\n\n\n\n<p>What that means for our Twitter\/X scraper is that we can:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Scrape with Playwright (browser automation + network interception)<\/li>\n\n\n\n<li>Process with JMESPath (parsing X\u2019s nested JSON)<\/li>\n\n\n\n<li>Analyse with OpenAI (sentiment, topics, trends)<\/li>\n\n\n\n<li>Export with Pandas (CSV for spreadsheets users)<\/li>\n\n\n\n<li>Store with aiofiles (async file operations)<\/li>\n\n\n\n<li>Display with Rich (beautiful terminal output)<\/li>\n<\/ul>\n\n\n\n<p>And we can do it all in the same language. No context switching. No rewriting. The same codebase from data collection to AI-powered insights. That\u2019s the beauty of Python: everything you need in one ecosystem.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Selenium Twitter Scraping: A Choice Between the New and the Old<\/h3>\n\n\n\n<p>I started this project with Selenium. I\u2019ve used Selenium for many projects and have never had major issues with it. Selenium WebDriver and proxies seemed like the perfect combo for a Twitter scraper.This project was different.<\/p>\n\n\n\n<p>Twitter\/X\u2019s anti-bot detection is aggressive and I kept running into walls:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Proxy connection issues:<\/strong> Selenium\u2019s proxy authentication required a hacky workaround with Chrome extensions.<\/li>\n\n\n\n<li><strong>Getting blocked constantly:<\/strong> Twitter\/X detected my scrolling patterns as bot behaviour.<\/li>\n\n\n\n<li><strong>Human behaviour simulation:<\/strong> Mimicking natural scrolling in Selenium felt clunky and unreliable.<\/li>\n<\/ul>\n\n\n\n<p>I tried tweaking delays, random scroll amounts, rotating user agents, but X kept catching me. The scraper would run for 200\u2013300 tweets, then get blocked. I would restart and I even came up with the idea for doing scraping in session, but even then I kept being blocked (frustrating).<\/p>\n\n\n\n<p>That\u2019s when I decided to switch to Playwright mid-project.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Why Playwright to Scrape Twitter?<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Native proxy support:<\/strong> Built-in authentication without Chrome extensions or hacks (Thank god).<\/li>\n\n\n\n<li><strong>Network interception: <\/strong>Capture X\u2019s GraphQL API responses instead of parsing HTML.<\/li>\n\n\n\n<li><strong>Better anti-detection:<\/strong> the Playwright\u2019s flags were a good hand to me.<\/li>\n\n\n\n<li><strong>Faster execution:<\/strong> Notice speed improvements over Selenium.<\/li>\n<\/ul>\n\n\n\n<p>So, halfway through development, I redacted the entire codebase to use Playwright; a complete rewrite. Was it worth it?<\/p>\n\n\n\n<p>Absolutely. The results were amazing:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Proxy connections are stable and fast<\/li>\n\n\n\n<li>No more random blocks from Twitter\/X<\/li>\n\n\n\n<li>Scraping sessions became 2\u20133x faster<\/li>\n\n\n\n<li>Cleaner code<\/li>\n<\/ul>\n\n\n\n<p>I felt like Selenium was fighting Twitter\/X\u2019s UI. By comparison, Playwright intercepts X\u2019s API. That\u2019s the difference between scraping what users see versus scraping what the application actually uses. I don\u2019t regret starting with Selenium \u2014 it helped me understand the problem. But switching to Playwright was the turning point for the project.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Discovering GraphQL: Twitter\u2019s Hidden API Goldmine<\/h3>\n\n\n\n<p>Most web scrapers get it wrong at first: your first instinct is always to try and scrape HTML first. You load a profile page, find the divs with the CSS selectors, and you extract the text \u2014 your life is good. Then, suddenly, X changed its UI and now all of your selectors return null as data.<\/p>\n\n\n\n<p>I was doing the same thing with Selenium once I started, dealing with CSS selectors, parsing HTML, and cleaning up text. I was talking to a friend who asked me why I wasn\u2019t just taking advantage of the XHR requests. So I opened the Chrome DevTools and kept an eye on the network tab for a requests like these:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>https:\/\/x.com\/i\/api\/graphql\/V7H0Br3k...\/UserTweets\nhttps:\/\/x.com\/i\/api\/graphql\/G3KGOASz...\/UserByScreenName\nhttps:\/\/x.com\/i\/api\/graphql\/B9Pw8l1f...\/TweetDetail<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #e0def4\">https:\/\/x.com\/i\/api\/graphql\/V7H0Br3k...\/UserTweets<\/span><\/span>\n<span class=\"line\"><span style=\"color: #e0def4\">https:\/\/x.com\/i\/api\/graphql\/G3KGOASz...\/UserByScreenName<\/span><\/span>\n<span class=\"line\"><span style=\"color: #e0def4\">https:\/\/x.com\/i\/api\/graphql\/B9Pw8l1f...\/TweetDetail<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>They weren\u2019t hidden or being kept a secret, they were just endpoints Twitter\/X\u2019s front-end uses to load data. It was a real \u201cEureka!\u201d moment for me. The responses were the clean, structured JSON I wanted. Twitter\/X doesn&#8217;t render tweets as HTML on the server; it fetches JSON from <a href=\"https:\/\/graphql.org\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">GraphQL<\/a>, then renders it in the browser with <a href=\"https:\/\/proxidize.com\/blog\/what-is-javascript\/\" target=\"_blank\" rel=\"noreferrer noopener\">JavaScript<\/a>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">What GraphQL Actually Gives Us<\/h4>\n\n\n\n<p>When I looked at the GraphQL responses, I realised X\u2019s API returned more data than what was visible on the UI. That\u2019s when I knew I needed to shift my focus to it. It\u2019d be even more useful once I got AI analysis involved \u2014 the more information we have the better.<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large centered\"><img decoding=\"async\" width=\"1024\" height=\"536\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/how-to-scrape-twitter-profile-grabbing-everything-about-the-account-1024x536.jpg\" alt=\"\" class=\"wp-image-88269\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/how-to-scrape-twitter-profile-grabbing-everything-about-the-account-1024x536.jpg 1024w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/how-to-scrape-twitter-profile-grabbing-everything-about-the-account-300x157.jpg 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/how-to-scrape-twitter-profile-grabbing-everything-about-the-account-768x402.jpg 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/how-to-scrape-twitter-profile-grabbing-everything-about-the-account-600x314.jpg 600w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/how-to-scrape-twitter-profile-grabbing-everything-about-the-account.jpg 1200w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"scrape-twitter-profile\">How to Scrape Twitter Profile: Grabbing Everything About the Account<\/h2>\n\n\n\n<p>To test out my code, I needed to pick someone\u2019s profile. I chose <a href=\"https:\/\/x.com\/FabrizioRomano\" target=\"_blank\" rel=\"noreferrer noopener\">Fabrizio Romano<\/a> because the man tweets up to 30 times a day and has had strong opinions throughout his entire career.<\/p>\n\n\n\n<p>Here\u2019s the data our Twitter\/X scraper grabbed from his profile:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>{\n  \"data\": {\n    \"user\": {\n      \"result\": {\n        \"rest_id\": \"330262748\",\n        \"legacy\": {\n          \"screen_name\": \"FabrizioRomano\",\n          \"name\": \"Fabrizio Romano\",\n          \"followers_count\": 26479397,\n          \"friends_count\": 2649,\n          \"statuses_count\": 64187,\n          \"verified\": true,\n          \"profile_image_url_https\": \"...\",\n          \"profile_banner_url\": \"...\",\n          \"description\": \"Here we go! \u00a9...\",\n          \"location\": \"\",\n          \"created_at\": \"...\"\n        }\n      }\n    }\n  }\n}<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">data<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">user<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">result<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">rest_id<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;330262748&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">legacy<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">screen_name<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;FabrizioRomano&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">name<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Fabrizio Romano&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">followers_count<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">26479397<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">friends_count<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">2649<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">statuses_count<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">64187<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">verified<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">true<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">profile_image_url_https<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;...&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">profile_banner_url<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;...&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">description<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Here we go! \u00a9...&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">location<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">created_at<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;...&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      <\/span><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #908CAA\">}<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>This is the data you can pull from individual tweets, which includes everything from number of replies, how many views it got, and so on:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>{\n  \"legacy\": {\n    \"full_text\": \"\ud83d\udea8\u26a0\ufe0f Breaking transfer news...\",\n    \"created_at\": \"Wed Oct 15 11:16:01 +0000 2025\",\n    \"retweet_count\": 152,\n    \"favorite_count\": 2436,\n    \"reply_count\": 250,\n    \"quote_count\": 10,\n    \"entities\": {\n      \"hashtags\": &#091;{\"text\": \"TransferNews\"}&#093;,\n      \"urls\": &#091;{\n        \"url\": \"https:\/\/t.co\/...\",\n        \"expanded_url\": \"https:\/\/...\"\n      }&#093;,\n      \"media\": &#091;{\n        \"type\": \"photo\",\n        \"media_url_https\": \"https:\/\/pbs.twimg.com\/...\"\n      }&#093;\n    }\n  },\n  \"views\": {\n    \"count\": \"118042\"\n  }\n}<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">legacy<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">full_text<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;\ud83d\udea8\u26a0\ufe0f Breaking transfer news...&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">created_at<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Wed Oct 15 11:16:01 +0000 2025&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">retweet_count<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">152<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">favorite_count<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">2436<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">reply_count<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">250<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">quote_count<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">10<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">entities<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">hashtags<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">&#091;{<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">text<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;TransferNews&quot;<\/span><span style=\"color: #908CAA\">}&#093;,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">urls<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">&#091;{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">url<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;https:\/\/t.co\/...&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">expanded_url<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;https:\/\/...&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      <\/span><span style=\"color: #908CAA\">}&#093;,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">media<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">&#091;{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">type<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;photo&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">media_url_https<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;https:\/\/pbs.twimg.com\/...&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      <\/span><span style=\"color: #908CAA\">}&#093;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">views<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">count<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;118042&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #908CAA\">}<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>Concretely, this means that we now have access to the following information:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Full tweet text (not truncated)<\/li>\n\n\n\n<li>Exact engagement metrics<\/li>\n\n\n\n<li>View counts<\/li>\n\n\n\n<li>Media URLs<\/li>\n\n\n\n<li>Profile images and banners<\/li>\n\n\n\n<li>Timestamps in proper ISO format<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How GraphQL Shifted the Paradigm<\/h3>\n\n\n\n<p>Making this discovery changed how I approached the problem in an important way.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Before:<\/strong> \u201cHow do I find the right CSS selectors for this data to extract from the HTML?\u201d<\/li>\n\n\n\n<li><strong>After:<\/strong> \u201cHow do I intercept the GraphQL responses that X is already fetching\u201d<\/li>\n<\/ul>\n\n\n\n<p>The GraphQL API was obviously not designed for scrapers. It was designed for X\u2019s own engineers. As a result:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>It\u2019s stable: <\/strong>Since it\u2019s provided by Twitter\u2019s own engineers, that means it\u2019s stable and won\u2019t break often.<\/li>\n\n\n\n<li><strong>Full of data:<\/strong> It shows more data than we can see on the UI itself.<\/li>\n\n\n\n<li><strong>It\u2019s structured:<\/strong> It has a consistent JSON schema not just random HTML div soup.<\/li>\n<\/ul>\n\n\n\n<p>This made it a great foundation for good data collection to feed into the AI sentiment analysis.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Network Interception: Capturing Clean JSON Instead of Messy HTML<\/h3>\n\n\n\n<p>Finding X\u2019s GraphQL was a big win and I think that\u2019s where Playwright won out for me. Selenium can\u2019t intercept network responses. It\u2019s built for DOM interactions like clicking buttons, filling forms, and finding elements.<\/p>\n\n\n\n<p>If you want to capture API responses in Selenium, you need browser extensions, proxy servers, or hacky workarounds that break often. Playwright has network interception built in.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The Interceptor: One Line that Changes Everything<\/h3>\n\n\n\n<p>Here\u2019s the one line of code that made it all possible:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>self.page.on(\"response\", self._intercept_response)<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">page<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">on<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;response&quot;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">_intercept_response<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>That\u2019s it! Now every HTTP response the browser receives triggers your callback function. When X loads its own tweets, you just need to capture the JSON.<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large centered\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"536\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/how-to-intercept-xs-api_-1024x536.jpg\" alt=\"\" class=\"wp-image-88337\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/how-to-intercept-xs-api_-1024x536.jpg 1024w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/how-to-intercept-xs-api_-300x157.jpg 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/how-to-intercept-xs-api_-768x402.jpg 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/how-to-intercept-xs-api_-600x314.jpg 600w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/how-to-intercept-xs-api_.jpg 1200w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"intercept-x-api\">How to Intercept X\u2019s API<\/h2>\n\n\n\n<p>The snippet below is the actual interceptor I use in the scraper:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>async def _intercept_response(self, response: Response):\n    try:\n        if response.request.resource_type in &#091;\"xhr\", \"fetch\"&#093;:\n            url = response.url\n            \n            if 'graphql' in url.lower() or 'api.twitter.com' in url or 'api.x.com' in url:\n                \n                if 'UserTweets' in url:\n                    self.logger.info(\"Parsing UserTweets response\")\n                    data = await response.json()\n                    self._parse_tweets_from_timeline(data)\n                    \n                elif 'UserByScreenName' in url:\n                    self.logger.info(\"Parsing UserByScreenName response\")\n                    data = await response.json()\n                    self._parse_user_data(data)\n                    \n                elif 'TweetDetail' in url or 'TweetResultByRestId' in url:\n                    self.logger.info(\"Parsing TweetDetail response\")\n                    data = await response.json()\n                    self._parse_single_tweet(data)\n              \n    except Exception as e:\n        self.logger.debug(f\"Error in response interceptor: {e}\")<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">async<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">def<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">_intercept_response<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">self<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #C4A7E7; font-style: italic\">response<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> Response<\/span><span style=\"color: #908CAA\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">try<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">request<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">resource_type <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #F6C177\">&quot;xhr&quot;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;fetch&quot;<\/span><span style=\"color: #908CAA\">&#093;:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            url <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">url<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;graphql&#39;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> url<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">lower<\/span><span style=\"color: #908CAA\">()<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">or<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;api.twitter.com&#39;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> url <\/span><span style=\"color: #3E8FB0\">or<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;api.x.com&#39;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> url<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;UserTweets&#39;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> url<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;Parsing UserTweets response&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    data <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">json<\/span><span style=\"color: #908CAA\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">_parse_tweets_from_timeline<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">data<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #3E8FB0\">elif<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;UserByScreenName&#39;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> url<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;Parsing UserByScreenName response&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    data <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">json<\/span><span style=\"color: #908CAA\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">_parse_user_data<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">data<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #3E8FB0\">elif<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;TweetDetail&#39;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> url <\/span><span style=\"color: #3E8FB0\">or<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;TweetResultByRestId&#39;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> url<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;Parsing TweetDetail response&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    data <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">json<\/span><span style=\"color: #908CAA\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">_parse_single_tweet<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">data<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">              <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">except<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #9CCFD8\">Exception<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">as<\/span><span style=\"color: #E0DEF4\"> e<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">debug<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Error in response interceptor: <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">e<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>This runs in the background while the browser scrolls. X loads more tweets and the interceptor catches the responses.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Scraping Twitter with Selenium vs Playwright<\/h3>\n\n\n\n<p>Let\u2019s illustrate how different the approach to scraping Twitter\/X plays out in practice between Selenium and Playwright.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Scraping X with Selenium<\/h4>\n\n\n\n<p>Before I switched away from Selenium, I had this to setup:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>WebDriverWait(driver, 10).until(\n    EC.presence_of_element_located((By.CSS_SELECTOR, '&#091;data-testid=\"tweet\"&#093;'))\n)\n\ntweets = driver.find_elements(By.CSS_SELECTOR, '&#091;data-testid=\"tweet\"&#093;')\n\nfor tweet in tweets:\n    text = tweet.find_element(By.CSS_SELECTOR, '&#091;data-testid=\"tweetText\"&#093;').text<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #E0DEF4\">WebDriverWait<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">driver<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">10<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">until<\/span><span style=\"color: #908CAA\">(<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">EC<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">presence_of_element_located<\/span><span style=\"color: #908CAA\">((<\/span><span style=\"color: #E0DEF4\">By<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #3E8FB0\">CSS_SELECTOR<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;&#091;data-testid=&quot;tweet&quot;&#093;&#39;<\/span><span style=\"color: #908CAA\">))<\/span><\/span>\n<span class=\"line\"><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">tweets <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> driver<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">find_elements<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">By<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #3E8FB0\">CSS_SELECTOR<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;&#091;data-testid=&quot;tweet&quot;&#093;&#39;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">for<\/span><span style=\"color: #E0DEF4\"> tweet <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> tweets<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    text <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> tweet<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">find_element<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">By<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #3E8FB0\">CSS_SELECTOR<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;&#091;data-testid=&quot;tweetText&quot;&#093;&#39;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">text<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>With Selenium, you need to wait for elements to load, then find all tweet elements, then parse each one of them in the hope they will work. This is how most scrapers work, targeting HTML.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Intercepting X API with Playwright<\/h4>\n\n\n\n<p>By contrast, using Playwright to intercept X\u2019s API is simple.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>await self.page.evaluate('window.scrollBy(0, window.innerHeight * 0.8)')<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">page<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">evaluate<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;window.scrollBy(0, window.innerHeight * 0.8)&#39;<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>You\u2019re not waiting for elements to render and you don\u2019t have to find selectors. You\u2019re just capturing the data X\u2019s is already loading and fetching. The browser is essentially doing the work and we\u2019re just intercepting the results.<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large centered\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"536\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/parsing-xs-timeline-data-1024x536.jpg\" alt=\"\" class=\"wp-image-88348\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/parsing-xs-timeline-data-1024x536.jpg 1024w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/parsing-xs-timeline-data-300x157.jpg 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/parsing-xs-timeline-data-768x402.jpg 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/parsing-xs-timeline-data-600x314.jpg 600w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/parsing-xs-timeline-data.jpg 1200w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"timeline-data\">Parsing X\u2019s Timeline Data<\/h2>\n\n\n\n<p>When X loads tweets, the GraphQL response looks like this:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>def _parse_tweets_from_timeline(self, data: Dict):\n    try:\n        instructions = jmespath.search(\n            'data.user.result.timeline_v2.timeline.instructions', \n            data\n        )\n        \n        if not instructions:\n            self.logger.warning(\"No timeline instructions found\")\n            return\n        \n        for instruction in instructions:\n            if instruction.get('type') == 'TimelineAddEntries':\n                entries = instruction.get('entries', [])\n                self.logger.info(f\"Found {len(entries)} entries in timeline\")\n                \n                tweet_count = 0\n                for entry in entries:\n                    entry_id = entry.get('entryId', '')\n                    \n                    if not entry_id.startswith('tweet-'):\n                        continue\n                    \n                    tweet_result = jmespath.search(\n                        'content.itemContent.tweet_results.result', \n                        entry\n                    )\n                    \n                    if tweet_result:\n                        parsed_tweet = self._extract_tweet_data(tweet_result)\n                        if parsed_tweet and parsed_tweet&#091;'id'&#093; not in self.scraped_tweet_ids:\n                            self.all_tweets.append(parsed_tweet)\n                            self.scraped_tweet_ids.add(parsed_tweet&#091;'id'&#093;)\n                            tweet_count += 1\n                \n                if tweet_count > 0:\n                    self.logger.info(f\"Extracted {tweet_count} tweets from this batch\")\n                    \n    except Exception as e:\n        self.logger.error(f\"Error parsing timeline tweets: {e}\", exc_info=True)<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">def<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">_parse_tweets_from_timeline<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">self<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #C4A7E7; font-style: italic\">data<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> Dict<\/span><span style=\"color: #908CAA\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">try<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        instructions <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> jmespath<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">search<\/span><span style=\"color: #908CAA\">(<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #F6C177\">&#39;data.user.result.timeline_v2.timeline.instructions&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            data<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">not<\/span><span style=\"color: #E0DEF4\"> instructions<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">warning<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;No timeline instructions found&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #3E8FB0\">return<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">for<\/span><span style=\"color: #E0DEF4\"> instruction <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> instructions<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> instruction<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;type&#39;<\/span><span style=\"color: #908CAA\">)<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">==<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;TimelineAddEntries&#39;<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                entries <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> instruction<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;entries&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">[])<\/span><\/span>\n<span class=\"line cbp-see-more-line cbp-see-more-transition\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Found <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #EB6F92; font-style: italic\">len<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">entries<\/span><span style=\"color: #908CAA\">)<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\"> entries in timeline&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                tweet_count <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #3E8FB0\">for<\/span><span style=\"color: #E0DEF4\"> entry <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> entries<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    entry_id <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> entry<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;entryId&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;&#39;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">not<\/span><span style=\"color: #E0DEF4\"> entry_id<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">startswith<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;tweet-&#39;<\/span><span style=\"color: #908CAA\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                        <\/span><span style=\"color: #3E8FB0\">continue<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    tweet_result <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> jmespath<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">search<\/span><span style=\"color: #908CAA\">(<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                        <\/span><span style=\"color: #F6C177\">&#39;content.itemContent.tweet_results.result&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                        entry<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> tweet_result<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                        parsed_tweet <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">_extract_tweet_data<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">tweet_result<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                        <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> parsed_tweet <\/span><span style=\"color: #3E8FB0\">and<\/span><span style=\"color: #E0DEF4\"> parsed_tweet<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #F6C177\">&#39;id&#39;<\/span><span style=\"color: #908CAA\">&#093;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">not<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scraped_tweet_ids<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                            <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">all_tweets<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">append<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">parsed_tweet<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                            <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scraped_tweet_ids<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">add<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">parsed_tweet<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #F6C177\">&#39;id&#39;<\/span><span style=\"color: #908CAA\">&#093;)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                            tweet_count <\/span><span style=\"color: #3E8FB0\">+=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">1<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> tweet_count <\/span><span style=\"color: #3E8FB0\">&gt;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Extracted <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">tweet_count<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\"> tweets from this batch&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">except<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #9CCFD8\">Exception<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">as<\/span><span style=\"color: #E0DEF4\"> e<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">error<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Error parsing timeline tweets: <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">e<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">&quot;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #C4A7E7; font-style: italic\">exc_info<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #EA9A97\">True<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><div class=\"cbp-see-more-container\" data-see-more-collapse-string=\"Collapse\" data-see-more-string=\"Expand\" style=\"display:flex;flex-direction:column;align-items:flex-end;width:100%;background-color:transparent;font-size:12px;line-height:1;position:relative;margin-bottom:-16px;height:32px\"><span role=\"button\" tabindex=\"0\" class=\"cbp-see-more-simple-btn cbp-see-more-simple-btn-hover\" style=\"color:#cecbee;background-color:#232136;padding:10px 16px;cursor:default\">Expand<\/span><\/div><\/div>\n\n\n\n<p><\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Extracting Individual Tweet Data<\/h3>\n\n\n\n<p>Once you have the tweet results object, extraction is clean and straightforward:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>def _extract_tweet_data(self, tweet_result: Dict) -> Optional[Dict&#091;str, Any&#093;]:\n    try:\n        if tweet_result.get('__typename') == 'TweetWithVisibilityResults':\n            tweet_result = tweet_result.get('tweet', {})\n        \n        legacy = tweet_result.get('legacy', {})\n        tweet_id = tweet_result.get('rest_id', '')\n        \n        user_result = tweet_result.get('core', {}).get('user_results', {}).get('result', {})\n        user_legacy = user_result.get('legacy', {})\n        \n        media = []\n        extended_entities = legacy.get('extended_entities', {})\n        for media_item in extended_entities.get('media', []):\n            media_info = {\n                'type': media_item.get('type', ''),\n                'url': media_item.get('media_url_https', '')\n            }\n            if media_item.get('type') == 'video':\n                variants = media_item.get('video_info', {}).get('variants', [])\n                video_variants = &#091;v for v in variants if v.get('content_type') == 'video\/mp4'&#093;\n                if video_variants:\n                    media_info&#091;'video_url'&#093; = max(video_variants, key=lambda x: x.get('bitrate', 0))&#091;'url'&#093;\n            media.append(media_info)\n        \n        tweet_data = {\n            'id': tweet_id,\n            'text': legacy.get('full_text', ''),\n            'created_at': legacy.get('created_at', ''),\n            'user': {\n                'username': user_legacy.get('screen_name', ''),\n                'display_name': user_legacy.get('name', ''),\n                'followers_count': user_legacy.get('followers_count', 0),\n                'verified': user_result.get('is_blue_verified', False)\n            },\n            'metrics': {\n                'retweet_count': legacy.get('retweet_count', 0),\n                'favorite_count': legacy.get('favorite_count', 0),\n                'reply_count': legacy.get('reply_count', 0),\n                'quote_count': legacy.get('quote_count', 0),\n                'view_count': tweet_result.get('views', {}).get('count', 0)\n            },\n            'hashtags': [ht.get('text', '') for ht in legacy.get('entities', {}).get('hashtags', [])],\n            'media': media,\n            'is_retweet': legacy.get('retweeted', False),\n            'is_reply': legacy.get('in_reply_to_status_id_str') is not None,\n            'scraped_at': time.time()\n        }\n        \n        return tweet_data\n        \n    except Exception as e:\n        self.logger.debug(f\"Error extracting tweet data: {e}\")\n        return None<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">def<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">_extract_tweet_data<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">self<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #C4A7E7; font-style: italic\">tweet_result<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> Dict<\/span><span style=\"color: #908CAA\">)<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">-&gt;<\/span><span style=\"color: #E0DEF4\"> Optional<\/span><span style=\"color: #908CAA\">[<\/span><span style=\"color: #E0DEF4\">Dict<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #9CCFD8\">str<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> Any<\/span><span style=\"color: #908CAA\">&#093;]:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">try<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> tweet_result<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;__typename&#39;<\/span><span style=\"color: #908CAA\">)<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">==<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;TweetWithVisibilityResults&#39;<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            tweet_result <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> tweet_result<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;tweet&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{})<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        legacy <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> tweet_result<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;legacy&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{})<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        tweet_id <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> tweet_result<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;rest_id&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;&#39;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        user_result <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> tweet_result<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;core&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{}).<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;user_results&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{}).<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;result&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{})<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        user_legacy <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> user_result<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;legacy&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{})<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        media <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">[]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        extended_entities <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> legacy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;extended_entities&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{})<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">for<\/span><span style=\"color: #E0DEF4\"> media_item <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> extended_entities<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;media&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">[]):<\/span><\/span>\n<span class=\"line cbp-see-more-line cbp-see-more-transition\"><span style=\"color: #E0DEF4\">            media_info <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;type&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> media_item<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;type&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;&#39;<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;url&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> media_item<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;media_url_https&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;&#39;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> media_item<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;type&#39;<\/span><span style=\"color: #908CAA\">)<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">==<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;video&#39;<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                variants <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> media_item<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;video_info&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{}).<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;variants&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">[])<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                video_variants <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #E0DEF4\">v <\/span><span style=\"color: #3E8FB0\">for<\/span><span style=\"color: #E0DEF4\"> v <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> variants <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> v<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;content_type&#39;<\/span><span style=\"color: #908CAA\">)<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">==<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;video\/mp4&#39;<\/span><span style=\"color: #908CAA\">&#093;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> video_variants<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    media_info<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #F6C177\">&#39;video_url&#39;<\/span><span style=\"color: #908CAA\">&#093;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EB6F92; font-style: italic\">max<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">video_variants<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #C4A7E7; font-style: italic\">key<\/span><span style=\"color: #3E8FB0\">=lambda<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #C4A7E7; font-style: italic\">x<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> x<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;bitrate&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">))&#091;<\/span><span style=\"color: #F6C177\">&#39;url&#39;<\/span><span style=\"color: #908CAA\">&#093;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            media<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">append<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">media_info<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        tweet_data <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #F6C177\">&#39;id&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> tweet_id<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #F6C177\">&#39;text&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> legacy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;full_text&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;&#39;<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #F6C177\">&#39;created_at&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> legacy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;created_at&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;&#39;<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #F6C177\">&#39;user&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;username&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> user_legacy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;screen_name&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;&#39;<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;display_name&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> user_legacy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;name&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;&#39;<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;followers_count&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> user_legacy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;followers_count&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;verified&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> user_result<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;is_blue_verified&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">False<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #908CAA\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #F6C177\">&#39;metrics&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;retweet_count&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> legacy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;retweet_count&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;favorite_count&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> legacy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;favorite_count&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;reply_count&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> legacy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;reply_count&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;quote_count&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> legacy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;quote_count&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;view_count&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> tweet_result<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;views&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{}).<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;count&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #908CAA\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #F6C177\">&#39;hashtags&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">[<\/span><span style=\"color: #E0DEF4\">ht<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;text&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;&#39;<\/span><span style=\"color: #908CAA\">)<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">for<\/span><span style=\"color: #E0DEF4\"> ht <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> legacy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;entities&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{}).<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;hashtags&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">[])],<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #F6C177\">&#39;media&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> media<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #F6C177\">&#39;is_retweet&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> legacy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;retweeted&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">False<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #F6C177\">&#39;is_reply&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> legacy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;in_reply_to_status_id_str&#39;<\/span><span style=\"color: #908CAA\">)<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">is<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">not<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">None<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #F6C177\">&#39;scraped_at&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> time<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">time<\/span><span style=\"color: #908CAA\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">return<\/span><span style=\"color: #E0DEF4\"> tweet_data<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">except<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #9CCFD8\">Exception<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">as<\/span><span style=\"color: #E0DEF4\"> e<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">debug<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Error extracting tweet data: <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">e<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">return<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">None<\/span><\/span><\/code><\/pre><div class=\"cbp-see-more-container\" data-see-more-collapse-string=\"Collapse\" data-see-more-string=\"Expand\" style=\"display:flex;flex-direction:column;align-items:flex-end;width:100%;background-color:transparent;font-size:12px;line-height:1;position:relative;margin-bottom:-16px;height:32px\"><span role=\"button\" tabindex=\"0\" class=\"cbp-see-more-simple-btn cbp-see-more-simple-btn-hover\" style=\"color:#cecbee;background-color:#232136;padding:10px 16px;cursor:default\">Expand<\/span><\/div><\/div>\n\n\n\n<p><\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Built-in Duplicate Prevention<\/h3>\n\n\n\n<p>To prevent any duplicated tweet I added this line to the code that checks if the tweet is duplicated based on its ID:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>if parsed_tweet&#091;'id'&#093; not in self.scraped_tweet_ids:\n    self.all_tweets.append(parsed_tweet)\n    self.scraped_tweet_ids.add(parsed_tweet&#091;'id'&#093;)<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> parsed_tweet<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #F6C177\">&#39;id&#39;<\/span><span style=\"color: #908CAA\">&#093;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">not<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scraped_tweet_ids<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">all_tweets<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">append<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">parsed_tweet<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scraped_tweet_ids<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">add<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">parsed_tweet<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #F6C177\">&#39;id&#39;<\/span><span style=\"color: #908CAA\">&#093;)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large centered\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"536\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/built-in-proxy-support-1024x536.jpg\" alt=\"\" class=\"wp-image-88353\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/built-in-proxy-support-1024x536.jpg 1024w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/built-in-proxy-support-300x157.jpg 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/built-in-proxy-support-768x402.jpg 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/built-in-proxy-support-600x314.jpg 600w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/built-in-proxy-support.jpg 1200w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"twitter-proxy-integration\">Built-in Proxy Support (Without the Headache)<\/h2>\n\n\n\n<p>As we know, web scraping in general needs proxies to achieve, but it\u2019s not something mandatory. If you were to scrape from just one IP, you\u2019d definitely be banned though, which is why we use proxies and IP rotation to prevent that (and get as much data as we can).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Another Reason I Didn\u2019t Use Selenium<\/h3>\n\n\n\n<p>In Selenium, proxy authentication is a disaster. Basic proxies (no auth) are simple enough:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>chrome_options = Options()\nchrome_options.add_argument('--proxy-server=http:\/\/proxy.com:8080')\ndriver = webdriver.Chrome(options=chrome_options)<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #E0DEF4\">chrome_options <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> Options<\/span><span style=\"color: #908CAA\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">chrome_options<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">add_argument<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;--proxy-server=http:\/\/proxy.com:8080&#39;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">driver <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> webdriver<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">Chrome<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">options<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\">chrome_options<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>But let\u2019s be real here, you need a proxy with authentication. You <em>can<\/em> make it work in a situation like this with an unauthenticated proxy,but you have to jump through one of a few hoops:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Create a Chrome Extension:<\/strong> Building one is not that hard, but it will add complexity to the code and you will need to maintain it every now and then.<\/li>\n\n\n\n<li><strong>Use a Proxy Server Wrapper:<\/strong> Running a local proxy server that handles authentication, then point Selenium at localhost. More infrastructure. More complexity and of course more things to break.<\/li>\n\n\n\n<li><strong>Environment Variables: <\/strong>Well this works from some tools, but in my case it didn\u2019t because it&#8217;s just not reliable enough.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Playwright Proxy Authentication: One Dictionary<\/h3>\n\n\n\n<p>Playwright\u2019s proxy setup is one clean dictionary:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>browser_args = {\n    'proxy': {\n        'server': 'your-server',\n        'username': 'your-username',\n        'password': 'your-password'\n    }\n}\n\nbrowser = await self.playwright.chromium.launch(**browser_args)<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #E0DEF4\">browser_args <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #F6C177\">&#39;proxy&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #F6C177\">&#39;server&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;your-server&#39;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #F6C177\">&#39;username&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;your-username&#39;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #F6C177\">&#39;password&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;your-password&#39;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">browser <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">playwright<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">chromium<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">launch<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">**<\/span><span style=\"color: #E0DEF4\">browser_args<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>That\u2019s it. Native username\/password authentication \u2014 no extensions; no local proxy servers; no environment variable hack.<\/p>\n\n\n\n<p>Here\u2019s the actual implementation from the scraper:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>async def initialize(self):\n    try:\n        self.playwright = await async_playwright().start()\n        \n        browser_args = {\n            'headless': False,\n            'args': &#091;\n                '--disable-blink-features=AutomationControlled',\n                '--disable-dev-shm-usage',\n                '--no-sandbox',\n            &#093;\n        }\n        \n        if self.proxy_config and self.proxy_config.get('enable_proxy_rotation'):\n            proxy_list = self.proxy_config.get('proxies', [])\n            if proxy_list:\n                proxy_str = proxy_list&#091;0&#093; \n                parts = proxy_str.split(':')\n                \n                if len(parts) == 4:\n                    host, port, username, password = parts\n                    browser_args&#091;'proxy'&#093; = {\n                        'server': f'http:\/\/{host}:{port}',\n                        'username': username,\n                        'password': password\n                    }\n                    self.logger.info(f\"Using proxy: {username}@{host}:{port}\")\n                    self.logger.info(\"Note: First connection through proxy may take 30-60 seconds...\")\n        \n        self.browser = await self.playwright.chromium.launch(**browser_args)\n        \n        self.context = await self.browser.new_context(\n            viewport={'width': 1920, 'height': 1080},\n            user_agent='Mozilla\/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit\/537.36',\n            locale='en-US',\n            timezone_id='America\/New_York'\n        )\n        \n        self.logger.info(\"Playwright browser initialized successfully\")\n        return True\n        \n    except Exception as e:\n        self.logger.error(f\"Failed to initialize Playwright: {e}\")\n        return False<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">async<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">def<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">initialize<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">self<\/span><span style=\"color: #908CAA\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">try<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">playwright <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> async_playwright<\/span><span style=\"color: #908CAA\">().<\/span><span style=\"color: #E0DEF4\">start<\/span><span style=\"color: #908CAA\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        browser_args <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #F6C177\">&#39;headless&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">False<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #F6C177\">&#39;args&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">&#091;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;--disable-blink-features=AutomationControlled&#39;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;--disable-dev-shm-usage&#39;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;--no-sandbox&#39;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #908CAA\">&#093;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">proxy_config <\/span><span style=\"color: #3E8FB0\">and<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">proxy_config<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;enable_proxy_rotation&#39;<\/span><span style=\"color: #908CAA\">):<\/span><\/span>\n<span class=\"line cbp-see-more-line cbp-see-more-transition\"><span style=\"color: #E0DEF4\">            proxy_list <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">proxy_config<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;proxies&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">[])<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> proxy_list<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                proxy_str <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> proxy_list<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">&#093;<\/span><span style=\"color: #E0DEF4\"> <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                parts <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> proxy_str<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">split<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;:&#39;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EB6F92; font-style: italic\">len<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">parts<\/span><span style=\"color: #908CAA\">)<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">==<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">4<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    host<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> port<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> username<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> password <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> parts<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    browser_args<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #F6C177\">&#39;proxy&#39;<\/span><span style=\"color: #908CAA\">&#093;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                        <\/span><span style=\"color: #F6C177\">&#39;server&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&#39;http:\/\/<\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">host<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">:<\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">port<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">&#39;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                        <\/span><span style=\"color: #F6C177\">&#39;username&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> username<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                        <\/span><span style=\"color: #F6C177\">&#39;password&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> password<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Using proxy: <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">username<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">@<\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">host<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">:<\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">port<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;Note: First connection through proxy may take 30-60 seconds...&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">browser <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">playwright<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">chromium<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">launch<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">**<\/span><span style=\"color: #E0DEF4\">browser_args<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">context <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">browser<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">new_context<\/span><span style=\"color: #908CAA\">(<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #C4A7E7; font-style: italic\">viewport<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #908CAA\">{<\/span><span style=\"color: #F6C177\">&#39;width&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">1920<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;height&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">1080<\/span><span style=\"color: #908CAA\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #C4A7E7; font-style: italic\">user_agent<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #F6C177\">&#39;Mozilla\/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit\/537.36&#39;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #C4A7E7; font-style: italic\">locale<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #F6C177\">&#39;en-US&#39;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #C4A7E7; font-style: italic\">timezone_id<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #F6C177\">&#39;America\/New_York&#39;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;Playwright browser initialized successfully&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">return<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">True<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">except<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #9CCFD8\">Exception<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">as<\/span><span style=\"color: #E0DEF4\"> e<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">error<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Failed to initialize Playwright: <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">e<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">return<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">False<\/span><\/span><\/code><\/pre><div class=\"cbp-see-more-container\" data-see-more-collapse-string=\"Collapse\" data-see-more-string=\"Expand\" style=\"display:flex;flex-direction:column;align-items:flex-end;width:100%;background-color:transparent;font-size:12px;line-height:1;position:relative;margin-bottom:-16px;height:32px\"><span role=\"button\" tabindex=\"0\" class=\"cbp-see-more-simple-btn cbp-see-more-simple-btn-hover\" style=\"color:#cecbee;background-color:#232136;padding:10px 16px;cursor:default\">Expand<\/span><\/div><\/div>\n\n\n\n<p><\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>The above code will take the environment variables from the configuration file which is config.ini, it should be something like this:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>&#091;PROXY&#093;\nenable_proxy_rotation = true\n# Format: host:port:username:password\nproxy_list = your-proxy\nproxy_timeout = 15<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #e0def4\">&#091;PROXY&#093;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #e0def4\">enable_proxy_rotation = true<\/span><\/span>\n<span class=\"line\"><span style=\"color: #e0def4\"># Format: host:port:username:password<\/span><\/span>\n<span class=\"line\"><span style=\"color: #e0def4\">proxy_list = your-proxy<\/span><\/span>\n<span class=\"line\"><span style=\"color: #e0def4\">proxy_timeout = 15<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>The code will parse it. Split by colons and pass it to the Playwright.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Automatic IP Rotation: The Secret Weapon<\/h3>\n\n\n\n<p>Proxidize\u2019s mobile proxies offer the ability to rotate IP addresses automatically, which helps us make the most of a single proxy when scraping a platform like X. We\u2019re using mobile proxies specifically because they are hard to detect. I set it to rotate every minute, but you can set the rotation interval to whatever you need.<\/p>\n\n\n\n<p>Why does this matter to us?<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Mobile proxies:<\/strong> They are super helpful \u2014 it\u2019s hard to detect them, since they seem like a real IP to the servers, and the barrier to banning them is higher because of <a href=\"https:\/\/proxidize.com\/blog\/what-is-ccgnat\/\" target=\"_blank\" rel=\"noreferrer noopener\">CGNAT<\/a>.<\/li>\n\n\n\n<li><strong>Automatic IP rotation:<\/strong> A new IP address every 60 seconds without having to intervene manually is a big plus here.<\/li>\n<\/ul>\n\n\n\n<p>Normally the first connection takes between 30\u201360 seconds, because of its connections to the proxy server, establishing tunnels and DNS resolution, but after that it becomes very fast.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">HTTP vs SOCKS5 Support and the Anti-Detection Stack<\/h4>\n\n\n\n<p>Playwright supports both HTTP and SOCKS5 proxies. In our case we are using an HTTP proxy; the reason we didn\u2019t use SOCKS5 is because it would add an unnecessary layer of complexity without any additional benefits.<\/p>\n\n\n\n<p>Proxies alone are not enough to scrape Twitter, because X also checks:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>User Agent:<\/strong> Does it look like a real browser?<\/li>\n\n\n\n<li><strong>Viewport Size:<\/strong> Is it a realistic screen resolution?<\/li>\n\n\n\n<li><strong>Locale\/Timezone:<\/strong> Do location signals match?<\/li>\n\n\n\n<li><strong>Automation Flags:<\/strong> Does the browser show signs that it\u2019s being automated?<\/li>\n<\/ul>\n\n\n\n<p>Luckily, we can have Playwright handle all of this for us:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>self.context = await self.browser.new_context(\n    viewport={'width': 1920, 'height': 1080},  # Standard desktop resolution\n    user_agent='Mozilla\/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/120.0.0.0 Safari\/537.36',\n    locale='en-US',\n    timezone_id='America\/New_York'\n)<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">context <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">browser<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">new_context<\/span><span style=\"color: #908CAA\">(<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #C4A7E7; font-style: italic\">viewport<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #908CAA\">{<\/span><span style=\"color: #F6C177\">&#39;width&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">1920<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;height&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">1080<\/span><span style=\"color: #908CAA\">},<\/span><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> Standard desktop resolution<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #C4A7E7; font-style: italic\">user_agent<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #F6C177\">&#39;Mozilla\/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/120.0.0.0 Safari\/537.36&#39;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #C4A7E7; font-style: italic\">locale<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #F6C177\">&#39;en-US&#39;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #C4A7E7; font-style: italic\">timezone_id<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #F6C177\">&#39;America\/New_York&#39;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>Here we create a new browser context, set the window resolution to something realistic, and define the User-Agent string. It also tells the browser to make &#8216;en-US&#8217; the default language and set the browser&#8217;s timezone to New York.<\/p>\n\n\n\n<p>By adding the next bit of code we can hide the automation.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>'args': &#091;\n    '--disable-blink-features=AutomationControlled',\n&#093;<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #F6C177\">&#39;args&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">&#091;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #F6C177\">&#39;--disable-blink-features=AutomationControlled&#39;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #908CAA\">&#093;<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>So following these things it will show Twitter\/X that a user from New York is browsing the website from a Mac, which looks like a real user.<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large centered\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"536\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/scraping-twitter_x-over-multiple-sessions-1024x536.jpg\" alt=\"\" class=\"wp-image-88355\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/scraping-twitter_x-over-multiple-sessions-1024x536.jpg 1024w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/scraping-twitter_x-over-multiple-sessions-300x157.jpg 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/scraping-twitter_x-over-multiple-sessions-768x402.jpg 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/scraping-twitter_x-over-multiple-sessions-600x314.jpg 600w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/scraping-twitter_x-over-multiple-sessions.jpg 1200w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"scraping-sessions\">Scraping Twitter\/X over Multiple Sessions (And Not Getting Banned)<\/h2>\n\n\n\n<p>X\u2019s login flow is very strict \u2014 it\u2019s like a guard hovering over your shoulder, asking you for your ID every time you want to do anything. It\u2019s infuriating and instantly prompted the question: how do we avoid the constant checks?<\/p>\n\n\n\n<p>Most scrapers treat authentication like a chore they have to repeat. Log in, scrape, close the browser, lose the session. It\u2019s the same story the next time: Log in again, and again, and again. For each login attempt we:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Waste 30\u201360 seconds<\/li>\n\n\n\n<li>Give X another chance to flag us for suspicious activity<\/li>\n\n\n\n<li>Risk of hitting rate limits<\/li>\n<\/ul>\n\n\n\n<p>I learned that the best way to avoid a ban is to only login once in a while. Not only does this look more human to Twitter\/X, it also decreases our chances of our session being blocked.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The Cookie Strategy<\/h3>\n\n\n\n<p>Let\u2019s be real here, cookies are like your authentication insurance policy. When you successfully log into X, the browser stores authentication cookies. These cookies are proof that X already knows you and that you are verified. They contain session tokens, user IDs, authentication signatures, and more \u2014 in other words everything about you. So we save those cookies to a file and load them next time to skip the entire login process. After you login successfully, Playwright lets you export all cookies:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>cookies = await self.context.cookies()\nPath('playwright_cookies.json').write_text(json.dumps(cookies, indent=2))\nself.logger.info(f\"Saved {len(cookies)} cookies to playwright_cookies.json\")<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #E0DEF4\">cookies <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">context<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">cookies<\/span><span style=\"color: #908CAA\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">Path<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;playwright_cookies.json&#39;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">write_text<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">json<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">dumps<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">cookies<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #C4A7E7; font-style: italic\">indent<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #EA9A97\">2<\/span><span style=\"color: #908CAA\">))<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Saved <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #EB6F92; font-style: italic\">len<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">cookies<\/span><span style=\"color: #908CAA\">)<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\"> cookies to playwright_cookies.json&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>In one fell swoop, X\u2019s entire authentication state gets exported to JSON. On the next run we can check if there&#8217;s a cookie, grab the necessary bits, and add those to the browser to log in.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>if Path('playwright_cookies.json').exists():\n    try:\n        cookies_data = json.loads(Path('playwright_cookies.json').read_text())\n        if cookies_data:\n            await self.context.add_cookies(cookies_data)\n            self.is_logged_in = True\n            self.logger.info(\"Loaded saved cookies - will skip login\")\n    except Exception as e:\n        self.logger.warning(f\"Failed to load cookies: {e}\")<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> Path<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;playwright_cookies.json&#39;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">exists<\/span><span style=\"color: #908CAA\">():<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">try<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        cookies_data <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> json<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">loads<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">Path<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;playwright_cookies.json&#39;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">read_text<\/span><span style=\"color: #908CAA\">())<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> cookies_data<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">context<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">add_cookies<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">cookies_data<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">is_logged_in <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">True<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;Loaded saved cookies - will skip login&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">except<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #9CCFD8\">Exception<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">as<\/span><span style=\"color: #E0DEF4\"> e<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">warning<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Failed to load cookies: <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">e<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>This saves us a few minutes and verifies our session, which makes our odds of being banned quite low. Sometimes the cookies will expire or be invalidated by X, so we also run a quick test to see whether the cookie\u2019s still valid. I did this by looking for the compose button (SideNav_NewTweet_Button), which only appears when you\u2019re authenticated.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Avoiding Detection: Looking Human (Enough) While Web Scraping<\/h3>\n\n\n\n<p>It\u2019s every scraper problem, how to look more human to any platform you scrape? Your browser fingerprint is everywhere. If you visit X\u2019s platform or any other platform they usually look for:<br><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Your User Agent (browser version, OS)<\/li>\n\n\n\n<li>Your screen resolution<\/li>\n\n\n\n<li>Your timezone and locale<\/li>\n\n\n\n<li>JavaScript capabilities<\/li>\n\n\n\n<li>WebGL renderer information<\/li>\n\n\n\n<li>Canvas fingerprinting<\/li>\n\n\n\n<li>Automation signs or signals (the most important one here)<\/li>\n<\/ul>\n\n\n\n<p>You can fake most of these, but the one category that will kill your scraper dead is if you get flagged as automation.<\/p>\n\n\n\n<p>As we know, Selenium and Playwright are both automation tools. They\u2019re designed to help us scrape websites to get the data we want. This is where it becomes difficult to avoid detection. For example, when Chrome launches via Selenium, it literally advertises itself as an automation tool!<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>navigator.webdriver === true  \/\/ \"Hi, I'm automated!\"<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #E0DEF4\">navigator<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">webdriver <\/span><span style=\"color: #3E8FB0\">===<\/span><span style=\"color: #E0DEF4\"> true  <\/span><span style=\"color: #3E8FB0\">\/\/<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Hi, I&#39;m automated!&quot;<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>X checks this. If navigator.webdriver is true, you\u2019re done. You\u2019ll be blocked, flagged and banned. Selenium tries to hide it, but it doesn\u2019t work everytime unfortunately.<\/p>\n\n\n\n<p>Playwright solves this problem with a single flag that does most of the work.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>browser_args = {\n    'headless': False,\n    'args': &#091;\n        '--disable-blink-features=AutomationControlled',\n    &#093;\n}<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #E0DEF4\">browser_args <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #F6C177\">&#39;headless&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">False<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #F6C177\">&#39;args&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">&#091;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #F6C177\">&#39;--disable-blink-features=AutomationControlled&#39;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&#093;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #908CAA\">}<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><code>\u2013disable-blink-features=AutomationControlled<\/code> tells Chrome to stop advertising the fact it\u2019s being automated. It\u2019s not perfect; advanced fingerprinting can still detect Playwright, but X\u2019s detection is not that strong (yet).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Human-Like Web Scraping: When \u201cGood Enough\u201d Is Good Enough<\/h3>\n\n\n\n<p>Notice that I\u2019m not trying to be perfect here; I\u2019m only trying to be good enough. Perfection would require a lot of work on the code that would end up being overkill in most cases. As long as the code\u2019s good enough in the following areas, we can actually avoid detection:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Hide any automation flags<\/li>\n\n\n\n<li>Use realistic User Agent<\/li>\n\n\n\n<li>Match viewport to user agent<\/li>\n\n\n\n<li>Consistent locale\/timezone<\/li>\n\n\n\n<li>Residential\/mobile proxies<\/li>\n\n\n\n<li>Human-like scroll timing (3\u20136 second delays)<\/li>\n<\/ul>\n\n\n\n<p>That\u2019s good for now and you might say that if X updates their detection, well they will and they always do that, so for that you need to check the list again, you want to check what changed, any updates required from your sides, it\u2019s a battle that will never end to be fair.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"error-handling\">Error Handling: For When of Course Things Go Wrong for Some Reason<\/h2>\n\n\n\n<p>Your code will inevitably break, that\u2019s something we all know as developers. That\u2019s why having a good error handling system can make it a bit easier to fix errors down the line.<\/p>\n\n\n\n<p>With scrapers and proxies, these are the most common problems:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Authentication failures:<\/strong> login broken, cookies expired, account locked<\/li>\n\n\n\n<li><strong>Network failures:<\/strong> Proxy timeout, connections drops, rate limits<\/li>\n\n\n\n<li><strong>Parsing failures:<\/strong> GraphQL response changed, data format different<\/li>\n\n\n\n<li><strong>Browser failures:<\/strong> Playwright crashes, page won\u2019t load, selectors missing<\/li>\n<\/ul>\n\n\n\n<p>Each of these categories needs its own error handling, so building a system for it isn\u2019t optional.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Network Failures: Retry and Move On<\/h3>\n\n\n\n<p>Proxies timeout and connections drop. It happens.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>async def _intercept_response(self, response: Response):\n    try:\n        if response.request.resource_type in &#091;\"xhr\", \"fetch\"&#093;:\n            url = response.url\n            \n            if 'graphql' in url.lower():\n                if 'UserTweets' in url:\n                    try:\n                        data = await response.json()\n                        self._parse_tweets_from_timeline(data)\n                    except Exception as e:\n                        self.logger.warning(f\"Failed to parse response from {url&#091;:100&#093;}: {e}\")\n                        \n    except Exception as e:\n        self.logger.debug(f\"Error in response interceptor: {e}\")<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">async<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">def<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">_intercept_response<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">self<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #C4A7E7; font-style: italic\">response<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> Response<\/span><span style=\"color: #908CAA\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">try<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">request<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">resource_type <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #F6C177\">&quot;xhr&quot;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;fetch&quot;<\/span><span style=\"color: #908CAA\">&#093;:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            url <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">url<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;graphql&#39;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> url<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">lower<\/span><span style=\"color: #908CAA\">():<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;UserTweets&#39;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> url<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><span style=\"color: #3E8FB0\">try<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                        data <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">json<\/span><span style=\"color: #908CAA\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                        <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">_parse_tweets_from_timeline<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">data<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><span style=\"color: #3E8FB0\">except<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #9CCFD8\">Exception<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">as<\/span><span style=\"color: #E0DEF4\"> e<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                        <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">warning<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Failed to parse response from <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">url<\/span><span style=\"color: #908CAA\">&#091;:<\/span><span style=\"color: #EA9A97\">100<\/span><span style=\"color: #908CAA\">&#093;<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">: <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">e<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">except<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #9CCFD8\">Exception<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">as<\/span><span style=\"color: #E0DEF4\"> e<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">debug<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Error in response interceptor: <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">e<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>If one of the responses failed, we just move on. The scraper doesn&#8217;t stall out because of one crash.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Parsing Failures: Defensive Extraction<\/h3>\n\n\n\n<p>X\u2019s GraphQL responses are nested nightmares. Sometimes fields are empty or missing, or the structure changes.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>def _extract_tweet_data(self, tweet_result: Dict) -> Optional[Dict&#091;str, Any&#093;]:\n    try:\n        if tweet_result.get('__typename') == 'TweetWithVisibilityResults':\n            tweet_result = tweet_result.get('tweet', {})\n        \n        legacy = tweet_result.get('legacy', {})\n        tweet_id = tweet_result.get('rest_id', '')\n        \n        tweet_data = {\n            'id': tweet_id,\n            'text': legacy.get('full_text', ''),\n            'created_at': legacy.get('created_at', ''),\n            'metrics': {\n                'retweet_count': legacy.get('retweet_count', 0),\n                'favorite_count': legacy.get('favorite_count', 0),\n                'reply_count': legacy.get('reply_count', 0),\n                'view_count': tweet_result.get('views', {}).get('count', 0)\n            }\n        }\n        \n        return tweet_data\n        \n    except Exception as e:\n        self.logger.debug(f\"Error extracting tweet data: {e}\")\n        return None<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">def<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">_extract_tweet_data<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">self<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #C4A7E7; font-style: italic\">tweet_result<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> Dict<\/span><span style=\"color: #908CAA\">)<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">-&gt;<\/span><span style=\"color: #E0DEF4\"> Optional<\/span><span style=\"color: #908CAA\">[<\/span><span style=\"color: #E0DEF4\">Dict<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #9CCFD8\">str<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> Any<\/span><span style=\"color: #908CAA\">&#093;]:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">try<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> tweet_result<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;__typename&#39;<\/span><span style=\"color: #908CAA\">)<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">==<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;TweetWithVisibilityResults&#39;<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            tweet_result <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> tweet_result<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;tweet&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{})<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        legacy <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> tweet_result<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;legacy&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{})<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        tweet_id <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> tweet_result<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;rest_id&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;&#39;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        tweet_data <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #F6C177\">&#39;id&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> tweet_id<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #F6C177\">&#39;text&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> legacy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;full_text&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;&#39;<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #F6C177\">&#39;created_at&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> legacy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;created_at&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;&#39;<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #F6C177\">&#39;metrics&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;retweet_count&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> legacy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;retweet_count&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;favorite_count&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> legacy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;favorite_count&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;reply_count&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> legacy<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;reply_count&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;view_count&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> tweet_result<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;views&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{}).<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;count&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">return<\/span><span style=\"color: #E0DEF4\"> tweet_data<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">except<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #9CCFD8\">Exception<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">as<\/span><span style=\"color: #E0DEF4\"> e<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">debug<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Error extracting tweet data: <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">e<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">return<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">None<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>Every <code>.get()<\/code> has a fallback. If there are missing fields just use the default, if the structure feels unfamiliar just return none, and \u2014 most importantly \u2014 don&#8217;t stop or crash. The scraper will get results even if the structure changes. The devs among you might want to change the labels to match the new structure.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The Screenshot Strategy<\/h3>\n\n\n\n<p>Screenshots are debugging gold. Whenever something breaks, you want to be able see what the page looked like before it said its last words.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>try:\n    await self.page.screenshot(path=f\"error_{username}_{timestamp}.png\")\n    self.logger.error(f\"Screenshot saved: error_{username}_{timestamp}.png\")\nexcept:\n    pass<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">try<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">page<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">screenshot<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">path<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;error_<\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">username<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">_<\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">timestamp<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">.png&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">error<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Screenshot saved: error_<\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">username<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">_<\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">timestamp<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">.png&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">except<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">pass<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>Whenever a login fails or anything unexpected or broken happens, you will have visual evidence to help you debug the issue. The image will be saved as a .png in the root of the project.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Enough to Level a Forest: Logging and Logging and Logging<\/h3>\n\n\n\n<p>You will notice throughout the code that I have a lot of loggers. I believe they help a lot in tracking the progress of the scraping. It&#8217;s comforting to know that if something goes wrong I can go back to the logs and see exactly what happened.<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large centered\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"536\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/pagination-hell-when-just-scroll-down-1024x536.jpg\" alt=\"\" class=\"wp-image-88357\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/pagination-hell-when-just-scroll-down-1024x536.jpg 1024w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/pagination-hell-when-just-scroll-down-300x157.jpg 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/pagination-hell-when-just-scroll-down-768x402.jpg 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/pagination-hell-when-just-scroll-down-600x314.jpg 600w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/10\/pagination-hell-when-just-scroll-down.jpg 1200w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"pagination-hell\">Pagination Hell: When \u201cJust Scroll Down\u201d Becomes a Nightmare<\/h2>\n\n\n\n<p>X doesn&#8217;t have pages. It has an infinite scroll that fights back, rate limits you, randomly stops loading, and occasionally just gives up for no apparent reason. Most people think Twitter\/X pagination is simple: scroll down, wait for more tweets, repeat. If only.<\/p>\n\n\n\n<p><strong>Here\u2019s what actually happens:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Scroll too fast? X stops loading new content (you have been flagged)<\/li>\n\n\n\n<li>Scroll too consistently? X will notice (you have been flagged)<\/li>\n\n\n\n<li>Reach the \u201cbottom\u201d? X might still have more tweets, (or not!) but you&#8217;ll need to reach a certain number of scrolls to know for sure<\/li>\n\n\n\n<li>Scroll for too long? X\u2019s lazy loading will just stop responding<\/li>\n<\/ul>\n\n\n\n<p>These are not bugs. This is X\u2019s intentional design to prevent scraping its platform. Pagination on X isn\u2019t a technical problem, it\u2019s psychological warfare between your scraper and X\u2019s anti-bot measures.<\/p>\n\n\n\n<p>Let me spare you some of the suffering and share some tips that might help you to win.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The Infinite Scrolling Problem: No Pages, Just Chaos<\/h3>\n\n\n\n<p>Pagination on Twitter is not traditional, because you can\u2019t predict what\u2019s next for you. It&#8217;s utter chaos.<\/p>\n\n\n\n<p><strong>Here\u2019s the traditional pagination:<\/strong><\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>Page 1: Tweets 1\u201320\nPage 2: Tweets 21\u201340\nPage 3: Tweets 41\u201360<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #e0def4\">Page 1: Tweets 1\u201320<\/span><\/span>\n<span class=\"line\"><span style=\"color: #e0def4\">Page 2: Tweets 21\u201340<\/span><\/span>\n<span class=\"line\"><span style=\"color: #e0def4\">Page 3: Tweets 41\u201360<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><strong>Here\u2019s X\u2019s infinite scroll<\/strong>:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>Scroll 1: Load 15\u201320 tweets (maybe)\nScroll 2: Load 8 tweets (why fewer?)\nScroll 3: Load 0 tweets (but there's more!)\nScroll 4: Load 22 tweets (now it works again?)<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #e0def4\">Scroll 1: Load 15\u201320 tweets (maybe)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #e0def4\">Scroll 2: Load 8 tweets (why fewer?)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #e0def4\">Scroll 3: Load 0 tweets (but there&#39;s more!)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #e0def4\">Scroll 4: Load 22 tweets (now it works again?)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>That being said, I still needed to find a solution. When do we stop? How do we pickup any new tweets along the way without missing any?<\/p>\n\n\n\n<p>This is the solution that we arrived at that&#8217;s  good enough: it tracks what you&#8217;ve already collected. Detect when nothing is new and know when to stop.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>async def _scroll_timeline(self, resume_from_tweet_id: Optional&#091;str&#093; = None):\n    self.logger.info(\"Starting timeline scroll...\")\n    \n    scroll_attempts = 0\n    self.scroll_attempts_without_new = 0\n    max_scroll_attempts = 5000 \n    max_attempts_without_new = 50  \n    \n    while scroll_attempts &lt; max_scroll_attempts:\n        scroll_attempts += 1\n        tweets_before = len(self.all_tweets)\n        \n        await self.page.evaluate('window.scrollBy(0, window.innerHeight * 0.8)')\n        \n        delay = random.uniform(self.scroll_delay_min, self.scroll_delay_max)\n        await asyncio.sleep(delay)\n        \n        tweets_after = len(self.all_tweets)\n        new_tweets = tweets_after - tweets_before\n        \n        if new_tweets > 0:\n            self.logger.info(f\"Scroll {scroll_attempts}: +{new_tweets} NEW tweets (total: {tweets_after})\")\n            self.scroll_attempts_without_new = 0\n        else:\n            self.scroll_attempts_without_new += 1\n            if self.scroll_attempts_without_new >= max_attempts_without_new:\n                self.logger.info(f\"No new tweets for {max_attempts_without_new} scrolls - stopping\")\n                break\n        \n    self.logger.info(f\"Scrolling completed after {scroll_attempts} attempts\")<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">async<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">def<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">_scroll_timeline<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">self<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #C4A7E7; font-style: italic\">resume_from_tweet_id<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> Optional<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #9CCFD8\">str<\/span><span style=\"color: #908CAA\">&#093;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">None<\/span><span style=\"color: #908CAA\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&quot;Starting timeline scroll...&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    scroll_attempts <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scroll_attempts_without_new <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    max_scroll_attempts <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">5000<\/span><span style=\"color: #E0DEF4\"> <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    max_attempts_without_new <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">50<\/span><span style=\"color: #E0DEF4\">  <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">while<\/span><span style=\"color: #E0DEF4\"> scroll_attempts <\/span><span style=\"color: #3E8FB0\">&lt;<\/span><span style=\"color: #E0DEF4\"> max_scroll_attempts<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        scroll_attempts <\/span><span style=\"color: #3E8FB0\">+=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">1<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        tweets_before <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EB6F92; font-style: italic\">len<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">all_tweets<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">page<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">evaluate<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;window.scrollBy(0, window.innerHeight * 0.8)&#39;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line cbp-see-more-line cbp-see-more-transition\"><span style=\"color: #E0DEF4\">        delay <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> random<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">uniform<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scroll_delay_min<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scroll_delay_max<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> asyncio<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">sleep<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">delay<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        tweets_after <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EB6F92; font-style: italic\">len<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">all_tweets<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        new_tweets <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> tweets_after <\/span><span style=\"color: #3E8FB0\">-<\/span><span style=\"color: #E0DEF4\"> tweets_before<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> new_tweets <\/span><span style=\"color: #3E8FB0\">&gt;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Scroll <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">scroll_attempts<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">: +<\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">new_tweets<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\"> NEW tweets (total: <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">tweets_after<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">)&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scroll_attempts_without_new <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">else<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scroll_attempts_without_new <\/span><span style=\"color: #3E8FB0\">+=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">1<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scroll_attempts_without_new <\/span><span style=\"color: #3E8FB0\">&gt;=<\/span><span style=\"color: #E0DEF4\"> max_attempts_without_new<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;No new tweets for <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">max_attempts_without_new<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\"> scrolls - stopping&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #3E8FB0\">break<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Scrolling completed after <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">scroll_attempts<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\"> attempts&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><div class=\"cbp-see-more-container\" data-see-more-collapse-string=\"Collapse\" data-see-more-string=\"Expand\" style=\"display:flex;flex-direction:column;align-items:flex-end;width:100%;background-color:transparent;font-size:12px;line-height:1;position:relative;margin-bottom:-16px;height:32px\"><span role=\"button\" tabindex=\"0\" class=\"cbp-see-more-simple-btn cbp-see-more-simple-btn-hover\" style=\"color:#cecbee;background-color:#232136;padding:10px 16px;cursor:default\">Expand<\/span><\/div><\/div>\n\n\n\n<p><\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>This method is good, it works. It&#8217;s fast and keeps track of what we&#8217;re scraping. After scrolling 50 times without finding a new tweet, we call it and stop the process.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Randomized Scroll Delays: Acting Human to Avoid Detection<\/h3>\n\n\n\n<p>Bots scroll at perfect intervals, but humans don\u2019t. Let\u2019s do a scroll comparison between a but and a human being.<\/p>\n\n\n\n<p><strong>Bot behavior:<\/strong><\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>await asyncio.sleep(2)\nawait self.page.evaluate('window.scrollBy(0, 800)')\nawait asyncio.sleep(2)\nawait self.page.evaluate('window.scrollBy(0, 800)')<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> asyncio<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">sleep<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #EA9A97\">2<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">page<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">evaluate<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;window.scrollBy(0, 800)&#39;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> asyncio<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">sleep<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #EA9A97\">2<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">page<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">evaluate<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;window.scrollBy(0, 800)&#39;<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>The exact scroll time is instantly recognizable and X it will flag you right away.<\/p>\n\n\n\n<p><strong>Human behavior:<\/strong><\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>self.scroll_delay_min = 3.0\nself.scroll_delay_max = 6.0\n\ndelay = random.uniform(self.scroll_delay_min, self.scroll_delay_max)\nawait asyncio.sleep(delay)<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scroll_delay_min <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">3.0<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scroll_delay_max <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">6.0<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">delay <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> random<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">uniform<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scroll_delay_min<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scroll_delay_max<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> asyncio<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">sleep<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">delay<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>By introducing a bit of variance between our scrolls, X will see a scroll, then 4.7 seconds of nothing, then another scroll. Maybe it&#8217;s 3.2 seconds this time, scroll again and so on.<\/p>\n\n\n\n<p>Why 3\u20136 seconds?<\/p>\n\n\n\n<p>I tested different ranges:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>1\u20132 seconds was too fast and X noticed; we were flagged<\/li>\n\n\n\n<li>2\u20134 seconds was better, but still too inconsistent; X\u2019s lazy loading couldn&#8217;t keep up<\/li>\n\n\n\n<li>3\u20136 seconds was the sweet spot; fast enough to be efficient and slow enough to look human<\/li>\n\n\n\n<li>5\u201310 seconds was too slow<\/li>\n<\/ul>\n\n\n\n<p>It makes sense if you think about it. People rarely scroll consistently, and if you time it, the timing does shake out to be about 3\u20136 seconds.<\/p>\n\n\n\n<p><strong>The implementation:<\/strong><\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>delay = random.uniform(self.scroll_delay_min, self.scroll_delay_max)\nawait asyncio.sleep(delay)\n\n# Real logs from scraping sessions\n# Scroll 1: +12 NEW tweets (delay: 4.7s)\n# Scroll 2: +8 NEW tweets (delay: 3.2s)\n# Scroll 3: +15 NEW tweets (delay: 5.9s)<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #E0DEF4\">delay <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> random<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">uniform<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scroll_delay_min<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scroll_delay_max<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> asyncio<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">sleep<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">delay<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> Real logs from scraping sessions<\/span><\/span>\n<span class=\"line\"><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> Scroll 1: +12 NEW tweets (delay: 4.7s)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> Scroll 2: +8 NEW tweets (delay: 3.2s)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> Scroll 3: +15 NEW tweets (delay: 5.9s)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>And there we have it: human-like scrolling. You could take it a step further and add randomization, but you run the risk of making it <em>less<\/em> human. Real users have mostly fixed patterns with small variations; being so random in scrolling risks that X will notice and flag you.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The \u201c50 Scrolls Without New Content\u201d Rule&nbsp;<\/h3>\n\n\n\n<p>X\u2019s infinite scroll has no end. It just never ends, unless you have been flagged. So how do you know when. stop?<\/p>\n\n\n\n<p><strong>Bad approach:<\/strong><\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>is_at_bottom = await self.page.evaluate('window.scrollY >= document.body.scrollHeight')\nif is_at_bottom:\n    break<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #E0DEF4\">is_at_bottom <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">page<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">evaluate<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;window.scrollY &gt;= document.body.scrollHeight&#39;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> is_at_bottom<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">break<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>This works if you know you will hit a bottom, but you will find one on X.<\/p>\n\n\n\n<p><strong>My approach: The 50-scroll rule<\/strong><\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>self.scroll_attempts_without_new = 0\nmax_attempts_without_new = 50\n\nwhile scroll_attempts &lt; max_scroll_attempts:\n    # ... scroll logic ...\n    \n    if new_tweets > 0:\n        self.scroll_attempts_without_new = 0  # Reset counter\n    else:\n        self.scroll_attempts_without_new += 1  # Increment counter\n        \n        if self.scroll_attempts_without_new >= max_attempts_without_new:\n            self.logger.info(f\"No new tweets for {max_attempts_without_new} scrolls - stopping\")\n            break<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scroll_attempts_without_new <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">max_attempts_without_new <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">50<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">while<\/span><span style=\"color: #E0DEF4\"> scroll_attempts <\/span><span style=\"color: #3E8FB0\">&lt;<\/span><span style=\"color: #E0DEF4\"> max_scroll_attempts<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> ... scroll logic ...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> new_tweets <\/span><span style=\"color: #3E8FB0\">&gt;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scroll_attempts_without_new <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> Reset counter<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">else<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scroll_attempts_without_new <\/span><span style=\"color: #3E8FB0\">+=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">1<\/span><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> Increment counter<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scroll_attempts_without_new <\/span><span style=\"color: #3E8FB0\">&gt;=<\/span><span style=\"color: #E0DEF4\"> max_attempts_without_new<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;No new tweets for <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">max_attempts_without_new<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\"> scrolls - stopping&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #3E8FB0\">break<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><strong>The logic:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Got new tweets? Reset counter to 0<\/li>\n\n\n\n<li>No new tweets? Increment counter<\/li>\n\n\n\n<li>Counter hits 50? Stop scraping<\/li>\n<\/ul>\n\n\n\n<p>I tested a bunch of different thresholds and 50 seemed to work the best. Less than 50 was too aggressive or stopped too early. More than 50 meant we were wasting time. 50 scrolls works out to 3\u20135 minutes of waiting before stopping. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"checkpoint-system\">Creating a Checkpoint System (Because Losing Your Progress Sucks)<\/h2>\n\n\n\n<p>Interruptions are by definition unforeseen. The internet dies, there\u2019s an error during scraping, and suddenly you\u2019ve lost all your data. By implementing a checkpoint system you can save your progress and pick up where you left off.<\/p>\n\n\n\n<p>Alongside checkpoints, our X scraper also has sessions. You\u2019re not necessarily going to be able to grab every single tweet from a specific account all in one go. Sessions let you resume scraping a profile, which needs its own checkpoints. For example:<br><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Session 1: Scraped 800 tweets (Oct 15 -&gt; Sept 1), saved checkpoint<\/li>\n\n\n\n<li>Sessions 2: Resumed from Sept 1, scraped another 800 tweets (Sept 1 -&gt; July 15), the checkpoint is updated<\/li>\n<\/ul>\n\n\n\n<p>Each session starts where the previous one stopped and it\u2019s how we can be sure that we\u2019re tracking every tweet, never losing our progress.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What Gets Saved: The Checkpoint File Format<\/h3>\n\n\n\n<p>The checkpoint file is just a small JSON that contains the following:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>{\n  \"total_tweets\": 795,\n  \"oldest_tweet_id\": \"1962619400537653743\",\n  \"oldest_tweet_date\": \"Mon Sep 01 20:51:43 +0000 2025\",\n  \"newest_tweet_id\": \"1978419586904072698\",\n  \"newest_tweet_date\": \"Wed Oct 15 11:16:01 +0000 2025\",\n  \"session_count\": 2,\n  \"last_session_tweets\": 86,\n  \"username\": \"username\",\n  \"last_updated\": \"2025-10-16T08:17:33.504827\"\n}<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">total_tweets<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">795<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">oldest_tweet_id<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;1962619400537653743&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">oldest_tweet_date<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Mon Sep 01 20:51:43 +0000 2025&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">newest_tweet_id<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;1978419586904072698&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">newest_tweet_date<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Wed Oct 15 11:16:01 +0000 2025&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">session_count<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">2<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">last_session_tweets<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">86<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">username<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;username&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">last_updated<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;2025-10-16T08:17:33.504827&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #908CAA\">}<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>There\u2019s nothing complicated happening here. A few pieces of important information is saved so you can continue a scraping session. The most important one is <code>oldest_tweet_id<\/code>, because that\u2019s where we will start our next session.<\/p>\n\n\n\n<p>How do we create this cool JSON file?<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly># After scraping completes\ncheckpoint_data = {\n    'total_tweets': len(all_tweets),\n    'oldest_tweet_id': all_tweets&#091;-1&#093;&#091;'id'&#093;,\n    'oldest_tweet_date': all_tweets&#091;-1&#093;&#091;'created_at'&#093;,\n    'newest_tweet_id': all_tweets&#091;0&#093;&#091;'id'&#093;,\n    'newest_tweet_date': all_tweets&#091;0&#093;&#091;'created_at'&#093;,\n    'session_count': existing_checkpoint.get('session_count', 0) + 1,\n    'last_session_tweets': len(new_tweets_this_session)\n}\n\nself.checkpoint_manager.save_checkpoint(username, checkpoint_data)<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #908CAA; font-style: italic\">#<\/span><span style=\"color: #6E6A86; font-style: italic\"> After scraping completes<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">checkpoint_data <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #F6C177\">&#39;total_tweets&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EB6F92; font-style: italic\">len<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">all_tweets<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #F6C177\">&#39;oldest_tweet_id&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> all_tweets<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #3E8FB0\">-<\/span><span style=\"color: #EA9A97\">1<\/span><span style=\"color: #908CAA\">&#093;&#091;<\/span><span style=\"color: #F6C177\">&#39;id&#39;<\/span><span style=\"color: #908CAA\">&#093;,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #F6C177\">&#39;oldest_tweet_date&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> all_tweets<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #3E8FB0\">-<\/span><span style=\"color: #EA9A97\">1<\/span><span style=\"color: #908CAA\">&#093;&#091;<\/span><span style=\"color: #F6C177\">&#39;created_at&#39;<\/span><span style=\"color: #908CAA\">&#093;,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #F6C177\">&#39;newest_tweet_id&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> all_tweets<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">&#093;&#091;<\/span><span style=\"color: #F6C177\">&#39;id&#39;<\/span><span style=\"color: #908CAA\">&#093;,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #F6C177\">&#39;newest_tweet_date&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> all_tweets<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">&#093;&#091;<\/span><span style=\"color: #F6C177\">&#39;created_at&#39;<\/span><span style=\"color: #908CAA\">&#093;,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #F6C177\">&#39;session_count&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> existing_checkpoint<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;session_count&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">)<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">+<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">1<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #F6C177\">&#39;last_session_tweets&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EB6F92; font-style: italic\">len<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">new_tweets_this_session<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">checkpoint_manager<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">save_checkpoint<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">username<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> checkpoint_data<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>X\u2019s tweet IDs are chronological, i.e. the newer the tweet, the higher the ID number.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>all_tweets[0] = Newest tweet (highest ID)<\/li>\n\n\n\n<li>all_tweets[-1] = Oldest tweet (lowest ID)<\/li>\n<\/ul>\n\n\n\n<p>Thus, the oldest tweet becomes your resume point.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Resume Flow: Picking Up Where You Left Off<\/h3>\n\n\n\n<p>We created a command <code>\u2013resume<\/code> that helps you resume scraping from where you stopped. Here is an example of the command:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>python main.py user -u username --resume<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #EA9A97\">python<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">main.py<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">user<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">-u<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">username<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">--resume<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><strong>Step 1: Load the checkpoint<\/strong><\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>if resume:\n    checkpoint = self.checkpoint_manager.load_checkpoint(username)\n    if checkpoint:\n        existing_tweets = self.checkpoint_manager.load_existing_tweets(username)\n        resume_from_tweet_id = checkpoint.get('oldest_tweet_id')\n        \n        self.logger.info(f\"Resuming from checkpoint with {len(existing_tweets)} existing tweets\")\n        self.logger.info(f\"   Will continue from tweet: {resume_from_tweet_id}\")\n    else:\n        self.logger.info(f\"No checkpoint found for @{username}, starting fresh\")<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> resume<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    checkpoint <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">checkpoint_manager<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">load_checkpoint<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">username<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> checkpoint<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        existing_tweets <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">checkpoint_manager<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">load_existing_tweets<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">username<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        resume_from_tweet_id <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> checkpoint<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;oldest_tweet_id&#39;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Resuming from checkpoint with <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #EB6F92; font-style: italic\">len<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">existing_tweets<\/span><span style=\"color: #908CAA\">)<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\"> existing tweets&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;   Will continue from tweet: <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">resume_from_tweet_id<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">else<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;No checkpoint found for @<\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">username<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">, starting fresh&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>The console will show you that it will continue from the last tweet ID and provide you with any other information.<\/p>\n\n\n\n<p><strong>Step 2: Pass the resume point to the scraper<\/strong><\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>result = await self.playwright_scraper.scrape_user_tweets(\n    username=username,\n    resume_from_tweet_id=resume_from_tweet_id \n)<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #E0DEF4\">result <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">playwright_scraper<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scrape_user_tweets<\/span><span style=\"color: #908CAA\">(<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #C4A7E7; font-style: italic\">username<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\">username<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #C4A7E7; font-style: italic\">resume_from_tweet_id<\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\">resume_from_tweet_id <\/span><\/span>\n<span class=\"line\"><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>Here the scraper will scroll until it finds the specific ID we stopped at last session.<\/p>\n\n\n\n<p><strong>Step 3: Merge old and new tweets<\/strong><\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>all_tweets = self.checkpoint_manager.merge_tweets(\n    existing_tweets,  \n    result&#091;'tweets'&#093;\n)\n\nself.logger.info(f\"Merged: {len(existing_tweets)} existing + {len(result&#091;'tweets'&#093;)} new = {len(all_tweets)} total\")<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #E0DEF4\">all_tweets <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">checkpoint_manager<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">merge_tweets<\/span><span style=\"color: #908CAA\">(<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    existing_tweets<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\">  <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    result<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #F6C177\">&#39;tweets&#39;<\/span><span style=\"color: #908CAA\">&#093;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Merged: <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #EB6F92; font-style: italic\">len<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">existing_tweets<\/span><span style=\"color: #908CAA\">)<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\"> existing + <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #EB6F92; font-style: italic\">len<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">result<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #F6C177\">&#39;tweets&#39;<\/span><span style=\"color: #908CAA\">&#093;)<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\"> new = <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #EB6F92; font-style: italic\">len<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">all_tweets<\/span><span style=\"color: #908CAA\">)<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\"> total&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><strong>Step 4: Update the checkpoint<\/strong><\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>new_checkpoint_data = {\n    'total_tweets': len(all_tweets),\n    'oldest_tweet_id': all_tweets&#091;-1&#093;&#091;'id'&#093;,\n    'oldest_tweet_date': all_tweets&#091;-1&#093;&#091;'created_at'&#093;,\n    'session_count': checkpoint.get('session_count', 0) + 1,\n    'last_session_tweets': len(result&#091;'tweets'&#093;)\n}\n\nself.checkpoint_manager.save_checkpoint(username, new_checkpoint_data)<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #E0DEF4\">new_checkpoint_data <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #F6C177\">&#39;total_tweets&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EB6F92; font-style: italic\">len<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">all_tweets<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #F6C177\">&#39;oldest_tweet_id&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> all_tweets<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #3E8FB0\">-<\/span><span style=\"color: #EA9A97\">1<\/span><span style=\"color: #908CAA\">&#093;&#091;<\/span><span style=\"color: #F6C177\">&#39;id&#39;<\/span><span style=\"color: #908CAA\">&#093;,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #F6C177\">&#39;oldest_tweet_date&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> all_tweets<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #3E8FB0\">-<\/span><span style=\"color: #EA9A97\">1<\/span><span style=\"color: #908CAA\">&#093;&#091;<\/span><span style=\"color: #F6C177\">&#39;created_at&#39;<\/span><span style=\"color: #908CAA\">&#093;,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #F6C177\">&#39;session_count&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> checkpoint<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;session_count&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">)<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">+<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">1<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #F6C177\">&#39;last_session_tweets&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EB6F92; font-style: italic\">len<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">result<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #F6C177\">&#39;tweets&#39;<\/span><span style=\"color: #908CAA\">&#093;)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">checkpoint_manager<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">save_checkpoint<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">username<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> new_checkpoint_data<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>Now the checkpoint points to the oldest tweet from this combined dataset. The next session will resume from there.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Finding the Resume Point: The Needle in the Haystack<\/h3>\n\n\n\n<p>X doesn&#8217;t let you jump to a specific tweet, so starting where your last session stopped might be the most difficult part of this process.<\/p>\n\n\n\n<p>You can\u2019t tell X to take you to tweet ID 321321312321, but with the checkpoint we can go back to the tweet where we stopped and continue from there.<\/p>\n\n\n\n<p>Here&#8217;s how we did it:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>async def _scroll_timeline(self, resume_from_tweet_id: Optional&#091;str&#093; = None):\n    scroll_attempts = 0\n    resume_point_found = False if resume_from_tweet_id else True\n    \n    while scroll_attempts &lt; max_scroll_attempts:\n        scroll_attempts += 1\n        tweets_before = len(self.all_tweets)\n        \n        await self.page.evaluate('window.scrollBy(0, window.innerHeight * 0.8)')\n        delay = random.uniform(self.scroll_delay_min, self.scroll_delay_max)\n        await asyncio.sleep(delay)\n        \n        tweets_after = len(self.all_tweets)\n        new_tweets = tweets_after - tweets_before\n        \n        if resume_from_tweet_id and not resume_point_found:\n            for tweet in self.all_tweets:\n                if tweet.get('id') == resume_from_tweet_id:\n                    resume_point_found = True\n                    self.logger.info(f\"Found resume point at tweet {resume_from_tweet_id}!\")\n                    self.logger.info(f\"   Clearing {len(self.all_tweets)} duplicate tweets...\")\n                    \n                    self.all_tweets.clear()\n                    self.scraped_tweet_ids.clear()\n                    break\n        \n        if new_tweets > 0:\n            if not resume_point_found:\n                self.logger.info(f\"Scrolling to resume point... ({tweets_after} tweets checked)\")\n            else:\n                self.logger.info(f\"Scroll {scroll_attempts}: +{new_tweets} NEW tweets (total: {tweets_after})\")\n            self.scroll_attempts_without_new = 0\n        else:\n            self.scroll_attempts_without_new += 1\n            if not resume_point_found and self.scroll_attempts_without_new >= 100:\n                self.logger.warning(f\"Scrolled 100 times without finding resume point - might not exist\")\n                break\n            elif resume_point_found and self.scroll_attempts_without_new >= 50:\n                self.logger.info(f\"No new tweets for 50 scrolls - stopping\")\n                break<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">async<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">def<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">_scroll_timeline<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">self<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #C4A7E7; font-style: italic\">resume_from_tweet_id<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> Optional<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #9CCFD8\">str<\/span><span style=\"color: #908CAA\">&#093;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">None<\/span><span style=\"color: #908CAA\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    scroll_attempts <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    resume_point_found <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">False<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> resume_from_tweet_id <\/span><span style=\"color: #3E8FB0\">else<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">True<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">while<\/span><span style=\"color: #E0DEF4\"> scroll_attempts <\/span><span style=\"color: #3E8FB0\">&lt;<\/span><span style=\"color: #E0DEF4\"> max_scroll_attempts<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        scroll_attempts <\/span><span style=\"color: #3E8FB0\">+=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">1<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        tweets_before <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EB6F92; font-style: italic\">len<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">all_tweets<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">page<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">evaluate<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;window.scrollBy(0, window.innerHeight * 0.8)&#39;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        delay <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> random<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">uniform<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scroll_delay_min<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scroll_delay_max<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">await<\/span><span style=\"color: #E0DEF4\"> asyncio<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">sleep<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">delay<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        tweets_after <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EB6F92; font-style: italic\">len<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">all_tweets<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        new_tweets <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> tweets_after <\/span><span style=\"color: #3E8FB0\">-<\/span><span style=\"color: #E0DEF4\"> tweets_before<\/span><\/span>\n<span class=\"line cbp-see-more-line cbp-see-more-transition\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> resume_from_tweet_id <\/span><span style=\"color: #3E8FB0\">and<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">not<\/span><span style=\"color: #E0DEF4\"> resume_point_found<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #3E8FB0\">for<\/span><span style=\"color: #E0DEF4\"> tweet <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">all_tweets<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> tweet<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;id&#39;<\/span><span style=\"color: #908CAA\">)<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">==<\/span><span style=\"color: #E0DEF4\"> resume_from_tweet_id<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    resume_point_found <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">True<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Found resume point at tweet <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">resume_from_tweet_id<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">!&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;   Clearing <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #EB6F92; font-style: italic\">len<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">all_tweets<\/span><span style=\"color: #908CAA\">)<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\"> duplicate tweets...&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">all_tweets<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">clear<\/span><span style=\"color: #908CAA\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scraped_tweet_ids<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">clear<\/span><span style=\"color: #908CAA\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                    <\/span><span style=\"color: #3E8FB0\">break<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> new_tweets <\/span><span style=\"color: #3E8FB0\">&gt;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">not<\/span><span style=\"color: #E0DEF4\"> resume_point_found<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Scrolling to resume point... (<\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">tweets_after<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\"> tweets checked)&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #3E8FB0\">else<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Scroll <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">scroll_attempts<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">: +<\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">new_tweets<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\"> NEW tweets (total: <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">tweets_after<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">)&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scroll_attempts_without_new <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">else<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scroll_attempts_without_new <\/span><span style=\"color: #3E8FB0\">+=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">1<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">not<\/span><span style=\"color: #E0DEF4\"> resume_point_found <\/span><span style=\"color: #3E8FB0\">and<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scroll_attempts_without_new <\/span><span style=\"color: #3E8FB0\">&gt;=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">100<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">warning<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Scrolled 100 times without finding resume point - might not exist&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #3E8FB0\">break<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #3E8FB0\">elif<\/span><span style=\"color: #E0DEF4\"> resume_point_found <\/span><span style=\"color: #3E8FB0\">and<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">scroll_attempts_without_new <\/span><span style=\"color: #3E8FB0\">&gt;=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">50<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #E0DEF4; font-style: italic\">self<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">logger<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">info<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;No new tweets for 50 scrolls - stopping&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #3E8FB0\">break<\/span><\/span><\/code><\/pre><div class=\"cbp-see-more-container\" data-see-more-collapse-string=\"Collapse\" data-see-more-string=\"Expand\" style=\"display:flex;flex-direction:column;align-items:flex-end;width:100%;background-color:transparent;font-size:12px;line-height:1;position:relative;margin-bottom:-16px;height:32px\"><span role=\"button\" tabindex=\"0\" class=\"cbp-see-more-simple-btn cbp-see-more-simple-btn-hover\" style=\"color:#cecbee;background-color:#232136;padding:10px 16px;cursor:default\">Expand<\/span><\/div><\/div>\n\n\n\n<p><\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>The code runs based on the command <code>\u2013resume<\/code>. First we check if we have a checkpoint. If we do, we grab the ID and scroll until we find the tweet. If there are (new) tweets we missed last time, we save them after testing for duplicates.<\/p>\n\n\n\n<p>If we don&#8217;t find the old tweet we&#8217;re looking for, the code fails by itself gracefully, it\u2019s better than waiting around for nothing.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"ai-integration\">AI Analysis Integration (Making Sense of your Data)<\/h2>\n\n\n\n<p>You have collected enough tweets. For the sake of argument, let\u2019s say you\u2019ve scraped 3,000\u20134,000 tweets. Perfect. Now you have thousands of JSON lines waiting for you. Are you going to manually read them? You will go insane and working your way through even one of them will take forever. Web scraping is hard but making sense of the data is even harder.<\/p>\n\n\n\n<p>Most scrapers just stop at data collection. They dump JSON files and call it a day. You are left with raw data and no insight. \u201cWell that could be useful\u2026\u201d \u2014 no it isn\u2019t, and as a software developer with a product\/analytical mindset, I always want to know more about the data I\u2019m collecting.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Do the tweets suggest an average sentiment? Are some people especially upset or happy about something?<\/li>\n\n\n\n<li>What topics do people most talk about?<\/li>\n\n\n\n<li>Which content gets the most engagement?<\/li>\n\n\n\n<li>Are there trending patterns over time?<\/li>\n<\/ul>\n\n\n\n<p>So it&#8217;s nice to make sense of the data you have by answering these questions. It will certainly make a lot more sense than staring at tens of thousands of lines of JSON.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Handling Data Overload: You Scraped It, Now What?<\/h3>\n\n\n\n<p>Scraping just 800 tweets will leave you with more than 200,000 words of text. That\u2019s roughly a 400-page book, which seems like a silly amount of reading to do to get a general idea of \u201cHow do people feel about this topic on average?\u201d<\/p>\n\n\n\n<p>I might be the type to do that, to be honest, but normal people would consider that a waste of time. That\u2019s where AI comes in. It reads the tweets and analyses them to give you a better sense of the data you have.<\/p>\n\n\n\n<p><strong>Before AI:<\/strong><\/p>\n\n\n\n<p>You just have the JSON files, you open them one by one to look at the data and make sense of it, which will take a lot of time and effort.<\/p>\n\n\n\n<p><strong>After AI:<\/strong><\/p>\n\n\n\n<p>You can use a single command at the start of your scraping:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>python main.py user -u username --analyze<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #EA9A97\">python<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">main.py<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">user<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">-u<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">username<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">--analyze<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>The output will be a JSON files that has everything you want:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>{\n  \"sentiment\": {\n    \"overall_sentiment\": {\n      \"positive\": 61,\n      \"negative\": 21,\n      \"neutral\": 17\n    },\n    \"insights\": \"Predominantly positive sentiment around transfer news...\"\n  },\n  \"topics\": {\n    \"top_topics\": &#091;\n      {\"topic\": \"Transfer News\", \"frequency\": 0.42},\n      {\"topic\": \"Contract Extensions\", \"frequency\": 0.28}\n    &#093;\n  }\n}<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">sentiment<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">overall_sentiment<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">positive<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">61<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">negative<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">21<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">neutral<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">17<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">insights<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Predominantly positive sentiment around transfer news...&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">topics<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">top_topics<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">&#091;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      <\/span><span style=\"color: #908CAA\">{<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">topic<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Transfer News&quot;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">frequency<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0.42<\/span><span style=\"color: #908CAA\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      <\/span><span style=\"color: #908CAA\">{<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">topic<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Contract Extensions&quot;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">frequency<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0.28<\/span><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&#093;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #908CAA\">}<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>We\u2019ve just saved ourselves tons of time and the output will probably be more accurate than if a human had read it, given that AI can have all the tweets as context for the prompt.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Seven Analysis Types That Actually Matter<\/h3>\n\n\n\n<p>So we didn\u2019t just plug in ChatGPT and tell it to &#8220;analyze this\u201d. Instead, we built seven specific analysis types, each answering different questions:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Sentiment analysis: <\/strong>Mainly used for brand monitoring, public opinion tracking, etc.<\/li>\n\n\n\n<li><strong>Topic analysis:<\/strong> Content strategy, trend identification<\/li>\n\n\n\n<li><strong>Summary generation:<\/strong> Quick briefings, stakeholder reports<\/li>\n\n\n\n<li><strong>Classification: <\/strong>Helps categorize the data by topic, i.e. news, opinion, personal, etc.<\/li>\n\n\n\n<li><strong>Entity extraction:<\/strong> Competitive intelligence, relationships mapping<\/li>\n\n\n\n<li><strong>Trend analysis:<\/strong> Predictive insights, content timing optimization<\/li>\n\n\n\n<li><strong>Engagement analysis:<\/strong> Content optimization, social media strategy<\/li>\n<\/ol>\n\n\n\n<p>Each type of analysis answers a specific question. You\u2019re not getting generic answers  but structured, actionable insight.<\/p>\n\n\n\n<p>Let&#8217;s go back to our friend Fabrizio Romano to test this on a real-world example.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>{\n  \"tweet_count\": 795,\n  \"analyses\": {\n    \"sentiment\": {\n      \"overall_sentiment\": {\n        \"positive\": 61,\n        \"negative\": 21,\n        \"neutral\": 17\n      },\n      \"individual_sentiments\": &#091;\n        {\n          \"tweet_index\": 2,\n          \"sentiment\": \"positive\",\n          \"confidence\": 0.9,\n          \"reasoning\": \"Breaking news with heart emoji suggests positive sentiment.\"\n        },\n        {\n          \"tweet_index\": 7,\n          \"sentiment\": \"negative\",\n          \"confidence\": 0.95,\n          \"reasoning\": \"Injury context and warning emoji convey negative sentiment.\"\n        }\n      &#093;\n    },\n    \"topics\": {\n      \"top_topics\": [\n        {\n          \"topic\": \"Transfer News\",\n          \"frequency\": 0.42,\n          \"keywords\": &#091;\"here we go\", \"confirmed\", \"deal\"&#093;,\n          \"category\": \"Sports\/Football\"\n        },\n        {\n          \"topic\": \"Contract Extensions\",\n          \"frequency\": 0.28,\n          \"keywords\": &#091;\"renewed\", \"extends\", \"stays\"&#093;\n        }\n      ]\n    }\n  }\n}<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">tweet_count<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">795<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">analyses<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">sentiment<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">overall_sentiment<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">positive<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">61<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">negative<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">21<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">neutral<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">17<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      <\/span><span style=\"color: #908CAA\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">individual_sentiments<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">&#091;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">tweet_index<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">2<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">sentiment<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;positive&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">confidence<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0.9<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line cbp-see-more-line cbp-see-more-transition\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">reasoning<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Breaking news with heart emoji suggests positive sentiment.&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">tweet_index<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">7<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">sentiment<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;negative&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">confidence<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0.95<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">reasoning<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Injury context and warning emoji convey negative sentiment.&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      <\/span><span style=\"color: #908CAA\">&#093;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">topics<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">top_topics<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">[<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">topic<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Transfer News&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">frequency<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0.42<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">keywords<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #F6C177\">&quot;here we go&quot;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;confirmed&quot;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;deal&quot;<\/span><span style=\"color: #908CAA\">&#093;,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">category<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Sports\/Football&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">topic<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Contract Extensions&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">frequency<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0.28<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">          <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">keywords<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #F6C177\">&quot;renewed&quot;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;extends&quot;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;stays&quot;<\/span><span style=\"color: #908CAA\">&#093;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      <\/span><span style=\"color: #908CAA\">]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #908CAA\">}<\/span><\/span><\/code><\/pre><div class=\"cbp-see-more-container\" data-see-more-collapse-string=\"Collapse\" data-see-more-string=\"Expand\" style=\"display:flex;flex-direction:column;align-items:flex-end;width:100%;background-color:transparent;font-size:12px;line-height:1;position:relative;margin-bottom:-16px;height:32px\"><span role=\"button\" tabindex=\"0\" class=\"cbp-see-more-simple-btn cbp-see-more-simple-btn-hover\" style=\"color:#cecbee;background-color:#232136;padding:10px 16px;cursor:default\">Expand<\/span><\/div><\/div>\n\n\n\n<p><\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>You immediately know:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>61% of the tweets are positive (transfer excitement)<\/li>\n\n\n\n<li>21% are negative (injuries, failed deals)<\/li>\n\n\n\n<li>Top topic is transfer news (42% of the content)<\/li>\n\n\n\n<li>\u201cHere we go\u201d is a signature phrase&nbsp;<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Token Optimization and Smart Batching&nbsp;<\/h3>\n\n\n\n<p>Adding AI is all well and good, but we need to know whether something&#8217;s gone wrong, like blowing through a bunch of money accidentally. I introduced a batching system to address this. This means that data would be sent in groups, not all at once, since OpenAI charges based on tokens used, and we only want to send information that&#8217;s actually important to us.<\/p>\n\n\n\n<p><strong>Full tweet object:<\/strong><\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>{\n  \"id\": \"1978419586904072698\",\n  \"text\": \"\ud83d\udea8\u26a0\ufe0f Breaking transfer news...\",\n  \"full_text\": \"\ud83d\udea8\u26a0\ufe0f Breaking transfer news...\",\n  \"created_at\": \"Wed Oct 15 11:16:01 +0000 2025\",\n  \"user\": {\n    \"id\": \"330262748\",\n    \"username\": \"FabrizioRomano\",\n    \"display_name\": \"Fabrizio Romano\",\n    \"followers_count\": 26479397,\n    \"following_count\": 2649,\n    \"verified\": true,\n    \"profile_image_url\": \"https:\/\/...\",\n    \"description\": \"...\"\n  },\n  \"metrics\": {...},\n  \"media\": &#091;...&#093;,\n  \"urls\": &#091;...&#093;,\n  \"hashtags\": &#091;...&#093;,\n  \"scraped_at\": 1729012345\n}<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">id<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;1978419586904072698&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">text<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;\ud83d\udea8\u26a0\ufe0f Breaking transfer news...&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">full_text<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;\ud83d\udea8\u26a0\ufe0f Breaking transfer news...&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">created_at<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Wed Oct 15 11:16:01 +0000 2025&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">user<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">id<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;330262748&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">username<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;FabrizioRomano&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">display_name<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Fabrizio Romano&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">followers_count<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">26479397<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">following_count<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">2649<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">verified<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">true<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">profile_image_url<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;https:\/\/...&quot;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">description<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;...&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">metrics<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><span style=\"color: #EB6F92\">...<\/span><span style=\"color: #908CAA\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">media<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #EB6F92\">...<\/span><span style=\"color: #908CAA\">&#093;,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">urls<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #EB6F92\">...<\/span><span style=\"color: #908CAA\">&#093;,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">hashtags<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #EB6F92\">...<\/span><span style=\"color: #908CAA\">&#093;,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #9CCFD8\">scraped_at<\/span><span style=\"color: #908CAA\">&quot;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">1729012345<\/span><\/span>\n<span class=\"line\"><span style=\"color: #908CAA\">}<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>In the JSON above, there\u2019s a lot of information you\u2019re sending to the LLM that doesn\u2019t really matter, so it\u2019s cheaper to send only the required data.<\/p>\n\n\n\n<p><strong>The solution: Extract only what matters<\/strong><\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>def _extract_essential_tweet_data(self, tweets: List[Dict&#091;str, Any&#093;]) -> Dict&#091;str, Any&#093;:\n    essential_data = {\n        'texts': [],\n        'engagement_metrics': [],\n        'metadata': []\n    }\n    \n    for tweet in tweets:\n        text = tweet.get('text', '').strip()\n        if text:\n            essential_data&#091;'texts'&#093;.append(text)\n            \n            metrics = tweet.get('metrics', {})\n            essential_data&#091;'engagement_metrics'&#093;.append({\n                'retweet_count': metrics.get('retweet_count', 0),\n                'favorite_count': metrics.get('favorite_count', 0),\n                'reply_count': metrics.get('reply_count', 0),\n                'view_count': metrics.get('view_count', '0')\n            })\n            \n            essential_data&#091;'metadata'&#093;.append({\n                'created_at': tweet.get('created_at', ''),\n                'has_media': len(tweet.get('media', [])) > 0,\n                'hashtags': tweet.get('hashtags', []),\n                'is_reply': tweet.get('is_reply', False)\n            })\n    \n    return essential_data<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">def<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">_extract_essential_tweet_data<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7; font-style: italic\">self<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #C4A7E7; font-style: italic\">tweets<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> List<\/span><span style=\"color: #908CAA\">[<\/span><span style=\"color: #E0DEF4\">Dict<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #9CCFD8\">str<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> Any<\/span><span style=\"color: #908CAA\">&#093;])<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">-&gt;<\/span><span style=\"color: #E0DEF4\"> Dict<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #9CCFD8\">str<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> Any<\/span><span style=\"color: #908CAA\">&#093;:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    essential_data <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #F6C177\">&#39;texts&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">[],<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #F6C177\">&#39;engagement_metrics&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">[],<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #F6C177\">&#39;metadata&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">[]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">for<\/span><span style=\"color: #E0DEF4\"> tweet <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> tweets<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        text <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> tweet<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;text&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;&#39;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">strip<\/span><span style=\"color: #908CAA\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">        <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> text<\/span><span style=\"color: #908CAA\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            essential_data<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #F6C177\">&#39;texts&#39;<\/span><span style=\"color: #908CAA\">&#093;.<\/span><span style=\"color: #E0DEF4\">append<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">text<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            metrics <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> tweet<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;metrics&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{})<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            essential_data<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #F6C177\">&#39;engagement_metrics&#39;<\/span><span style=\"color: #908CAA\">&#093;.<\/span><span style=\"color: #E0DEF4\">append<\/span><span style=\"color: #908CAA\">({<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;retweet_count&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> metrics<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;retweet_count&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;favorite_count&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> metrics<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;favorite_count&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;reply_count&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> metrics<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;reply_count&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;view_count&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> metrics<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;view_count&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;0&#39;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #908CAA\">})<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            essential_data<\/span><span style=\"color: #908CAA\">&#091;<\/span><span style=\"color: #F6C177\">&#39;metadata&#39;<\/span><span style=\"color: #908CAA\">&#093;.<\/span><span style=\"color: #E0DEF4\">append<\/span><span style=\"color: #908CAA\">({<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;created_at&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> tweet<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;created_at&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#39;&#39;<\/span><span style=\"color: #908CAA\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;has_media&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EB6F92; font-style: italic\">len<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">tweet<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;media&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">[]))<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">&gt;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">0<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;hashtags&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> tweet<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;hashtags&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">[]),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">                <\/span><span style=\"color: #F6C177\">&#39;is_reply&#39;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> tweet<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;is_reply&#39;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">False<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">            <\/span><span style=\"color: #908CAA\">})<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">return<\/span><span style=\"color: #E0DEF4\"> essential_data<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>Now we\u2019re sending:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Tweet text (needed for analysis)<\/li>\n\n\n\n<li>Engagement metrics (needed for engagement analysis)<\/li>\n\n\n\n<li>Minimal metadata (dates, flags)<\/li>\n<\/ul>\n\n\n\n<p>And by doing that we reduced the size of the JSON files to ~75-80%, as well as we are trying to do a batching system to not send the data once, but we send them in batches so we have control on the token size.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Structured Prompts&nbsp;<\/h3>\n\n\n\n<p>Having a good prompt is essential if you want to get good results. Prompts are a science in themselves, but here\u2019s an example of a bad prompt:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>prompt = f\"Analyze the sentiment of these tweets: {tweets}\"<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #E0DEF4\">prompt <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;Analyze the sentiment of these tweets: <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">tweets<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">&quot;<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>A prompt like this will give you inconsistent results because you\u2019re not being specific enough about what you want.&nbsp;<\/p>\n\n\n\n<p>By contrast, this is an example of a good prompt.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>prompt = f\"\"\"\nAnalyze the sentiment of the following {len(tweets)} tweets.\n\nProvide:\n1. Overall sentiment distribution (positive, negative, neutral percentages)\n2. Individual tweet sentiments with confidence scores\n3. Key emotional themes and patterns\n\nRespond in JSON format with the following structure:\n{{\n    \"overall_sentiment\": {{\n        \"positive\": percentage,\n        \"negative\": percentage,\n        \"neutral\": percentage\n    }},\n    \"individual_sentiments\": &#091;\n        {{\"tweet_index\": 1, \"sentiment\": \"positive\", \"confidence\": 0.85, \"reasoning\": \"explanation\"}}\n    &#093;,\n    \"emotional_themes\": &#091;\"theme1\", \"theme2\"&#093;,\n    \"insights\": \"Overall sentiment analysis insights\"\n}}\n\nTweets:\n{chr(10).join(&#091;f\"{i+1}. {text}\" for i, text in enumerate(tweets)&#093;)}\n\"\"\"<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #E0DEF4\">prompt <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;&quot;&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F6C177\">Analyze the sentiment of the following <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #EB6F92; font-style: italic\">len<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">tweets<\/span><span style=\"color: #908CAA\">)<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\"> tweets.<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #F6C177\">Provide:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F6C177\">1. Overall sentiment distribution (positive, negative, neutral percentages)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F6C177\">2. Individual tweet sentiments with confidence scores<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F6C177\">3. Key emotional themes and patterns<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #F6C177\">Respond in JSON format with the following structure:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">{{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F6C177\">    &quot;overall_sentiment&quot;: <\/span><span style=\"color: #3E8FB0\">{{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F6C177\">        &quot;positive&quot;: percentage,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F6C177\">        &quot;negative&quot;: percentage,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F6C177\">        &quot;neutral&quot;: percentage<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F6C177\">    <\/span><span style=\"color: #3E8FB0\">}}<\/span><span style=\"color: #F6C177\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F6C177\">    &quot;individual_sentiments&quot;: &#091;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F6C177\">        <\/span><span style=\"color: #3E8FB0\">{{<\/span><span style=\"color: #F6C177\">&quot;tweet_index&quot;: 1, &quot;sentiment&quot;: &quot;positive&quot;, &quot;confidence&quot;: 0.85, &quot;reasoning&quot;: &quot;explanation&quot;<\/span><span style=\"color: #3E8FB0\">}}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F6C177\">    &#093;,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F6C177\">    &quot;emotional_themes&quot;: &#091;&quot;theme1&quot;, &quot;theme2&quot;&#093;,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F6C177\">    &quot;insights&quot;: &quot;Overall sentiment analysis insights&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">}}<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #F6C177\">Tweets:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #EB6F92; font-style: italic\">chr<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #EA9A97\">10<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">join<\/span><span style=\"color: #908CAA\">(&#091;<\/span><span style=\"color: #3E8FB0\">f<\/span><span style=\"color: #F6C177\">&quot;<\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">i<\/span><span style=\"color: #3E8FB0\">+<\/span><span style=\"color: #EA9A97\">1<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">. <\/span><span style=\"color: #3E8FB0\">{<\/span><span style=\"color: #E0DEF4\">text<\/span><span style=\"color: #3E8FB0\">}<\/span><span style=\"color: #F6C177\">&quot;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">for<\/span><span style=\"color: #E0DEF4\"> i<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> text <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EB6F92; font-style: italic\">enumerate<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">tweets<\/span><span style=\"color: #908CAA\">)&#093;)<\/span><span style=\"color: #3E8FB0\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F6C177\">&quot;&quot;&quot;<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>This will give you consistent results. You\u2019re telling the AI specifically what you want it to do and how you want the data structured. From here, you can change the prompt to suit your needs. There\u2019s a class in the code called AnalysisPrompts that has all the prompts you want.<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"conclusion\">Conclusion<\/h2>\n\n\n\n<p>Scraping Twitter\/X isn\u2019t straightforward. The platform has strict rate limits and strong bot detection \u2014 a clear attempt to prevent web scraping. It\u2019s easy enough to build a system that collects data, but it\u2019s much harder to build a system that can handle a variety of different workloads.<\/p>\n\n\n\n<p><strong>Key takeaways:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use Playwright, not Selenium: network interception beats HTML parsing. X\u2019s UI changes on a weekly basis.<\/li>\n\n\n\n<li>Intercept GraphQL responses: Stop parsing HTML. Capture the JSON X\u2019s front-end already fetched.<\/li>\n\n\n\n<li>Save cookies, avoid re-authentication: Login once, reuse sessions for weeks.<\/li>\n\n\n\n<li>Randomize everything: Scroll delays (3\u20136 seconds), timing patterns, human-like behavior.<\/li>\n\n\n\n<li>Implement checkpoints: To not lose progress of your sessions, always save them.<\/li>\n\n\n\n<li>Use proxies from day one: Auto-rotating mobile proxies is super important.<\/li>\n\n\n\n<li>Start simple, scale smart: Don\u2019t go crazy from the first try, start step by step then scale from there.<\/li>\n<\/ul>\n\n\n\n<p>The difference between a scraper that gets 100 tweets and 10,000+ tweets is focusing on resilience over perfection. The goal is not to build the most complex and advanced scraper, but to build a scraper that\u2019s good enough to get the job done.<\/p>\n\n\n\n<p>The scraper isn\u2019t perfect, it\u2019s just good enough for the job to bypass X\u2019s detection and tries to collect tweets as it can. We built this project to be resilient and applicable to real-world use cases. <strong>You can find the <a href=\"https:\/\/github.com\/proxidize\/x-scraper\" target=\"_blank\" data-type=\"link\" data-id=\"https:\/\/github.com\/proxidize\/x-scraper\" rel=\"noreferrer noopener\">X scraper<\/a> repo here.<\/strong><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"faq\">Frequently Asked Questions<\/h2>\n\n\n<div id=\"rank-math-faq\" class=\"rank-math-block\">\n<div class=\"rank-math-list \">\n<div id=\"faq-question-1761831632897\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">Do I need proxies to scrape Twitter\/X?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>For small projects (&lt; 500 tweets), you might not need them. For anything serious, or. scale, you absolutely need proxies. X can track your IP and block you if you scrape too many tweets at a time.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1761831792706\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">Will Twitter\/X ban me for web scraping?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>If you scrape like a bot, you will be banned. If you scrape like a human, probably not. X can see the patterns in your scraper&#8217;s behavior and connect the dots. If it sees something very suspicious it will block you right away.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1761831858386\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">Do I need to provide my X credentials?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Yes. The scraper logs into your account to access the timelines. You need to provide a username and password to start scraping.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1761831901763\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">What happens when my cookies expire?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>The scraper detects expired cookies automatically and re-authenticates again.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1761831932690\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">Can I contribute to this project?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Yes. It&#8217;s an open-source project.<\/p>\n\n<\/div>\n<\/div>\n<\/div>\n<\/div>","protected":false},"author":8854,"featured_media":87994,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","format":"standard","categories":[],"tags":[],"class_list":["post-87991","blog","type-blog","status-publish","format-standard","has-post-thumbnail","hentry"],"acf":[],"_links":{"self":[{"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/blog\/87991","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/types\/blog"}],"author":[{"embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/users\/8854"}],"replies":[{"embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/comments?post=87991"}],"version-history":[{"count":21,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/blog\/87991\/revisions"}],"predecessor-version":[{"id":94434,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/blog\/87991\/revisions\/94434"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/media\/87994"}],"wp:attachment":[{"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/media?parent=87991"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/categories?post=87991"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/tags?post=87991"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}