{"id":61516,"date":"2024-12-04T14:25:09","date_gmt":"2024-12-04T14:25:09","guid":{"rendered":"https:\/\/proxidize.com\/?post_type=blog&#038;p=61516"},"modified":"2025-10-02T12:11:51","modified_gmt":"2025-10-02T11:11:51","slug":"web-scraping-with-ruby","status":"publish","type":"blog","link":"https:\/\/proxidize.com\/blog\/web-scraping-with-ruby\/","title":{"rendered":"How to Master Web Scraping With Ruby: A Beginner\u2019s Guide"},"content":{"rendered":"\n<p>We previously explored the many popular-choice <a href=\"https:\/\/proxidize.com\/blog\/web-scraping-with-selenium\/\">programming languages<\/a> available when considering a <a href=\"https:\/\/proxidize.com\/use-cases\/web-scraping\/\">web scraping<\/a> project but we have yet to explore how web scraping with Ruby can be done and its advantages. Whether you&#8217;re tracking competitor prices, gathering research data, or monitoring social media trends, web scraping with Ruby lets you collect and analyze web data at scale. Ready to automate your <a href=\"https:\/\/proxidize.com\/blog\/impact-of-automation-on-data-collection\/\">web data collection<\/a>? This guide will walk you through everything you need to know about web scraping with Ruby; from setting up your environment to building your first scraper and handling complex websites.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>Ruby is often described as simple and productive, making it a strong choice for web development. However, when compared with other languages such as Python, JavaScript, Java, and PHP, the differences become clearer. Both Ruby and Python are compared in terms of their web development and scripting tasks but while Python takes pride in its simplicity and versatility, Ruby shines with its focus on elegant syntax. Ruby is also object-oriented which when compared to JavaScript\u2019s functional and prototype-based design. Ruby is less common for frontend development but is perfect for backend work with its frameworks.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>When placed against Java, Ruby is more flexible and avoids Java\u2019s lengthy syntax. Java does have strength in enterprise-level applications but Ruby is a great choice for startups and dynamic web projects. Finally, Ruby offers a more structured and modern development approach when placed against PHP which is often criticized for its inconsistent design.&nbsp;<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized centered\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1010\" height=\"569\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2024\/12\/setting-up-your-web-scraping-with-ruby-environment-fixed-version.jpg\" alt=\"Image of a large computer screen with three people surrounding it all holding folders with code written on it, and text on top that reads &quot;Setting Up Your Web Scraping With Ruby Environment&quot;\" class=\"wp-image-61586\" style=\"object-fit:cover\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2024\/12\/setting-up-your-web-scraping-with-ruby-environment-fixed-version.jpg 1010w, https:\/\/proxidize.com\/wp-content\/uploads\/2024\/12\/setting-up-your-web-scraping-with-ruby-environment-fixed-version-300x169.jpg 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2024\/12\/setting-up-your-web-scraping-with-ruby-environment-fixed-version-768x433.jpg 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2024\/12\/setting-up-your-web-scraping-with-ruby-environment-fixed-version-600x338.jpg 600w\" sizes=\"(max-width: 1010px) 100vw, 1010px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Setting Up Your Web Scraping With Ruby Environment<\/h2>\n\n\n\n<p>Before diving into web scraping with Ruby, load up your favorite integrated development environment. A proper setup will save you countless hours of troubleshooting later.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Installing Required Ruby Gems<\/h3>\n\n\n\n<p>Other programming languages have libraries that help them process certain actions. In Ruby, these libraries are called Gems and are open-source libraries that contain Ruby code within them.&nbsp;<\/p>\n\n\n\n<p>First, ensure you have Ruby installed on your system. Here&#8217;s a quick platform-specific guide:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Windows: Download and run <a href=\"https:\/\/rubyinstaller.org\/\" target=\"_blank\" rel=\"noopener\">Ruby Installer<\/a>.<\/li>\n\n\n\n<li>macOS: Use Homebrew command brew install Ruby.<\/li>\n\n\n\n<li>Linux: Use sudo apt install Ruby-full for Ubuntu-based systems.<\/li>\n<\/ol>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>Now, install these essential libraries for web scraping:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2\"><span role=\"button\" data-code=\"gem install httparty\ngem install nokogiri\ngem install csv\" style=\"color:#232136;background-color:#e0def4\" aria-label=\"Copy\" data-copied-text=\"Copied!\" data-has-text-button=\"textSimple\" data-inside-header-type=\"none\" aria-live=\"polite\" class=\"code-block-pro-copy-button\"><span class=\"cbp-btn-text\">Copy<\/span><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\"><code><span class=\"line\"><span style=\"color: #EA9A97\">gem<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">install<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">httparty<\/span><\/span>\n<span class=\"line\"><span style=\"color: #EA9A97\">gem<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">install<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">nokogiri<\/span><\/span>\n<span class=\"line\"><span style=\"color: #EA9A97\">gem<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">install<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">csv<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>All three of these libraries will be helpful when web scraping with Ruby. HTTParty library handles HTTP requests while <a href=\"https:\/\/nokogiri.org\/index.html\" target=\"_blank\" rel=\"noopener\">Nokogiri gem<\/a> serves as your HTML parsing powerhouse. The CSV gem will help you export scraped data efficiently.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Configuring Development Tools<\/h3>\n\n\n\n<p>Choose a development environment that supports Ruby well. Visual Studio Code with the Ruby extension offers an excellent free option, while RubyMine provides a more feature-rich paid alternative.<\/p>\n\n\n\n<p>Create a new project directory and set up your Gemfile:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2\"><span role=\"button\" data-code=\"source 'https:\/\/rubygems.org'\ngem 'nokogiri'\ngem 'httparty'\ngem 'csv'\" style=\"color:#232136;background-color:#e0def4\" aria-label=\"Copy\" data-copied-text=\"Copied!\" data-has-text-button=\"textSimple\" data-inside-header-type=\"none\" aria-live=\"polite\" class=\"code-block-pro-copy-button\"><span class=\"cbp-btn-text\">Copy<\/span><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\"><code><span class=\"line\"><span style=\"color: #EB6F92;font-style: italic\">source<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;https:\/\/rubygems.org&#039;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #EA9A97\">gem<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;nokogiri&#039;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #EA9A97\">gem<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;httparty&#039;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #EA9A97\">gem<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;csv&#039;<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>Run bundle install to install all dependencies and create your <a href=\"https:\/\/bundler.io\/guides\/gemfile.html\" target=\"_blank\" rel=\"noopener\">Gemfile.lock<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Understanding Basic Web Scraping Concepts<\/h3>\n\n\n\n<p>Web scraping involves two fundamental processes:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Making HTTP Requests: Using HTTParty to fetch web pages, similar to how your browser requests content<\/li>\n\n\n\n<li><a href=\"https:\/\/proxidize.com\/blog\/parsing-html-python-pyquery\/\">Parsing HTML<\/a>: Using Nokogiri to extract specific data from the webpage&#8217;s HTML structure<\/li>\n<\/ul>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>The real power comes from combining these tools. HTTParty fetches the raw HTML content, which Nokogiri then parses into a format that makes data extraction straightforward. Think of HTTParty as your web browser and Nokogiri as your data extraction assistant.<\/p>\n\n\n\n<p>Remember that websites are built using HTML and CSS. Understanding these basic building blocks will help you identify the right elements to scrape. HTML provides the structure through tags, while CSS selectors help you target specific elements for extraction.<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized centered\"><img decoding=\"async\" width=\"1010\" height=\"569\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2024\/12\/building-your-first-web-scraper-fixed-version.jpg\" alt=\"Image of a man with glasses and a tie typing away as folders surround him and a loading bar reading &quot;Copying All Data&quot; above it. Above the image is text reading &quot;Building Your First Web Scraper&quot;\" class=\"wp-image-61588\" style=\"object-fit:cover\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2024\/12\/building-your-first-web-scraper-fixed-version.jpg 1010w, https:\/\/proxidize.com\/wp-content\/uploads\/2024\/12\/building-your-first-web-scraper-fixed-version-300x169.jpg 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2024\/12\/building-your-first-web-scraper-fixed-version-768x433.jpg 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2024\/12\/building-your-first-web-scraper-fixed-version-600x338.jpg 600w\" sizes=\"(max-width: 1010px) 100vw, 1010px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Building Your First Web Scraper<\/h2>\n\n\n\n<p>Let&#8217;s put our Ruby web scraping environment to work by building our first web scraper. We&#8217;ll start with a simple example that demonstrates the core concepts of web scraping. For the purposes of our article, we will be scraping the common website <a href=\"https:\/\/quotes.toscrape.com\/\" target=\"_blank\" rel=\"noopener\">Quotes To Scrape<\/a>, to provide you with a simple understanding of how it works.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Making HTTP Requests With HTTParty<\/h3>\n\n\n\n<p>First, let&#8217;s fetch data from a web page using HTTParty. Here&#8217;s a code block on how to make your first HTTP request:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2\"><span role=\"button\" data-code=\"require 'httparty'\nresponse = HTTParty.get('https:\/\/quotes.toscrape.com')\nif response.code == 200\n  html_content = response.body\n  puts html_content  # Print the HTML content\nelse\n  puts &quot;Error: #{response.code}&quot;\nend\" style=\"color:#232136;background-color:#e0def4\" aria-label=\"Copy\" data-copied-text=\"Copied!\" data-has-text-button=\"textSimple\" data-inside-header-type=\"none\" aria-live=\"polite\" class=\"code-block-pro-copy-button\"><span class=\"cbp-btn-text\">Copy<\/span><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">require<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;httparty&#039;<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">response <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #9CCFD8\">HTTParty<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#039;https:\/\/quotes.toscrape.com&#039;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">code <\/span><span style=\"color: #3E8FB0\">==<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">200<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  html_content <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">body<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #EB6F92;font-style: italic\">puts<\/span><span style=\"color: #E0DEF4\"> html_content  <\/span><span style=\"color: #908CAA;font-style: italic\">#<\/span><span style=\"color: #6E6A86;font-style: italic\"> Print the HTML content<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">else<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #EB6F92;font-style: italic\">puts<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Error: <\/span><span style=\"color: #908CAA\">#{<\/span><span style=\"color: #F6C177\">response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #F6C177\">code<\/span><span style=\"color: #908CAA\">}<\/span><span style=\"color: #F6C177\">&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">end<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>The response object contains valuable information like status codes and headers. A status code of 200 indicates a successful request.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Parsing HTML With Nokogiri<\/h3>\n\n\n\n<p>Once we have our HTML content, we&#8217;ll use Nokogiri to transform it into a parseable document:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2\"><span role=\"button\" data-code=\"require 'httparty'\nrequire 'nokogiri'\n# Fetch the HTML content from the website\nresponse = HTTParty.get('https:\/\/quotes.toscrape.com')\nif response.code == 200\n  html_content = response.body\n  # Parse the HTML content\n  document = Nokogiri::HTML(html_content)\n  # Loop through each quote on the page\n  document.css('.quote').each_with_index do |quote_block, index|\n    quote = quote_block.css('.text').text.strip\n    author = quote_block.css('.author').text.strip\n    tags = quote_block.css('.tags .tag').map(&amp;:text)\n    puts &quot;Quote #{index + 1}: #{quote}&quot;\n    puts &quot;Author: #{author}&quot;\n    puts &quot;Tags: #{tags.join(', ')}&quot;\n    puts &quot;---------------------------&quot;\n  end\nelse\n  puts &quot;Error: #{response.code}&quot;\nend\" style=\"color:#232136;background-color:#e0def4\" aria-label=\"Copy\" data-copied-text=\"Copied!\" data-has-text-button=\"textSimple\" data-inside-header-type=\"none\" aria-live=\"polite\" class=\"code-block-pro-copy-button\"><span class=\"cbp-btn-text\">Copy<\/span><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">require<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;httparty&#039;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">require<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;nokogiri&#039;<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #908CAA;font-style: italic\">#<\/span><span style=\"color: #6E6A86;font-style: italic\"> Fetch the HTML content from the website<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">response <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #9CCFD8\">HTTParty<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#039;https:\/\/quotes.toscrape.com&#039;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">code <\/span><span style=\"color: #3E8FB0\">==<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">200<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  html_content <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">body<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #908CAA\">  <\/span><span style=\"color: #908CAA;font-style: italic\">#<\/span><span style=\"color: #6E6A86;font-style: italic\"> Parse the HTML content<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  document <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #9CCFD8\">Nokogiri<\/span><span style=\"color: #908CAA\">::<\/span><span style=\"color: #E0DEF4;font-style: italic\">HTML<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">html_content<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #908CAA\">  <\/span><span style=\"color: #908CAA;font-style: italic\">#<\/span><span style=\"color: #6E6A86;font-style: italic\"> Loop through each quote on the page<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  document<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">css<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#039;.quote&#039;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">each_with_index <\/span><span style=\"color: #3E8FB0\">do<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">|<\/span><span style=\"color: #E0DEF4;font-style: italic\">quote_block<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4;font-style: italic\">index<\/span><span style=\"color: #908CAA\">|<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    quote <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> quote_block<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">css<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#039;.text&#039;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">text<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">strip<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    author <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> quote_block<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">css<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#039;.author&#039;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">text<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">strip<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    tags <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> quote_block<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">css<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#039;.tags .tag&#039;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">map<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">&amp;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #3E8FB0\">text<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #EB6F92;font-style: italic\">puts<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Quote <\/span><span style=\"color: #908CAA\">#{<\/span><span style=\"color: #F6C177\">index <\/span><span style=\"color: #3E8FB0\">+<\/span><span style=\"color: #F6C177\"> <\/span><span style=\"color: #EA9A97\">1<\/span><span style=\"color: #908CAA\">}<\/span><span style=\"color: #F6C177\">: <\/span><span style=\"color: #908CAA\">#{<\/span><span style=\"color: #F6C177\">quote<\/span><span style=\"color: #908CAA\">}<\/span><span style=\"color: #F6C177\">&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #EB6F92;font-style: italic\">puts<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Author: <\/span><span style=\"color: #908CAA\">#{<\/span><span style=\"color: #F6C177\">author<\/span><span style=\"color: #908CAA\">}<\/span><span style=\"color: #F6C177\">&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #EB6F92;font-style: italic\">puts<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Tags: <\/span><span style=\"color: #908CAA\">#{<\/span><span style=\"color: #F6C177\">tags<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #F6C177\">join<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#039;, &#039;<\/span><span style=\"color: #908CAA\">)}<\/span><span style=\"color: #F6C177\">&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #EB6F92;font-style: italic\">puts<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;---------------------------&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #3E8FB0\">end<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">else<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #EB6F92;font-style: italic\">puts<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Error: <\/span><span style=\"color: #908CAA\">#{<\/span><span style=\"color: #F6C177\">response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #F6C177\">code<\/span><span style=\"color: #908CAA\">}<\/span><span style=\"color: #F6C177\">&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">end<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>Nokogiri creates a <a href=\"https:\/\/www.w3schools.com\/whatis\/whatis_htmldom.asp\" target=\"_blank\" rel=\"noopener\">DOM representation<\/a> of our HTML, making it easy to search and extract specific elements.<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized centered\"><img decoding=\"async\" width=\"1010\" height=\"569\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2024\/12\/handling-different-types-of-web-pages-fixed-version.jpg\" alt=\"\" class=\"wp-image-61587\" style=\"object-fit:cover\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2024\/12\/handling-different-types-of-web-pages-fixed-version.jpg 1010w, https:\/\/proxidize.com\/wp-content\/uploads\/2024\/12\/handling-different-types-of-web-pages-fixed-version-300x169.jpg 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2024\/12\/handling-different-types-of-web-pages-fixed-version-768x433.jpg 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2024\/12\/handling-different-types-of-web-pages-fixed-version-600x338.jpg 600w\" sizes=\"(max-width: 1010px) 100vw, 1010px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Handling Different Types of Web Pages<\/h2>\n\n\n\n<p>Web scraping with Ruby comes with its own set of challenges, especially when dealing with modern websites. Understanding different types of web pages and their scraping process approaches is crucial for successful data extraction.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Static vs Dynamic Content<\/h3>\n\n\n\n<p>Modern websites come in two distinct types: <a href=\"https:\/\/leadadvisors.com\/blog\/static-vs-dynamic-websites\/\" target=\"_blank\" rel=\"noreferrer noopener\">static and dynamic websites<\/a>. Static pages deliver their content directly in the HTML document, making them straightforward to scrape using basic tools like Nokogiri. Dynamic pages, however, generate content through JavaScript after the initial page load, requiring more sophisticated approaches.<\/p>\n\n\n\n<p>Here&#8217;s how to identify and handle each type:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2\"><span role=\"button\" data-code=\"# For static content\ndoc = Nokogiri::HTML(HTTParty.get(url).body)\ndata = doc.css('.target-element').text\n# For dynamic content\nrequire 'selenium-webdriver'\ndriver = Selenium::WebDriver.for :chrome\ndriver.get(url)\ndata = driver.find_element(css: '.target-element').text\" style=\"color:#232136;background-color:#e0def4\" aria-label=\"Copy\" data-copied-text=\"Copied!\" data-has-text-button=\"textSimple\" data-inside-header-type=\"none\" aria-live=\"polite\" class=\"code-block-pro-copy-button\"><span class=\"cbp-btn-text\">Copy<\/span><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\"><code><span class=\"line\"><span style=\"color: #908CAA;font-style: italic\">#<\/span><span style=\"color: #6E6A86;font-style: italic\"> For static content<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">doc <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #9CCFD8\">Nokogiri<\/span><span style=\"color: #908CAA\">::<\/span><span style=\"color: #E0DEF4;font-style: italic\">HTML<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #9CCFD8\">HTTParty<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">url<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">body<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">data <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> doc<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">css<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#039;.target-element&#039;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">text<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #908CAA;font-style: italic\">#<\/span><span style=\"color: #6E6A86;font-style: italic\"> For dynamic content<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">require<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;selenium-webdriver&#039;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">driver <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #9CCFD8\">Selenium<\/span><span style=\"color: #908CAA\">::<\/span><span style=\"color: #9CCFD8\">WebDriver<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">for <\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #3E8FB0\">chrome<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">driver<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">url<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">data <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> driver<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">find_element<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">css<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;.target-element&#039;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">text<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Dealing With JavaScript-Heavy Sites<\/h3>\n\n\n\n<p>JavaScript-heavy sites require special handling as they render content dynamically. Traditional scraping methods only capture the initial HTML, missing the dynamically loaded content. To overcome this, we can use headless browsers:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2\"><span role=\"button\" data-code=\"require 'watir'\nbrowser = Watir::Browser.new :chrome, headless: true\nbrowser.goto(url)\nbrowser.element(css: '#dynamic-content').wait_until(&amp;:present?)\ncontent = browser.html\" style=\"color:#232136;background-color:#e0def4\" aria-label=\"Copy\" data-copied-text=\"Copied!\" data-has-text-button=\"textSimple\" data-inside-header-type=\"none\" aria-live=\"polite\" class=\"code-block-pro-copy-button\"><span class=\"cbp-btn-text\">Copy<\/span><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">require<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;watir&#039;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">browser <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #9CCFD8\">Watir<\/span><span style=\"color: #908CAA\">::<\/span><span style=\"color: #9CCFD8\">Browser<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #3E8FB0\">new<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #3E8FB0\">chrome<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">headless<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">true<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">browser<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">goto<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">url<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">browser<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">element<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">css<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;#dynamic-content&#039;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">wait_until<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">&amp;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #3E8FB0\">present?<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">content <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> browser<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">html<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>This approach ensures that JavaScript executes fully before we attempt to extract data. The wait_until method is particularly useful for ensuring dynamic content has loaded.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Managing Authentication and Sessions<\/h3>\n\n\n\n<p>Many websites require authentication to access their content. Here&#8217;s how to handle login sessions effectively:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2\"><span role=\"button\" data-code=\"agent = Mechanize.new\nlogin_page = agent.get(login_url)\nform = login_page.forms.first\nform.field_with(name: 'username').value = 'your_username'\nform.field_with(name: 'password').value = 'your_password'\ndashboard = agent.submit(form)\" style=\"color:#232136;background-color:#e0def4\" aria-label=\"Copy\" data-copied-text=\"Copied!\" data-has-text-button=\"textSimple\" data-inside-header-type=\"none\" aria-live=\"polite\" class=\"code-block-pro-copy-button\"><span class=\"cbp-btn-text\">Copy<\/span><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\"><code><span class=\"line\"><span style=\"color: #E0DEF4\">agent <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #9CCFD8\">Mechanize<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #3E8FB0\">new<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">login_page <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> agent<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">login_url<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">form <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> login_page<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">forms<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">first<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">form<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">field_with<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">name<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;username&#039;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">value <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;your_username&#039;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">form<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">field_with<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">name<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;password&#039;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">value <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;your_password&#039;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">dashboard <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> agent<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">submit<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">form<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>Key considerations for authenticated scraping:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Store credentials securely using environment variables.<\/li>\n\n\n\n<li>Implement proper session management.<\/li>\n\n\n\n<li>Handle timeouts and re-authentication.<\/li>\n\n\n\n<li>Respect rate limits.<\/li>\n<\/ul>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>Remember that different websites may require different approaches, and sometimes you&#8217;ll need to combine multiple techniques for successful data extraction. The key is to analyze the target website&#8217;s behavior and choose the appropriate tools for the job.<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized centered\"><img loading=\"lazy\" decoding=\"async\" width=\"1010\" height=\"569\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2024\/12\/storing-and-processing-scraped-data-fixed-version.jpg\" alt=\"Image of a man surrounded by two folders with a piece of paper going between them. Text above that reads &quot;Storing and Processing Scraped Data&quot;\" class=\"wp-image-61585\" style=\"object-fit:cover\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2024\/12\/storing-and-processing-scraped-data-fixed-version.jpg 1010w, https:\/\/proxidize.com\/wp-content\/uploads\/2024\/12\/storing-and-processing-scraped-data-fixed-version-300x169.jpg 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2024\/12\/storing-and-processing-scraped-data-fixed-version-768x433.jpg 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2024\/12\/storing-and-processing-scraped-data-fixed-version-600x338.jpg 600w\" sizes=\"(max-width: 1010px) 100vw, 1010px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Storing and Processing Scraped Data<\/h2>\n\n\n\n<p>Successfully extracting data is only half the battle in web scraping with Ruby. Storing and processing that data effectively is equally crucial. Let&#8217;s explore how to handle your scraped data professionally.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Working With Different Data Formats<\/h3>\n\n\n\n<p>Ruby offers flexible options for storing scraped data. The most common formats are <a href=\"https:\/\/proxidize.com\/blog\/json-vs-csv\/\" data-type=\"link\" data-id=\"https:\/\/proxidize.com\/blog\/json-vs-csv\/\">CSV and JSON<\/a>, each serving different purposes:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2\"><span role=\"button\" data-code=\"require 'csv'\n# Assuming `quotes` is dynamically populated from scraping\n# Example: quotes = [{ quote: &quot;...&quot;, author: &quot;...&quot;, tags: &quot;...&quot; }, ...]\n# Storing data in CSV format\nCSV.open('quotes.csv', 'w+', write_headers: true, headers: %w[Quote Author Tags]) do |csv|\n  quotes.each do |item|\n    csv &lt;&lt; [item[:quote], item[:author], item[:tags]]\n  end\nend\" style=\"color:#232136;background-color:#e0def4\" aria-label=\"Copy\" data-copied-text=\"Copied!\" data-has-text-button=\"textSimple\" data-inside-header-type=\"none\" aria-live=\"polite\" class=\"code-block-pro-copy-button\"><span class=\"cbp-btn-text\">Copy<\/span><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">require<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;csv&#039;<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #908CAA;font-style: italic\">#<\/span><span style=\"color: #6E6A86;font-style: italic\"> Assuming `quotes` is dynamically populated from scraping<\/span><\/span>\n<span class=\"line\"><span style=\"color: #908CAA;font-style: italic\">#<\/span><span style=\"color: #6E6A86;font-style: italic\"> Example: quotes = [{ quote: &quot;...&quot;, author: &quot;...&quot;, tags: &quot;...&quot; }, ...]<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #908CAA;font-style: italic\">#<\/span><span style=\"color: #6E6A86;font-style: italic\"> Storing data in CSV format<\/span><\/span>\n<span class=\"line\"><span style=\"color: #9CCFD8\">CSV<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #EB6F92;font-style: italic\">open<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#039;quotes.csv&#039;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;w+&#039;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">write_headers<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">true<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">headers<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">%<\/span><span style=\"color: #E0DEF4\">w<\/span><span style=\"color: #908CAA\">[<\/span><span style=\"color: #E0DEF4;font-style: italic\">Quote<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4;font-style: italic\">Author<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4;font-style: italic\">Tags<\/span><span style=\"color: #908CAA\">])<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">do<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">|<\/span><span style=\"color: #E0DEF4;font-style: italic\">csv<\/span><span style=\"color: #908CAA\">|<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  quotes<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">each <\/span><span style=\"color: #3E8FB0\">do<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">|<\/span><span style=\"color: #E0DEF4;font-style: italic\">item<\/span><span style=\"color: #908CAA\">|<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    csv <\/span><span style=\"color: #3E8FB0\">&lt;&lt;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">[<\/span><span style=\"color: #E0DEF4\">item<\/span><span style=\"color: #908CAA\">[:<\/span><span style=\"color: #3E8FB0\">quote<\/span><span style=\"color: #908CAA\">],<\/span><span style=\"color: #E0DEF4\"> item<\/span><span style=\"color: #908CAA\">[:<\/span><span style=\"color: #3E8FB0\">author<\/span><span style=\"color: #908CAA\">],<\/span><span style=\"color: #E0DEF4\"> item<\/span><span style=\"color: #908CAA\">[:<\/span><span style=\"color: #3E8FB0\">tags<\/span><span style=\"color: #908CAA\">]]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #3E8FB0\">end<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">end<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Implementing Automated Scraping<\/h3>\n\n\n\n<p>Automation transforms our price monitor from a manual tool into a self-running system. Let&#8217;s implement a complete code for scheduled scraping:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2\"><span role=\"button\" data-code=\"require 'httparty'\nrequire 'nokogiri'\nrequire 'csv'\n# Method to scrape quotes from the website\ndef scrape_quotes\n  url = 'https:\/\/quotes.toscrape.com'\n  response = HTTParty.get(url)\n  if response.code == 200\n    document = Nokogiri::HTML(response.body)\n    quotes = []\n    document.css('.quote').each do |quote_block|\n      quote = quote_block.css('.text').text.strip\n      author = quote_block.css('.author').text.strip\n      tags = quote_block.css('.tags .tag').map(&amp;:text).join(', ')\n      quotes &lt;&lt; { quote: quote, author: author, tags: tags }\n    end\n    quotes\n  else\n    puts &quot;Error: Unable to fetch quotes (HTTP #{response.code})&quot;\n    []\n  end\nend\n# Save scraped quotes to a CSV file\ndef save_to_csv(quotes)\n  filename = &quot;quotes_report_#{Time.now.strftime('%Y-%m-%d')}.csv&quot;\n  CSV.open(filename, 'w+', write_headers: true, headers: %w[Quote Author Tags]) do |csv|\n    quotes.each do |quote|\n      csv &lt;&lt; [quote[:quote], quote[:author], quote[:tags]]\n    end\n  end\n  puts &quot;Quotes saved to #{filename}&quot;\nend\n# Main script execution\nquotes = scrape_quotes\nsave_to_csv(quotes)\" style=\"color:#232136;background-color:#e0def4\" aria-label=\"Copy\" data-copied-text=\"Copied!\" data-has-text-button=\"textSimple\" data-inside-header-type=\"none\" aria-live=\"polite\" class=\"code-block-pro-copy-button\"><span class=\"cbp-btn-text\">Copy<\/span><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">require<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;httparty&#039;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">require<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;nokogiri&#039;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">require<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;csv&#039;<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #908CAA;font-style: italic\">#<\/span><span style=\"color: #6E6A86;font-style: italic\"> Method to scrape quotes from the website<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">def<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">scrape_quotes<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  url <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;https:\/\/quotes.toscrape.com&#039;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  response <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #9CCFD8\">HTTParty<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">url<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #3E8FB0\">if<\/span><span style=\"color: #E0DEF4\"> response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">code <\/span><span style=\"color: #3E8FB0\">==<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">200<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    document <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #9CCFD8\">Nokogiri<\/span><span style=\"color: #908CAA\">::<\/span><span style=\"color: #E0DEF4;font-style: italic\">HTML<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">body<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    quotes <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">[]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    document<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">css<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#039;.quote&#039;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">each <\/span><span style=\"color: #3E8FB0\">do<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">|<\/span><span style=\"color: #E0DEF4;font-style: italic\">quote_block<\/span><span style=\"color: #908CAA\">|<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      quote <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> quote_block<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">css<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#039;.text&#039;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">text<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">strip<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      author <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> quote_block<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">css<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#039;.author&#039;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">text<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">strip<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      tags <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> quote_block<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">css<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#039;.tags .tag&#039;<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">map<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #3E8FB0\">&amp;<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #3E8FB0\">text<\/span><span style=\"color: #908CAA\">).<\/span><span style=\"color: #E0DEF4\">join<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#039;, &#039;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      quotes <\/span><span style=\"color: #3E8FB0\">&lt;&lt;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">quote<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> quote<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">author<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> author<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">tags<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> tags <\/span><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">end<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    quotes<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #3E8FB0\">else<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #EB6F92;font-style: italic\">puts<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Error: Unable to fetch quotes (HTTP <\/span><span style=\"color: #908CAA\">#{<\/span><span style=\"color: #F6C177\">response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #F6C177\">code<\/span><span style=\"color: #908CAA\">}<\/span><span style=\"color: #F6C177\">)&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #908CAA\">[]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #3E8FB0\">end<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">end<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #908CAA;font-style: italic\">#<\/span><span style=\"color: #6E6A86;font-style: italic\"> Save scraped quotes to a CSV file<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">def<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">save_to_csv<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #C4A7E7;font-style: italic\">quotes<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  filename <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;quotes_report_<\/span><span style=\"color: #908CAA\">#{<\/span><span style=\"color: #9CCFD8\">Time<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #F6C177\">now<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #F6C177\">strftime<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#039;%Y-%m-%d&#039;<\/span><span style=\"color: #908CAA\">)}<\/span><span style=\"color: #F6C177\">.csv&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #9CCFD8\">CSV<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #EB6F92;font-style: italic\">open<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">filename<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;w+&#039;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">write_headers<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">true<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">headers<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">%<\/span><span style=\"color: #E0DEF4\">w<\/span><span style=\"color: #908CAA\">[<\/span><span style=\"color: #E0DEF4;font-style: italic\">Quote<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4;font-style: italic\">Author<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #E0DEF4;font-style: italic\">Tags<\/span><span style=\"color: #908CAA\">])<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #3E8FB0\">do<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">|<\/span><span style=\"color: #E0DEF4;font-style: italic\">csv<\/span><span style=\"color: #908CAA\">|<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    quotes<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">each <\/span><span style=\"color: #3E8FB0\">do<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">|<\/span><span style=\"color: #E0DEF4;font-style: italic\">quote<\/span><span style=\"color: #908CAA\">|<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">      csv <\/span><span style=\"color: #3E8FB0\">&lt;&lt;<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">[<\/span><span style=\"color: #E0DEF4\">quote<\/span><span style=\"color: #908CAA\">[:<\/span><span style=\"color: #3E8FB0\">quote<\/span><span style=\"color: #908CAA\">],<\/span><span style=\"color: #E0DEF4\"> quote<\/span><span style=\"color: #908CAA\">[:<\/span><span style=\"color: #3E8FB0\">author<\/span><span style=\"color: #908CAA\">],<\/span><span style=\"color: #E0DEF4\"> quote<\/span><span style=\"color: #908CAA\">[:<\/span><span style=\"color: #3E8FB0\">tags<\/span><span style=\"color: #908CAA\">]]<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">    <\/span><span style=\"color: #3E8FB0\">end<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #3E8FB0\">end<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #EB6F92;font-style: italic\">puts<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;Quotes saved to <\/span><span style=\"color: #908CAA\">#{<\/span><span style=\"color: #F6C177\">filename<\/span><span style=\"color: #908CAA\">}<\/span><span style=\"color: #F6C177\">&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">end<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #908CAA;font-style: italic\">#<\/span><span style=\"color: #6E6A86;font-style: italic\"> Main script execution<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">quotes <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> scrape_quotes<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">save_to_csv<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">quotes<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Implementing a Proxy<\/h3>\n\n\n\n<p>When deciding to perform web scraping with Ruby, certain websites might ban your IP due to rate limiting or because they disallow web scraping of their website. This should not deter you from trying web scraping with Ruby as there is a way around it. That way is through the use of a <a href=\"https:\/\/proxidize.com\/proxy-server\/\">proxy server<\/a>. Proxies can hide your IP address by redirecting traffic through a server, thus keeping your identity hidden and ensuring that you will not be rate-limited or detected for scraping. Implementing a proxy within your web scraping with Ruby script is as simple as adding a few lines of code.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(2 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2\"><span role=\"button\" data-code=\"require 'httparty'\noptions = {\n  http_proxyaddr: 'mobile-proxy-address.com',\n  http_proxyport: 8080,\n  http_proxyuser: 'username',\n  http_proxypass: 'password'\n}\nresponse = HTTParty.get('https:\/\/example.com', options)\nputs response.body\" style=\"color:#232136;background-color:#e0def4\" aria-label=\"Copy\" data-copied-text=\"Copied!\" data-has-text-button=\"textSimple\" data-inside-header-type=\"none\" aria-live=\"polite\" class=\"code-block-pro-copy-button\"><span class=\"cbp-btn-text\">Copy<\/span><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">require<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;httparty&#039;<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">options <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #3E8FB0\">http_proxyaddr<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;mobile-proxy-address.com&#039;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #3E8FB0\">http_proxyport<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EA9A97\">8080<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #3E8FB0\">http_proxyuser<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;username&#039;<\/span><span style=\"color: #908CAA\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">  <\/span><span style=\"color: #3E8FB0\">http_proxypass<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&#039;password&#039;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #908CAA\">}<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">response <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #9CCFD8\">HTTParty<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#039;https:\/\/example.com&#039;<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> options<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #EB6F92;font-style: italic\">puts<\/span><span style=\"color: #E0DEF4\"> response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">body<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>These lines give you the option depending on your proxy provider and the information they provide for their proxies. While including this within your script is optional, it does depend entirely on the website you decide to scrape. For our example website (Quote to scrape), applying this is not needed as the website is made to be used as a practice for web scraping. However, if you have mastered web scraping with Ruby and wish to test out your talents with another website, including a proxy would save you from getting your IP banned on that site.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>Web scraping with Ruby offers powerful automation capabilities that transform tedious manual data collection into efficient, scalable processes. Through proper environment setup, strategic use of libraries like Nokogiri and HTTParty, and smart handling of different web page types, you can build robust scraping solutions for various business needs.<\/p>\n\n\n\n<p>Your scraping journey starts with mastering the basics of making HTTP requests and parsing HTML. As you progress, you&#8217;ll tackle more complex challenges like handling dynamic content, managing authentication, and implementing automated monitoring systems. Remember to maintain good scraping practices by r implementing appropriate request delays and using proxies when needed. Success in web scraping comes from continuous learning and adaptation. Start with simple projects, test your code thoroughly, and gradually build more sophisticated solutions. Armed with the knowledge from this guide, you can now create efficient web scrapers that save time and deliver valuable data insights for your projects.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"author":2627,"featured_media":75718,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","format":"standard","categories":[110],"tags":[],"class_list":["post-61516","blog","type-blog","status-publish","format-standard","has-post-thumbnail","hentry","category-web-scraping-and-automation"],"acf":[],"_links":{"self":[{"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/blog\/61516","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/types\/blog"}],"author":[{"embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/users\/2627"}],"replies":[{"embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/comments?post=61516"}],"version-history":[{"count":9,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/blog\/61516\/revisions"}],"predecessor-version":[{"id":84846,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/blog\/61516\/revisions\/84846"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/media\/75718"}],"wp:attachment":[{"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/media?parent=61516"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/categories?post=61516"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/tags?post=61516"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}