{"id":84203,"date":"2025-09-26T15:14:15","date_gmt":"2025-09-26T14:14:15","guid":{"rendered":"https:\/\/proxidize.com\/?post_type=blog&#038;p=84203"},"modified":"2025-10-02T11:31:18","modified_gmt":"2025-10-02T10:31:18","slug":"what-is-beautifulsoup","status":"publish","type":"blog","link":"https:\/\/proxidize.com\/blog\/what-is-beautifulsoup\/","title":{"rendered":"What is BeautifulSoup?"},"content":{"rendered":"\n<p>Have you ever found yourself sifting through the HTML of a website? It doesn\u2019t matter why you\u2019re doing it, we don\u2019t judge. Trying to click the little ellipses to go one layer deeper into a page only to realize what you\u2019re looking for was actually three layers up and you went down the wrong branch \u2014 it\u2019s a frustrating experience. Imagine having to do that for more than a single page, maybe even hundreds or thousands! Madness. A better way must be possible.<\/p>\n\n\n\n<p>There is! BeautifulSoup is a <strong>Python library<\/strong> that\u2019s important for every HTML dabbler to learn. It\u2019s simple and easy to understand and even non-programmers can dip their toes into it without too much legwork ahead of time. On top of saving yourself a lot of time, it\u2019s also a great gateway into&nbsp; understanding the basics of programming. This article will talk about some of its uses and features, and provide a simple guide on how to use it.<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized centered\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"536\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/beautifulsoup-explained--1024x536.jpg\" alt=\"Image of a person sat on a computer with code blocks leaking over the computer. Text above reads &quot;BeautifulSoup Explained&quot;\" class=\"wp-image-84217\" style=\"object-fit:cover\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/beautifulsoup-explained--1024x536.jpg 1024w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/beautifulsoup-explained--300x157.jpg 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/beautifulsoup-explained--768x402.jpg 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/beautifulsoup-explained--600x314.jpg 600w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/beautifulsoup-explained-.jpg 1200w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">BeautifulSoup Explained<\/h2>\n\n\n\n<p>BeautifulSoup was <strong>created to solve<\/strong> the issue of <strong>sifting though poorly-structured HTML code<\/strong>. It was made in 2004 around the time when the internet was really starting to take off and websites did not have a strict standard to follow. For programmers or hobbyists who wanted to parse or scrape a website that was built on poor code, their options were limited. Beautiful Soup was the solution they were looking for as it is able to parse through all the messy code and gather the bits of information that one might need without having to alter the website itself.<\/p>\n\n\n\n<p>When it was created, BeautifulSoup was written in Python and contained a slew of algorithms to assist with parsing these websites. <strong>BeautifulSoup was designed to simplify the process of extracting data from HTML and XML documents<\/strong>, especially when dealing with inconsistent or malformed markup. It provides a flexible interface for navigating, searching, and modifying parse trees, making it ideal for web scraping tasks. It can handle broken tags, missing attributes, and nested elements, allowing developers to focus on parsing and scraping rather than spending countless hours cleaning up the underlying structure. It bridges the gap between raw web data and structured information extraction in Python.<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized centered\"><img decoding=\"async\" width=\"1024\" height=\"536\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/what-does-beautifulsoup-do-1024x536.jpg\" alt=\"Image of a computer surrounded by webpages. Text above reads &quot;What Does Beautiful Soup Do?&quot;\" class=\"wp-image-84220\" style=\"object-fit:cover\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/what-does-beautifulsoup-do-1024x536.jpg 1024w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/what-does-beautifulsoup-do-300x157.jpg 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/what-does-beautifulsoup-do-768x402.jpg 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/what-does-beautifulsoup-do-600x314.jpg 600w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/what-does-beautifulsoup-do.jpg 1200w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">What Does Beautiful Soup Do?<\/h2>\n\n\n\n<p>BeautifulSoup <strong>helps isolate titles and links from webpages<\/strong>. It extracts all the text from <strong>HTML tags and alters the HTML in the document you are working with<\/strong>. When using BS4, you can <strong>navigate through HTML and XML documents<\/strong> as well by moving through the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Parse_tree\" target=\"_blank\" rel=\"noopener\">parse tree<\/a> to locate exactly what you need, <strong>from headlines to price tags<\/strong>.&nbsp;<\/p>\n\n\n\n<p>It helps with searching for specific elements within a webpage by providing an easy search method<strong> by tag, attribute, or text<\/strong>, allowing you more flexibility to pinpoint the data you are looking for. Sometimes, you may need to alter the webpage\u2019s structure to get the data you need. BeautifulSoup will allow you to modify the parse tree, giving you the choice to <strong>add, remove, or change elements<\/strong> as needed. BS4 is designed to <strong>handle poorly formatted markup<\/strong>, meaning it can still make sense of messy code that might affect other tools.<\/p>\n\n\n\n<p>Beautiful Soup is simple to use as it <strong>abstracts many of the complexities involved in parsing HTML<\/strong>, allowing developers to focus on writing minimal code to perform complex tasks. It can help <strong>navigate and extract data<\/strong> regardless of if it is well-structured or malformed. BS4 automatically corrects common issues like unclosed tags or improperly nested elements, making parsing and extracting data possible from even broken HTML.<\/p>\n\n\n\n<p>It supports multiple parsing strategies through its integration with different parsers. It has a <strong>built-in parser called <\/strong><a href=\"https:\/\/docs.python.org\/3\/library\/html.parser.html\" target=\"_blank\" rel=\"noopener\"><strong>html.parser<\/strong><\/a> which is suitable for simple tasks, but more powerful parsers like <strong>lxml can be used for faster speeds and more complex tasks<\/strong>. BS4 can easily integrate with other Python libraries like <strong>Requests for downloading web pages, pandas for data manipulation, and re for regular expression-based searching<\/strong>.<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized centered\"><img decoding=\"async\" width=\"1024\" height=\"536\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/what-is-beautifulsoup-used-for-1024x536.jpg\" alt=\"Image of a computer with a server to its side. Text above reads &quot;What is BeautifulSoup Used For?&quot;\" class=\"wp-image-84225\" style=\"object-fit:cover\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/what-is-beautifulsoup-used-for-1024x536.jpg 1024w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/what-is-beautifulsoup-used-for-300x157.jpg 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/what-is-beautifulsoup-used-for-768x402.jpg 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/what-is-beautifulsoup-used-for-600x314.jpg 600w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/what-is-beautifulsoup-used-for.jpg 1200w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">What Is BeautifulSoup Used For?<\/h2>\n\n\n\n<p>The library has many use cases, but mostly people do <a href=\"https:\/\/proxidize.com\/blog\/web-scraping-with-beautiful-soup\/\"><strong>web scraping with BeautifulSoup<\/strong><\/a>. You can also <strong>parse web pages, <\/strong><a href=\"https:\/\/proxidize.com\/blog\/how-to-scape-images-from-website\/\"><strong>scrape images<\/strong><\/a><strong> and databases<\/strong>, and even use it for <strong>machine learning and automation processes<\/strong>. With the parsing features that make it so well suited to web scraping, you can <strong>automate the process of searching for content<\/strong> and make predictions based on the collected data. After setting a parser into motion, you can create a <strong>web crawler<\/strong> that continues to return and collect data from your program of choice. This collected data can then be used to create various machine learning models by transferring the data into a new format.<\/p>\n\n\n\n<p>BeautifulSoup can help in everything from document processing to data extraction to reports and automation. Let us explore some more specific use cases for it.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Document Processing&nbsp;<\/h3>\n\n\n\n<p>BS4 can help with <strong>parsing and extracting information from local HTML files<\/strong> which can come in handy when you need to <strong>extract tables from a saved annual financial report<\/strong>. It can help clean and reformat messy or broken HTML markup before reusing it in a CMS or database. If you are checking a set of HTML pages for missing &lt;title&gt; or &lt;meta&gt; tags, it can help inspect your website during an audit. Lastly, it can <strong>automate extracting data from tables or lists<\/strong>, saving you countless hours of manual labor.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Data Extraction<\/h3>\n\n\n\n<p>On the topic of extracting data, BS4 can <strong>process XML-based data formats like RSS or Atom feeds<\/strong>, helping you <strong>build a news aggregator by parsing headlines and links from multiple RSS feeds<\/strong>. You can analyze structured information from log files or datasets such as error codes to help you monitor them more easily. You can easily convert semi-structured documents into either <a href=\"https:\/\/proxidize.com\/blog\/json-vs-csv\/\">CSV or JSON<\/a>, or even Excel formats. If you find yourself with parsed HTML, you can turn it into a searchable database.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Reports<\/h3>\n\n\n\n<p>Automate HTML-based emails, like extracting all unsubscribe links from a batch of newsletters with BS4. Create automated pipelines to generate weekly performance reports and extract metadata from any technical documents.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Research&nbsp;<\/h3>\n\n\n\n<p>With BeautifulSoup, you can mine and extract data for academic articles for citation analysis. You can support digital humanities research and build datasets for natural language processing.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Automation<\/h3>\n\n\n\n<p>Automation becomes much simpler with BS4 as you can automate tasks such as extracting daily prices from saved HTML pages of a supplier\u2019s catalog so you\u2019re aware of any price changes. You can speed up QA testing by parsing rendered HTML output from an app to verify expected elements are working as they should.<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized centered\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"536\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/pros-and-cons-of-beautifulsoup-1024x536.jpg\" alt=\"Image of a bowl of soup with a checkmark on the left and an X on the right. Text above reads &quot;Pros and Cons of BeautifulSoup&quot;\" class=\"wp-image-84219\" style=\"object-fit:cover\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/pros-and-cons-of-beautifulsoup-1024x536.jpg 1024w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/pros-and-cons-of-beautifulsoup-300x157.jpg 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/pros-and-cons-of-beautifulsoup-768x402.jpg 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/pros-and-cons-of-beautifulsoup-600x314.jpg 600w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/pros-and-cons-of-beautifulsoup.jpg 1200w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Pros and Cons of BeautifulSoup<\/h2>\n\n\n\n<p>While it may seem like the perfect tool to start your journey with, it does hold many pros and cons that make it worthwhile, and also not the best for your specific use case. Let us explore some of these.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What Are the Advantages of Beautifulsoup?<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>BS4 is beginner-friendly and easy to learn. It has a simple and intuitive API with clear and widely available documentation.<\/li>\n\n\n\n<li>It can work well with multiple parsers, including but not limited to html.parser, lxml, and html5lib.<\/li>\n\n\n\n<li>It supports both HTML and XML and can handle broken or poorly formatted code.<\/li>\n\n\n\n<li>It provides multiple ways to explore the parse tree through tags, attributes, text, and CSS selectors and is flexible with data extraction methods like text, attributes, parents, siblings, and so on.<\/li>\n\n\n\n<li>It can modify the DOM tree by adding, removing, or editing tags and attributes.<\/li>\n\n\n\n<li>Support with different character encodings includes Unicode.<\/li>\n\n\n\n<li>Works seamlessly with requests and urllib and integrates with pandas for structured data output.<\/li>\n\n\n\n<li>Works well with Genex, Selenium, Scrapy, and other tools.<\/li>\n\n\n\n<li>Perfect for parsing HTML and XML documents, and can be useful for cleaning and reformatting inconsistent markup.<\/li>\n\n\n\n<li>Helpful for automating any repetitive parsing tasks like reports, logs, and email, and can be used for text mining, sentiment analysis, and content analysis.<\/li>\n\n\n\n<li>Finally, it is a good tool for understanding parsing and DOM-like navigation.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">What Are the Disadvantages of Beautifulsoup?<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>It is slower than other alternatives, such as lxml.<\/li>\n\n\n\n<li>It is memory-intensive for larger documents since it loads everything into memory.<\/li>\n\n\n\n<li>Not practical for real-time or larger-scale processing.<\/li>\n\n\n\n<li>Has difficulty with executing or interpreting JavaScript.<\/li>\n\n\n\n<li>Does not have built-in crawling or request handling, meaning you need to install that yourself.<\/li>\n\n\n\n<li>BS4 lacks browser-level simulation for CSS rendering, DOM events, or AJAX.<\/li>\n\n\n\n<li>While helpful, the final_all() method can produce overly broad results that need extra filtering.<\/li>\n\n\n\n<li>Suffers from limited XPath support when compared to other tools.<\/li>\n\n\n\n<li>It is not updated as frequently as other alternatives.<\/li>\n\n\n\n<li>Will vary in behavior depending on the underlying parser.<\/li>\n\n\n\n<li>It is outperformed by lxml when it comes to XML-heavy tasks and by Scrapy for full-featured crawling pipelines.<\/li>\n\n\n\n<li>Lastly, it requires full browser automation tools to handle JavaScript-heavy sites, which most websites run on.<\/li>\n<\/ul>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized centered\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"536\" src=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/how-to-use-beautifulsoup-1024x536.jpg\" alt=\"Image of a terminal showing how to install BeautifulSoup with a bowl of soup next to it. Text above reads &quot;How to Use BeautifulSoup&quot;\" class=\"wp-image-84218\" style=\"object-fit:cover\" srcset=\"https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/how-to-use-beautifulsoup-1024x536.jpg 1024w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/how-to-use-beautifulsoup-300x157.jpg 300w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/how-to-use-beautifulsoup-768x402.jpg 768w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/how-to-use-beautifulsoup-600x314.jpg 600w, https:\/\/proxidize.com\/wp-content\/uploads\/2025\/09\/how-to-use-beautifulsoup.jpg 1200w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">How to Use BeautifulSoup<\/h2>\n\n\n\n<p>Installing BeautifulSoup is as simple and straightforward as using it. It comes preinstalled in any Python virtual environment so you do not need to install another program online to access it. The first thing you need to do is open your terminal and enter the following command:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>pip install beautifulsoup4<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #EA9A97\">pip<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">install<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">beautifulsoup4<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>For something more advanced such as <a href=\"https:\/\/proxidize.com\/blog\/parse-xml-in-python\/\">parsing XML in Python<\/a>, you may want to look at lxml or html5lib. This can be done by entering the following command:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>pip install lxml \npip install html5lib<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #EA9A97\">pip<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">install<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">lxml<\/span><span style=\"color: #E0DEF4\"> <\/span><\/span>\n<span class=\"line\"><span style=\"color: #EA9A97\">pip<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">install<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">html5lib<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>After installing it, you can start using BS4 by importing it into your scripts by placing this at the start of your script:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>from bs4 import BeautifulSoup<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">from<\/span><span style=\"color: #E0DEF4\"> bs4 <\/span><span style=\"color: #3E8FB0\">import<\/span><span style=\"color: #E0DEF4\"> BeautifulSoup<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<div style=\"height:0px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Parsing HTML<\/h3>\n\n\n\n<p>The first step to using Beautiful Soup is to parse an HTML document. Typically, this is done by fetching the website\u2019s HTML content using requests:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>import requests \nfrom bs4 import BeautfiulSoup\nurl = \"http:\/\/example.com\"\nresponse = requests.get \nhtml_content = response.content\nsoup = BeautifulSoup (html_content, \"html.parser\")<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">import<\/span><span style=\"color: #E0DEF4\"> requests <\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">from<\/span><span style=\"color: #E0DEF4\"> bs4 <\/span><span style=\"color: #3E8FB0\">import<\/span><span style=\"color: #E0DEF4\"> BeautfiulSoup<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">url <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;http:\/\/example.com&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">response <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> requests<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get <\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">html_content <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> response<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">content<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">soup <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> BeautifulSoup <\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">html_content<\/span><span style=\"color: #908CAA\">,<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;html.parser&quot;<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Navigating the Parse Tree<\/h3>\n\n\n\n<p>You can navigate through the parse tree by accessing different tags. If you wish to find the first <code>h1<\/code> tag on a page, use this:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>h1_tag = soup.h1 print (h1_tag.text)<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #E0DEF4\">h1_tag <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> soup<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">h1 <\/span><span style=\"color: #EB6F92; font-style: italic\">print<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">h1_tag<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">text<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>You can also use <code>final_all<\/code> to search for all instances of a tag:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>all_links = soul.find_all ('a')\nfor link in all_links: print(link.get('href'))<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #E0DEF4\">all_links <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> soul<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">find_all <\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;a&#39;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #3E8FB0\">for<\/span><span style=\"color: #E0DEF4\"> link <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> all_links<\/span><span style=\"color: #908CAA\">:<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #EB6F92; font-style: italic\">print<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">link<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;href&#39;<\/span><span style=\"color: #908CAA\">))<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Searching for Elements<\/h3>\n\n\n\n<p>By using the parse tree, you can search for items by tag name, attributes, or text content.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>By Tag Name<\/strong>: <code>title_tag = soup.find(\u2018title\u2019) print (title_tag.text)<\/code><\/li>\n\n\n\n<li><strong>By Attribute<\/strong>: <code>link = soup.find(\u2018a\u2019, href=\u2019\/example\u2019) print (link.text)<\/code><\/li>\n\n\n\n<li><strong>By Text<\/strong>: <code>paragraph = soup.find(\u2018p\u2019, text=\u2019Specific Text\u2019) print (paragraph)<\/code><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Modifying the Parse Tree<\/h3>\n\n\n\n<p>You can also modify the parse tree by changing the content of a tag:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>h1_tag.string = \"New Title\" \nprint (soup.h1)<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #E0DEF4\">h1_tag<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">string <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;New Title&quot;<\/span><span style=\"color: #E0DEF4\"> <\/span><\/span>\n<span class=\"line\"><span style=\"color: #EB6F92; font-style: italic\">print<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">soup<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">h1<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>You can also add, remove, and replace tags:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>new_tag = soup.new_tag('p')\nnew_tag.string = \"New Paragraph\"\nsoup.body.append(new_tag)<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #E0DEF4\">new_tag <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> soup<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">new_tag<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;p&#39;<\/span><span style=\"color: #908CAA\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">new_tag<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">string <\/span><span style=\"color: #3E8FB0\">=<\/span><span style=\"color: #E0DEF4\"> <\/span><span style=\"color: #F6C177\">&quot;New Paragraph&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #E0DEF4\">soup<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">body<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">append<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">new_tag<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Extract Data<\/h3>\n\n\n\n<p>Lastly, you can extract data from tags you have found. This can be anything from the tag\u2019s text, attributes, or the entire tag itself:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.75rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#e0def4;--cbp-line-number-width:calc(1 * 0.6 * .75rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span role=\"button\" tabindex=\"0\" style=\"color:#e0def4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>for link in all_links \nprint(link.get('href'), link.text)<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki rose-pine-moon\" style=\"background-color: #232136\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #3E8FB0\">for<\/span><span style=\"color: #E0DEF4\"> link <\/span><span style=\"color: #3E8FB0\">in<\/span><span style=\"color: #E0DEF4\"> all_links <\/span><\/span>\n<span class=\"line\"><span style=\"color: #EB6F92; font-style: italic\">print<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #E0DEF4\">link<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">get<\/span><span style=\"color: #908CAA\">(<\/span><span style=\"color: #F6C177\">&#39;href&#39;<\/span><span style=\"color: #908CAA\">),<\/span><span style=\"color: #E0DEF4\"> link<\/span><span style=\"color: #908CAA\">.<\/span><span style=\"color: #E0DEF4\">text<\/span><span style=\"color: #908CAA\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>BeautifulSoup is the first step in learning and understanding programming as it is the base-level tool used in Python. It is compatible with every version of Python, including Python 2 and Python 3. It is one of the most documented programming tools in the world of software development with guides walking you through how to scrape anything and everything. Proxidize offers guides on <a href=\"https:\/\/proxidize.com\/blog\/scrape-youtube-videos\/\">how to scrape Youtube videos<\/a> and images and <a href=\"https:\/\/proxidize.com\/blog\/scraping-websites-with-login-pages-python\/\">how to scrape websites with login pages<\/a>, even how to scrape Google results themselves.&nbsp;<\/p>\n\n\n\n<p>Key Takeaways:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Beautiful Soup does not need to be installed externally as it exists within Python itself. It simply needs to be installed in the terminal via pip.&nbsp;<\/li>\n\n\n\n<li>One of the main uses of BS4 is web scraping.&nbsp;<\/li>\n\n\n\n<li>With BS4, you can parse HTML, navigate and modify the parse tree, search for elements, and extract data.&nbsp;<\/li>\n\n\n\n<li>You may need to install a parser into your script for more advanced parsing such as lxml or html5lib.<\/li>\n\n\n\n<li>BS4 is compatible with every version of Python, including Python 2 and Python 3.&nbsp;<\/li>\n<\/ol>\n\n\n\n<p>If you are new to programming and want to see what all the excitement is about, choosing to start with BeautifulSoup is a safe and easy bet to explore all the possibilities that programming has to offer. Once you master it, you can move on to the more advanced languages and tools to really test the limits of each one.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Frequently Asked Questions<\/h2>\n\n\n<div id=\"rank-math-faq\" class=\"rank-math-block\">\n<div class=\"rank-math-list \">\n<div id=\"faq-question-1758895647890\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">What is the use of BeautifulSoup?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>It is used for pulling out data from HTML and XML files.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1758895663344\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">Is using BeautifulSoup legal?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>It is perfectly legal to use Beautiful Soup as long as you follow a website\u2019s terms and conditions when it comes to scraping or other matters.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1758895672461\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">Is BeautifulSoup better than Selenium?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Both tools have their own strengths and weaknesses. It depends on your personal preferences.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1758895673098\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">Is BeautifulSoup good for web scraping?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Yes, BS4 is perfect for web scraping because it is used to pull data from websites and documents.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1758895674428\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">Is Beautiful Soup easy to learn?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Yes, it is very easy to learn and has extensive documentation and tutorials on how to use it for almost anything.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1758895675031\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">Is BeautifulSoup free to use?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Yes, it is an open-source program and free to use for anyone.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1758895675731\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">Why is it called Beautiful Soup?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>It is called that after a poem in Alice in Adventures in Wonderland. It is also in reference to \u201ctag-soup\u201d meaning poorly structured HTML.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1758895677029\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">Can BeautifulSoup handle broken HTML?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Yes it can. This is one of the reasons it was developed in the first place.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1758895677714\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">What is the purpose of the <code>find()<\/code> method in BeautifulSoup?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>The purpose of the <code>find()<\/code> method is to locate and return the first HTML or XML element that matches the specific criteria within a parsed document.<\/p>\n\n<\/div>\n<\/div>\n<\/div>\n<\/div>","protected":false},"author":2627,"featured_media":84224,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","format":"standard","categories":[266],"tags":[],"class_list":["post-84203","blog","type-blog","status-publish","format-standard","has-post-thumbnail","hentry","category-tech-tutorials-and-programming"],"acf":[],"_links":{"self":[{"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/blog\/84203","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/types\/blog"}],"author":[{"embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/users\/2627"}],"replies":[{"embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/comments?post=84203"}],"version-history":[{"count":4,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/blog\/84203\/revisions"}],"predecessor-version":[{"id":84733,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/blog\/84203\/revisions\/84733"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/media\/84224"}],"wp:attachment":[{"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/media?parent=84203"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/categories?post=84203"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/proxidize.com\/wp-json\/wp\/v2\/tags?post=84203"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}