GoBeautifulSoup is a high-performance HTML/XML parsing library that provides a 100% compatible API with BeautifulSoup4, but powered by Go for dramatically improved performance. It's designed as a drop-in replacement for BeautifulSoup4 with significant speed improvements.
- 🔥 Up to 10-50x faster than BeautifulSoup4 for parsing and querying
- 🔄 100% API Compatible - Drop-in replacement for BeautifulSoup4
- ⚡ Go-Powered Backend - Leverages Go's performance for HTML/XML processing
- 🌐 Cross-Platform - Works on Windows, macOS, and Linux (x64/ARM64)
- 💾 Memory Efficient - Optimized memory usage for large documents
- 🛡️ Production Ready - Thoroughly tested with comprehensive benchmarks
GoBeautifulSoup dramatically outperforms BeautifulSoup4 across all operations:
| Document Size | GoBeautifulSoup | BeautifulSoup4 (html.parser) | BeautifulSoup4 (lxml) | Speed Improvement |
|---|---|---|---|---|
| Small (1KB) | 0.044ms | 2.1ms | 1.8ms | 48x faster |
| Medium (100KB) | 5.7ms | 89ms | 76ms | 15x faster |
| Large (1MB) | 154ms | 2,400ms | 1,980ms | 15x faster |
| Operation | GoBeautifulSoup | BeautifulSoup4 | Speed Improvement |
|---|---|---|---|
find('div') |
0.16ms | 3.2ms | 20x faster |
find_all('div') |
4.5ms | 45ms | 10x faster |
select('h3') |
2.5ms | 28ms | 11x faster |
find(class_='item') |
0.55ms | 8.9ms | 16x faster |
pip install gobeautifulsoupGoBeautifulSoup provides the exact same API as BeautifulSoup4:
from gobeautifulsoup import BeautifulSoup
# Parse HTML
html = """
<html>
<head><title>Example</title></head>
<body>
<div class="container">
<p class="highlight">Hello World!</p>
<a href="https://example.com">Link</a>
</div>
</body>
</html>
"""
soup = BeautifulSoup(html, 'html.parser')
# All familiar BeautifulSoup methods work exactly the same
title = soup.find('title').get_text()
print(title) # "Example"
paragraph = soup.find('p', class_='highlight')
print(paragraph.get_text()) # "Hello World!"
links = soup.find_all('a')
for link in links:
print(link.get('href')) # "https://example.com"from gobeautifulsoup import BeautifulSoup
html = """
<html>
<body>
<h1>Welcome</h1>
<p class="intro">This is an introduction.</p>
<ul>
<li>Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
</ul>
</body>
</html>
"""
soup = BeautifulSoup(html, 'html.parser')
# Find elements
heading = soup.find('h1')
print(f"Heading: {heading.get_text()}")
# Find by class
intro = soup.find('p', class_='intro')
print(f"Introduction: {intro.get_text()}")
# Find all list items
items = soup.find_all('li')
for i, item in enumerate(items, 1):
print(f"Item {i}: {item.get_text()}")import requests
from gobeautifulsoup import BeautifulSoup
# Scrape a webpage
url = "https://httpbin.org/html"
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
# Extract all links
links = soup.find_all('a')
for link in links:
href = link.get('href')
text = link.get_text().strip()
if href:
print(f"Link: {text} -> {href}")
# Extract all headings
for heading in soup.find_all(['h1', 'h2', 'h3']):
print(f"{heading.name}: {heading.get_text()}")from gobeautifulsoup import BeautifulSoup
html = """
<div class="content">
<article id="post-1" class="post featured">
<h2>Featured Post</h2>
<p class="excerpt">This is a featured post excerpt.</p>
</article>
<article id="post-2" class="post">
<h2>Regular Post</h2>
<p class="excerpt">This is a regular post excerpt.</p>
</article>
</div>
"""
soup = BeautifulSoup(html, 'html.parser')
# CSS selectors work exactly like BeautifulSoup4
featured_posts = soup.select('.post.featured')
print(f"Featured posts: {len(featured_posts)}")
# Complex selectors
excerpts = soup.select('article p.excerpt')
for excerpt in excerpts:
print(f"Excerpt: {excerpt.get_text()}")
# ID selectors
specific_post = soup.select('#post-1 h2')[0]
print(f"Specific post title: {specific_post.get_text()}")from gobeautifulsoup import BeautifulSoup
xml_data = """
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book id="1">
<title>Python Programming</title>
<author>John Doe</author>
<price currency="USD">29.99</price>
</book>
<book id="2">
<title>Web Development</title>
<author>Jane Smith</author>
<price currency="USD">34.99</price>
</book>
</catalog>
"""
soup = BeautifulSoup(xml_data, 'xml')
# Process XML data
books = soup.find_all('book')
for book in books:
book_id = book.get('id')
title = book.find('title').get_text()
author = book.find('author').get_text()
price = book.find('price')
print(f"Book {book_id}: {title} by {author}")
print(f"Price: {price.get('currency')} {price.get_text()}")
print("-" * 40)from gobeautifulsoup import BeautifulSoup
import re
html = """
<table class="data-table">
<thead>
<tr>
<th>Product</th>
<th>Price</th>
<th>Stock</th>
</tr>
</thead>
<tbody>
<tr data-product-id="123">
<td class="product-name">Laptop</td>
<td class="price">$999.99</td>
<td class="stock in-stock">Available</td>
</tr>
<tr data-product-id="124">
<td class="product-name">Mouse</td>
<td class="price">$29.99</td>
<td class="stock out-of-stock">Out of Stock</td>
</tr>
</tbody>
</table>
"""
soup = BeautifulSoup(html, 'html.parser')
# Extract structured data
products = []
rows = soup.select('tbody tr')
for row in rows:
product_id = row.get('data-product-id')
name = row.select_one('.product-name').get_text()
price_text = row.select_one('.price').get_text()
stock_cell = row.select_one('.stock')
# Extract price using regex
price_match = re.search(r'\$(\d+\.?\d*)', price_text)
price = float(price_match.group(1)) if price_match else 0.0
# Determine stock status
in_stock = 'in-stock' in stock_cell.get('class', [])
products.append({
'id': product_id,
'name': name,
'price': price,
'in_stock': in_stock
})
# Display extracted data
for product in products:
status = "✅ Available" if product['in_stock'] else "❌ Out of Stock"
print(f"{product['name']} (ID: {product['id']})")
print(f"Price: ${product['price']:.2f} | Status: {status}")
print("-" * 50)GoBeautifulSoup is designed as a drop-in replacement. Simply change your import:
# Before
from bs4 import BeautifulSoup
# After
from gobeautifulsoup import BeautifulSoup
# Everything else stays exactly the same!✅ Full BeautifulSoup4 API Compatibility
find()andfind_all()methods- CSS selector support with
select() - Tree navigation (parent, children, siblings)
- Attribute access and modification
- Text extraction and manipulation
✅ Parser Support
- HTML parser (
html.parser) - XML parser (
xml) - Automatic encoding detection
✅ Advanced Features
- Regular expression search
- Custom attribute filters
- Tree modification methods
- Pretty printing
GoBeautifulSoup consists of two main components:
- Go Core: High-performance HTML/XML parsing engine written in Go
- Python Wrapper: Provides BeautifulSoup4-compatible API
The Go core handles all the heavy lifting (parsing, querying, tree traversal), while the Python wrapper ensures 100% API compatibility.
- Reuse Parser: For multiple documents, reuse the BeautifulSoup instance when possible
- Use Specific Selectors: More specific CSS selectors perform better than broad searches
- Limit Search Scope: Use
find()instead offind_all()when you only need one result - Choose Right Parser: Use 'html.parser' for HTML and 'xml' for XML documents
- API Reference: docs/api.md
- Migration Guide: docs/migration.md
- Performance Guide: docs/performance.md
- Examples: examples/
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Found a bug? Please create an issue on GitHub Issues with:
- Python version
- Operating system
- Minimal code example
- Expected vs actual behavior
This project is licensed under the MIT License - see the LICENSE file for details.
- Inspired by the excellent BeautifulSoup library by Leonard Richardson
- Built with Go for maximum performance
- Thanks to all contributors and users
- GitHub: https://github.com/coffeecms/gobeautifulsoup
- PyPI: https://pypi.org/project/gobeautifulsoup/
- Documentation: https://gobeautifulsoup.readthedocs.io/
- Benchmarks: benchmarks/
Ready to supercharge your HTML parsing? Install GoBeautifulSoup today and experience the performance difference!
pip install gobeautifulsoup