-
-
Notifications
You must be signed in to change notification settings - Fork 600
feature: addition of a Cloudflare bypass for the anti-bot #577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
yeshua-aguilar
wants to merge
9
commits into
tgbot-collection:master
Choose a base branch
from
yeshua-aguilar:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
e3a2bae
feat: Adding a bypass to avoid Cloudflare bot blocking for websites
yeshua-aguilar d188cf6
feat: creacion del test bypass cloudflare
yeshua-aguilar 15ed418
Merge pull request #2 from yeshua-aguilar/develop
yeshua-aguilar 8587f3b
feat: Add Cloudflare bypass support for direct downloads
yeshua-aguilar 9dbb367
feat: add test for Cloudflare bypass functionality
yeshua-aguilar ec1ee88
Merge branch 'master' into develop
yeshua-aguilar ecb5f33
refactor: switch to ai-cloudscraper, scope bypass to direct.py only
yeshua-aguilar 96ecef6
Merge branch 'develop' of https://github.com/yeshua-aguilar/ytdlbot i…
yeshua-aguilar bb8d28b
Merge pull request #4 from yeshua-aguilar/develop
yeshua-aguilar File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| #!/usr/bin/env python3 | ||
| # coding: utf-8 | ||
|
|
||
| # ytdlbot - test_cloudflare_bypass.py | ||
| # Test script for Cloudflare bypass functionality using ai-cloudscraper | ||
|
|
||
| import sys | ||
| from pathlib import Path | ||
|
|
||
| sys.path.insert(0, str(Path(__file__).parent)) | ||
|
|
||
| from utils.http_client import BypassHTTPClient, get_realistic_headers | ||
|
|
||
|
|
||
| def test_cloudflare_bypass(): | ||
| print("=" * 50) | ||
| print("Testing Cloudflare Bypass (ai-cloudscraper)") | ||
| print("=" * 50) | ||
|
|
||
| test_sites = [ | ||
| ("https://nowsecure.nl/", "NowSecure (Cloudflare protected)"), | ||
| ("https://www.google.com/", "Google (no protection)"), | ||
| ] | ||
|
|
||
| client = BypassHTTPClient(bypass_enabled=True) | ||
|
|
||
| for url, description in test_sites: | ||
| print(f"\nTesting: {description}") | ||
| print(f"URL: {url}") | ||
|
|
||
| try: | ||
| resp = client.get(url, timeout=10) | ||
| print(f"Status: {resp.status_code}") | ||
| print(f"Content length: {len(resp.text)} chars") | ||
| print("Result: OK") | ||
| except Exception as e: | ||
| print(f"Error: {e}") | ||
| print("Result: FAILED") | ||
|
|
||
| client.close() | ||
| print("\n" + "=" * 50) | ||
| print("Test completed!") | ||
| print("=" * 50) | ||
|
|
||
|
|
||
| def test_headers(): | ||
| print("\n" + "=" * 50) | ||
| print("Testing Realistic Headers") | ||
| print("=" * 50) | ||
|
|
||
| headers = get_realistic_headers() | ||
|
|
||
| for key, value in headers.items(): | ||
| print(f"{key}: {value}") | ||
|
|
||
| print("\nResult: OK") | ||
|
|
||
|
|
||
| def test_direct_download_usage(): | ||
| print("\n" + "=" * 50) | ||
| print("Testing DirectDownload Usage Pattern") | ||
| print("=" * 50) | ||
|
|
||
| from utils.http_client import get_http_client | ||
|
|
||
| client = get_http_client(bypass_enabled=True) | ||
|
|
||
| print("\nTesting single instance pattern:") | ||
| print(f"Client type: {type(client).__name__}") | ||
|
|
||
| try: | ||
| resp = client.get("https://httpbin.org/headers", timeout=5) | ||
| print(f"Status: {resp.status_code}") | ||
| print("Result: OK") | ||
| except Exception as e: | ||
| print(f"Error: {e}") | ||
| print("Result: FAILED") | ||
|
|
||
| print("\n" + "=" * 50) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| test_cloudflare_bypass() | ||
| test_headers() | ||
| test_direct_download_usage() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,153 @@ | ||
| #!/usr/bin/env python3 | ||
| # coding: utf-8 | ||
|
|
||
| # ytdlbot - http_client.py | ||
| # HTTP client with Cloudflare bypass support using ai-cloudscraper | ||
|
|
||
| __author__ = "yeshua-aguilar" | ||
|
|
||
| import logging | ||
| from typing import Optional | ||
|
|
||
| import cloudscraper | ||
| import requests | ||
| from requests.adapters import HTTPAdapter | ||
| from urllib3.util.retry import Retry | ||
|
|
||
|
|
||
| class BypassHTTPClient: | ||
| """HTTP client that automatically bypasses Cloudflare protection.""" | ||
|
|
||
| def __init__(self, bypass_enabled: bool = True, timeout: int = 30): | ||
| self._bypass_enabled = bypass_enabled | ||
| self._timeout = timeout | ||
| self._session: Optional[requests.Session] = None | ||
| self._scraper: Optional[cloudscraper.CloudScraper] = None | ||
|
|
||
| self._user_agent = ( | ||
| "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 " | ||
| "(KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36" | ||
| ) | ||
|
|
||
| self._headers = { | ||
| "User-Agent": self._user_agent, | ||
| "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8", | ||
| "Accept-Language": "en-US,en;q=0.5", | ||
| "Accept-Encoding": "gzip, deflate, br", | ||
| "DNT": "1", | ||
| "Connection": "keep-alive", | ||
| "Upgrade-Insecure-Requests": "1", | ||
| } | ||
|
|
||
| def _create_scraper(self) -> cloudscraper.CloudScraper: | ||
| scraper = cloudscraper.create_scraper( | ||
| browser={ | ||
| "browser": "chrome", | ||
| "platform": "windows", | ||
| "desktop": True, | ||
| }, | ||
| delay=10, | ||
| ) | ||
| scraper.headers.update(self._headers) | ||
| return scraper | ||
|
|
||
| def _create_session(self) -> requests.Session: | ||
| session = requests.Session() | ||
| session.headers.update(self._headers) | ||
|
|
||
| retry_strategy = Retry( | ||
| total=3, | ||
| backoff_factor=1, | ||
| status_forcelist=[429, 500, 502, 503, 504], | ||
| ) | ||
| adapter = HTTPAdapter(max_retries=retry_strategy) | ||
| session.mount("http://", adapter) | ||
| session.mount("https://", adapter) | ||
|
|
||
| return session | ||
|
|
||
| def get(self, url: str, **kwargs) -> requests.Response: | ||
| """Make a GET request with Cloudflare bypass if needed.""" | ||
| kwargs.setdefault("timeout", self._timeout) | ||
|
|
||
| if self._bypass_enabled: | ||
| return self._bypass_get(url, **kwargs) | ||
| return self._normal_get(url, **kwargs) | ||
|
|
||
| def _bypass_get(self, url: str, **kwargs) -> requests.Response: | ||
| """Try Cloudflare bypass first, fallback to normal request.""" | ||
| try: | ||
| if self._scraper is None: | ||
| self._scraper = self._create_scraper() | ||
|
|
||
| logging.debug("Attempting Cloudflare bypass for %s", url) | ||
| response = self._scraper.get(url, **kwargs) | ||
|
|
||
| if response.status_code == 403 and "cloudflare" in response.text.lower(): | ||
| logging.warning("Cloudflare bypass failed, trying normal request") | ||
| return self._normal_get(url, **kwargs) | ||
|
|
||
| return response | ||
| except Exception as e: | ||
| logging.warning("Cloudflare bypass error: %s, falling back to normal request", e) | ||
| return self._normal_get(url, **kwargs) | ||
|
|
||
| def _normal_get(self, url: str, **kwargs) -> requests.Response: | ||
| """Make a normal GET request without bypass.""" | ||
| if self._session is None: | ||
| self._session = self._create_session() | ||
|
|
||
| return self._session.get(url, **kwargs) | ||
|
|
||
| def close(self): | ||
| """Close all sessions.""" | ||
| if self._session: | ||
| self._session.close() | ||
| self._session = None | ||
| if self._scraper: | ||
| self._scraper.close() | ||
| self._scraper = None | ||
|
|
||
|
|
||
| _client_instance: Optional[BypassHTTPClient] = None | ||
|
|
||
|
|
||
| def get_http_client(bypass_enabled: bool = True) -> BypassHTTPClient: | ||
| """Get or create a shared HTTP client instance.""" | ||
| global _client_instance | ||
| if _client_instance is None: | ||
| _client_instance = BypassHTTPClient(bypass_enabled=bypass_enabled) | ||
| return _client_instance | ||
|
|
||
|
|
||
| def get_cloudflare_bypass_session() -> cloudscraper.CloudScraper: | ||
| """Get a CloudScraper session for yt-dlp or other libraries.""" | ||
| scraper = cloudscraper.create_scraper( | ||
| browser={ | ||
| "browser": "chrome", | ||
| "platform": "windows", | ||
| "desktop": True, | ||
| }, | ||
| delay=10, | ||
| ) | ||
| return scraper | ||
|
|
||
|
|
||
| def get_realistic_headers() -> dict: | ||
| """Get realistic browser headers for manual use.""" | ||
| return { | ||
| "User-Agent": ( | ||
| "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 " | ||
| "(KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36" | ||
| ), | ||
| "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8", | ||
| "Accept-Language": "en-US,en;q=0.5", | ||
| "Accept-Encoding": "gzip, deflate, br", | ||
| "DNT": "1", | ||
| "Connection": "keep-alive", | ||
| "Upgrade-Insecure-Requests": "1", | ||
| "Sec-Fetch-Dest": "document", | ||
| "Sec-Fetch-Mode": "navigate", | ||
| "Sec-Fetch-Site": "none", | ||
| "Sec-Fetch-User": "?1", | ||
| } |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about aria2 part?
Do you have any suggestions?