Skip to content

feature: addition of a Cloudflare bypass for the anti-bot#577

Open
yeshua-aguilar wants to merge 9 commits intotgbot-collection:masterfrom
yeshua-aguilar:master
Open

feature: addition of a Cloudflare bypass for the anti-bot#577
yeshua-aguilar wants to merge 9 commits intotgbot-collection:masterfrom
yeshua-aguilar:master

Conversation

@yeshua-aguilar
Copy link
Copy Markdown

Summary

Adds automatic Cloudflare bypass support to avoid anti-bot detection when downloading videos from protected websites.

Problem

Many websites use Cloudflare protection which blocks bot requests, even for legitimate video downloads. This causes failures when users try to download content from these sites.

Solution
Implemented a new HTTP client module using cloudscraper library that:

  • Automatically bypasses Cloudflare challenges
  • Uses realistic browser headers (Chrome/Windows)
  • Falls back to normal requests if bypass fails
  • Configurable via BYPASS_CLOUDFLARE env variable
    Files Changed
File Change
pyproject.toml Added cloudscraper>=1.2.71
requirements.txt Added cloudscraper>=1.2.71
src/utils/http_client.py NEW - HTTP client with bypass
src/config/config.py Added BYPASS_CLOUDFLARE setting
src/engine/direct.py Uses new HTTP client
src/engine/generic.py Added realistic headers for yt-dlp
src/engine/instagram.py Uses new HTTP client
.env.example Documented new setting

Testing

Testing: NowSecure (Cloudflare protected)
URL: https://nowsecure.nl/
Status: 200
Content length: 58817 chars
Result: OK

Testing: Google (no protection)
URL: https://www.google.com/
Status: 200
Content length: 46302 chars
Result: OK

Configuration

.env
BYPASS_CLOUDFLARE=True # Default: True

Dependencies

  • cloudscraper>=1.2.71 — Automatic Cloudflare bypass

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use personal API for this. So no need for additional workarounds here.

Copy link
Copy Markdown
Collaborator

@SanujaNS SanujaNS Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yt-dlp handles this part on own, specially for YT. Why do we need to manually inject headers here?
I need your opinion.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about aria2 part?
Do you have any suggestions?

@SanujaNS
Copy link
Copy Markdown
Collaborator

Hi @yeshua-aguilar ,

Thank you for your interest in the project and for submitting this PR!

I left some comments on the code, but I had a few general questions:

  • Version: Why go with the older cloudscraper? The same dev seems to have a newer version here.

  • Alternatives: Any thoughts on using ai-cloudscraper instead?

  • Context: Was there a specific issue you ran into that prompted adding this to ytdlbot?

  • Implementation: While yt-dlp likely doesn’t need this, our direct download part could definitely benefit. Why not implement this fully for that section, or do you have other suggestions?

Appreciate the help!

@yeshua-aguilar
Copy link
Copy Markdown
Author

Hi @SanujaNS , thanks for the feedback! Let me address each point:

Version:
You're right! I wasn't aware of ai-cloudscraper when I created this PR. The original cloudscraper (1.2.71) hasn't been updated since April 2023, while ai-cloudscraper is actively maintained

I'll switch to ai-cloudscraper for this PR.
Alternatives
ai-cloudscraper looks like the best option. It's free (MIT license), open source, and a drop-in replacement - same import syntax works. No concerns from my side.

Context
I encountered Cloudflare 403 errors when trying to download videos from several sites. The bot was blocked even for legitimate download requests. This happens especially with sites that have aggressive anti-bot settings enabled.

Implementation

Great point, You're right that:

  • yt-dlp already handles Cloudflare well with curl-cffi - no bypass needed in generic.py
  • direct download (direct.py) uses requests directly and would benefit from this
    I'll update the PR to:
  1. Switch to ai-cloudscraper
  2. Remove bypass from generic.py (yt-dlp doesn't need it)
  3. Keep it only in direct.py
  4. Remove from instagram.py unless you think it's needed there
    I'll make these changes soon. Thanks for the guidance!

@SanujaNS
Copy link
Copy Markdown
Collaborator

Hi @yeshua-aguilar ,

Sorry about the delayed response and thank you for the update.

I will go through the changes and get back to you.

@SanujaNS SanujaNS requested a review from BennyThink March 2, 2026 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants