Skip to content

🫧Python toolkit / library to help retrieve and collect image data from anime image fan art websites

License

Notifications You must be signed in to change notification settings

luminolous/moescraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MoeScraper

Anime picture scraper toolkit. Scrape & download images by tags from multiple sources, with rate limiting, retries, progress bar, and metadata export. You can use it to collect data for model / adapter (LoRA / LoRA+ / etc.) training.


Features

  • Multi-source scraping via simple adapter interface
  • Tag-based search scraping to a target count (client.scrape_images(...))
    • Progress bar
    • Resume support
  • Concurrent downloading (thread pool)
  • Metadata export:
    • JSONL during scraping

How to use it

First, you need to install the package. You can copy this command and paste it in your terminal.

pip install "moescraper @ git+https://github.com/luminolous/moescraper.git"

If you want to scrape from the Gelbooru website, you'll need to enter the api_key and user_id of your Gelbooru account. You can copy the code below to enter the credentials.

import os
os.environ["GELBOORU_API_KEY"] = "Your_API_Key"
os.environ["GELBOORU_USER_ID"] = "Your_User_ID"

And then, copy the code below and you can adjust it for the parameter settings.

For the website key, you can check it here

from moescraper import MoeScraperClient

client = MoeScraperClient()

client.scrape_images(
    source="danbooru",                                # Website key
    tags=["elaina_(majo_no_tabitabi)", "solo"],     # your image tags: based on each website
    n_images=1000,                                    # input your target total images
    nsfw_mode="safe",                                 # "safe" | "all" | "nsfw"
    out_dir="moescraper_result/images",
    meta_jsonl="moescraper_result/metadata.jsonl",
    index_db="moescraper_result/index.sqlite",
    state_path="moescraper_result/scrape_state.json",
    limit=200,                                        # per-page fetch size
    max_workers=2,                                    # download concurrency
    allowed_exts=["jpg", "png"],                      # filter file type
    freeze_apng=True,                                 # freeze animation on APNG file
)

client.close()

And the output will look like this:

moescraper_result/
  images/                # downloaded images folder
  metadata.jsonl         # JSONL lines for downloaded posts
  index.sqlite           # SQLite index for dedupe/track exported
  scrape_state.json      # resume cursor (page pointer)

Example Result

This is an example of the scraping results folder.

Example result


NSFW Modes

  • safe → keep safe / non-explicit only
  • all → keep everything
  • nsfw → keep nsfw only

Rating quality depends on each source; treat it as best-effort.


Disclaimer

This project is for educational/research purposes. Please respect each website’s Terms of Service and the content licenses. Use polite rate limits.


Supported Sources

Source Website Key Link Note
Danbooru danbooru https://danbooru.donmai.us ⚠️NSFW Warnings
Safebooru safebooru https://safebooru.org/ -
Zerochan zerochan https://www.zerochan.net/ -
Aibooru aibooru https://aibooru.online/ ⚠️NSFW Warnings
Safe Aibooru safe_aibooru https://safe.aibooru.online/ -
Konachan konachan https://konachan.com/ ⚠️NSFW Warnings
Yandere yandere https://yande.re/ ⚠️NSFW Warnings
Gelbooru gelbooru https://gelbooru.com/ ⚠️NSFW Warnings

About

🫧Python toolkit / library to help retrieve and collect image data from anime image fan art websites

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages