A simple command utility to extract information from the YouTube API v3 for scientific purposes.
This Python based command line utility enables the easy extraction of information from the YouTube API (Version 3). Currently, it supports only a small subset of the API interface and focuses on extracting related videos from given starting points.
First, make sure you have a recent version of Python 3 installed.
Next, install yt-scraper by using pip:
sudo pip install yt-scraperUpdate by adding the --upgrade flag:
sudo pip install --upgrade yt-scraperWindows users may need to alter the command to:
py -m pip install --upgrade yt-scraperMac users may need to alter the command to:
python3 -m pip install --upgrade yt-scraperIn order to use this program, you will need an official YouTube API key.
You can obtain one from this page and
use it with the following examples by appending -k <KEY> to them.
Currently, yt-scraper has two commands: search and config
The first command is used to query the YouTube API, whereas the second is used to configure the default configuration.
The search command starts a video search from one or multiple given starting points. These could be multiple videos orginating from a search (using term), or a provided list of video ids (using input), or just one root video (using id or url).
For example, the following command will return the first video when one searches for cat.
$ yt-scraper search term 'cat'[STATUS] Result:
Depth: 0, Rank: 0, ID: hY7m5jjJ9mM
Title: CATS will make you LAUGH YOUR HEAD OFF - Funny CAT compilation
Related Videos: []
One can also provide a video id or a video url as a starting point,
which is more interesting when combined with the --max-depth option:
$ yt-scraper search id '0A2R27kCeD4' --max-depth 2 Depth: 0, Rank: 0, ID: 0A2R27kCeD4
Title: 🤣 Funniest 🐶 Dogs and 😻 Cats - Awesome Funny Home Animal Videos 😇
Related Videos: ['pc8-8KfIW5c']
Depth: 1, Rank: 0, ID: pc8-8KfIW5c
Title: 🦁 Funniest Animals 🐼 - Try Not To Laugh 🤣 - Funny Domestic And Wild Animals' Life
Related Videos: ['OrJMUNEyZsE']
Depth: 2, Rank: 0, ID: OrJMUNEyZsE
Title: Funniest Videos for Pets to Watch Compilation | Funny Pet Videos
Related Videos: []
Additionally, one can specify the number of videos
that should be returned on each level by utilizing the --number option.
For instance, the following command returns two related videos
from a given video (specified by it's url) and
additionally one related video from each sibling:
$ yt-scraper search url 'https://www.youtube.com/watch?v=0A2R27kCeD4' --depth 1 --number 2 --number 1 [STATUS] Result:
Depth: 0, Rank: 0, ID: 0A2R27kCeD4
Title: 🤣 Funniest 🐶 Dogs and 😻 Cats - Awesome Funny Home Animal Videos 😇
Related Videos: ['pc8-8KfIW5c', 'tbyAuT50eu4']
Depth: 1, Rank: 0, ID: pc8-8KfIW5c
Title: 🦁 Funniest Animals 🐼 - Try Not To Laugh 🤣 - Funny Domestic And Wild Animals' Life
Related Videos: []
Depth: 1, Rank: 1, ID: tbyAuT50eu4
Title: 😁 Funniest 😻 Cats and 🐶 Dogs - Awesome Funny Pet Animals 😇
Related Videos: []
For the sake of brevity, you can shorten --number to -n and --depth to
-d.
There are even global commands, too! Global options are specified in front of the command and alter the behavior of all commands. This may not sound very meaningful to you given that there only two commands right now and you are right! But this is likely to change in the future.
For example, to see more output during the program execution,
specify --verbose or -v right after yt-scraper:
$ yt-scraper -v search id '0A2R27kCeD4' --max-depth 2There are many more options that you can make use of. All of them are described in the Options section.
Sometimes you may find yourself struggling with all the possible options.
Fortunately, there is the config command for all the lazy typer out there.
Setting a particular default option like the output directory to ~/my_data is as easy
as typing
$ yt-scraper config set encoding utf-8Forgetful? Just double-check by typing get instead of set:
$ yt-scraper config get encoding[STATUS] The value of 'encoding' is set to 'utf-8'.
| Search options | Default | Description |
|---|---|---|
-n, --number |
1 | Number of the videos fetched per level (can be specified several times) |
-d, --max-depth |
0 | Number of recursion steps to perform. |
-k, --api-key |
Required | The API key that should be used to query the YouTube API v3. |
-o, --output-dir |
Optional | Path to the directory where output files are saved |
-f, --output-format |
csv | Specifies the file format of output files. |
-N, --output-name |
Optional | Specifies the file name or prefix of output files. |
-r, --region-code |
de | Return only videos which are unrestricted in the given region. |
-l, --lang-code |
de | Return videos mostly relevant to a specified language. |
-s, --safe-search |
none | Filter sensitive or restricted videos. |
-e, --encoding |
utf-8 | Transform fetched text to another encoding. |
-u, --unique |
False | Do not process seen videos again |
-i, --include |
All | Specify field to export (can be used several times) |
-x, --exclude |
None | Specify field not to export (can be used several times) |
| Global options | Default | Description |
|---|---|---|
-c, --config-path |
System-specific | Specifies a configuration file. For details, see configuration. |
-v, --verbose |
False | Shows more output during program execution. |
-V, --version |
Optional | Shows the current program version and exits |
More information can be found by adding the --help option to commands or
reading the YouTube API manual.
Old-fashioned people, who do not like the config command,
can manually configure the program by editing the config.toml file.
It is secretly used and altered when using the config command.
Entered values are used in all future queries as long as
they are not overwritten by actual command line options.
For example, to always use the API key ABCDEFGH and a search depth of 3,
where on each level one video less is returned,
just create following configuration file:
config.toml
api_key = "ABCDEFGH"
number = [ 4, 3, 2, 1 ]
depth = 3
verbose = trueAn example toml is included: config.toml
Then put this file in your standard configuration folder. Typically this folder can be found at the following location:
- Mac OS X:
~/Library/Application Support/YouTube Scraper - Unix:
~/.config/youtube-scraper - Windows:
C:\Users\<user>\AppData\Roaming\YouTube Scraper
If the folder does not exist, you may need to create it first.
Google dramatically lowered the maximum number of API requests per day in recent years.
Currently, they impede a daily usage of 10,000 quota points, which corresponds to around 100 videos.
One can create multiple accounts though, and specify them in the config.toml file like this:
api_key = ["<API Key 1>", "<API Key 2>", "..."]Furthermore, specify the --unique option to avoid duplicate processing of frontier nodes. Keep in mind, however, that the resulting graph will have a maximum degree of n.
-
0.2.X
- Added UNLICENSE to project
-
0.3.X
- Uploaded to PyPI
-
0.4.X
- New command search
-
0.5.X
- Option
--depthrenamed to--max-depth - Video attributes, such as title, description, channel are fetched
- More consistent option handling
- Option
-
0.6.X
- New export feature: csv
- New command: config
- New API options: region-code, lang-code and safe-search
-
0.7.X
- New
--versionoption - New
--encodingoption - New
--export-nameoption - New
--uniqueoption - New input method by importing a file or reading from stdin
- Added prompt when encountering an API error
- New
-
0.8.X
- New
--format sqlSQLite export - New
config wherecommand - Multiple API keys with automatic key switching is now possible
- New
--includeoption - New
--excludeoption - Renamed
inputargument tofile - Piping of urls is now possible
- New
Every of these features is going to be a minor patch:
- Add node video data attributes, such as title and description.
- Add possibility to specify more than one API key to switch seamlessly.
- Add possibility to query more than 50 videos on one level.
- Add youtube-dl integration for downloading subtitles.
- Add a testing suite.
- Add export functionality to CSV and SQLlite.
- Add more information about quota to README
If you come across any bugs or have a suggestion, please don't hesitate to file an issue.
Contributions in any form are welcomed. I will accept pull-requests if they extent yt-scraper's functionality.
To set up the development environment,
please install Poetry and run poetry install inside the project.
A test suite will be added soon.
In general, the contribution process is somewhat like this:
- Fork it (
$ git clone https://github.com/rattletat/yt-scraper) - Create your feature branch (
$ git checkout -b feature/fooBar) - Commit your changes (
$ git commit -am 'Add some fooBar') - Push to the branch (
$ git push origin feature/fooBar) - Create a new Pull Request
Michael Brauweiler
- Twitter: @rattletat
- Email: rattletat@posteo.me
This plugin is free and unemcumbered software released into the public domain.
For more information, see the included UNLICENSE file.
