Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cookies from browser #29201

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open

Cookies from browser #29201

wants to merge 25 commits into from

Conversation

mbway
Copy link

@mbway mbway commented Jun 2, 2021

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

I have added a new command line option: --cookies-from-browser <browser_name> which uses cookies directly extracted from an installed browser rather than reading from a cookies file provided by the user for several reasons:

  • cookies may become more important to download age restricted youtube videos as the old methods no longer work
  • the existing --cookies option is not very easy for end users because it requires them to find and use a browser extension that saves to just the right format that youtube_dl understands
  • the --cookies argument is a bit of a compromise as I'm sure that in the vast majority of cases users are using un-edited cookies directly from their browsers anyway
  • users may not understand the security implications of someone obtaining their cookies and so may mishandle them

The --cookies-from-browser option will look in the default user data location for the given browser on the current platform. Multiple browser profiles may exist so I have chosen to read from the profile that was most recently written to.

Initially I only supported Firefox on Linux (which does not encrypt cookies), and the scope grew from there. I reverse engineered the chromium encryption mechanisms by reading the source code and referring to a few other projects which also aim to decrypt chromium cookies, though I think this one of the most comprehensive implementations.

This pull request is still a bit 'in progress' as I'm not sure how to handle a few things:

  • logging: I think there should be at least some messages printed but youtube_dl doesn't seem to use a standard logging library so I'm just printing to stdout for now Sorted now
  • third party libraries: it looks like youtube_dl has no third party dependencies so this feature may not be accepted because it relies on two third party libraries. This could be avoided in theory by re-implementing AES in CBC and GCM modes, and implementing all the keyring interfaces (Gnome keyring, KWallet, OSX keyring etc). For now I've wrapped the third party imports in try/except so users can install the additional libraries if they want to use --cookies-from-browser
    • there is also the complication that the keyring package is not required at all on windows and is optional on OSX. The cryptography package is always required however.
    • the cryptography package is dropping support for python 2: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release.
    • sorted: using pycryptodome for AES-GCM (Windows) but no external dependencies for other cryptographic operations. keyring required on Linux only.

Tested browsers and platforms

Tested mostly with python 3.9 but I did try 2.7/Linux/Firefox and 2.7/Linux/Chromium

Linux OSX Windows
Firefox X X X
Chrome X X X
Chromium X X X
Safari Na X Na
Brave X X X
Opera X X X
Edge X X X
Vivaldi X X X

@Lesmiscore
Copy link
Contributor

Lesmiscore commented Jun 3, 2021

I have a question, does X in Tested browsers and platforms table mean "already tested"? Thanks

@mbway
Copy link
Author

mbway commented Jun 3, 2021

@nao20010128nao yes, X means I tried downloading a video that required cookies first without having the browser installed to check that no cookies are found, then again with the browser installed but not logged in (so cookies are found but the video does not download) then again once logged in.

@Lesmiscore
Copy link
Contributor

@mbway Wow, that's great. Great PR I've ever seen on youtube-dl repo.

@mbway
Copy link
Author

mbway commented Jun 3, 2021

thanks :) I'm glad you like it

@pukkandan
Copy link
Contributor

Amazing PR; I have a few suggestions:

Multiple browser profiles may exist so I have chosen to read from the profile that was most recently written to.

I think it is better to give the user full control over the path and profile. One way to do this could be a syntax like --cookies-from-browser BROWSER:PROFILE or similar. If no profile is given, we can just look for the Default profile and throw an error if it can't be found. It would also be useful to allow passing direct paths like --cookies-from-browser BROWSER:/path/to/profile as as a way to handle non-standard installations and other chromium/firefox forks. (I use vivaldi, so I'd very much like to see this implemented :D)

logging: I think there should be at least some messages printed but youtube_dl doesn't seem to use a standard logging library so I'm just printing to stdout for now

There is utils.write_string. However, this will not obey the verbosity level (-v, -q). I believe the "correct" method would be to wrap the entire thing in a class and have YoutubeDl initialize it with a reference to self. See downloader.common.FileDownloader for example. Then you will have access to functions like self.ydl.to_screen, and params with self.ydl.params.get('verbose') etc

This could be avoided in theory by re-implementing AES in CBC and GCM modes,

AES in CBC mode is already implemented. See aes.py. I dont think GCM mode is implemented though. If this is not sufficient, consider using pycryptodome instead of cryptography since the former is already an optional dependancy in downloader.hls

For now I've wrapped the third party imports in try/except so users can install the additional libraries if they want to use --cookies-from-browser

This seems to be the correct approach since the above mentioned pycryptodome dependency is handled similarly


Hopefully I was of some help to you

@rautamiekka
Copy link
Contributor

pycryptodome indeed sounds like the best option even if only for its Python 2.7 and 3.5+ support.

@mbway
Copy link
Author

mbway commented Jun 3, 2021

@pukkandan
I did think about allowing a profile to be specified but I was thinking of using multiple command line arguments which wouldn't have been very nice so I avoided that, but I like the idea browser:profile_name or browser:profile_path so I've added that.

So long as a browser behaves like the browser you name, you can pass the profile path of a different browser. You can use this to extract cookies from the beta or dev channels of chrome and edge so I removed those as specific options. For example you can do --cookies-from-browser "chrome:~/.config/google-chrome-beta/Default" instead.

Of course there was an exception to this. Opera doesn't seem to have profiles at all (unlike the other browsers, Local State and Cookies are stored in the same directory) so you can't specify an alternative install directory for opera. I don't think this is a problem though.

I added support for Vivaldi, it was the next on my list of browsers I would support so I went ahead and added it. I've only tested it on Linux.

The default behaviour is to take the most recently written to profile rather than having a default profile because Firefox seems to name profiles with random directory names like kf6hsid3.default whereas chrome does have predictable names like Default, Profile 1 etc. And as stated above Opera doesn't even have profiles.

I found that hashlib provides an implementation of pbkdf2 so I was able to reduce the requirements to:

  • windows: pycryptodome for AES-GCM
  • mac: nothing (keyring package optional but keyring fallback works fine)
  • linux: keyring

I added a small wrapper around the YoutubeDL object to make it act like a logger. I think that's the cleanest way to deal with it.

Unfortunately as I was making these changes, something changed with youtube so I no longer need to provide cookies to download the video I was testing with. Hopefully --cookies-from-browser will be more generally useful in future though.

@pukkandan
Copy link
Contributor

something changed with youtube so I no longer need to provide cookies to download the video I was testing with. Hopefully --cookies-from-browser will be more generally useful in future though.

Cookies are essential both for downloading private playlists on youtube and in general for other services with paid content. So your PR is incredibly useful even if the age-gate is fully bypassable

@mbway
Copy link
Author

mbway commented Jun 8, 2021

before this gets merged I would like to have a go at adding Safari support as well. Safari seems to do things differently (not sqlite) but others have reverse engineered the format so it should be possible to parse

@mbway
Copy link
Author

mbway commented Jun 12, 2021

I have added support for Safari which I think means that this feature should support 99% of browsers now.

The .binarycookies format has been documented in the past but there seems to have been a few changes made to the format. The structure isn't quite as described in the documentation but it was close enough that I was able to extract cookies successfully. There are just a few fields which I don't know whether they are just padding or whether they have meaningful values. From my testing these fields are always filled with null bytes but they could just be rarely used settings. Either way, for the purposes of youtube_dl all the useful data can be extracted.

I found two files: ~/Library/Cookies/Cookies.binarycookies and ~/Library/Cookies/com.apple.Safari.SearchHelper.binarycookies. Starting from a clean install, after visiting a few sites the second file only contained

<Cookie CONSENT=... for .youtube.com/>
<Cookie VISITOR_INFO1_LIVE=... for .youtube.com/>

I can't find anything online about the SearchHelper cookies file. The cookies above can also be found in the Cookies.binarycookies file with different values, and SearchHelper had CONSENT=PENDING and Cookies.binarycookies had CONSENT=YES (which happens after accepting the terms on popup when visiting youtube) so I think the correct approach is to ignore the SearchHelper cookies and just read Cookies.binarycookies.

@mbway
Copy link
Author

mbway commented Jun 17, 2021

I made the changes necessary to hopefully pass the checks. It's much messier maintaining compatibility with ancient python versions. I can't even easily get hold of these versions to check, but I think the tests should pass now, although for quite a few of the python versions the cookies tests will be skipped unless pycryptodome is available.

The requirements are now:

  • Windows: pycryptodome for AES-GCM and PBKDF2
  • Mac: python>=3.4 or pycryptodome for PBKDF2 (keyring package optional but keyring fallback works fine)
  • Linux: python>=3.4 or pycryptodome for PBKDF2 and keyring

@rautamiekka
Copy link
Contributor

Using a venv you can install whatever Python version you want. If PyPi haven't deleted the ancient module versions it should work that way, too.

@mbway
Copy link
Author

mbway commented Jun 17, 2021

@rautamiekka yeah, so I normally use the venv module, but that copies the python interpreter it's called with, so I installed virtualenv and tried

[youtube-dl]$ virtualenv -p python3.3 
RuntimeError: failed to find interpreter for Builtin discover of python_spec='python3.3'
[youtube-dl]$ virtualenv venv --python=python3.3
RuntimeError: failed to find interpreter for Builtin discover of python_spec='python3.3'

I also tried conda

[youtube-dl]$ conda create --name testing python=3.3
Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - python=3.3

Current channels:

  - https://conda.anaconda.org/conda-forge/linux-64
  - https://conda.anaconda.org/conda-forge/noarch

then I gave up. I'll see what the CI says and fix things if necessary

@rautamiekka
Copy link
Contributor

Yeah, according to https://anaconda.org/conda-forge/python/files?version=3.4.5 there's nothing between 2.7.x and 3.4.x (and PyPi doesn't seem to have Python as a package), so 3.4.x is indeed the earliest beyond compiling 3.3.x: https://www.python.org/downloads/release/python-336/.

pukkandan pushed a commit to yt-dlp/yt-dlp that referenced this pull request Jul 21, 2021
)

* also adds `--no-cookies-from-browser`

Original PR: ytdl-org/youtube-dl#29201
Authored by: mbway
@mbway
Copy link
Author

mbway commented Jul 21, 2021

this feature is now merged into yt-dlp: yt-dlp/yt-dlp#488 so head over there if you want to use it. I will back-port the improvements I made for that pull request, although I don't expect this one to be merged any time soon as it's is already on page 3 of pull requests

@mbway
Copy link
Author

mbway commented Jul 23, 2021

edit: for people reading this. The other side of this conversation was deleted by the author. For context, their original comment was:

Note that this option requires SQLite, which is not preinstalled on all platforms. I think a better option is OAuth, as you dont have to go rummaging through a users browser profiles. And its better than manual cookies too, as user doesnt have to figure out how to export cookies and/or HAR file, or have to install an addon just to export cookies. Here is a writeup from Google:

https://developers.google.com/identity/sign-in/devices


No offence, but your comments seem misguided.

Note that this option requires SQLite, which is not preinstalled on all platforms

On every platform I have tested on, the python interpreter is compiled with sqlite support. It is possible on source-based Linux distributions such as Gentoo to compile without sqlite support (or if you compile from source manually on any platform of course) (see yt-dlp/yt-dlp#544) but that can be handled and I will backport this workaround.

I think a better option is OAuth

  • Not every site uses OAuth
  • OAuth is not a substitute for every type of cookie (eg the consent cookie which is set after agreeing to the 'Before you continue to YouTube' popup when visiting youtube. I'm not sure if this cookie is useful, but it's an example)
  • OAuth requires an API key. Google does not want media to be downloaded from youtube and would not allow youtube-dl to obtain an API key
  • OAuth provides authentication for accessing official APIs such as the youtube data api. youtube-dl does not use the official APIs of the sites it downloads from because the official APIs do not offer the information required to download videos, again because the sites do not want to allow this

[OAuth is better because the] user doesnt have to figure out how to export cookies

That's exactly what this feature aims to improve upon. The user can simple point youtube-dl at their browser of choice. From a usability perspective it is less hassle even than OAuth because the user does not have to authenticate and give access to youtube-dl, so no user interaction is required

@mbway
Copy link
Author

mbway commented Jul 23, 2021

We have a difference of opinion around priorities (sqlite support, accessing browser data, non-youtube support etc). I'm not going to argue.

I mentioned that cookies and OAuth serve different purposes. I probably should have been clearer that I'm not against youtube-dl using OAuth (not that my opinion has any weight here anyway, I'm not a youtube-dl maintainer). Rather than opening an issue you commented on this pull request saying I think a better option is OAuth, implying it's a complete replacement and without providing details. I have given my stance on why I disagree.

It looks like your project uses an API designed for smart TVs, so I stand by my statement that video sites do not sanction use of internal APIs for the purposes of user downloads.

@mbway
Copy link
Author

mbway commented Jul 23, 2021

Forcing SQLite on the user base

It won't.

Those are "internal APIs" that YouTube-DL is using

Yes, without authorization from google, which is not possible with OAuth

@mbway
Copy link
Author

mbway commented Jul 23, 2021

first point: see yt-dlp/yt-dlp#544 which I linked to earlier. There is a fix for that problem: yt-dlp/yt-dlp#554. Sqlite support is not required unless you want to use the feature (--cookies-from-browser) with browsers which store cookies in sqlite, which is entirely optional.

Second point: google can revoke an API key. They cannot so easily mitigate use of cookies since using them presents identically to how a user accessing the site presents

@mbway
Copy link
Author

mbway commented Jul 23, 2021

edit: for context, the original question was in two parts, this is the second part

Cookies can be revokes as well. Youve probably see it before, different website have the option of "log me out everywhere", for people who are worried that they may have left themselves signed in on a public computer.


I'm not sure if we are having a good-faith conversation here. Your second point is a valid question though so I'll answer it:

It comes down to how many targets there are. Google can't identify every user who has used their cookies to give to youtube-dl, so it's not an issue.
If every user generated their own API key then this would be hard to prevent as well, but would be a usability nightmare as each user would have to go through the process at https://developers.google.com/ to create a project, generate an API key and assign the correct scopes to it. It might be possible to automate the process though which could be something to look into? (although, thinking about it, you would have to be authenticated in order to create a project for the user's google account so it's a chicken and egg problem)

Alternatively (the normal option) a single key could be generated for the project (youtube-dl) and each user would share the same API key. This would be the single point of failure which could be revoked.

@mbway
Copy link
Author

mbway commented Jul 23, 2021

thats a two line pull request

the fix would not propagate immediately to users and using a central key may have different legal implications than providing a tool for users to user their own cookies/keys.

This is getting off-topic for this pull request. If you want to discuss OAuth further please open an issue for it.

@mbway
Copy link
Author

mbway commented Jul 31, 2021

this pull request should now be ready to merge whenever a maintainer has the time to look at it. I'm able to use it on my machine and the tests pass under python 3.9

nixxo pushed a commit to nixxo/yt-dlp that referenced this pull request Nov 22, 2021
…t-dlp#488)

* also adds `--no-cookies-from-browser`

Original PR: ytdl-org/youtube-dl#29201
Authored by: mbway
@porg
Copy link

porg commented Jun 15, 2022

^ I assume that downloading as a logged in Youtube/Google user rather than an anonymous user will give you the "throttle penalty" much less likely. So making cookies as user friendly as possible should be a high priority. Hence this PR expedited.

Whats stopping this from being merged?

@porg
Copy link

porg commented Jun 15, 2022

@mbway some more questions:

  1. Users who need --cookies-from-browser already now are advised to use yt-dlp ?
  2. Could your solution be also used for wget and curl?
    • They both are pretty oldschool in that regard.
    • They too could benefit very well from a --cookies-from-browser <browser_name> function too.

@pukkandan
Copy link
Contributor

Could your solution be also used for wget and curl?
They both are pretty oldschool in that regard.
They too could benefit very well from a --cookies-from-browser <browser_name> function too.

Assuming someone wants to translate the whole thing into C, it would work...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants