Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Loom] Add new extractor #28039

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

[Loom] Add new extractor #28039

wants to merge 12 commits into from

Conversation

wongyiuhang
Copy link

@wongyiuhang wongyiuhang commented Feb 1, 2021

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

In response to a site request #27957, this new extractor is written for loom.com.

Closes #27957

Comment on lines 66 to 78
def _extract_video_info_json(self, webpage, video_id):
info = self._html_search_regex(
r'window.loomSSRVideo = (.+?);',
webpage,
'info')
return self._parse_json(info, 'json', js_to_json)

def _get_url_by_id_type(self, video_id, type):
request = compat_urllib_request.Request(
self._BASE_URL + 'api/campaigns/sessions/' + video_id + '/' + type,
{})
json_doc = self._download_json(request, video_id)
return (url_or_none(json_doc.get('url')), json_doc.get('part_credentials'))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated at 34e6a6b

Comment on lines 74 to 76
request = compat_urllib_request.Request(
self._BASE_URL + 'api/campaigns/sessions/' + video_id + '/' + type,
{})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move into _download_json.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated at 70b8045

Comment on lines 80 to 88
def _get_m3u8_formats(self, url, video_id, credentials):
format_list = self._extract_m3u8_formats(url, video_id)
for item in format_list:
item['protocol'] = 'm3u8_native'
item['url'] += '?' + credentials
item['ext'] = 'mp4'
item['format_id'] = 'hls-' + str(item.get('height', 0))
item['extra_param_to_segment_url'] = credentials
return format_list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated at 34e6a6b

ext = self._search_regex(
r'\.([a-zA-Z0-9]+)\?',
url, 'ext', default=None)
if(ext != 'm3u8'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No parens.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated at 34e6a6b

Comment on lines +99 to +101
ext = self._search_regex(
r'\.([a-zA-Z0-9]+)\?',
url, 'ext', default=None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read coding conventions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this part, I may need to extract the file extension from a url.

Would you prefer a relaxed regex \.([^.?]+)\??

Or HEAD [URL] and extract the extension from content-type header with mimetype2ext(mt)?

Comment on lines 107 to 108
'width': try_get(info, lambda x: x['video_properties']['width']),
'height': try_get(info, lambda x: x['video_properties']['height'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int_or_none

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated at 29c4168


return {
'id': info.get('id'),
'title': info.get('name'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mandatory. Read coding conventions.

Copy link
Author

@wongyiuhang wongyiuhang Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the id, I may provide a fallback value from the url. However, the title does not have another fallback source, other than the embedded JSON.

Any advice?

Afterthoughts:
Is that okay if use use [video_id] or the word Loom as the fallback title?


for i in range(len(folder_info['entries'])):
video_id = folder_info['entries'][i]
folder_info['entries'][i] = LoomIE(self._downloader)._real_extract(url_or_none(self._BASE_URL + 'share/' + video_id))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url_result.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated at 1b2651e

Comment on lines +84 to +108

ext = self._search_regex(
r'\.([a-zA-Z0-9]+)\?',
url, 'ext', default=None)
if ext != 'm3u8':
formats.append({
'url': url,
'ext': ext,
'format_id': type,
'width': int_or_none(try_get(info, lambda x: x['video_properties']['width'])),
'height': int_or_none(try_get(info, lambda x: x['video_properties']['height']))
})
else:
credentials = compat_urllib_parse_urlencode(part_credentials)
m3u8_formats = self._extract_m3u8_formats(url, video_id)
for item in m3u8_formats:
item['protocol'] = 'm3u8_native'
item['url'] += '?' + credentials
item['ext'] = 'mp4'
item['format_id'] = 'hls-' + str(item.get('height', 0))
item['extra_param_to_segment_url'] = credentials
for i in range(len(m3u8_formats)):
formats.insert(
(-1, len(formats))[i == len(m3u8_formats) - 1],
m3u8_formats[i])
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

octet-stream support required
#27957 (comment)

@Fred-Vatin
Copy link

This has been already merged ?

I try to download a video here. When I try with the MPD link I can detect, YTDL returns a 403 error.

Are you able to download it ?

@wongyiuhang
Copy link
Author

This has been already merged ?

I try to download a video here. When I try with the MPD link I can detect, YTDL returns a 403 error.

Are you able to download it ?

This PR has not completed the code reviewing yet. Also, I have yet implemented the *.mpd support yet...🙈

@rememberlenny
Copy link

@dstftw or @wongyiuhang can I help move this along?

@wongyiuhang
Copy link
Author

@dstftw or @wongyiuhang can I help move this along?

Yes, of course. I'm sorry for holding the pull request. Is there anything that I need to do? 👀

@cellarpub
Copy link

@wongyiuhang since the PR has not yet been merged I have tried to git clone your repo and checkout to 'loom' branch. Then I have installed with 'pip3 install -e .' but downloading the link above does not work.
I am getting the following error:

ERROR: Unsupported URL: https://www.loom.com/share/384b27b953714dc19aba4768643038bd

This is my youtube-dl -v:

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.02.10
[debug] Git HEAD: 360d5f0da
[debug] Python version 3.9.7 (CPython) - Linux-5.15.0-1-MANJARO-x86_64-with-glibc2.33
[debug] exe versions: ffmpeg 4.4, ffprobe 4.4, rtmpdump 2.4
[debug] Proxy map: {}
Usage: youtube-dl [OPTIONS] URL [URL...]

@mysticaltech
Copy link

Hey folks, you were almost there!

@coolbaluk
Copy link

I suggest an optimisation, which is to use the transcoded_url if it exits, and then the raw_url if not and then do the hls/dash stitching.

Here's how to the *.mpd @wongyiuhang

            if ext == 'mpd':
                credentials = compat_urllib_parse_urlencode(part_credentials)
                mpd_formats = self._extract_mpd_formats(
                    url, video_id)
                for item in mpd_formats:
                    for f in item['fragments']:
                        f['path'] += '?' + credentials
                        self.to_screen(f)
                    item['protocol'] = 'http_dash_segments'
                    item['url'] += '?' + credentials
                    item['ext'] = 'mp4'
                    item['format_id'] = 'dash-' + str(item.get('height', 0))
                for i in range(len(mpd_formats)):
                    formats.insert(
                        (-1, len(formats))[i == len(mpd_formats) - 1],
                        mpd_formats[i])

I'm happy to give a hand, what else is needed to get this one through ? @dstftw

Anyone else that we can tag ?

@alfonsrv
Copy link

alfonsrv commented Apr 3, 2022

I checked out @wongyiuhang's code from his loom branch and it doesn't work (anymore). Merely downloads a 4kb mp4. Somebody on Reddit suggests having to download the manifest.

Also, embedded URLs are invalid currently and look like this https://www.loom.com/embed/1ae0b5c204b14f5881f0a826cbc7b3b9

@upintheairsheep
Copy link

Hello, make sure not to forget about this.

@ryanhugh
Copy link

Is anyone available to keep moving this along? Support for loom would be great

@upintheairsheep
Copy link

Is anyone available to keep moving this along? Support for loom would be great

It works perfectly

@upintheairsheep
Copy link

@upintheairsheep
Copy link

We just need to add a little more metadata from mine and we done

@upintheairsheep
Copy link

wongyiuhang#1

@dirkf
Copy link
Contributor

dirkf commented Jun 2, 2023

The first test for LoomFolderIE is giving 404 on JSON download. The API URL with folders/.../by_name seems not to be supported now. But the folder structure can be traversed with the folders/....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Site request: Support for loom.com