-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the CC file instead of the transcript #42
Comments
Can you provide an example for this? As far as I know, YouTube does not decide how people upload their subtitles, it can be that whoever create the CC made them with multiple lines at once.
How can I get the As for now, I would say this is a huge refactor of the code and I would prefer not to do it since it looks to be some undocumented API that can lose access at any moment, while the current solution of parsing the HTML can be adapted easily in case of changes. It would be interesting if we could implement the XML and have he HTML parsing as a fallback, but I would like to see how much work this would be. |
A working link can be parsed from any youtube watch page that has CC. I can't provide a working link here because the |
Please share the link to one of those videos you mention without breaks at the end of sentences, I will try to get the |
https://www.youtube.com/watch?v=V_v5Gcjgv3U
the transcript gives one. I might add that I use the timing to embed the videos directly into Anki. Not sure everybody is using it that way. |
I see what you mean now, I guess they did it to avoid a long list for transcript. Your proposal makes a lot of sense, I check how to implement it when I have some free time. |
Cool! Let me know if you need any assistance for example with testing, etc. |
Wow, I'm amazed that you were able do to that so quickly! Are you going to give the user the ability to select which CC language they want to use within the extension? |
Yes I got all the caption links, the user will have to select a language, this way there's only one URL to request the XML |
@tube-CC I was able to make a beta with the functionality you ask, It takes the captions you mentioned from the script with the Screen.Recording.2023-01-20.at.16.01.38.movIf you want to give it a test, you can use this package: |
Testing the Beta
|
|
Can all of the pulling and parsing be done at the time the extension button is clicked rather than when the page/extension is loaded? I suppose it would seem slower to the end user, but at least it would cause the data to reload on click. |
So, I got it to work. Starting on line 72 in the popup.js
You can see I added the |
This is what it does already, like you pointed either the extension reloads the page or I should find a way to get the data that YouTube queries when another video is loaded, I got access to the I considered the reload, but I would like to keep it as a last resort solution. |
Youtube recently decided to merge multiple lines of the CC into each single line of the transcript. This makes youtube2Anki much less useful. I found that the CC file can be pulled as XML. You can find the links to the various CC files in the HTML of the video page below a section that looks like "captions":{"playerCaptionsTracklistRenderer":{"captionTracks":.
After replacing \u0026 with &, the URLs look like this:
https://www.youtube.com/api/timedtext?v=[video_id]&caps=asr&xoaf=5&hl=en&ip=0.0.0.0&ipbits=0&expire=[expire_code]&sparams=ip,ipbits,expire,v,caps,xoaf&signature=[signature_code]&key=yt8&lang=en
Would it be possible to rewrite to use the CC file from those links instead of the transcript for a more granular set of data and timing?
Originally posted by @tube-CC in #40
The text was updated successfully, but these errors were encountered: