-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add audio/video metadata #72
Comments
YouTubeWe will definitely need an API key for this. Here are the steps, which are very similar to my instructions for setting up the Sheets API token except you can use the same project and even the API key, you just need to enable YouTube API:
Should be all set. Go back to your Dashboard and confirm it's there then let me know it's ready: Archive.orgQuestion: why don't any of the records use this for audio yet? It looks like ELA has quite a few, just wondering why they're not in the dataset. Anyhoo, no API key required. Here's an example of the meta for your "Kabardian comparative" for example: https://archive.org/metadata/ela_kabardian_comparative/metadata Which returns: Or go straight for the kill with one level deeper: https://archive.org/metadata/ela_kabardian_comparative/metadata/description That's the info we need, right? Easy on my end, BUT how will those files be embedded as audio? I see a bunch of formats listed, but also archive.org's docs suggest an iframe, which is normally used for videos. The audio embed I have set up in the code is not going to do anything with an API URL like that, and vice versa. So, will I need to do a check to see if the
I see how to get the full list as well, but I assume we'd only need one of them? Here's a full list for another of your audio items as a comparison: https://archive.org/download/mid-2003-06-13c I'm not sure we'll get away without me doing some kind of check/processing on my end, so I could see this being the flow of code:
OR, we treat it as a video/webplayer:
The obvious drawback of the "video" approach is that you couldn't have audio AND video for the same record. OR:
I'm going to be hitting the API no matter what so whatever is cleanest for that. Assuming there is enough consistency of results then I'm sure I could find whatever I need there and deliver it to the user in whatever format you want. Personally I would go with the "video" embed since it has a lot more options and could be useful for those languages that have like 100+ wave files: If you've got instances of Video AND Audio though, then would be a harder sell for the embed approach. 🤔 For Future Jasonhere is the archive's metadata API: https://blog.archive.org/2013/07/04/metadata-api/ |
Hope that made sense, kind just learning it as I go, so let me know what you think. |
Great, thanks for these instructions. I’ll get on them once we’re ready to get started. Maybe this will be obvious once we’re working on it, but can we choose what metadata gets pulled by the API? Want to clarify this with Dan, whose request this was, but I think Description is the key thing, maybe one or two others.
Re: Archive.org <http://archive.org/>, same deal— and yes, we have some things on there. (Aren’t we using the Neo-Mandaic sample from there?). But we're going to have a TON more— all of our audio— within the next year or so, we just wanted to wait until we have the best samples up there with all the proper metadata. Cool that a full list can be pulled, which is probably more than we need, but I’d want to ask Dan— potentially a neat capability. In principle all the code flows you describe sound sensible, but the ability to have both video and audio for any given language community would be nice.
… On Oct 12, 2020, at 10:36 PM, Jason Lampel ***@***.***> wrote:
@rperlin-ela <https://github.com/rperlin-ela>
YouTube
We will definitely need an API key for this. Here are the steps, which are very similar to my instructions <#18 (comment)> for setting up the Sheets API token except you can use the same project and even the API key, you just need to enable YouTube API:
https://console.developers.google.com/apis/dashboard <https://console.developers.google.com/apis/dashboard>
Select your project if needed:
<https://user-images.githubusercontent.com/4974087/95803510-85ea2900-0cbd-11eb-8d90-599622bdb61f.png>
If needed, click the project in the popup.
Click this:
<https://user-images.githubusercontent.com/4974087/95803555-a6b27e80-0cbd-11eb-9b4b-d3aef5b20cb1.png>
Type youtube
Click the result with v3 in it:
<https://user-images.githubusercontent.com/4974087/95803375-2be96380-0cbd-11eb-8a6a-165009ca0d40.png>
Click ENABLE
Should be all set. Go back to your Dashboard and confirm it's there then let me know it's ready:
<https://user-images.githubusercontent.com/4974087/95803467-6b17b480-0cbd-11eb-80ec-9f04a732a675.png>
Archive.org
Question: why don't any of the records use this for audio yet? It looks like ELA has quite a few, just wondering why they're not in the dataset.
Anyhoo, no API key required. Here's an example of the meta for your "Kabardian comparative" for example:
https://archive.org/metadata/ela_kabardian_comparative/metadata <https://archive.org/metadata/ela_kabardian_comparative/metadata>
Which returns:
<https://user-images.githubusercontent.com/4974087/95805410-94870f00-0cc2-11eb-8a33-1bc76b6c9440.png>
Or go straight for the kill with one level deeper: https://archive.org/metadata/ela_kabardian_comparative/metadata/description <https://archive.org/metadata/ela_kabardian_comparative/metadata/description>
That's the info we need, right? Easy on my end, BUT how will those files be embedded as audio? I see a bunch of formats listed, but also archive.org's docs suggest an iframe <https://archive.org/help/audio.php>, which is normally used for videos.
The audio embed I have set up in the code is not going to do anything with an API URL like that, and vice versa. So, will I need to do a check to see if the Audio url includes archive.org then process it accordingly to make it work with one of these?
The webplayer/video thing to use as an iframe embed (I don't know enough about these formats)
Sift through the full API results <https://archive.org/metadata/ela_kabardian_comparative> for that file to find the WAVE?
<https://user-images.githubusercontent.com/4974087/95806714-92727f80-0cc5-11eb-9da1-bf928f5cb8d0.png>
I see how to get the full list <https://archive.org/download/ela_kabardian_comparative/clips%20Kabardian/> as well, but I assume we'd only need one of them?
Here's a full list for another of your audio items as a comparison: https://archive.org/download/mid-2003-06-13c <https://archive.org/download/mid-2003-06-13c>
I'm not sure we'll get away without me doing some kind of check/processing on my end, so I could see this being the flow of code:
You provide me with a consistent URL in Audio column to the wav file or whatever
I check Audio to see if it contains archive.org
If so, parse it out so I can get just the item ID (I don't like this already)
Use the ID to hit their API so that I can get the Description
OR, we treat it as a video/webplayer:
You populate the Video field with the embed URL, e.g. https://archive.org/embed/mid-2003-06-13c <https://archive.org/embed/mid-2003-06-13c>
I follow steps 3-4 above (3 is easier this way)
The obvious drawback of the "video" approach is that you couldn't have audio AND video for the same record.
OR:
you give me the metadata API URL and I sift through the results until I find the WAVE format (assuming it's always there).
I grab the Description while I'm there.
I'm going to be hitting the API no matter what so whatever is cleanest for that. Assuming there is enough consistency of results then I'm sure I could find whatever I need there and deliver it to the user in whatever format you want. Personally I would go with the "video" embed since it has a lot more options and could be useful for those languages that have like 100+ wave files:
<https://user-images.githubusercontent.com/4974087/95808796-7b825c00-0cca-11eb-9088-7c44e3219755.png>
If you've got instances of Video AND Audio though, then would be a harder sell for the embed approach. 🤔
For Future Jason
here is the archive's metadata API: https://blog.archive.org/2013/07/04/metadata-api/ <https://blog.archive.org/2013/07/04/metadata-api/>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#72 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMNKB5F7OXGP5HIXUPBQJG3SKO4MLANCNFSM4QTE3FWQ>.
|
Feel free to start any time, this might be one of the first things I work on since it's fresh in my brain.
If I'm understanding correctly, yes. Archive.org doesn't have more than what I showed in the screenshot I think, but as for YouTube there may be other metadata. You can skim through the options here: https://developers.google.com/youtube/v3/docs/videos#properties Highlights from today's call
Transcript filesUnless these are part of something in the videos or audio files already in existence, I don't think this is an easy option. If you had an external file somewhere, like Dropbox, we'd have to either:
|
@rperlin-ela should I assume that all the archive.org instances will have a playlist? Or maybe a better question is, is this format ok when they don't? I don't think it's harming anything when there's only a single item, and it keeps the UI consistent with the instances which do have playlists. |
Re: additional metadata fields, I think you're also going to want |
Yes, I think what's in there is now just Language, but Title followed by Description would be great. I reckon some archive.org instances will have playlists but not all (as with Youtube). Format from the screenshot seems fine to me and good to keep consistent. I believe the Youtube API is now enabled — let me know if not or if you need anything else from my end. In case it helps I uploaded a new tileset just now with what I think should be the correct embed URL from Archive for Neo-Mandaic. (Sorry if that was jumping the gun—I see for the time being it's giving an error.) |
I think I'm seeing captions on all the videos that have them — check Wakhi for example. So all seems good. But from other experiences with Youtube I wonder if people who haven't enable some kind of setting will see it. I followed what you had in the instructions, so I think all I had done was enable the key. But I see in the dashboard it's failing 100% of the 13 times it was tried. Sorry if this was off base, but something I saw made it seem like I needed to "create credentials", specifying a little further info about the use and then getting the opportunity to restrict key, but that seemed to result in a new API key, which I'm sending you by email. Kind of fumbling around in the dark, but hopefully I can unravel if necessary. |
Yeah all the captions will be there, we might be talking about two different things. You mentioned something about transcripts and in YouTube the transcript is a textual list of placemarks within the video's captions/subtitles I think? Like: or
Yeah if you click the gear icon it has a cc option. This seems to coincide with but remain independent from Anyhoo probably not relevant I guess. As for the new API key, sorry if my instructions steered you off course, I may have missed something as it's hard to know what's on your screen (I have multiple projects in Google so mine might look different). It's probably not a bad idea to have two API keys anyway (one youtube, one sheets), and it looks like the youtube key you just made is working so we should be set but I'll let you know if not! |
All good if the API is working.
My bad, anyway I hear you on the challenges and it being outside of what we’ve talked about… but this Transcript thing could theoretically be pulled via API right under our metadata, highlighting the relevant sentence as the video is playing? Something like that would be much, much better than a downloadable transcript.
… On Oct 14, 2020, at 9:57 PM, Jason Lampel ***@***.***> wrote:
I think I'm seeing captions on all the videos that have them — check Wakhi for example. So all seems good.
Yeah all the captions will be there, we might be talking about two different things. You mentioned something about transcripts and in YouTube the transcript is a textual list of placemarks within the video's captions/subtitles I think? Like:
<https://user-images.githubusercontent.com/4974087/96067083-2ecb8c00-0e56-11eb-846d-58e38027e849.png>
or
<https://user-images.githubusercontent.com/4974087/96067150-51f63b80-0e56-11eb-9d9c-389da5721e7f.png>
But from other experiences with Youtube I wonder if people who haven't enable some kind of setting will see it.
Yeah if you click the gear icon it has a cc option. This seems to coincide with but remain independent from cc subtitles etc.
<https://user-images.githubusercontent.com/4974087/96067228-7fdb8000-0e56-11eb-88fe-b8fc9a5ac858.png>
Anyhoo probably not relevant I guess.
As for the new API key, sorry if my instructions steered you off course, I may have missed something as it's hard to know what's on your screen (I have multiple projects in Google so mine might look different). It's probably not a bad idea to have two API keys anyway (one youtube, one sheets), and it looks like the youtube key you just made is working so we should be set but I'll let you know if not!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#72 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMNKB5FRHBAJXVRCYSHWLMTSKZJIFANCNFSM4QTE3FWQ>.
|
Definitely not going to do that but yes I think with an extra week of time something like that could be achieved. It's just a bunch of data in the API, someone would have to create a very very complex component to sync with the video. Shouldn't have brought it up, it's so far beyond everything else. |
Understood, cool to know about.
… On Oct 14, 2020, at 10:38 PM, Jason Lampel ***@***.***> wrote:
Definitely not going to do that but yes I think with an extra week of time something like that could be achieved. It's just a bunch of data in the API, someone would have to create a very very complex component to sync with the video.
Shouldn't have brought it up, it's so far beyond everything else.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#72 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMNKB5C7EZ7VYXNJK7CMTO3SKZOBZANCNFSM4QTE3FWQ>.
|
If that’s an easy switch to flip, yes by all means
… On Oct 14, 2020, at 10:44 PM, Jason Lampel ***@***.***> wrote:
if it's of use, the cc can be forced on:
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
yeah should be, kinda falls along same lines as how i'm playlistifying the Archive videos. Just make sure to continue to leave all your video URLs as bare as possible. I think these should be the only three scenarios?
I can append parameters to the URL as needed. The approach is kind of fragile since it relies on parsing on my end and consistency on yours, not to mention Google's APIs are kind of volatile. But in lieu of a CMS or additional column in the data, it's probably the best we can do for now. As for embed parameters, let me know if there are any others of interest:
None are difficult to add, it's not part of the API, just some extra cruft to append to the embed URLs. |
I got the single-video API connection wired up, wasn't terribly hard and it definitely looks better than just the language for title. For descrip, you thinking above the video or below? Here it is on laptop: And mobile (whatever that is!): I don't know how long the average descrip is and that might dictate placement a bit, so let me know what you think. |
Thanks, fixing Wakhi for the next upload — I assume all the others you see are ok then?
I’ll look at parameters and try to let you if there’s anything ASAP.
… On Oct 14, 2020, at 10:59 PM, Jason Lampel ***@***.***> wrote:
yeah should be, kinda falls along same lines as how i'm playlistifying the Archive videos. Just make sure to continue to leave all your video URLs as bare as possible. I think these should be the only three scenarios?
https://www.youtube.com/embed/VIDEO_ID
https://www.youtube.com/embed/videoseries?list=PLAYLIST_ID
https://archive.org/embed/ela_kabardian_comparative
I can append parameters to the URL as needed.
The approach is kind of fragile since it relies on parsing on my end and consistency on yours, not to mention Google's APIs are kind of volatile. But in lieu of a CMS or additional column in the data, it's probably the best we can do for now.
As for embed parameters, let me know if there are any others of interest:
YouTube: https://developers.google.com/youtube/player_parameters#cc_lang_pref <https://developers.google.com/youtube/player_parameters#cc_lang_pref>
Archive: https://archive.org/help/audio.php <https://archive.org/help/audio.php>
None are difficult to add, it's not part of the API, just some extra cruft to append to the embed URLs.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#72 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMNKB5BGR4URQ6PDHZ4VPX3SKZQR7ANCNFSM4QTE3FWQ>.
|
Ones I've looked at, yes, but that's something you'll want to QC on your end for this iteration and ongoing. If you're taking notes for the Big Comprehensive Manual, include the formats I mentioned above:
If any other variations are used accidentally, it won't work. If I'm missing any other scenarios, let me know. If Google changes their API, let Google let me know. :) Re: QC in general- I think that's something that could be automated on your end. If you had a separate tab/sheet with formulas pointing to the main source tab, you could use formulas that automatically check for the usual Bad Data suspects:
Just brainstorming but those are several very-real things we've encountered on more than one occasion and it adds time to troubleshooting, communication, and data maintenance, so might be worth pursuing. Could just start with one column at a time to get some practice and see how it goes, and I think it would pay for itself in the long run. |
For description in the video modal, are you thinking of having it above the video or below? See my screenshots a few comments up. |
The way you have it looks good to me: title on top and then description below. Length of Acehnese description is probably representative but it would be good to allow for the possibility of longer ones. Thanks for the QC tips, you're obviously right— I just need someone a little more expert with Google Sheets, will ask. |
(Sorry forgot that playlists and Archive were still to-do!) |
No worries. Have you had a chance to check on those extra parameters yet? YouTube: https://developers.google.com/youtube/player_parameters#cc_lang_pref |
Yeah, didn’t see anything critical — good just forcing on subs.
… On Oct 16, 2020, at 5:11 PM, Jason Lampel ***@***.***> wrote:
No worries. Have you had a chance to check on those extra parameters yet?
YouTube: https://developers.google.com/youtube/player_parameters#cc_lang_pref <https://developers.google.com/youtube/player_parameters#cc_lang_pref>
Archive: https://archive.org/help/audio.php <https://archive.org/help/audio.php>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#72 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMNKB5CTJIS5KVWTS4KHLNTSLCZG5ANCNFSM4QTE3FWQ>.
|
Sounds good. QuestionsYoutube playlists meta titleThe title differs from the video titles, so should there be some indication that it's a playlist? Aside from the playlist icon in the top-right, it's not immediately obvious that the title refers to a playlist: Could be something subtle like this: Maybe the ID/format issue?I'm wondering if maybe this is a byproduct of standardizing the URLs, or a true typo, but this one isn't working (Neo-Aramaic). Your syntax is fine, just like I requested it: https://www.youtube.com/embed/videoseries?list=PLcXFPx-z7B0pfN0ZhGj1bVpE9foHBR1nC But that link doesn't work. If I found the correct playlist, then the URL would be this: https://www.youtube.com/embed/videoseries?list=PL2BF759B18CCE5DD2 |
I'd like to add some basic error catching for non-existent videos and playlists in case we come across another one, but I'm not sure which scenario to use:
Option 1 could just show "video not found" or something and not even show the player, while Option 2 (which I'm currently using) shows this: followed by this if i try to play the video: If my URL parsing (to extract the playlist/video ID) is accurate for one video and one playlist, then it should be accurate for all of them, so I'm leaning towards assuming that if the API doesn't find anything for the given video/playlist ID, then your URL must be incorrect. On my end (the code side, not what the user sees) I set up a Sentry message to let us know if the YouTube API returned nothing for that URL. This doesn't tell me whether it's my fault or yours, but at least it will silently tell us which URL failed without anything crashing on the user. |
If you want to see what it looks like, here is the deploy: https://deploy-preview-116--languagemapping.netlify.app/details?id=645 archive.org stuff not ready but YT vids and playlists are working. |
All looking good. Error catching sounds good, go with your judgment on it.
Yep, good call, this gives a whole new visibility to playlists, so I will work on filling in all the description for playlists, don't think anything additional is needed.
Fixed now — link was right, but playlist was temporarily/inadvertently private. |
Error handlingit was naive of me to ask "can we rely on 100% accurate data". the answer to
Great?Yes it is. Big softball thrown in your direction to deal with video errors Hopefully there's a way in Sentry to pick and choose which Closed captioning...cannot be forced on for videos or playlists which don't already have it, including those where the cc is auto-generated. QuestionsHTML description support?i turned it on since i saw a Next stepsSentry alertsWill get these wired up in the UI tomorrow or soon, otherwise kind of a waste of time if I have to forward the emails to you about data stuff. I also have it broken down into production and deploy environments, so you can see if they occurred during our internal testing/PR stuff or live site. I haven't tested it with much besides local dev so far, there's a lot of moving parts to all the error stuff so fingers crossed. it's working great on my local env though so no reason shouldn't be same in The Outside World. Also can't really test much without a known error (i was leaning on your broken one to test), meaning i can't test the Sentry stuff including alerts either. might have to do a dummy one just so the setup doesn't get left in the dust until a real error happens. More things I'm missing when...it's not 12:54am? Most likely. |
Ok, sounds good about error handling. No problem about closed captioning. HTML from Archive not important, better to stay consistent.
Sorry if I wasn’t clear — I don’t think we need the word “Playlist” for the playlists. I will make sure the descriptions start with “A collection of…”
I seem to getting be an error now with the deploy when I go to play any (non-playlist) video. Playlists and Archive looking good tho.
… On Oct 20, 2020, at 2:55 AM, Jason Lampel ***@***.***> wrote:
Error handling
it was naive of me to ask "can we rely on 100% accurate data". the answer to
that question in the absence of validation and QC/QA, is always NO when
humans are responsible for entering the data. so, with that in mind i took
Sentry up a couple notches with the err handling now catching 3 scenarios:
Incorrect URL format, which needs to be one of the three formats we
discussed: YouTube playlist, embed, or Internet Archive embed. Could be a
typo on your end, for example.
Format was fine but no video/embed/etc. was found, e.g. your Neo-Aramaic
General fetch failure catch-all for the scary unknown scenarios.
Great?
Yes it is. Big softball thrown in your direction to deal with video errors
falling in one of those 3 scenarios. I will use the Sentry info I added today
("tags" relevant to the scenarios) to create an alert so you'll get notified if
someone opens a bogus video URL. You will definitely want to subscribe to those alerts.
Hopefully there's a way in Sentry to pick and choose which
alerts (instead of the full mess of error events) but it's worth it either way.
If you don't want them to go to your shared ELA acct then we can add another
acct as a separate user, but this will save me the trouble of notifying you any
time there is a video issue. It's already an automated process so might as well
take advantage of it.
Closed captioning
...cannot be forced on for videos or playlists which don't already have it, including those where the cc is auto-generated.
Questions
HTML description support?
i turned it on since i saw a <span> in one of your Internet Archive examples but might be a good idea if i disable it since there's no guarantee it will match our UI:
<https://user-images.githubusercontent.com/4974087/96547587-98d69d80-1269-11eb-8317-ca0e98312dac.png>
Next steps
Sentry alerts
Will get these wired up in the UI tomorrow or soon, otherwise kind of a waste of time if I have to forward the emails to you about data stuff. I also have it broken down into production and deploy environments, so you can see if they occurred during our internal testing/PR stuff or live site.
I haven't tested it with much besides local dev so far, there's a lot of moving parts to all the error stuff so fingers crossed. it's working great on my local env though so no reason shouldn't be same in The Outside World.
Also can't really test much without a known error (i was leaning on your broken one to test), meaning i can't test the Sentry stuff including alerts either. might have to do a dummy one just so the setup doesn't get left in the dust until a real error happens.
More things I'm missing when
...it's not 12:54am? Most likely.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#72 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMNKB5HTDZAMMUQMUARMRJ3SLUX7DANCNFSM4QTE3FWQ>.
|
Ok I will remove it, just make sure to update your Archive descrips otherwise you'll see the HTML as HTML.
No worries, I'll remove it.
Is it like this? If so, it's happening regardless of where it's played, e.g. paste the URL into browser bar vs in the project, but if I remove
I'm wondering if it has something to do with not setting the language via URL? I'm assuming it defaults to How important is it to force the |
Messing with the iframe parameters has nothing to do with the YouTube API or the metadata we are adding, so in order to stay focused on the SOW I'm going to say let's drop the |
Ok, all sounds good.
Yes, that’s the error. Yep, if cc_load_policy is causing any trouble, let’s remove it, no problem, no questions asked. It rings a bell that this is something no longer supported— that’s why we used to tag for this in the past but stopped doing so. A miracle way to do this via API parameter seemed too good to be true.
… On Oct 20, 2020, at 4:54 PM, Jason Lampel ***@***.***> wrote:
Messing with the iframe parameters has nothing to do with the YouTube API or the metadata we are adding, so in order to stay focused on the SOW I'm going to say let's drop the cc efforts for now. If it's important to you then feel free to create a wishlist issue.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#72 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMNKB5DO4EFEFFSSK3TU4LDSLX2HLANCNFSM4QTE3FWQ>.
|
Sooo guess what. The problem was actually from the cc_load_policy doesn't break anything if I remove that parameter. |
Description
We'd like to import metadata automatically from Youtube and/or Archive.org and show it under the language name above the audio/video embed in the dialog. Probably the most important thing would be what both Youtube and Archive.org call the Description, but maybe a few other fields like rights, attribution. We also have this info in spreadsheet form if that’s easier.
Resolution
@abettermap thinks Youtube may have an API we can use
The text was updated successfully, but these errors were encountered: