podcast:transcript - "generator" and "optout" #458
Replies: 14 comments
-
These feel like common sense suggestions to me and, as the properties all have sane defaults when not present I don’t see any issues adding them. The only comment would be about ‘optout’ being a unary value. I’ve heard that isn’t great for xml parsers. So we might need it to be something like: optout=“true”. |
Beta Was this translation helpful? Give feedback.
-
@daveajones that's right, unlike HTML, XML attributes must have values. |
Beta Was this translation helpful? Give feedback.
-
With this proposal, how do I signal that it is OK to make and use a copy of the transcript? |
Beta Was this translation helpful? Give feedback.
-
I like this proposal, but, like with every new feature, it's important to think about added complexity and how everything fits in with existing standards. A couple of thoughts:
|
Beta Was this translation helpful? Give feedback.
-
Seeing as these changes are being proposed by a transcript company, how are their concerns addressed with this proposal? What is their exact issue? This knowledge may help the conversation |
Beta Was this translation helpful? Give feedback.
-
I’m very in favor of the generator property. That would be great for setting quality expectations in the UI. Also in favor of clear licensing rather than just an opt-out property. Ideally, podcasters could require that any third-party transcripts are made available under a share-alike condition, and the transcriber could use the podcast:events tag to provide the transcript back to the host for inclusion in the RSS feed. |
Beta Was this translation helpful? Give feedback.
-
@adamc199 wrote: "Seeing as these changes are being proposed by a transcript company" My bad - they're not. They're being proposed by me, after hearing what a transcript company is planning, partially to ensure that the transcript company doesn't produce transcripts for shows that have them already in their feeds.
The default has always been that people can make a transcript of your show and use it however you like. So, don't change anything if you're cool with that. That is the default behaviour. @joksas says: "it's important to think about added complexity and how everything fits in with existing standards" I agree, and hope that this doesn't change the existing standard, and doesn't add complexity unless podcasters want it. These are optional and backwards-compatible.
Not sure that a licence offers the specific use-case of "don't make a transcript for me". You can't licence something that doesn't exist, after all. But if a licence (which adds complexity!) is the right way to indicate how third-party companies might generate a transcript, I'd be keen to work out how that works. |
Beta Was this translation helpful? Give feedback.
-
Listening to the board meeting last week, perhaps there's a bit of misunderstanding about what I'm trying to do. tl:dr; I spoke with a company, some of their ideas sound shitty, and I want to at least build something into the standards that lets creators signal that they don't want a part of it. I'm not asking for a licence in my proposal. The opt-out proposal is to stop a third party doing something with my stuff if I don't want them to. It probably won't win any fans with the transcription company I spoke with: this proposal designed to stop them messing around with my stuff, or at the very base level, designed to tell them I do not want them to mess with my stuff. The As @adamc199 notes in the board meeting: he allows people to do anything with his content. That's cool. The proposal allows that by default (simply don't opt out). The proposal also notes that opting out of other people making transcripts for you is not recommended. I would actually hope that podcast apps penalise shows that deliberately do not produce a transcript and stop others from producing one. I would hope that everyone who can't be bothered to make a transcript would not stop a third-party company making a transcript for them. But, in my world at least, the creator is king, and however a third-party transcription company wants to build a product, I - as the creator and the king - want to be able to tell them to not go anywhere near my stuff, if I want to. (That wouldn't stop me from making a specific agreement with a specific company, of course. Again, it doesn't get in the way of anything a creator wants to do. But it is creator-first, and I have no interest nor wish to do anything which isn't.) Does that offer some clarification? |
Beta Was this translation helpful? Give feedback.
-
Question from the above... If a creator only makes an automated transcript, but a third-party wishes to make a better, human-edited version of that transcript, the above proposal breaks this. Is that good? Or do we want to make some form of "derivatives OK" marker for the transcript? (Yes, that's a licence. Shudder.) Or is that something that should be opt-in, and therefore kept separate from a specification? (Typically, when you prohibit something, you can also sign a contract with a company to allow them to do that thing - copyright law says I can't broadcast INXS songs on the radio, but I can also get a licence with their rights holders to do that). Thinking aloud - perhaps that needs clarifying in the proposal: that this is a default position, and does not stop individual agreements being made. Or, do we want the creator to be able to signal "if you think you can make this work better, make a derivative work that is better, no need to ask" and in which case, perhaps we need an additional field to make that clear. Maybe we define a default licence - BY-NC-ND - and then allow creators to overwrite that with more permissive licences. (BY - attribution required; NC - no commercial use, which we need to better define; ND - no derivatives). |
Beta Was this translation helpful? Give feedback.
-
If you as creator use auto-generated transcripts, I don't think it's likely you would 'opt out' of others doing the same thing. And if you do, and a person wants to put the work into it, that person can probably also put up with the effort of contacting you. Also because they'll probably want to distribute their work somewhere - either in the source feed (they need to contact you anyway) or in their own app/YT channel a) with ulterior motives (which is where a license could come in, but it's not super likely to happen except for truly community driven shows like no agenda) or b) with commercial motives (which is where it would get more complicated and the proposed default license wouldn't work). TL;DR – I would personally keep it simple and stay away from the licence. Unless the same is implemented for #177. |
Beta Was this translation helpful? Give feedback.
-
I wonder if a different approach might be better. In what circumstance would someone add a transcript to their podcast and want an app or service to ignore that transcript and use their own? It seems like it makes the most since to treat the tag as the authority: if it's present, use this transcript no matter what. And then to address the automated-transcript concern, I wonder if that's something that should be in the |
Beta Was this translation helpful? Give feedback.
-
The original proposal was
I would not expect that use-case either; and not sure we need to add to the specification for it. |
Beta Was this translation helpful? Give feedback.
-
I had consulted with a lawyer last year for my transcription service, and his opinion on the issue was that this needs to be "Opt In" rather than "Opt Out". That is, he advises getting the express permission of the copyright holders in order to legally create/publish/sell our own independent transcript of their work. Although it would be understandably tempting to make this an "Opt out" tag, we unfortunately can't just assume our rights into existence where there were none to begin with. Two examples illustrating the potential for legal problems:
I am aware there are now a few podcast apps out there offering to generate transcripts for any podcast that doesn't already have one, and in one case selling those transcripts, but from what I understand, this is legally questionable. If it were clearly legal, I probably would have started doing the same last year with my own platform, but as there do appear to be real legal concerns here, I am taking my time to make sure I do it the right way, where transcripts are first and foremost owned by the copyright holders, and they are the ones who decide whether or not transcripts should be published through some explicit and intentional action on their part. (Of course I'll share what I'm doing once I have details to share.) Some online legal discussions covering this topic:
If there were an opt-in in the specification, its main use case would be to permit automated generation of transcripts by third parties on a large scale. Since automated tools can't interpret the contents of licenses, including the license as part of this tag wouldn't actually be helpful for the use case. One legal interpretation could be that while this tag isn't itself a license, it is something which, if a podcast creator "opts in" via this tag, it would be reasonable for automated tools to assume that the value of this tag is at least consistent with the podcast creator's full license. Of course most of podcasting has traditionally followed an implicit license model, and so in that tradition, if a podcast creator doesn't actually publish a license but does explicitly opt in via this tag, I think it would be reasonable for podcast apps to assume a license to do the thing this tag was intended to permit. |
Beta Was this translation helpful? Give feedback.
-
I think the "generator" attribute would be really useful, but not only to know the simple binary of whether it was human or machine generated (because it could be "in between"), but also to indicate the name of the service/generator rather than simply "auto" (because different machine-transcribed services can have much higher quality than others). For my use case, the quality of the transcripts is the most important factor that I need to ascertain, and given that 99.99% of all transcripts will be auto generated, it would be a waste to have an attribute that would virtually only be used to encode one value. Alternatively, we could have something akin to a User-Agent string where a version number could also be included in addition to some slug for the name of the service. Applications would be able to do various interesting kinds of logic based on service providers that would also allow different services to interoperate in interesting ways. |
Beta Was this translation helpful? Give feedback.
-
Having chatted with a transcript company today, I would like to propose a few small changes.
The current specification is (mainly):
I'd like to propose two new tag values, and one general statement.
Creator transcripts may not be edited nor replaced
If any type of podcast transcript is supplied in an RSS feed, an app or a service using that RSS feed MUST NOT produce its own publicly-accessible transcript, nor alter the content of the supplied transcript in any material way (other than reformatting).
An app or service MAY take an un time-synchronised transcript and synchronise it to the audio using machine learning techniques, providing the transcript is not otherwise altered in any way.
A transcript is a derivative work and is subject to specific copyright rules. By supplying a transcript, a creator notifies that this is the official transcript of their creative work, and gives no licence to produce a further derivative version.
This does not preclude an app producing an alternative private transcript for topic extraction or search optimisation, provided that this transcript is not available to the public.
Generator
Automated transcriptions are a good start, but are often wrong. A legally dubious phrase may be unwittingly produced by the AI generator which may place a podcaster or a podcast app into legal trouble. One not very serious example: "Joe really hit the miners hard" is fine in the context of a news story about striking mineworkers, but "Joe really hit the minors hard" could be seen as an accusation of child cruelty. Both sentences sound identical.
This optional value would identify how the transcription was generated.
It has three values:
An app may decide to with-hold the display of automated transcripts for legal reasons; or may flag that "this transcript was produced automatically and may be inaccurate".
optout
Transcripts may be generated by a third-party company or a podcast app.
A podcaster may wish to opt out of a third-party transcript for legal or creative reasons.
The preferred way for a podcaster to control their creative work is to produce a transcript of their work themselves, which will be used. However, a podcaster may also choose to opt out of transcript generation entirely.
If a podcast transcript tag only contains an optout value, this is a signal from the creator that an app or service may not produce a transcription of its own.
This is not recommended, and may result in a podcast app not carrying this podcast.
Beta Was this translation helpful? Give feedback.
All reactions