Replies: 4 comments 1 reply
-
Here's one proposal for indicating it on transcripts: #458. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Beta Was this translation helpful? Give feedback.
0 replies
-
Since what we call the tag doesn't matter, I suggest 🤣 |
Beta Was this translation helpful? Give feedback.
0 replies
-
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
With the increased availability of both commercial and open source AI models that are capable of generating text, images, audio and transcripts, I'd like to start a discussion (and later, a concrete proposal) on flags we can introduce at various points in the spec to clearly label content that has been generated via AI.
Two key examples are:
Use cases: Listeners may simply want to be informed of what they're getting, and whether it is authentic or generated, or to make a judgement of the quality or reliability of the content. Software developers also will want to build tools that can filter based on this information. For example, the average listener might want a search engine that can filter out podcasts generated by AI, or a deaf listener may want to filter their search results to that they only get shown news sources that come with high quality transcripts edited by real humans rather than AI generated transcripts (which in many non-English languages have atrocious quality.)
Other examples of content that may be generated by AI include chapters, titles, description and images. A few of these might be considered inconsequential, but it could still be informative to know if they were generated by an AI.
We might also want to indicate which AI produced the content. For example, whether the transcript was generated by Otter.ai or Whisper, or some other tool, rather than embed "Transcribed by Otter.ai" into the transcript itself(*).
(*) Which, it has to be said, ruins the transcript. Transcription services are adding the authorship or the generation tool as a regular line in the transcript itself because we currently don't have any other place to put it or any standard way to tag it. Someone might be searching for marine biology podcasts covering otters and probably doesn't want all podcasts transcribed by otter.ai to show up as matches.
If you have any thoughts on how this could be done (or whether this should be done) for each different type of content, please share below.
Beta Was this translation helpful? Give feedback.
All reactions