-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backward compatible video transcripts export #138
Conversation
6c94ced
to
095f5cf
Compare
@muhammad-ammar this is ready for your review. Please take a look :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Qubad786 I am done with first pass of the code. I am unable to see where we are creating transcript files with old naming convention?
for video_transcript in video_transcripts: | ||
if video_transcript.language_code not in exported_language_codes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any idea what was the purpose of this if condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just an extra/unneeded safe check.
edxval/exceptions.py
Outdated
@@ -62,3 +62,10 @@ class InvalidTranscriptProvider(ValError): | |||
This error is raised when an transcript provider is not supported | |||
""" | |||
pass | |||
|
|||
|
|||
class TranscriptsGenerationException(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also inherit it from ValError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
edxval/transcript_utils.py
Outdated
|
||
Arguments: | ||
sjson_subs (dict): `sjson` subs. | ||
speed (float): speed of `sjson_subs`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are not using speed
anywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove that.
edxval/transcript_utils.py
Outdated
return sjson_subs | ||
|
||
|
||
class Transcript(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not move the generate_srt_from_sjson
and generate_sjson_from_srt
functions inside the Transcript
and make them class methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just felt right. I can move them into Transcript
.
edxval/transcript_utils.py
Outdated
SJSON = 'sjson' | ||
|
||
@staticmethod | ||
def convert(content, input_format, output_format): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not make this class method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there need for this?
"At the left we can see..." | ||
] | ||
} | ||
""") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add non-english text in sjson_transcript
and srt_transcript
above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
""" | ||
invalid_srt_transcript = 'invalid SubRip file content' | ||
with self.assertRaises(TranscriptsGenerationException): | ||
Transcript.convert(invalid_srt_transcript, 'srt', 'sjson') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also need to add tests for invalid input and output formats to verify that asserts are raised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
@@ -843,21 +865,26 @@ def create_transcript_file(video_id, language_code, file_format, resource_fs, st | |||
static_dir (str): The Directory to store transcript file. | |||
resource_fs (SubFS): The file system to store transcripts. | |||
""" | |||
transcript_name = u'{video_id}-{language_code}.{file_format}'.format( | |||
transcript_filename = '{video_id}-{language_code}.srt'.format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the reason for SRT
always?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we decided to create transcripts with name pattern like we were doing in the old code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From now on transcripts will be exported in SRT format regardless of their original format. These filenames are also going to set on on self.transcripts
which must only conatain SRT transcript filenames. Previously, we decided on not to moving transcript conversion utils into edxval but now bacward comp. cannot be achieved without it moving into edxval.
edxval/tests/test_api.py
Outdated
) | ||
|
||
self.assert_xml_equal(exported_metadata['xml'], expected) | ||
self.assertItemsEqual(exported_metadata['transcripts'], ['en', 'de']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported_metadata['transcripts'] is a dict and here we are comparing it with a list? How this is working? am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it, by default, gets compared with the keys. I will make this explicit as well.
) | ||
|
||
with self.file_system.open(combine(constants.EXPORT_IMPORT_STATIC_DIR, transcript_file_name), 'wb') as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why changes in this test? why we removed the create_file_in_fs
call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed a file that is encoded with non utf-8
, while create_file_in_fs
creates the file with utf-8
encoded content.
@muhammad-ammar feedback addressed. |
@muhammad-ammar I have made it simple. we can achieve the purpose without the old naming conventions, we just need to put transcript filename in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
- Transcript files are exported into course OLX in .srt format. - Transcript language to filename maps is returned with xml, so that, it can be used by platform in old metadata fields for backward compatiblilty. - Add/fix tests bump VAL version
c39974f
to
0bbb8af
Compare
EDUCATOR-2914
Export transcripts metadata along with xml …
.SRT
format.