Backward compatible video transcripts export #138

Qubad786 · 2018-05-21T13:40:43Z

EDUCATOR-2914

Export transcripts metadata along with xml …

Transcript files are exported into course OLX in .SRT format.
Transcript language to filename maps is also returned along with xml, so that, it can be used by platform to update old metadata fields for backward compatibility.
fix tests

Qubad786 · 2018-05-24T11:16:39Z

@muhammad-ammar this is ready for your review. Please take a look :)

muhammad-ammar

@Qubad786 I am done with first pass of the code. I am unable to see where we are creating transcript files with old naming convention?

muhammad-ammar · 2018-05-25T07:26:17Z

edxval/api.py

    for video_transcript in video_transcripts:
-        if video_transcript.language_code not in exported_language_codes:


any idea what was the purpose of this if condition?

It was just an extra/unneeded safe check.

muhammad-ammar · 2018-05-25T07:30:02Z

edxval/exceptions.py

@@ -62,3 +62,10 @@ class InvalidTranscriptProvider(ValError):
    This error is raised when an transcript provider is not supported
    """
    pass
+
+
+class TranscriptsGenerationException(Exception):


Should we also inherit it from ValError?

muhammad-ammar · 2018-05-25T07:33:17Z

edxval/transcript_utils.py

+
+    Arguments:
+        sjson_subs (dict): `sjson` subs.
+        speed (float): speed of `sjson_subs`.


we are not using speed anywhere

will remove that.

muhammad-ammar · 2018-05-25T07:35:44Z

edxval/transcript_utils.py

+    return sjson_subs
+
+
+class Transcript(object):


why not move the generate_srt_from_sjson and generate_sjson_from_srt functions inside the Transcript and make them class methods?

It just felt right. I can move them into Transcript.

muhammad-ammar · 2018-05-25T07:36:10Z

edxval/transcript_utils.py

+    SJSON = 'sjson'
+
+    @staticmethod
+    def convert(content, input_format, output_format):


why not make this class method?

Is there need for this?

muhammad-ammar · 2018-05-25T07:38:20Z

edxval/tests/test_transcript_utils.py

+                    "At the left we can see..."
+                ]
+            }
+        """)


Can we add non-english text in sjson_transcript and srt_transcript above?

muhammad-ammar · 2018-05-25T07:40:18Z

edxval/tests/test_transcript_utils.py

+        """
+        invalid_srt_transcript = 'invalid SubRip file content'
+        with self.assertRaises(TranscriptsGenerationException):
+            Transcript.convert(invalid_srt_transcript, 'srt', 'sjson')


we also need to add tests for invalid input and output formats to verify that asserts are raised.

muhammad-ammar · 2018-05-25T07:44:20Z

edxval/api.py

@@ -843,21 +865,26 @@ def create_transcript_file(video_id, language_code, file_format, resource_fs, st
        static_dir (str): The Directory to store transcript file.
        resource_fs (SubFS): The file system to store transcripts.
    """
-    transcript_name = u'{video_id}-{language_code}.{file_format}'.format(
+    transcript_filename = '{video_id}-{language_code}.srt'.format(


what is the reason for SRT always?

I think we decided to create transcripts with name pattern like we were doing in the old code?

From now on transcripts will be exported in SRT format regardless of their original format. These filenames are also going to set on on self.transcripts which must only conatain SRT transcript filenames. Previously, we decided on not to moving transcript conversion utils into edxval but now bacward comp. cannot be achieved without it moving into edxval.

muhammad-ammar · 2018-05-25T09:18:39Z

edxval/tests/test_api.py

        )

+        self.assert_xml_equal(exported_metadata['xml'], expected)
+        self.assertItemsEqual(exported_metadata['transcripts'], ['en', 'de'])


exported_metadata['transcripts'] is a dict and here we are comparing it with a list? How this is working? am I missing something?

it, by default, gets compared with the keys. I will make this explicit as well.

muhammad-ammar · 2018-05-25T09:23:00Z

edxval/tests/test_api.py

        )

+        with self.file_system.open(combine(constants.EXPORT_IMPORT_STATIC_DIR, transcript_file_name), 'wb') as f:


why changes in this test? why we removed the create_file_in_fs call?

I needed a file that is encoded with non utf-8, while create_file_in_fs creates the file with utf-8 encoded content.

Qubad786 · 2018-05-25T10:25:13Z

@muhammad-ammar feedback addressed.

Qubad786 · 2018-05-25T10:30:36Z

I am unable to see where we are creating transcript files with old naming convention?

@muhammad-ammar I have made it simple. we can achieve the purpose without the old naming conventions, we just need to put transcript filename in video_module.transcripts and it will be picked from contentstore.

muhammad-ammar

👍

- Transcript files are exported into course OLX in .srt format. - Transcript language to filename maps is returned with xml, so that, it can be used by platform in old metadata fields for backward compatiblilty. - Add/fix tests bump VAL version

Qubad786 changed the title ~~Backward compatible video transcripts export~~ [WIP] Backward compatible video transcripts export May 21, 2018

Qubad786 force-pushed the mrehan/backward-transcript-export branch from 6c94ced to 095f5cf Compare May 24, 2018 11:10

Qubad786 changed the title ~~[WIP] Backward compatible video transcripts export~~ Backward compatible video transcripts export May 24, 2018

Qubad786 requested a review from muhammad-ammar May 24, 2018 11:17

muhammad-ammar requested changes May 25, 2018

View reviewed changes

muhammad-ammar reviewed May 25, 2018

View reviewed changes

Qubad786 force-pushed the mrehan/backward-transcript-export branch from c39974f to 0bbb8af Compare May 25, 2018 11:35

Qubad786 merged commit 2830e2e into master May 25, 2018

Qubad786 deleted the mrehan/backward-transcript-export branch May 25, 2018 11:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backward compatible video transcripts export #138

Backward compatible video transcripts export #138

Qubad786 commented May 21, 2018 •

edited

Loading

Qubad786 commented May 24, 2018

muhammad-ammar left a comment

muhammad-ammar May 25, 2018

Qubad786 May 25, 2018

muhammad-ammar May 25, 2018

Qubad786 May 25, 2018

muhammad-ammar May 25, 2018

Qubad786 May 25, 2018

muhammad-ammar May 25, 2018

Qubad786 May 25, 2018

muhammad-ammar May 25, 2018

Qubad786 May 25, 2018

muhammad-ammar May 25, 2018

Qubad786 May 25, 2018

muhammad-ammar May 25, 2018

Qubad786 May 25, 2018

muhammad-ammar May 25, 2018

muhammad-ammar May 25, 2018

Qubad786 May 25, 2018 •

edited

Loading

muhammad-ammar May 25, 2018

Qubad786 May 25, 2018

muhammad-ammar May 25, 2018

Qubad786 May 25, 2018 •

edited

Loading

Qubad786 commented May 25, 2018

Qubad786 commented May 25, 2018

muhammad-ammar left a comment

		for video_transcript in video_transcripts:
		if video_transcript.language_code not in exported_language_codes:

		)

		with self.file_system.open(combine(constants.EXPORT_IMPORT_STATIC_DIR, transcript_file_name), 'wb') as f:

Backward compatible video transcripts export #138

Backward compatible video transcripts export #138

Conversation

Qubad786 commented May 21, 2018 • edited Loading

Export transcripts metadata along with xml …

Qubad786 commented May 24, 2018

muhammad-ammar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Qubad786 May 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Qubad786 May 25, 2018 • edited Loading

Choose a reason for hiding this comment

Qubad786 commented May 25, 2018

Qubad786 commented May 25, 2018

muhammad-ammar left a comment

Choose a reason for hiding this comment

Qubad786 commented May 21, 2018 •

edited

Loading

Qubad786 May 25, 2018 •

edited

Loading

Qubad786 May 25, 2018 •

edited

Loading