Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backward compatible video transcripts export #138

Merged
merged 1 commit into from
May 25, 2018

Conversation

Qubad786
Copy link
Contributor

@Qubad786 Qubad786 commented May 21, 2018

EDUCATOR-2914

Export transcripts metadata along with xml …

  • Transcript files are exported into course OLX in .SRT format.
  • Transcript language to filename maps is also returned along with xml, so that, it can be used by platform to update old metadata fields for backward compatibility.
  • fix tests

@Qubad786 Qubad786 changed the title Backward compatible video transcripts export [WIP] Backward compatible video transcripts export May 21, 2018
@Qubad786 Qubad786 force-pushed the mrehan/backward-transcript-export branch from 6c94ced to 095f5cf Compare May 24, 2018 11:10
@Qubad786 Qubad786 changed the title [WIP] Backward compatible video transcripts export Backward compatible video transcripts export May 24, 2018
@Qubad786
Copy link
Contributor Author

@muhammad-ammar this is ready for your review. Please take a look :)

Copy link
Contributor

@muhammad-ammar muhammad-ammar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Qubad786 I am done with first pass of the code. I am unable to see where we are creating transcript files with old naming convention?

for video_transcript in video_transcripts:
if video_transcript.language_code not in exported_language_codes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any idea what was the purpose of this if condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just an extra/unneeded safe check.

@@ -62,3 +62,10 @@ class InvalidTranscriptProvider(ValError):
This error is raised when an transcript provider is not supported
"""
pass


class TranscriptsGenerationException(Exception):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also inherit it from ValError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.


Arguments:
sjson_subs (dict): `sjson` subs.
speed (float): speed of `sjson_subs`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are not using speed anywhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove that.

return sjson_subs


class Transcript(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not move the generate_srt_from_sjson and generate_sjson_from_srt functions inside the Transcript and make them class methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just felt right. I can move them into Transcript.

SJSON = 'sjson'

@staticmethod
def convert(content, input_format, output_format):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make this class method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there need for this?

"At the left we can see..."
]
}
""")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add non-english text in sjson_transcript and srt_transcript above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

"""
invalid_srt_transcript = 'invalid SubRip file content'
with self.assertRaises(TranscriptsGenerationException):
Transcript.convert(invalid_srt_transcript, 'srt', 'sjson')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need to add tests for invalid input and output formats to verify that asserts are raised.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@@ -843,21 +865,26 @@ def create_transcript_file(video_id, language_code, file_format, resource_fs, st
static_dir (str): The Directory to store transcript file.
resource_fs (SubFS): The file system to store transcripts.
"""
transcript_name = u'{video_id}-{language_code}.{file_format}'.format(
transcript_filename = '{video_id}-{language_code}.srt'.format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reason for SRT always?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we decided to create transcripts with name pattern like we were doing in the old code?

Copy link
Contributor Author

@Qubad786 Qubad786 May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From now on transcripts will be exported in SRT format regardless of their original format. These filenames are also going to set on on self.transcripts which must only conatain SRT transcript filenames. Previously, we decided on not to moving transcript conversion utils into edxval but now bacward comp. cannot be achieved without it moving into edxval.

)

self.assert_xml_equal(exported_metadata['xml'], expected)
self.assertItemsEqual(exported_metadata['transcripts'], ['en', 'de'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported_metadata['transcripts'] is a dict and here we are comparing it with a list? How this is working? am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it, by default, gets compared with the keys. I will make this explicit as well.

)

with self.file_system.open(combine(constants.EXPORT_IMPORT_STATIC_DIR, transcript_file_name), 'wb') as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why changes in this test? why we removed the create_file_in_fs call?

Copy link
Contributor Author

@Qubad786 Qubad786 May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed a file that is encoded with non utf-8, while create_file_in_fs creates the file with utf-8 encoded content.

@Qubad786
Copy link
Contributor Author

@muhammad-ammar feedback addressed.

@Qubad786
Copy link
Contributor Author

I am unable to see where we are creating transcript files with old naming convention?

@muhammad-ammar I have made it simple. we can achieve the purpose without the old naming conventions, we just need to put transcript filename in video_module.transcripts and it will be picked from contentstore.

Copy link
Contributor

@muhammad-ammar muhammad-ammar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

 - Transcript files are exported into course OLX in .srt format.
 - Transcript language to filename maps is returned with xml, so that, it can be used by platform in old metadata fields for backward compatiblilty.
 - Add/fix tests

bump VAL version
@Qubad786 Qubad786 force-pushed the mrehan/backward-transcript-export branch from c39974f to 0bbb8af Compare May 25, 2018 11:35
@Qubad786 Qubad786 merged commit 2830e2e into master May 25, 2018
@Qubad786 Qubad786 deleted the mrehan/backward-transcript-export branch May 25, 2018 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants