Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(srt): reader adds newline to multi-line cues #435

Merged

Conversation

lideen
Copy link
Contributor

@lideen lideen commented Nov 27, 2024

I noticed when converting SRT to TTML that multi-line cues had an initial line break (<br/>):

Input:

1
101:00:00,000 --> 101:00:01,000
Hello

2
101:00:01,000 --> 101:00:02,000
Hello
World

Output:

<tt xmlns="http://www.w3.org/ns/ttml" xmlns:tts="http://www.w3.org/ns/ttml#styling" xml:lang="">
    <head>
        <layout>
            <region xml:id="r1" tts:color="#ffffff" tts:displayAlign="after" tts:extent="90% 90%"
                    tts:fontFamily="&quot;Verdana&quot;, &quot;Arial&quot;, &quot;Tiresias&quot;, sansSerif"
                    tts:fontSize="80%" tts:lineHeight="125%" tts:origin="5% 5%" tts:textAlign="center"
                    tts:textOutline="#000000 5%"/>
        </layout>
    </head>
    <body region="r1">
        <div>
            <p begin="101:00:00.000" end="101:00:01.000">
                <span>Hello</span>
            </p>
            <p begin="101:00:01.000" end="101:00:02.000">
                <br/>
                <span>Hello</span>
                <br/>
                <span>World</span>
            </p>
        </div>
    </body>
</tt>

I tried to fix this in this PR, with these changes the output for the same file is:

<tt xmlns="http://www.w3.org/ns/ttml" xmlns:tts="http://www.w3.org/ns/ttml#styling" xml:lang="">
    <head>
        <layout>
            <region xml:id="r1" tts:color="#ffffff" tts:displayAlign="after" tts:extent="90% 90%"
                    tts:fontFamily="&quot;Verdana&quot;, &quot;Arial&quot;, &quot;Tiresias&quot;, sansSerif"
                    tts:fontSize="80%" tts:lineHeight="125%" tts:origin="5% 5%" tts:textAlign="center"
                    tts:textOutline="#000000 5%"/>
        </layout>
    </head>
    <body region="r1">
        <div>
            <p begin="101:00:00.000" end="101:00:01.000">
                <span>Hello</span>
            </p>
            <p begin="101:00:01.000" end="101:00:02.000">
                <span>Hello</span>
                <br/>
                <span>World</span>
            </p>
        </div>
    </body>
</tt>

Closes #436

@palemieux
Copy link
Contributor

Can you send the problematic file as a zip (to preserve EOLs)?

@lideen
Copy link
Contributor Author

lideen commented Nov 27, 2024

Can you send the problematic file as a zip (to preserve EOLs)?

Here it is

Interview.srt.zip

doc = to_model(f)

self.assertIsInstance(
doc.get_body().first_child().first_child().first_child(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest expanding the test to make sure the two lines contain Hello and World.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, updated test.

doc = to_model(f)

self.assertIsInstance(
doc.get_body().first_child().first_child().first_child(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest expanding the test to make sure there is only line that contains Hello.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, updated test.

@palemieux
Copy link
Contributor

Thanks for the catch. The fix looks good. Just two suggestions on the tests.

@lideen lideen force-pushed the fix/srt-multiline-cues-starts-with-newline branch from 8ee6f47 to 67443e2 Compare November 28, 2024 07:34
@palemieux palemieux merged commit 1a8c6a5 into sandflow:master Nov 28, 2024
1 check passed
@lideen lideen deleted the fix/srt-multiline-cues-starts-with-newline branch November 28, 2024 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When converting SRT to TTML that multi-line cues had an initial line break (<br/>):
2 participants