Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle unknown chars in segmentation_upid #55

Closed
huyen-streamotion opened this issue Sep 23, 2022 · 7 comments
Closed

Handle unknown chars in segmentation_upid #55

huyen-streamotion opened this issue Sep 23, 2022 · 7 comments
Labels
bug Something isn't working

Comments

@huyen-streamotion
Copy link

Input /DBKAAAAAAAAAP/wBQb+YtC8/AA0AiZDVUVJAAAD6X/CAAD3W3ACEmJibG5kcHBobkQCAsGDpQIAAAAAAAEKQ1VFSRSAIyowMljRk9c=

Actual behaviour: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 12: invalid start byte at https://github.com/futzu/scte35-threefive/blob/f41c8a614abc1c866c5ddb48caf54368c235145a/threefive/bitn.py#L55

Expected behaviour:

  • Either, having that exception handled, setting segmentation_upid to a default string
  • Or, allowing user to provide encoding charset
@futzu
Copy link
Owner

futzu commented Sep 26, 2022

let me check it out, Ill get back to you shortly.

@futzu
Copy link
Owner

futzu commented Sep 26, 2022

Passing a charset in is a can of worms I do not wish to open, however, I will do,
return int.to_bytes(stuff, wide, byteorder="big").decode("utf-8", errors="replace")
and the segmentation upid will come out like

    "descriptors": [
        {
            "tag": 2,
            "descriptor_length": 38,
            "name": "Segmentation Descriptor",
            "identifier": "CUEI",
            "components": [],
            "segmentation_event_id": "0x3e9",
            "segmentation_event_cancel_indicator": false,
            "program_segmentation_flag": true,
            "segmentation_duration_flag": true,
            "delivery_not_restricted_flag": false,
            "web_delivery_allowed_flag": false,
            "no_regional_blackout_flag": false,
            "archive_allowed_flag": false,
            "device_restrictions": "Restrict Group 2",
            "segmentation_duration": 180.12,
            "segmentation_duration_ticks": 16210800,
            "segmentation_message": "Not Indicated",
            "segmentation_upid_type": 2,
            "segmentation_upid_type_name": "Deprecated",
            "segmentation_upid_length": 18,
            "segmentation_upid": "bblndpphnD\u0002\u0002\ufffd\ufffd\ufffd\u0002\u0000\u0000",
            "segmentation_type_id": 0,
            "segment_num": 0,
            "segments_expected": 0
        },

Can you live with that solution?

@futzu
Copy link
Owner

futzu commented Sep 26, 2022

pip install --upgrade threefive

>>>> import threefive
>>>> threefive.version()
'2.3.49'

@futzu
Copy link
Owner

futzu commented Sep 26, 2022

On second thought, a charset might be a good idea.
In the Stream class, I need a charset sometimes for mpegts descriptors,
I'm fairly sure it could be used in other places as well.
Maybe a threefive.charset var or something. I can default it to utf-8,
but it could be set otherwise if needed.
What do you think?

@huyen-streamotion
Copy link
Author

Thanks @futzu.
That quick fix is good for us.
I also vote for allowing the user to provide the charset.

@futzu
Copy link
Owner

futzu commented Sep 29, 2022

Here's what I'm testing now,
this would apply to UPID types:

  • 0x01,0x02(Deprecated),
  • 0x03 (AdID) ,
  • 0x07 (TID),
  • 0x09 (ADI),
  • 0x10 (UUID),
  • 0x11 (SCR),
  • 0x0E (ADS Info),
  • 0x0F (URI)
  • 0xFD (Unknown)
>>> from threefive import Cue,upids
>>> i="/DBKAAAAAAAAAP/wBQb+YtC8/AA0AiZDVUVJAAAD6X/CAAD3W3ACEmJibG5kcHBobkQCAsGDpQIAAAAAAAEKQ1VFSRSAIyowMljRk9c="

>>> upids.charset
'ascii'
>>> cue=Cue(i)
>>> cue.decode()
ascii
True
>>> cue.descriptors[0].segmentation_upid
'bblndpphnD\x02\x02���\x02\x00\x00'


>>> upids.charset="big5"
>>> cue.decode()
big5
True
>>> cue.descriptors[0].segmentation_upid
'bblndpphnD\x02\x02���\x02\x00\x00'


>>> upids.charset="utf16"
>>> cue.decode()
utf16
True
>>> cue.descriptors[0].segmentation_upid
'扢湬灤桰䑮Ȃ菁ʥ\x00'

@futzu
Copy link
Owner

futzu commented Oct 9, 2022

threefive.upids.charset is active as of threefive-2.3.51.

@futzu futzu closed this as completed Oct 9, 2022
@futzu futzu added the bug Something isn't working label Oct 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants