Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-UTF-8 ICY metadata #6753

Closed
yattoz opened this issue Dec 11, 2019 · 3 comments
Closed

Support non-UTF-8 ICY metadata #6753

yattoz opened this issue Dec 11, 2019 · 3 comments
Assignees

Comments

@yattoz
Copy link

yattoz commented Dec 11, 2019

[REQUIRED] Issue description

I read an audio stream from a URL (it's a webradio) with ICY metadata. The source is a bit old, it uses Airtime, and it sends its metadata in a non-UTF-8 format. In our case, it's mostly French songs.

Current behaviour:
  • When retrieving metadata, ExoPlayer replaces all accentuated characters (like é, à, ë, â, ç...) with a single replacement character � (code 0xFFFD). It is then impossible to guess which character has been replaced by this unknown character �.
Desired behaviour
  • ExoPlayer should ideally adapt to these sources to parse the correct characters by relying on a given Locale. Or at least, it should provide a raw result of the data it parsed (for example in an array of uint8 to represent each byte it parsed from the ICY) to allow the programmer to deal with these characters by themself.

[REQUIRED] Reproduction steps

You can clone and run my radio app : https://github.com/yattoz/Tsumugi-app
All you need to do is press play and wait. Songs with accents are not very frequent so you may need to wait quite some time.
The metadata code is located in the file RadioService.kt.

[REQUIRED] Link to test content

The audio source used is : https://radio.mahoro-net.org/streams/tsumugi
It's in plain text in the app, and is accessible from anywhere in the world.

[REQUIRED] A full bug report captured from the device

You'll find attached the bug report from my physical device, Motorola Moto G5 Plus "potter". I harvested it right when a song with an accent appeared (see logcat extract just below)
bugreport-potter_n-OPS28.85-17-6-2-2019-12-11-22-38-04.zip

In addition, here is what is displayed by my Log when I print respectively:

  • the title
  • the title as Int values
  • the Metadata object used in addMetadataOutput
E/fr.forum_thalie.tsumugi: ======RadioService=====onMetadata: Title ----> France Gall - R�siste
E/fr.forum_thalie.tsumugi: [70, 114, 97, 110, 99, 101, 32, 71, 97, 108, 108, 32, 45, 32, 82, 65533, 115, 105, 115, 116, 101]
E/fr.forum_thalie.tsumugi: raw: entries=[ICY: title="France Gall - R�siste", url="null", rawMetadata="StreamTitle='France Gall - R�siste';"]

[REQUIRED] Version of ExoPlayer being used

I am using ExoPlayer 2.11.0 (the latest release at the time of writing).
I saw the same behaviour on 2.10.6.

[REQUIRED] Device(s) and version(s) of Android being used

This has been reproduced on:

  • Android emulators for Android 5, 6, 7.1, 9 and 10.
  • Motorola Moto G5 Plus, Android 8.1
  • Xiaomi Redmi 6A, Android 8.1 then 9 (after update)
  • Blackberry Q5 (API18 - equivalent Android 4.3)

Comment:

I didn't test anything like modifying ExoPlayer by myself, but I happened to read quickly the files related to ICY metadata parsing: https://github.com/google/ExoPlayer/tree/release-v2/library/core/src/main/java/com/google/android/exoplayer2/metadata/icy

It might be a problem to force the decoding as UTF-8 of the byte array in the IcyDecoder:


This method is actually a simple String decoding with a given charset:
return new String(bytes, offset, length, Charset.forName(C.UTF8_NAME));

Or as I said before, if there's no good alternative, it might be helpful to store and expose this byte array to let the developer deal with special characters.

Thank you very much for your hard work!

@icbaker
Copy link
Collaborator

icbaker commented Dec 12, 2019

Thanks for the report! I wasn't able to reproduce after watching the provided stream for ~1 hour - but I can see how we're assuming a UTF-8 character encoding without any concrete evidence, and it looks like it's not strictly defined for ICY.

I'll have a look into how we can best handle this.

I'm going to mark this as an enhancement, since the ICY spec is pretty under-defined it's hard to really call this a bug in ExoPlayer - we currently do a sensible-ish thing in an ambiguous situation :)

@icbaker icbaker changed the title ICY metadata fails on non-UTF8 sources Support non-UTF-8 ICY metadata Dec 12, 2019
@yattoz
Copy link
Author

yattoz commented Dec 12, 2019

If this can help, I noticed the following when listening to this stream using Foobar2000 on Windows.

  • If I explicitly set in Regional Settings that non-Unicode text should be interpreted with the French locale, Foobar2000 displays these characters correctly.
  • If I set up in Regional Settings another language, say for example Japanese, Foobar2000 interprets the special characters the way they are interpreted in Japanese : it uses the special character + the next character (making it 16 bytes) and decode these 16 bytes as a Japanese character. So a text like Résiste is displayed as something like R漢iste (notice how the first s has disappeared). (In that case, of course Japanese is not the right way to decode this stream. But if we imagine that some Japanese stream is using non-Unicode encoding, and relies only with this "legacy" encoding, then ExoPlayer simply won't be able to decode it at all.)

I don't know if that could help, but that's what I noticed.

icbaker added a commit that referenced this issue Dec 13, 2019
Also change IcyInfo.rawMetatadata from String to byte[]

ICY doesn't specify the character encoding, and there are streams
not using UTF-8 (issue:#6753). It seems the default of at least one
server is ISO-8859-1 so let's support that as a fallback:
savonet/liquidsoap#411 (comment)

Also update IcyDecoder to skip strings it doesn't recognise at all
instead of decoding invalid characters.

The feed from issue:#6753 now decodes accents correctly:
EventLogger:   ICY: title="D Pai - Le temps de la rentrée", url="null"
PiperOrigin-RevId: 285388522
@icbaker
Copy link
Collaborator

icbaker commented Dec 13, 2019

It seems your stream metadata is encoded in ISO-8859-1.

It looks like this is the default for at least one ICY server:
savonet/liquidsoap#411 (comment)

I've updated IcyDecoder to fall-back to ISO-8859-1 if UTF-8 decoding fails - now accents in your stream are rendered correctly in LogCat by the demo app.

@icbaker icbaker closed this as completed Dec 13, 2019
ojw28 pushed a commit that referenced this issue Jan 17, 2020
Also change IcyInfo.rawMetatadata from String to byte[]

ICY doesn't specify the character encoding, and there are streams
not using UTF-8 (issue:#6753). It seems the default of at least one
server is ISO-8859-1 so let's support that as a fallback:
savonet/liquidsoap#411 (comment)

Also update IcyDecoder to skip strings it doesn't recognise at all
instead of decoding invalid characters.

The feed from issue:#6753 now decodes accents correctly:
EventLogger:   ICY: title="D Pai - Le temps de la rentrée", url="null"
PiperOrigin-RevId: 285388522
@google google locked and limited conversation to collaborators Feb 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants