#6449 Extended file name support to include characters from multiple languages, including Cyrillic and Han scripts #8925

christianrowlands · 2024-10-16T14:22:10Z

Type of change

Feature
Bugfix
Technical
Other :

Content

I fixed an issue when sharing files that had non-Latin characters would have the file name replaced with underscores. For example, here is a screenshot that shows up when "forwarding" a file that was send in Element Android.

Motivation and context

Here is a link to the issue: #6449

Worthy of note is that I thought about a couple different approaches to fixing this problem. My first regex approach was to use the existing "inclusion" approach, and add Cyrillic and Han scripts. However, after realizing that it could get messy to add support for all the different scripts, I switched to an "exclusion" approach where I remove any known invalid characters.

For reference, here was the first approach

.replace("[^\\p{sc=Cyrillic}\\p{sc=Han}a-z A-Z0-9\\\\.\\-]".toRegex(), "_")

And version 2

.replace("[\\\\?%*:|\"<>\\s]".toRegex(), "_")

Tests

I sent a file containing Cyrillic characters in Element Web.
I viewed that message in Element Android
I clicked the share button for that file.
I verified that the file name in the share UI was not all underscores.

I also wrote unit tests to verify the new regex works as expected (see the code diff).

Tested devices

Physical
Emulator
OS version(s): Android 15 and Android 5.1

Checklist

Changes has been tested on an Android device or Android emulator with API 21
[-] UI change has been tested on both light and dark themes
Accessibility has been taken into account. See https://github.com/element-hq/element-android/blob/develop/CONTRIBUTING.md#accessibility
Pull request is based on the develop branch
Pull request includes a new file under ./changelog.d. See https://github.com/element-hq/element-android/blob/develop/CONTRIBUTING.md#changelog
Pull request includes screenshots or videos if containing UI changes
Pull request includes a sign off
You've made a self review of your PR
[-] If you have modified the screen flow, or added new screens to the application, you have updated the test UiAllScreensSanityTest.allScreensTest()

Signed-off-by: Christian Rowlands <craxiomdev [at] gmail.com>

… names

…d of including different character scripts for file names

christianrowlands · 2024-10-16T14:24:36Z

@bmarty , I have another simple PR for you if you can take a look at it. If you have any objections to the new RegEx, I am happy to update it as necessary, or add more tests to verify different scenarios.

CLAassistant · 2024-11-01T13:37:43Z

All committers have signed the CLA.

bmarty

Thanks for this change!

bmarty · 2024-11-12T15:00:17Z

@christianrowlands can you handle the errors reported by the CI please? Let me know if you need some help.

christianrowlands · 2024-11-12T15:03:33Z

Thanks for reviewing!

I will look into the CI failures.

christianrowlands · 2024-11-12T15:16:38Z

@bmarty , give it another run and see if that resolves some or all of the issues.

bmarty · 2024-11-12T15:18:29Z

Can you run ./gradlew ktlintFormat first? I think it will remove some unused imports.

christianrowlands · 2024-11-12T15:20:28Z

Whoops, I thought I already removed that unused Timber import. I will run it again now.

bmarty · 2024-11-12T15:34:39Z

I think you need to either rebase your PR's branch, or merge develop into your branch so that the code can be compiled. We are now using Java 21.

christianrowlands · 2024-11-12T15:38:57Z

Ok, I merged develop into my branch. Hopefully we are good now 🤞

christianrowlands added 3 commits October 15, 2024 20:00

element-hq#6449 Adds support for additional character scripts in file…

f8b2bc0

… names

element-hq#6449 Switch to removing specific invalid characters instea…

686ca05

…d of including different character scripts for file names

element-hq#6449 Remove test logging for file name

11f6987

christianrowlands changed the title ~~#6449~~ #6449 Extended file name support to include characters from multiple languages, including Cyrillic and Han scripts Oct 16, 2024

element-hq deleted a comment from logman12oge Oct 25, 2024

bmarty self-requested a review November 8, 2024 10:10

bmarty approved these changes Nov 12, 2024

View reviewed changes

element-hq#6449 Use the correct name in the file headers

2cfc230

element-hq#6449 Remove unused imports

36e8b7b

Merge branch 'develop' into bugfix/cmr/extended-character-filename

a608bff

bmarty merged commit 93962d0 into element-hq:develop Nov 12, 2024
11 of 13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#6449 Extended file name support to include characters from multiple languages, including Cyrillic and Han scripts #8925

#6449 Extended file name support to include characters from multiple languages, including Cyrillic and Han scripts #8925

christianrowlands commented Oct 16, 2024

christianrowlands commented Oct 16, 2024

CLAassistant commented Nov 1, 2024 •

edited

Loading

bmarty left a comment

bmarty commented Nov 12, 2024

christianrowlands commented Nov 12, 2024

christianrowlands commented Nov 12, 2024

bmarty commented Nov 12, 2024

christianrowlands commented Nov 12, 2024

bmarty commented Nov 12, 2024 •

edited

Loading

christianrowlands commented Nov 12, 2024

#6449 Extended file name support to include characters from multiple languages, including Cyrillic and Han scripts #8925

#6449 Extended file name support to include characters from multiple languages, including Cyrillic and Han scripts #8925

Conversation

christianrowlands commented Oct 16, 2024

Type of change

Content

Motivation and context

Tests

Tested devices

Checklist

christianrowlands commented Oct 16, 2024

CLAassistant commented Nov 1, 2024 • edited Loading

bmarty left a comment

Choose a reason for hiding this comment

bmarty commented Nov 12, 2024

christianrowlands commented Nov 12, 2024

christianrowlands commented Nov 12, 2024

bmarty commented Nov 12, 2024

christianrowlands commented Nov 12, 2024

bmarty commented Nov 12, 2024 • edited Loading

christianrowlands commented Nov 12, 2024

CLAassistant commented Nov 1, 2024 •

edited

Loading

bmarty commented Nov 12, 2024 •

edited

Loading