-
Notifications
You must be signed in to change notification settings - Fork 731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#6449 Extended file name support to include characters from multiple languages, including Cyrillic and Han scripts #8925
#6449 Extended file name support to include characters from multiple languages, including Cyrillic and Han scripts #8925
Conversation
…d of including different character scripts for file names
@bmarty , I have another simple PR for you if you can take a look at it. If you have any objections to the new RegEx, I am happy to update it as necessary, or add more tests to verify different scenarios. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this change!
@christianrowlands can you handle the errors reported by the CI please? Let me know if you need some help. |
Thanks for reviewing! I will look into the CI failures. |
@bmarty , give it another run and see if that resolves some or all of the issues. |
Can you run |
Whoops, I thought I already removed that unused Timber import. I will run it again now. |
I think you need to either rebase your PR's branch, or merge develop into your branch so that the code can be compiled. We are now using Java 21. |
Ok, I merged develop into my branch. Hopefully we are good now 🤞 |
Type of change
Content
I fixed an issue when sharing files that had non-Latin characters would have the file name replaced with underscores. For example, here is a screenshot that shows up when "forwarding" a file that was send in Element Android.
Motivation and context
Here is a link to the issue: #6449
Worthy of note is that I thought about a couple different approaches to fixing this problem. My first regex approach was to use the existing "inclusion" approach, and add Cyrillic and Han scripts. However, after realizing that it could get messy to add support for all the different scripts, I switched to an "exclusion" approach where I remove any known invalid characters.
For reference, here was the first approach
.replace("[^\\p{sc=Cyrillic}\\p{sc=Han}a-z A-Z0-9\\\\.\\-]".toRegex(), "_")
And version 2
.replace("[\\\\?%*:|\"<>\\s]".toRegex(), "_")
Tests
I also wrote unit tests to verify the new regex works as expected (see the code diff).
Tested devices
Checklist
Signed-off-by: Christian Rowlands <craxiomdev [at] gmail.com>