-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: use the sorting function StrCmpLogicalW
provided by the win32 API
#798
Conversation
All indents doesn't use Tab. |
Fix formatting. |
234d8e3
to
0de09f8
Compare
StrCmpLogicalW
provided by the win32 API
I assume this will use window's sorting algorithm to sort stuff? What is the performance impact of this compared to our current solution? Because what we have now works pretty well, and I'm not willing to take much of a performance hit for no perceivable difference. Regarding formatting, we already have a |
I think it firstly considers to fix #599 instead of performance. |
Maybe make it optional, users could then choose which sort method to use. |
This is a strict improvement that should always be unconditionally enabled for users using the mpv builds with luajit on Windows. I don't think an option should be added for switches because Lua's sorting algorithm has many issues. |
As @hooke007 mentioned above, the current Lua sorting algorithm cannot properly handle the sorting of non Latin characters, which leads to a bad experience. It is quite difficult to optimize sorting algorithms in Lua to handle non Latin characters correctly. This is different from romanization of search characters, as we only need to romanize the user's local language for searching. But for sorting, we actually need to romanize all non Latin characters otherwise we won't be able to sort correctly. In addition, even the sorting of Latin characters still has problems with special characters. Please refer to jonniek/mpv-playlistmanager#128. The same sorting algorithm is used in the playlistmanager.lua script, so the sorting problem mentioned there also exists in uosc. |
Maybe we'd have to convert the strings to utf-32 (or an array of unicode codepoints? |
Yes, this may be a good idea, we can use Unicode table order for sorting, which will improve the sorting of non Latin characters. |
I've come to the realization that utf-32 won't work. It'll sort the same as utf-8. |
For this reason, I searched for the sorting method of the unicode table. The logogram text included in unicode is sorted by glyph order rather than pronunciation. This means that sorting according to the unicode table is also incorrect, and only after romanization can the correct sorting be obtained.
As I mentioned above, the sorting of logogram text by Windows Explorer and Macos Finder is based on romanized pronunciation rather than code points. |
Good to know. If I understand that correctly that could simply be done by shoving the result from gsub into the char conversion function without having to do anything else. |
Not only that, we also need to place latin characters above romanization conversion characters in sorting. And as I mentioned earlier, romanization sort in lua may not be a good idea.
Additionally, there is a difficulty in distinguishing and sorting romanization characters in different languages, which will have to be handled using Unicode ranges. |
Benchmark performance testing was conducted using the following test code: filenames = {"11", "19", "速20", "度21", "测0", "试1", "9", "10"}
local begin = os.clock()
for i = 1, 10000 do
sort_filenames_windows(filenames) --or sort_filenames_lua(filenames)
end
print(string.format('total time:%.2fms\n', ((os.clock() - begin) * 1000))) test result:
The performance of the win32 API meets expectations, and its impact on performance is relatively small compared to the significant improvement brought by native sorting algorithms. By the way, when introducing a simple romanization process to Lua's sorting algorithm and then conducting the above benchmark testing, the results are |
Since the win32 API solution is only applicable to Windows users, I am thinking about how to optimize sorting algorithms for Linux and MacOS users. After the above discussion, it is difficult to implement romanization sorting in lua and cannot guarantee performance. We need a new alternative solution, as uosc already has a ziggy tool, using Go's sorting algorithm to optimize the sorting experience on Unix systems may be a good idea. Fortunately, Go has a powerful library golang.org/x/text/collate which could help us. I tried using the following code to test the sorting algorithm with collate of Go. package main
import (
"fmt"
"golang.org/x/text/collate"
"golang.org/x/text/language"
)
func main() {
letters := []string{"ä", "å", "ö", "o", "a"}
ec := collate.New(language.English)
ec.SortStrings(letters)
fmt.Printf("English Sorting: %v\n", letters)
sc := collate.New(language.Swedish)
sc.SortStrings(letters)
fmt.Printf("Swedish Sorting: %v\n", letters)
numbers := []string{"ä", "å", "ö", "o", "a", "阿", "不", "0", "11", "01", "2", "3", "23", "050", "50", "-50", "The Woodshed-Episode 5", "The Woodshed-Episode 5_1", "sort&test.mp4", "sort.test19.mp4", "sort_test.mp4", "sort+test.mp4", "sort1test.mp4", "sort07test .mp4", "sorttest(1).mp4", "sort-test.mp4"}
ec.SortStrings(numbers)
fmt.Printf("Alphabetic Sorting: %v\n", numbers)
nc := collate.New(language.Chinese, collate.IgnoreCase, collate.Numeric)
nc.SortStrings(numbers)
fmt.Printf("Numeric Sorting: %v\n", numbers)
} Here are the test results:
We can see that the sorting algorithm of Go can fix #599. Although it cannot handle special characters like the native sorting algorithm provided by the win32 API, and therefore cannot fix the problem mentioned jonniek/mpv-playlistmanager#128, but it can correctly sort non latin characters when the correct @tomasklaen is more familiar with the Go, and it should be up to him to decide how to use and optimize the sorting algorithm of Go correctly. It is not within the range of this pr. |
First of all, how big of an issue is the lua sorting we have right now? It sorts some languages slightly differently than they'd prefer, but it's still in the realm of good enough and requires minimal effort to get used to it. Is it completely nonsensical for asian languages? Doing it in ziggy might be an option, but that depends on how much delay would the data passing and json encode/decode add, and I don't think I'll like the numbers. I already don't like the double in sorting time this solution adds. And it feels like it should be in ziggy, since making this only for windows feels wrong. I'm trying to balance the added value of features to their performance hit and development/maintenance cost going forward, and I'm on the fence here. |
As mentioned above (#798 (comment)), in the current Lua sorting algorithm, all logogram texts have absurd sorting results. They are sorted according to the code point order in the Unicode table rather than the correct pronunciation order. In addition, for accented characters that exist in many languages, it cannot be correctly sorted according to different languages. The current lua sorting algorithm is far from being good enough.
In my opinion, implementing better sorting algorithms in ziggy is meaningful for Linux and Macos platforms, otherwise non Latin users on these platforms would only have a poor sorting experience. We still need this win platform specific solution on Windows. As shown in the results of my testing of the sorting algorithm in Go above (#798 (comment)), although the local language based sorting algorithm in Go is already excellent enough, its handling of special characters is still completely different from the native sorting algorithm in Windows. When we have a way to implement native sorting algorithms on Windows to provide users with a consistent experience, there is no reason not to use it. Although the new win32 api sorting algorithm incurs additional time costs, as demonstrated in performance tests, the differences and delays between the two are still not within human perception even in sorting tens of thousands of files, as their speed is fast enough. And this kind of usage scenario is almost never encountered in mpv. |
Thank @stax76 for providing the code. See-Also mpvnet-player/mpv.net#575 (comment), jonniek/mpv-playlistmanager#128
Fixes #599