-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Import BUG #838
Comments
After digging more into deep i recognized that it was the config of mysql db. Best regrads |
Unfortunately its not only the DB i thought i have to convert it to another collation like "utf8mb4_bin" of the DB. but this has negative side effects. |
Hiya Ralph. as we state in https://blog.crm-now.de/doc/berliCRM/installation/Installation_berlicrm.html the DATABASE COLLATION must be utf8_unicode_ci . The second part of your question is not clear enough to answer, can you try to reword it? Regards |
Ahh ... I talked to a colleague which understood your question. Let me give an AI answer ;-) The utf8_unicode_ci collation in MySQL is a case-insensitive collation that supports the UTF-8 character set. It treats accented characters as equivalent to their non-accented counterparts. This behavior is by design and is intended to facilitate searches and comparisons where differences in accents or case should be ignored. In the case of "müller" and "muller", the utf8_unicode_ci collation treats them as equivalent because it ignores the difference in the accent on the letter 'u'. This can be beneficial in many situations, such as when searching for names or words where accents might be inconsistently used or omitted. If you want accent sensitivity in your searches, you would need to use a different collation that supports that, such as utf8mb4_bin, which is case-sensitive and accent-sensitive. However, it's worth noting that using accent-insensitive collations like utf8mb3_unicode_ci or utf8mb4_unicode_ci is often preferred for applications where users might input data inconsistently. Regards Emilio |
HI Emilio, thank you for your really fast update :) (makes sense for me) Do you know where to take a look at the code to implement a more granular double check for duplicates at importing data function? What i try to implement is a check if the Accountname (which i try to import) is already existing (100% same check). i could share my code afterwards -> could be interesting for german admin-users. Best regards |
utf8mb4_unicode_ci is not ok, that is the source for your Umlaute problem |
Hi Frank, and which one should i use? Best regards, |
as Emilio wrote utf8_unicode_ci |
Hi Ralph, 'utf8_unicode_ci' will not help you with this issue, it treats Umlauts the same as 'utf8mb4_unicode_ci'. What you'd need is either a binary collation like 'utf8_bin' or a typecast to binary for every comparison. Best Regards, |
Dear Team,
When i try to import testdata it identifies accounts as duplicates but they are not really duplicates.
Example data (csv-content):
Organisationnames:
Müller
Muller
Möller
Moller
When trying to import these organisation names as accounts it idetifies "Müller" as "Muller" -> false duplicates
So i can't import such data because of false duplicates
Best regards
Ralph
The text was updated successfully, but these errors were encountered: