Data Import BUG #838

ralphkretzschmar · 2024-03-06T12:09:01Z

Dear Team,
When i try to import testdata it identifies accounts as duplicates but they are not really duplicates.

Example data (csv-content):
Organisationnames:
Müller
Muller
Möller
Moller
When trying to import these organisation names as accounts it idetifies "Müller" as "Muller" -> false duplicates

So i can't import such data because of false duplicates

Best regards
Ralph

ralphkretzschmar · 2024-03-06T13:47:16Z

After digging more into deep i recognized that it was the config of mysql db.
Sorry for the false report.

Best regrads
Ralph

ralphkretzschmar · 2024-03-06T16:06:45Z

Unfortunately its not only the DB i thought i have to convert it to another collation like "utf8mb4_bin" of the DB. but this has negative side effects.
i guess the comparison for accountnames has to be done at code level?

urban-thinking · 2024-03-06T16:15:21Z

Hiya Ralph.

as we state in https://blog.crm-now.de/doc/berliCRM/installation/Installation_berlicrm.html the DATABASE COLLATION must be utf8_unicode_ci .

The second part of your question is not clear enough to answer, can you try to reword it?

Regards
Emilio

urban-thinking · 2024-03-06T17:08:26Z

Ahh ... I talked to a colleague which understood your question. Let me give an AI answer ;-)

The utf8_unicode_ci collation in MySQL is a case-insensitive collation that supports the UTF-8 character set. It treats accented characters as equivalent to their non-accented counterparts. This behavior is by design and is intended to facilitate searches and comparisons where differences in accents or case should be ignored.

In the case of "müller" and "muller", the utf8_unicode_ci collation treats them as equivalent because it ignores the difference in the accent on the letter 'u'. This can be beneficial in many situations, such as when searching for names or words where accents might be inconsistently used or omitted.

If you want accent sensitivity in your searches, you would need to use a different collation that supports that, such as utf8mb4_bin, which is case-sensitive and accent-sensitive. However, it's worth noting that using accent-insensitive collations like utf8mb3_unicode_ci or utf8mb4_unicode_ci is often preferred for applications where users might input data inconsistently.

Regards Emilio

ralphkretzschmar · 2024-03-06T18:40:16Z

HI Emilio,

thank you for your really fast update :) (makes sense for me)
i think to go with utf8mb4_unicode_ci is fin because of search funktions etc.

Do you know where to take a look at the code to implement a more granular double check for duplicates at importing data function?

What i try to implement is a check if the Accountname (which i try to import) is already existing (100% same check).
So i could import import accounts like "Muller GmbH" and "Müller GmbH" as they are treated as different accounts and still have the the other benefits for inconsistently data input.

i could share my code afterwards -> could be interesting for german admin-users.

Best regards
Ralph

Archibald111 · 2024-03-06T19:36:45Z

utf8mb4_unicode_ci is not ok, that is the source for your Umlaute problem

ralphkretzschmar · 2024-03-06T19:42:01Z

Hi Frank,

and which one should i use?

Best regards,
Ralph

Archibald111 · 2024-03-06T20:14:32Z

as Emilio wrote utf8_unicode_ci

AlexKay85 · 2024-03-07T07:43:33Z

Hi Ralph,

'utf8_unicode_ci' will not help you with this issue, it treats Umlauts the same as 'utf8mb4_unicode_ci'.

What you'd need is either a binary collation like 'utf8_bin' or a typecast to binary for every comparison.
We do not support binary collations, they were not tested at all and probably wouldn't work very well.
Unfortunately it's not easy to fix this on the code level either. Too many places where it'd had to be done and it also opens another can of worms.

Best Regards,
Alex

ralphkretzschmar closed this as completed Mar 6, 2024

ralphkretzschmar reopened this Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Import BUG #838

Data Import BUG #838

ralphkretzschmar commented Mar 6, 2024

ralphkretzschmar commented Mar 6, 2024

ralphkretzschmar commented Mar 6, 2024

urban-thinking commented Mar 6, 2024

urban-thinking commented Mar 6, 2024

ralphkretzschmar commented Mar 6, 2024

Archibald111 commented Mar 6, 2024

ralphkretzschmar commented Mar 6, 2024

Archibald111 commented Mar 6, 2024

AlexKay85 commented Mar 7, 2024

Data Import BUG #838

Data Import BUG #838

Comments

ralphkretzschmar commented Mar 6, 2024

ralphkretzschmar commented Mar 6, 2024

ralphkretzschmar commented Mar 6, 2024

urban-thinking commented Mar 6, 2024

urban-thinking commented Mar 6, 2024

ralphkretzschmar commented Mar 6, 2024

Archibald111 commented Mar 6, 2024

ralphkretzschmar commented Mar 6, 2024

Archibald111 commented Mar 6, 2024

AlexKay85 commented Mar 7, 2024