Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: handle JSON character encoding for hec (Umlauts) #112

Merged
merged 4 commits into from
Nov 14, 2022
Merged

fix: handle JSON character encoding for hec (Umlauts) #112

merged 4 commits into from
Nov 14, 2022

Conversation

zszia
Copy link
Contributor

@zszia zszia commented Nov 18, 2021

To reproduce:

data = {"field_a": "Üü_Öö_Ää_some_text", "field_b": "some_text_Üü_Öö_Ää"}
event = event_writer.create_event(data, time=time.time(), source="solnlib_demo", sourcetype="solnlib_demo")
event_writer.write_events([event])

When indexing data with special characters, these are implicitly UTF-8 encoded.

Here how the raw data looks like in Splunk:

image

Searching for field_a="Üü_Öö_Ää_some_text" is not working. The only way is to use star (*):

field_a="*_some_text"

which is very inefficient if the first character is an umlaut.

The other workaround would be to search for
field_a="\u00dc\u00fc_\u00d6\u00f6_\u00c4\u00e4_some_text"
which is not practicable.

We are using the "Splunk Add-on for Microsoft Cloud Services" app, which is using solnlib
https://splunkbase.splunk.com/app/3110/
When our security team is trying to find names with umlauts (example: Müller), then it's not working, which is a big problem.
They practically always have to use *. And as you know the azure data is pretty big.

If you are using syntax highlighting, then the data is displayed identical in both cases, which is very confusing for the users, which expects the search for Umlauts to work.
image

After the fix the raw data looks like this and the Splunk searches are working correctly:
image

@zszia zszia requested a review from a user November 18, 2021 14:28
@ghost
Copy link

ghost commented Nov 24, 2021

I passed this issue to the team who is working on "Splunk Add-on for Microsoft Cloud Services", next update will be next week.

zugf added a commit to zugf/Splunk-Class-httpevent that referenced this pull request Dec 15, 2021
@artemrys artemrys requested review from artemrys and removed request for a user July 2, 2022 16:47
After applying the fix searching for a field with Umlauts is now working correctly.
@artemrys
Copy link
Member

@zszia thanks for the contribution, sorry that it took so long to merge it, I've just added tests to verify the behaviour for the future.

@artemrys artemrys merged commit 699e387 into splunk:main Nov 14, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Nov 14, 2022
@srv-rr-github-token
Copy link
Contributor

🎉 This PR is included in version 4.8.2 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants