fix: handle JSON character encoding for hec (Umlauts) #112

zszia · 2021-11-18T14:28:45Z

To reproduce:

data = {"field_a": "Üü_Öö_Ää_some_text", "field_b": "some_text_Üü_Öö_Ää"}
event = event_writer.create_event(data, time=time.time(), source="solnlib_demo", sourcetype="solnlib_demo")
event_writer.write_events([event])

When indexing data with special characters, these are implicitly UTF-8 encoded.

Here how the raw data looks like in Splunk:

Searching for field_a="Üü_Öö_Ää_some_text" is not working. The only way is to use star (*):

field_a="*_some_text"

which is very inefficient if the first character is an umlaut.

The other workaround would be to search for
field_a="\u00dc\u00fc_\u00d6\u00f6_\u00c4\u00e4_some_text"
which is not practicable.

We are using the "Splunk Add-on for Microsoft Cloud Services" app, which is using solnlib
https://splunkbase.splunk.com/app/3110/
When our security team is trying to find names with umlauts (example: Müller), then it's not working, which is a big problem.
They practically always have to use *. And as you know the azure data is pretty big.

If you are using syntax highlighting, then the data is displayed identical in both cases, which is very confusing for the users, which expects the search for Umlauts to work.

After the fix the raw data looks like this and the Splunk searches are working correctly:

ghost · 2021-11-24T15:35:03Z

I passed this issue to the team who is working on "Splunk Add-on for Microsoft Cloud Services", next update will be next week.

handles same issue as addressed for solnlib (see splunk/addonfactory-solutions-library-python#112)

After applying the fix searching for a field with Umlauts is now working correctly.

artemrys · 2022-11-14T15:17:41Z

@zszia thanks for the contribution, sorry that it took so long to merge it, I've just added tests to verify the behaviour for the future.

srv-rr-github-token · 2022-11-14T15:22:51Z

🎉 This PR is included in version 4.8.2 🎉

The release is available on:

v4.8.2
GitHub release

Your semantic-release bot 📦🚀

zszia requested a review from a user November 18, 2021 14:28

zugf added a commit to zugf/Splunk-Class-httpevent that referenced this pull request Dec 15, 2021

fix: support non-ascii chars in event messages

7b3b4f1

handles same issue as addressed for solnlib (see splunk/addonfactory-solutions-library-python#112)

artemrys requested review from artemrys and removed request for a user July 2, 2022 16:47

fix: handle JSON character encoding for hec (Umlauts)

de7f361

After applying the fix searching for a field with Umlauts is now working correctly.

artemrys requested a review from okashaev-splunk as a code owner November 10, 2022 15:50

artemrys added 3 commits November 10, 2022 16:53

style: pre-commit

e43c151

test: update integration tests to include fixed scenario

a476020

test: sleep before search

9ffddfc

artemrys approved these changes Nov 14, 2022

View reviewed changes

artemrys merged commit 699e387 into splunk:main Nov 14, 2022

github-actions bot locked and limited conversation to collaborators Nov 14, 2022

srv-rr-github-token added the released label Nov 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle JSON character encoding for hec (Umlauts) #112

fix: handle JSON character encoding for hec (Umlauts) #112

zszia commented Nov 18, 2021

ghost commented Nov 24, 2021

artemrys commented Nov 14, 2022

srv-rr-github-token commented Nov 14, 2022

fix: handle JSON character encoding for hec (Umlauts) #112

fix: handle JSON character encoding for hec (Umlauts) #112

Conversation

zszia commented Nov 18, 2021

ghost commented Nov 24, 2021

artemrys commented Nov 14, 2022

srv-rr-github-token commented Nov 14, 2022