Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 Encode/Decode Error Handling #505

Closed
ashurack opened this issue Jan 12, 2023 · 5 comments
Closed

UTF-8 Encode/Decode Error Handling #505

ashurack opened this issue Jan 12, 2023 · 5 comments

Comments

@ashurack
Copy link

ashurack commented Jan 12, 2023

Describe the bug
Custom search commands exception out when non UTF-8 event data is present in the search pipeline

To Reproduce

  1. Create a custom command
  2. Pass non UTF-8 field data to the custom command (feel free to use invalid_utf8.csv)

Expected behavior
splunk-sdk-python (and all other potentially impacted SDK's) should handle encoding/decoding in the same manner as Splunk Core.

Logs or Screenshots

Splunk (please complete the following information):

  • Version: 8.2.5
  • OS: Windows 10 Pro 19045.2486
  • Deployment: single-instance

SDK (please complete the following information):

  • Version: 1.7.2
  • Language Runtime Version: Python 3.7
  • OS: Windows 10 Pro 19045.2486

Additional context
My patch - to get my command working ASAP - was to change errors='strict' to errors='replace' here. I chose replace since it mimic's the functionality of Splunk. I didn't touch any other instances of errors='strict' and only tested this against StreamingCommand.

This bug is not limited to the inputlookup command but it is the easiest way to reproduce.

@ashah-splunk
Copy link
Contributor

@ashurack we are unable to reproduce the issue and were able to successfully upload the given csv file. below is the screenshot for the same
image

Note- here we are using a Streaming CSC. No modification is being applied to the data read from the csv file.

Request you to share your CSC if possible. Also do let us know if something is being missed from steps to reproducing the issue

@ashurack
Copy link
Author

Looks like the csv file I uploaded is 100% valid UTF. I'll try to get a sample that will trigger the decoding issue this week. Message me on Splunk Slack in the meantime for more details.

@pabloperezj
Copy link

Hi @ashah-splunk, same error here (Splunk 9.0.1, Debian GNU/Linux 11, Python 3.7.11). Loading events from CSV doesn't work as expected (non UTF characters are parsed to UTF). I am using botsv3 dataset and getting the same problem as @ashurack using vt4splunk streaming command from VT4Splunk. The suggested solution by @ashurack seems to work properly.

  • Event example:
Screenshot 2023-06-14 at 10 38 36
  • Error:
Screenshot 2023-06-14 at 14 36 09
  • search.log:
ERROR ChunkedExternProcessor [1549763 ChunkedExternProcessorStderrLogger] - stderr: UnicodeDecodeError at "/opt/splunk/etc/apps/TA-virustotal-app/bin/ta_virustotal_app/aob_py3/splunklib/six.py", line 917 : 'utf-8' codec can't decode byte 0x8e in position 294568: invalid start byte
ERROR ChunkedExternProcessor [1549763 ChunkedExternProcessorStderrLogger] - stderr: Traceback:
ERROR ChunkedExternProcessor [1549763 ChunkedExternProcessorStderrLogger] - stderr:   File "/opt/splunk/etc/apps/TA-virustotal-app/bin/ta_virustotal_app/aob_py3/splunklib/searchcommands/search_command.py", line 780, in _process_protocol_v2
ERROR ChunkedExternProcessor [1549763 ChunkedExternProcessorStderrLogger] - stderr:     self._execute(ifile, None)
ERROR ChunkedExternProcessor [1549763 ChunkedExternProcessorStderrLogger] - stderr:   File "/opt/splunk/etc/apps/TA-virustotal-app/bin/ta_virustotal_app/aob_py3/splunklib/searchcommands/streaming_command.py", line 55, in _execute
ERROR ChunkedExternProcessor [1549763 ChunkedExternProcessorStderrLogger] - stderr:     SearchCommand._execute(self, ifile, self.stream)
ERROR ChunkedExternProcessor [1549763 ChunkedExternProcessorStderrLogger] - stderr:   File "/opt/splunk/etc/apps/TA-virustotal-app/bin/ta_virustotal_app/aob_py3/splunklib/searchcommands/search_command.py", line 855, in _execute
ERROR ChunkedExternProcessor [1549763 ChunkedExternProcessorStderrLogger] - stderr:     self._execute_v2(ifile, process)
ERROR ChunkedExternProcessor [1549763 ChunkedExternProcessorStderrLogger] - stderr:   File "/opt/splunk/etc/apps/TA-virustotal-app/bin/ta_virustotal_app/aob_py3/splunklib/searchcommands/search_command.py", line 948, in _execute_v2
ERROR ChunkedExternProcessor [1549763 ChunkedExternProcessorStderrLogger] - stderr:     result = self._read_chunk(istream)
ERROR ChunkedExternProcessor [1549763 ChunkedExternProcessorStderrLogger] - stderr:   File "/opt/splunk/etc/apps/TA-virustotal-app/bin/ta_virustotal_app/aob_py3/splunklib/searchcommands/search_command.py", line 912, in _read_chunk
ERROR ChunkedExternProcessor [1549763 ChunkedExternProcessorStderrLogger] - stderr:     return metadata, six.ensure_str(body)
ERROR ChunkedExternProcessor [1549763 ChunkedExternProcessorStderrLogger] - stderr:   File "/opt/splunk/etc/apps/TA-virustotal-app/bin/ta_virustotal_app/aob_py3/splunklib/six.py", line 917, in ensure_str
ERROR ChunkedExternProcessor [1549763 ChunkedExternProcessorStderrLogger] - stderr:     s = s.decode(encoding, errors)
ERROR ChunkedExternProcessor [1549758 localCollectorThread] - EOF while attempting to read transport header read_size=0
ERROR ChunkedExternProcessor [1549758 localCollectorThread] - Error in 'vt4splunk' command: External search command exited unexpectedly with non-zero error code 1.
ERROR LocalCollector [1549758 localCollectorThread] - SearchMessage orig_component=LocalCollector sid= message_key= message=Error in 'vt4splunk' command: External search command exited unexpectedly with non-zero error code 1.

@ashah-splunk
Copy link
Contributor

@pabloperezj sorry for the delay in response. We were able to reproduce the issue using botsv3 dataset. Also during our verification we found that issue occurs only for certain specific non-utf8 characters. We are validating the change suggested by @ashurack and accordingly will make the changes in the SDK.

We will update you know once we have a new SDK release available with the change.

@ashah-splunk
Copy link
Contributor

@ashurack ,@pabloperezj the fix is available in the latest Python SDK v1.7.4, request you to pull the latest SDK release. Please re-open the issue if the issue still persists. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants