Skip to content
This repository has been archived by the owner on Oct 5, 2019. It is now read-only.

Fixed invalid number mappings for datetime strings #157

Merged
merged 2 commits into from
Sep 18, 2017

Conversation

leeren
Copy link
Contributor

@leeren leeren commented Sep 14, 2017

This fixes GH-156, where timestamp keys whose values could not be correctly rendered to timestamp strings were ultimately mapped to their original Number types instead of string types. As described in the issue, the problem with this was that when later ingested by analytics frameworks like Elasticsearch, type conflicts could arise since some values were interpreted as longs and others as strings. The fix is to by default return a very early datetime if rendering fails.

@leeren leeren self-assigned this Sep 14, 2017
@@ -355,10 +355,11 @@ def _normalize_val(val, key=None):
:returns: A string
"""
# If the key hints this is a timestamp, try to use some popular formats
if key and any([-1 != key.lower().find(hint) for hint in ['time', 'utc', 'date', 'accessed']]):
if key and any([-1 != key.lower().find(hint) for hint in ['time', 'utc', 'date', 'accessed']]) and not 'times' in key.lower():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any ideas if we could refactor the first part of the if statement to also use the in keyword instead of any + find + in combo?

if ts:
return _datetime_to_string(ts)
if not ts:
ts = datetime.fromtimestamp(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment about what is going on here. It maybe not clear why we are assigning the default datetime here when there is not ts calculated before.

@leeren leeren force-pushed the issue_156_invalid_timestamp_types branch from 9c2e5c1 to 2c1e80e Compare September 15, 2017 20:20
@leeren
Copy link
Contributor Author

leeren commented Sep 15, 2017

Currently the heuristics for checking if a key maps to a timestamp is seeing if the key A contains certain keywords, and B has a value that is within a certain epoch timeframe. Timestamps failing to meet B should thus be given a default timestamp value. These fields must not however be grouped together with non-timestamp keys meeting A, which represent a huge subset.

One solution would be to whitelist all known timestamp keys for timestamp rendering, and giving those whose values fail to be rendered the default timestamp. However, this would be overkill since there are hundreds if not thousands of osxcollector-output timestamp-based keys.

The most cost-effective solution right now in my opinion is to simply check for known timestamp keys failing to meet B . From what I've seen, this list is quite small, and we can for now just check it in the code by means of list filtering as done here. As an extension this could be instead made into a JSON containing all keys with known mapping issues.

Copy link
Contributor

@jjsendor jjsendor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship it!

@leeren leeren merged commit 0fe96a8 into master Sep 18, 2017
@leeren leeren deleted the issue_156_invalid_timestamp_types branch September 18, 2017 22:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Timestamp keys are incorrectly mapped to numbers instead of datetime-based strings when invalid
2 participants