Read multiple large log files at the same time
This application makes a lot of assumptions that are not necessarially true. Maybe this will improve over time.
If you find yourself trying to view log files generated by multiple applications side by side and struggling to come up with the perfect awk or sed syntax to parse and sort events, then maybe this will help.
- each row in the log file will have the date in there somewhere using some format.
- the log files might be structured or un-structured
- each entry is a single line (sorry, stack traces will look weird)
- all the log files have approximately the same start time
- the log files could be very large. Who has heard of logrotate anyway?
- Multiple files are opened and their content is fanned into a single goroutine responsible for sorting.
- The sorting goroutine maintains a sorted heap, the size of which determines the window for correct sorting. The larger the heap, the more we are able to deal with out-of-order log entries.
- Once the the heap has reached capacity, items at the bottom of the heap are most likely sorted correctly, and those are popped off for display.
- Finally, once all the inputs have finished reading, the heap is emptied and the last sorted lines are popped off for display
Processing files that are not close to each other in time, or files that are much more verbose than others, will result in incorrect sorting. Maybe I'll fix this in the future. The reason this happens is that the fan-in from files is evenly between all inputs, but some files might have timestamps well into the future, and so they just hang out in the heap until eventually the heap is filled with incorrectly-sorted entries.
No. Use a log aggrigator.