-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Compression of Rolled Log Files #48
Comments
Thanks for the suggestion! We don't have any plans to attempt this at present - the sink is heavily text-oriented. It would be interesting to see if there are any extension points we could provide that would enable it, but I suspect they would need to be quite intricate. |
If I'm not mistaken, all the actual writing is centralised in A couple of notes:
An alternative is to have some kind of hook for when files are rolled, allowing the application to do something like ZIP the file. But I'm not sure if there is any 'precedent' for hooks in any of the other Serilog sinks? Personally, in any case I'd favour a streaming implementation for its more predictable CPU and disk usage. |
@nblumhardt have you any thoughts on this? |
Thanks for the nudge! Personally, and this might not be an informed point of view :-), I don't find compressed-file logging appealing. On Windows at least, it's likely easier and more ergonomic just to mark a folder as "compressed" and let the OS/filesystem deal with it, if you're stuck with low-disk-space local file logging. With disk being at least as cheap to expand as CPU, it seems like adding storage is also probably a better bang-for-the-buck solution. Adding something like a There are likely quite a few gory details to figure out, though - it'd require some dedicated time and effort. If anyone's interested in enumerating the use cases, how they'd be achieved, and the rough API/internal design, it'd be a worthwhile discussion. |
My particular use case is for logging audit logs straight to an archive - the expectation is that these will be needed only rarely, so it makes sense to compress them from the start. I also need it to work on both Windows and Linux. While storage is cheap, we also expect to log a lot of data, and with excellent compression ratios achievable with almost no impact on CPU or memory, it's hard to think of a reason not to compress files. Have you ever used Windows compressed folders before? I ask, because if you had, you likely wouldn't have suggested it ;) They have unfortunately always had terrible compression ratios and awful performance. If I get some time, I'll come up with a couple of suggestions, for both the streaming (some kind of |
Don't say I didn't warn you ;-). We use Snappy compression in Seq 👍 .. I should have further qualified:
It's been a long time since I thought of a local log file as anything more than a temporary destination while logs make their way off to a central archive, where compression is pretty normal. Auditing directly to the archive is an interesting twist on this. Will be interesting to see what you come up with! |
I have started to work on resolving this issue with @lajoie for our specific use case. We are basing the compression choice parameters off Java's logback library. As of now I am planning on checking parameters at the end of a roll period for compression, then compressing the log file using dotnet's FileZip to create a zip for each file. Once the compressed version is created, the original file will be deleted. I am working on gzip compression as well. Do you have any suggestions related to this method or things I should consider? |
@AdamUdovich I've just got round to thinking about this again too. I'm interested to see what you do, and was wondering if it's extensible? In particular, it would be nice to easily switch out the compression algorithm - for example by specifying your own implementation of an @nblumhardt what do you think about this? If we put aside the option of streaming compression and focus on compressing rolled files, these are the options I had been considering:
|
@cocowalla I do not currently have plans to make the compression system extensible, but I can check with @lajoie to see if it is something we would want to work on. One question you might be able to answer is how unit tests should be structured for testing the rolling compression. Do you have specific testing goals for a basic Zip and GZip system that serilog would like to see? |
@AdamUdovich I think there are 2 parts:
There are existing tests for file rolling, so it makes sense to build on them. Also, just so it's clear: I don't own this repo, I'm just a Serilog user who has written a few sinks, and would like compression to be added :) @nblumhardt, would be great to get your thoughts on compression, especially with regards to the last few comments here? |
Ok thank you @cocowalla ! I do not have an option to not delete non compressed files a the moment. I can check if this is something I will build in or if a repo admin says it should be included. |
@AdamUdovich Linux' |
@cocowalla do you think Seri log would like to see functionality based on logrotate for this type of compression fucntionality? Is this a question more for @nblumhardt on what the repo would like to see done structure wise in general? |
@AdamUdovich yes, I'd consider the option to delete compressed files as essential for any log rolling feature. Would be good tho to get some input from @nblumhardt on the questions of extensibility, algorithm support and the like. |
@cocowalla 👍 I had in mind that the primary challenge here would be to provide an appropriate hook/set of hooks in the file sink to allow compression (and other odd cases such as perhaps "headers" and "footers") to be plugged in. At least at the outset, I think it'd be better having difference schemes and algorithms provided as additional packages/projects not under this repo. The main challenges around doing this (from the File sink's perspective), IIRC, are synchronization (the sink currently has no means of knowing when the last writer has finished with the outgoing file, and we'd ideally not introduce more contention in fixing this) and state management (at startup, maybe after a crash, for example, do we have any outstanding "rolled" hooks to invoke?) HTH; seems like getting a fork going and experimenting with different approaches might be a way to work through this all more smoothly. Cheers! |
@nblumhardt do you mean that the file can still be written to after the call to @AdamUdovich would be great if you could push your fork to GitHub so we can see how you're approaching some of the challenges? |
@nblumhardt as an experiment, I just added support for wrapping the output stream, which enables things like streaming GZIP compression. Turns out this is much simpler than adding a 'rolling hook' 😏 My fork: https://github.com/cocowalla/serilog-sinks-file/tree/feature/streaming-compression I've only added this to This is just an experiment, so I've kept it as bare-bones as possible. One thing of note is that the |
@cocowalla I will be able to push my fork in the next couple of days. We are reviewing it internally first. My GZip compression uses a stream like your test, however, I do not use the sink.Emit(event). Can you touch on what Emit() does in your test on line 162,163? |
@AdamUdovich you had earlier said you would be implementing compression after rolling, which is why I created a fork in the other direction, using streaming compression?
@nblumhardt have you had any further thoughts about streaming vs rolling compression? If you think streaming compression is a goer, I'm happy to develop my fork a bit further. |
No opinions here, currently, @cocowalla :-) Sorry to not be more help right now but watching with interest. |
@cocowalla I opened a pull request with my changes to support compression on roll. If I should not open a pull request and just share the branch let me know. I have included a few unit tests as well which test the magic byte number. |
@AdamUdovich I've added a few questions/comments in a few of your most recent commits, but I confess I'm finding your changes a little difficult to follow. Perhaps you could talk through exactly what is happening here: var prevLog = Directory.GetFiles(_roller.LogFileDirectory, _roller.DirectorySearchPattern)
.Select(Path.GetFileName).LastOrDefault();
var logDirectory = _roller.LogFileDirectory;
CloseFile();
if (_compressionType != CompressionType.None)
{
Compress(prevLog, logDirectory, _compressionType);
var directoryFiles = Directory.GetFiles(_roller.LogFileDirectory, _roller.DirectorySearchPattern);
foreach (var directoryFile in directoryFiles)
{
var directoryFileName = Path.GetFileName(directoryFile);
// Not sure if we can assume log files will always be .txt?
if (System.IO.Path.GetExtension(directoryFileName) == ".txt")
{
Compress(directoryFileName, logDirectory, _compressionType);
}
}
}
...
public void ZipCompress(string logFile, string logDirectory)
{
var readDirectory = Path.Combine(logDirectory, "new_dir");
Directory.CreateDirectory(readDirectory);
System.IO.File.Move(
Path.Combine(logDirectory, logFile), Path.Combine(readDirectory, logFile)
);
var zipName = Path.GetFileNameWithoutExtension(logFile);
var zip_path = Path.Combine(logDirectory, $"{zipName}.zip");
System.IO.Compression.ZipFile.CreateFromDirectory(readDirectory, zip_path);
Directory.Delete(readDirectory, true);
}
public void GZipCompress(string logFile, string logDirectory)
{
var inFile = Path.Combine(logDirectory, logFile);
var logName = Path.GetFileNameWithoutExtension(logFile);
var outFile = Path.Combine(logDirectory, $"{logName}.gz");
using (var outputStream = System.IO.File.Create(outFile))
using (var inputStream = System.IO.File.OpenRead(inFile))
using (var gzipStream = new System.IO.Compression.GZipStream(outputStream, System.IO.Compression.CompressionMode.Compress))
{
int bufferSize = 1024;
var buffer = new byte[bufferSize];
inputStream.CopyTo(gzipStream, bufferSize);
}
System.IO.File.Delete(inFile);
} Also:
|
Hi @cocowalla sorry for the delay. We did not think it was necessary to make an option to save the files after compressing them in an uncompressed state. Is this something @nblumhardt would have to comment on for what the final option should be? I can adjust the buffer size if you think this would be best. The default buffer looks to be about 80Kb. link: https://docs.microsoft.com/en-us/dotnet/api/system.io.stream.copyto?view=netframework-4.7.2 I am not sure about addressing any synchronization or state issues. I asked @lajoie about these and we do not believe they are necessary for our project. I can address them if @nblumhardt is able to elaborate on them some more and time permits. Finally, The |
|
@nblumhardt I'm still thinking that the best way forward here is to add 2 extensibility points that allow for either/both of:
These 2 would enable the creation of a range of new sinks that extend My fork demonstrated the ease with which the former can be added. Wrapping the output stream supports a variety of use cases, such as streaming compression and encryption (see #71). For the latter, I personally don't think compression should be built-in to @nblumhardt , you've been unapologetically quiet on this so far 😄! What are your thoughts on the question of either building-in specific ZIP compression functionality, or adding extensibility points that enable a range of scenarios? Also, if you like the idea of extensiblity points, would you accept a PR for my stream-wrapping extensibility point, if I first ensure it plays nicely with JSON/AppSettings configuration? (at the least, I guess I'd need to switch out the |
Hey! Nice to see this rolling along ;-) Regarding zip-specifics, no, I don't think we'd want to deal with compression, encryption, etc. in this repo. Some basic hooks like you've described sound best, as long as they're easy enough to use correctly (i.e. we're clear on what synchronization is required by implementers). Is the stream wrapping alone enough for your use case, @cocowalla ? If so, perhaps we start on a PR that adds the extension point, in a non-breaking way, to Ideally we'd take the same approach to the post-roll handler - there's more to work out for that one, but perhaps once the first PR is in, we can take a look at adding on top? Regarding concurrency and so on @AdamUdovich - coming back around to this now, we could probably figure out most gotchas in the PR process. Finally, to get it into a stable release, I think we'd need at least one usable implementation of each extension point published somewhere (a pair of extension NuGet packages?) Without some wider usage/feedback it will be hard to be sure we've got it right :-) |
@nblumhardt excellent, then on my side I'll keep the ball rolling on the stream extensibility point 😆 |
@nblumhardt I rolled up my sleeves (OK, these are getting boring now 🙄) and got a PR in for you to review. |
#81 relates to this. |
#80 is merged an on NuGet, let's see how this goes 😎 |
It would be nice if the File sink supported compressing files as they're rolled out in order to help save on disk space.
The text was updated successfully, but these errors were encountered: