Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid calling decode() and allocating strings for logging if said log lines will no-op anyways. #93

Merged
merged 2 commits into from
Apr 4, 2024

Conversation

judilsteve
Copy link

These lines were throwing in my use case because chunk was not a valid UTF-8 byte sequence.

Even when these lines don't throw exceptions, they're still a performance penalty incurred in cases where the chunk object is large: The f-string is evaluated and allocated, even if the log level means that the debug logs are skipped, because the arguments to the _log.debug() function must be computed before the function can be called.

This PR proposes a simple fix.

@judilsteve
Copy link
Author

judilsteve commented Apr 3, 2024

For anyone interested, the instigator for this was implementing brotli compression:

from typing import Any, AsyncIterable
from sse_starlette.sse import EventSourceResponse, ServerSentEvent
import brotli
import orjson

def compressed_eventsource(events: AsyncIterable[Any]):
    compressor = brotli.Compressor(mode=brotli.MODE_TEXT, quality=7)

    async def compress_event_stream(events: AsyncIterable[Any]):
        async for event in events:
            encoded = ServerSentEvent(**event).encode()
            compressed = compressor.process(encoded) + compressor.flush()
            yield compressed
        yield compressor.finish()

    # Pings must be compressed too; can't mix and match compressed data with plaintext
    def ping():
        ping = ServerSentEvent(comment=f"ping - {datetime.utcnow()}")
        return compressor.process(ping.encode()) + compressor.flush()

    return EventSourceResponse(compress_event_stream(events), headers={'Content-Encoding': 'br'}, ping_message_factory=ping)

app = FastAPI()

@app.get('/test_brotli')
async def test_brotli():
    async def event_stream():
        for i in range(5):
            yield { 'data': f"Hello, world {i}" }
            await asyncio.sleep(timedelta(seconds=5).total_seconds())

    return compressed_eventsource(event_generator())

The above minimal example generates valid brotli-compressed event streams (as far as I can tell; I tested in Firefox 125).

@judilsteve
Copy link
Author

judilsteve commented Apr 3, 2024

Passing f"chunk: {chunk}" to the _log.debug() call instead of f"chunk: {chunk.decode()}" would also potentially be wise: This will cause the chunk to be written out into the logs as a byte-string literal, e.g. b'foobarbaz', with \xFF-style escape sequences for byte sequences that don't represent valid UTF-8 characters (instead of just throwing). But the call to _log.debug() should still be conditional to avoid computing/allocating this (potentially very large) string unless it's actually going to be logged.

@sysid
Copy link
Owner

sysid commented Apr 3, 2024

@judilsteve thanks for your PR. Seems to be an interesting use-case that you have. Could you please provide an example for a chunk which would be better handled by f"chunk: {chunk}"instead of f"chunk: {chunk.decode()}".

Would _log.debug("chunk: %s", chunk) also meet your objective of having a noop in case log-level is higher than debug? I'd prefer this syntax, actually.

@judilsteve
Copy link
Author

Could you please provide an example for a chunk which would be better handled by f"chunk: {chunk}"instead of f"chunk: {chunk.decode()}".

Almost any chunk compressed with brotli will cause chunk.decode() to throw, since .decode() is trying to interpret the byte sequence into a UTF-8 string, but brotli-compressed data is raw binary.

There are certain sequences of bytes that do not map to valid unicode characters in UTF-8. Some decoders will map these unrepresentable sequences to a "default" replacement char, but if calling .decode() with no arguments, it will raise a UnicodeError. The handling of unrecognised byte sequences can be configured via the errors kwarg, but you're still losing information compared to just logging str(chunk), which will print escape sequence literals to show you the actual content of those bytes.

Would _log.debug("chunk: %s", chunk) also meet your objective of having a noop in case log-level is higher than debug? I'd prefer this syntax, actually.

Yes, as far as I know, this syntax also defers any computation of large strings until it is known that the log line will actually be emitted. I only kept the f-string in my initial implementation in case it was part of a deliberate style choice to use f-strings over sprintf-style templating. I will update the PR to use this syntax.

@sysid sysid merged commit ab6f3ef into sysid:main Apr 4, 2024
0 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants