Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging: support sending structured logs to stackdriver via stdlib 'logging'. #13

Closed
yuvipanda opened this issue Oct 31, 2018 · 20 comments
Assignees
Labels
api: logging Issues related to the googleapis/python-logging API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@yuvipanda
Copy link
Contributor

google-cloud-logging version: 1.8.0

I want to send structured logs to stackdriver from Python using the standard logging module. You can integrate structured logging to the Python logging module with either https://github.com/madzak/python-json-logger or http://www.structlog.org/en/stable/. Would be great to preserve those structures as part of jsonPayload when sent to stackdriver. Currently, these get stringified and sent as jsonPayload.message instead.

@tseaver tseaver changed the title Logging: Can't send structured logs to stackdriver with Python Logger Module Logging: support ending structured logs to stackdriver via stdlib 'logging'. Dec 7, 2018
@parthmishra
Copy link

What is the status of this request? I know the root issue is that the formatter must know the attributes of the extra arguments passed into the stdlib logger, but is it too much of hassle to the point of being won't fix? My hesitation with the log_struct() method is the synchronous nature of it, perhaps an asynchronous transport mechanism for it could be a useful workaround?

@tseaver
Copy link
Contributor

tseaver commented Jun 11, 2019

@parthmishra The request as it stands (extract the "extra" fields from stdlib's LogRecord and convert them to log_struct) is unlikely to be implemented in this library. This comment on issue googleapis/google-cloud-python#6488 suggests one possibility for working around it.

To deal just with the synchronous bit, you could interact directly with the "background thread handler" transport. Its main worker method just polls the queue for lists of dicts to be pushed to the API: https://github.com/googleapis/google-cloud-python/blob/a14ffbaa50f7823c2792e91413a37cbc3ce687f5/logging/google/cloud/logging/handlers/transports/background_thread.py#L134-L152

@crazystick
Copy link
Contributor

I wanted to do something similar in order to have errors show up in Stackdriver Error Reporting, which is possible according to this issue I opened. I ended up doing it like this, but I think it would be neater if you could subclass _Worker and inject it into CloudLoggingHandler alongside the transport. Anyway, this works:

from google.cloud.logging.handlers.transports.background_thread import BackgroundThreadTransport, _Worker

def my_enqueue(
        self, record, message, resource=None, labels=None, trace=None, span_id=None):
    entry = {
        "info": {
            "message": message, "python_logger": record.name
        },
        "severity": record.levelname,
        "resource": resource,
        "labels": labels,
        "trace": trace,
        "span_id": span_id,
        "timestamp": datetime.utcfromtimestamp(record.created),
    }

    if record.levelno >= logging.WARNING:
        entry["info"]["context"] = {"reportLocation": {"filePath": record.filename, 
                                        "lineNumber": record.lineno,
                                        "functionName": record.funcName}}
        entry["info"]["serviceContext"] = {"service": "my_service", "version": "1"}
        entry["info"]["@type"] = "type.googleapis.com/google.devtools.clouderrorreporting.v1beta1.ReportedErrorEvent"

    self._queue.put_nowait(entry)

# monkeypatch enqueue function
_Worker.enqueue = my_enqueue

cloud_handler = CloudLoggingHandler(client,
                                    transport=BackgroundThreadTransport,
                                    name=...,
                                    resource=...,
                                    labels=...)

@tseaver tseaver changed the title Logging: support ending structured logs to stackdriver via stdlib 'logging'. Logging: support sending structured logs to stackdriver via stdlib 'logging'. Jun 19, 2019
@skunkwerk
Copy link

thanks @crazystick, that almost worked for me... but I'm seeing logs from the background thread itself in my logs (ie. "submitted 1 logs" etc.) Is there any way to exclude that logger? google.cloud.logging.handlers.transports.background_thread

@crazystick
Copy link
Contributor

There is a setup_logging function which you're supposed to use which excludes some internal logging. That function messes around with the root logger, so I do it manually like this. I find that adding the NullHandler is necessary or you end up with an extra StreamHandler which you might not want.

EXCLUDED_LOGGER_DEFAULTS = ("google.cloud", "google.auth", "google_auth_httplib2")

# don't propagate excluded loggers (and don't send them to stderr either)
for logger_name in EXCLUDED_LOGGER_DEFAULTS:
    logging.getLogger(logger_name).propagate = False
    logging.getLogger(logger_name).addHandler(logging.NullHandler())
    
# add cloud handler to root logger
logging.getLogger().addHandler(cloud_handler)

@plamut plamut transferred this issue from googleapis/google-cloud-python Feb 5, 2020
@product-auto-label product-auto-label bot added the api: logging Issues related to the googleapis/python-logging API. label Feb 5, 2020
@plamut plamut added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Feb 5, 2020
@Voyz
Copy link

Voyz commented Feb 6, 2020

Based on @crazystick 's answer, I use the following bodge to log custom objects as info payload. It derives more from the original _Worker.enqueue and supports passing custom fields.

from google.cloud.logging import _helpers
from google.cloud.logging.handlers.transports.background_thread import _Worker

def my_enqueue(self, record, message, resource=None, labels=None, trace=None, span_id=None):
    queue_entry = {
        "info": {"message": message, "python_logger": record.name},
        "severity": _helpers._normalize_severity(record.levelno),
        "resource": resource,
        "labels": labels,
        "trace": trace,
        "span_id": span_id,
        "timestamp": datetime.datetime.utcfromtimestamp(record.created),
    }

    if 'custom_fields' in record:
        entry['info']['custom_fields'] = record.custom_fields

    self._queue.put_nowait(queue_entry)

_Worker.enqueue = my_enqueue

Then

import logging
from google.cloud import logging as google_logging

logger = logging.getLogger('my_log_client')
logger.addHandler(CloudLoggingHandler(google_logging.Client(), 'my_log_client'))

logger.info('hello', extra={'custom_fields':{'foo': 1, 'bar':{'tzar':3}}})

Resulting in:
image

Which then makes it much easier to filter according to these custom_fields.

Naturally, it would be great if this would be naively supported - thanks in advance!

@mauricepoirrier
Copy link

I tried to implement @Voyz solution on a Cloud Function and I couldn't make it work.
I think there is a strange behaviour with CF and monkey patching the module.

def my_enqueue(
    self, record, message, resource=None, labels=None, trace=None, span_id=None
):
    queue_entry = {
        "info": {
            "message": message,
            "python_logger": record.name,
            "error_fields": record.error_fields,
        },
        "severity": _helpers._normalize_severity(record.levelno),
        "resource": resource,
        "labels": labels,
        "trace": trace,
        "span_id": span_id,
        "timestamp": datetime.datetime.utcfromtimestamp(record.created),
    }
    self._queue.put_nowait(queue_entry)


_Worker.enqueue = my_enqueue

And the function...

def index(request):
    logger = logging.getLogger("my_log_client")
    handler = CloudLoggingHandler(
        logger_client, name="my_log_client", transport=BackgroundThreadTransport
    )
    logger.setLevel(logging.INFO)
    logger.addHandler(handler)
    logging.info("hello")
    logging.info("hello", extra={"custom_one": 1})
    logger.info("hello", extra={"error_fields": {"foo": 1, "bar": {"tzar": 3}}})
    return ("", 202)

I would expect some error if logging.info("hello") does not have the error_fields keys.
Is there any update if this will be implemented?

image
No errors :(

@snickell
Copy link

snickell commented May 21, 2020

EDIT TO UPDATE, I've moved the gist into a repo and uploaded it to pypi:
https://github.com/snickell/google_structlog

pip install google-structlog

ORIGINAL for historical purposes:

I've extended @Voyz / @crazystick 's work, and built a pre-canned gist google_structlog:
https://gist.github.com/snickell/3b664c668b8a94d1ca6e31b0498a8697

  • Uses structlog (https://www.structlog.org/en/stable/)
  • Doesn't care how you use structlog, no particular naming constraints on extended attributes
  • Results in structured + searchable JSON in stackdriver
  • Preconfigured and works out of the box on GCP Debian 10 images with google-fluentd installed as per GCP instructions.

Its used like so:

import google_structlog

google_structlog.setup(log_name="here-is-mylilapp")

# Now you can use structlog to get searchable json details in stackdriver...
import structlog
logger = structlog.get_logger()
logger.error("Uhoh, something bad did", moreinfo="it was bad", years_back_luck=5)

# Of course, you can still use plain ol' logging stdlib to get { "message": ... } objects
import logging
logger = logging.getLogger("yoyo")
logger.error("Regular logging calls will work happily too")

# Now you can search stackdriver with the query:
# logName: 'here-is-mylilapp'

Which will produce a nice structured / searchable log inside GCP stackdriver:
Screen Shot 2020-05-21 at 5 31 04 AM

Main difference with what @Voyz had posted is its somewhat structlog specific, and doesn't care what sort of extra values you pass. It also includes all the pre-canned bells and whistles to run out of the box, and shows you what to search for in GCP so you can work backwards to something that works for you but is working by default.

If there's interest, I can polish it a little further and put something up on pypi.

@numbsafari
Copy link

Seems like this would be something obvious and cough idiomatic to be included in this library. Stack Driver doesn't looks so hot next to Sentry without this capability out of the box.

@snickell
Copy link

Yeah, its a little frustrating to have "structured logging" with no simple ability to get in on it and inject your own domain-specific fields. That's where structured starts paying for its "not as easy to scan" downsides. The system provided structure fields are sort of 🤷‍♂️ compared to domain specific info that's super easy to create in code, and frustratingly hard to get into the log analysis tools in GCP.

@numbsafari
Copy link

In my case I'm deploying to GKE, so I needed to modify the GKE handler. I went ahead and copied the approach used by Sentry for extracting the "extra" dictionary, and incorporated things like the python file and line number into a separate dict as well. Works for now.

Perhaps @mauricepoirrier , you need to monkeypatch the ContainerEngineHandler like I did, rather than modifying the transport, in order to get it to work with Cloud Functions?

@treethought
Copy link

@numbsafari do you have an example of what you did to modify GKE handler?

@nimish
Copy link

nimish commented Jun 18, 2020

What's the solution here on cloud functions? Using the google_structlog hack doesn't send the trace info for some reason and sticks the logs under a different name (fine, but confusing). I'm guessing the log agent on cloud functions does this automatically?

@snickell
Copy link

I dropped the gist into a repo here:
https://github.com/snickell/google_structlog

Its uploaded to PyPi, so available as:

pip install google-structlog

@snickell
Copy link

I'm not using cloud functions, so it'd be really cool if somebody who was could figure out how to make it work "least surprise" in that context too. I'm mostly using python apps running on GKE, for which this works reasonably well for me.

Hopefully somebody at Google will take this off our hands, but until then I think starting with a repo and refining it with issues etc is probably warranted at this level of complexity.

@simonz130 simonz130 self-assigned this Jun 19, 2020
@nimish
Copy link

nimish commented Jun 19, 2020

I'm not using cloud functions, so it'd be really cool if somebody who was could figure out how to make it work "least surprise" in that context too. I'm mostly using python apps running on GKE, for which this works reasonably well for me.

Hopefully somebody at Google will take this off our hands, but until then I think starting with a repo and refining it with issues etc is probably warranted at this level of complexity.

I took a look. It seems like most things the default stdout logger sends can be provided except for trace id. So it's not a huge deal -- the nodejs lib for gcp logging does it -- but it's not complete.

@caiolopes
Copy link

caiolopes commented Nov 6, 2020

@snickell your solution is not working for me on Google App Engine, runtime python37... Do you have any thoughts on why?

I really enjoy structlog lib and was hoping I could use your proposed setup!

Some tests I made:

import logging
import google_structlog

google_structlog.setup(setup_google=True)

logger = google_structlog.getLogger()
logger.info("Google setup 1")

logger = logging.getLogger()
logger.info("Google setup 2")

print("Google setup 3")

image

Doing the proposed setup in here https://cloud.google.com/logging/docs/setup/python works, but not for structured data of course

import google.cloud.logging
# Instantiates a client
client = google.cloud.logging.Client()

# Connects the logger to the root logging handler; by default this captures
# all logs at INFO level and higher
client.setup_logging()

logger = logging.getLogger(__name__)
logger.info("Google setup 4")

image

@simonz130 simonz130 added the priority: p2 Moderately-important priority. Fix may not be included in next release. label Nov 7, 2020
@simonz130 simonz130 removed their assignment Nov 7, 2020
@HynekBlahaDev
Copy link

Hi, I recently came across the issue with structured logging for Google Cloud.
My initial idea was to use structlog with two different renderers: One for console (local development) and one for AppEngine (prod) - inspired by https://github.com/kiwicom/kiwi-structlog-config.

However when I ran it in app engine standard, nothing showed up in logs... 🤨

The official documentation says that everything logged to stdout will be send to Cloud Logging.

Note: Logs written to stdout and stderr are automatically sent to Cloud Logging for you, without needing to use Cloud Logging library for Python.

I used this example for writing structured logs, but nothing shows in the Cloud Logging Explorer.

Did something change and the docs are no longer up-to-date? Any idea what could I be doing wrong?
Is the stdout logging no longer supported and that's why you are using google.cloud.logging library?

Thanks a lot!

@daniel-sanche daniel-sanche added this to the v2.5.0 milestone Feb 4, 2021
@meseta
Copy link

meseta commented Feb 4, 2021

are we getting structured logging!? woohoo!

@daniel-sanche daniel-sanche removed this from the v2 Update 2 milestone Apr 20, 2021
@lorenzznerol
Copy link

I tried to implement @Voyz solution on a Cloud Function and I couldn't make it work. I think there is a strange behaviour with CF and monkey patching the module.

def my_enqueue(
    self, record, message, resource=None, labels=None, trace=None, span_id=None
):
    queue_entry = {
        "info": {
            "message": message,
            "python_logger": record.name,
            "error_fields": record.error_fields,
        },
        "severity": _helpers._normalize_severity(record.levelno),
        "resource": resource,
        "labels": labels,
        "trace": trace,
        "span_id": span_id,
        "timestamp": datetime.datetime.utcfromtimestamp(record.created),
    }
    self._queue.put_nowait(queue_entry)


_Worker.enqueue = my_enqueue

And the function...

def index(request):
    logger = logging.getLogger("my_log_client")
    handler = CloudLoggingHandler(
        logger_client, name="my_log_client", transport=BackgroundThreadTransport
    )
    logger.setLevel(logging.INFO)
    logger.addHandler(handler)
    logging.info("hello")
    logging.info("hello", extra={"custom_one": 1})
    logger.info("hello", extra={"error_fields": {"foo": 1, "bar": {"tzar": 3}}})
    return ("", 202)

I would expect some error if logging.info("hello") does not have the error_fields keys. Is there any update if this will be implemented?

image No errors :(

This question here is not for a cloud function, there, you have to use google.cloud.logging instead of the stdlib Python logging.

This question here is just about getting it to work in a local Python script, using the GOOGLE_APPLICATION_CREDENTIALS to connect to GCP and drop the logs there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: logging Issues related to the googleapis/python-logging API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests