Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: replace system fields value from seconds parameters #3217

Open
VladimirGorobetsWildix opened this issue Oct 17, 2024 · 10 comments
Open
Assignees
Labels
bug Something isn't working confirmed The scope is clear, ready for implementation logger This item relates to the Logger Utility

Comments

@VladimirGorobetsWildix
Copy link

Expected Behavior

System fields should not be replaced by values ​​from additional parameters.
If duplicate fields occur, a prefix should be added for fields from additional objects.

For example message -> _message

Current Behavior

import {Logger} from '@aws-lambda-powertools/logger';

const logger = new Logger({
    serviceName: 'production-server',
});

const data = {
    timestamp: 'one year ago',
    name: 'user',
    message: 'message from chat',
    level: 'poweredUser',
    service: 'internal chat',
}

logger.info('test message', {user: 'admin'});

logger.info('very important log key', data);

Received result

{"level":"INFO","message":"test message","sampling_rate":0,"service":"production-server","timestamp":"2024-10-17T14:40:49.209Z","user":"admin"}

{"level":"poweredUser","message":"message from chat","sampling_rate":0,"service":"internal chat","timestamp":"one year ago","name":"user"}

First log is correct, but in the second log all system fields have been based on the fields of the second object.

This is a very non-obvious behavior that results in the loss of important logs and many hours of debugging and understanding the cause.

Code snippet

import {Logger} from '@aws-lambda-powertools/logger';

const logger = new Logger({
    serviceName: 'production-server',
});

const data = {
    timestamp: 'one year ago',
    name: 'user',
    message: 'message from chat',
    level: 'poweredUser',
    service: 'internal chat',
}

logger.info('test message', {user: 'admin'});

logger.info('very important log key', data);

Steps to Reproduce

  1. run snippet code
  2. find changed system fields (timestamp, message, level.....)

Possible Solution

If duplicate fields occur, a prefix should be added for fields from additional objects.

For example message -> _message

Powertools for AWS Lambda (TypeScript) version

latest

AWS Lambda function runtime

20.x

Packaging format used

npm

Execution logs

{"level":"INFO","message":"test message","sampling_rate":0,"service":"production-server","timestamp":"2024-10-17T14:40:49.209Z","user":"admin"}

{"level":"poweredUser","message":"message from chat","sampling_rate":0,"service":"internal chat","timestamp":"one year ago","name":"user"}

@VladimirGorobetsWildix VladimirGorobetsWildix added bug Something isn't working triage This item has not been triaged by a maintainer, please wait labels Oct 17, 2024
Copy link

boring-cyborg bot commented Oct 17, 2024

Thanks for opening your first issue here! We'll come back to you as soon as we can.
In the meantime, check out the #typescript channel on our Powertools for AWS Lambda Discord: Invite link

@dreamorosi
Copy link
Contributor

Hi, thank you for opening this issue.

I believe this is intended, according to our documentation here (emphasis mine):

Pass one or multiple objects containing arbitrary data to be logged. Each data object should be placed in an enclosing object as a single property value, you can name this property as you need: { myData: arbitraryObjectToLog }

To avoid the behavior you're seeing, the data parameter should be within an object:

logger.info('very important log key', { data });

which I confirmed produces:

{
    "level": "INFO",
    "message": "very important log key",
    "sampling_rate": 0,
    "service": "production-server",
    "timestamp": "2024-10-17T15:14:59.825Z",
    "data": {
        "timestamp": "one year ago",
        "name": "user",
        "message": "message from chat",
        "level": "poweredUser",
        "service": "internal chat"
    }
}

This is necessary because by default we merge each parameter with the JSON object of the log, so if your parameters have objects with duplicate keys they'll overwrite existing keys.

I understand how the current wording doesn't explicitly state this, and I am sorry to hear it led to data loss. We'll consider improving the section, perhaps with a more explicit disclaimer.

@dreamorosi dreamorosi added logger This item relates to the Logger Utility discussing The issue needs to be discussed, elaborated, or refined not-a-bug New and existing bug reports incorrectly submitted as bug and removed bug Something isn't working triage This item has not been triaged by a maintainer, please wait labels Oct 17, 2024
@dreamorosi dreamorosi moved this from Triage to On hold in Powertools for AWS Lambda (TypeScript) Oct 17, 2024
@dreamorosi dreamorosi added the need-response This item requires a response from a customer and will considered stale after 2 weeks label Oct 17, 2024
@VladimirGorobetsWildix
Copy link
Author

A warning in the documentation is very good, but trying to fix potential developer errors would be great.

This is very unobvious behavior because it is not limited in any way with the help of TypeScript, and the developer constantly needs to remember this feature of the logger.

Yesterday, one of our developers spent a lot of time because of this feature of the logger.

It is very difficult for a developer who is using your logger for the first time to understand why this can be done

logger.info('very important log key', {info: data});

but that's not possible

logger.info('very important log key', {message: data});

@VladimirGorobetsWildix
Copy link
Author

In our case, the developer followed your recommendation completely, but got unexpected behavior.

const message = await getMessageById(id);

logger.info('received message', {message});

@dreamorosi
Copy link
Contributor

In our case, the developer followed your recommendation completely, but got unexpected behavior.

const message = await getMessageById(id);

logger.info('received message', {message});

This case is actually the next bullet point in that same section I linked above, and the fourth case in the code example right below it.

@dreamorosi
Copy link
Contributor

Either way, I see where the confusion comes from and there is room for improvement.

I don't think aliasing the fields is the right choice here, but I also don't have a better idea yet.

Please allow me some time to discuss it with the team, and see what comes up.

@dreamorosi dreamorosi added bug Something isn't working and removed not-a-bug New and existing bug reports incorrectly submitted as bug need-response This item requires a response from a customer and will considered stale after 2 weeks labels Oct 17, 2024
@dreamorosi dreamorosi moved this from On hold to Backlog in Powertools for AWS Lambda (TypeScript) Oct 17, 2024
@leandrodamascena
Copy link
Contributor

leandrodamascena commented Oct 17, 2024

Hey @dreamorosi and @VladimirGorobetsWildix! I would like to share my impressions about this topic.

In my opinion we should first distinguish between events fields and mandatory/default fields. These mandatory fields serve a purpose: they keep things organized and follow specific rules/protocol/rfc in that programming language. Let's look at the OTEL Data Model as an example. It has fields that must follow a certain pattern. These fields are part of how the Logger Data Model/Feature, not part of the actual event being logged.

I also think that we need to look at this from the perspective of the tools that will retain/consume those logs. Let's imagine we ignore the RFC/Protocol and allow customers to override mandatory fields with any values they want. This could lead to several problems:

  • Log Levels might become inconsistent. Instead of standard levels like "DEBUG", we could end up with custom levels like "MY_LEVEL_DEBUG".
  • Timestamps could be misleading. The event timestamp might be used instead of the logger timestamp, which are often different and serve different purposes.
  • Location information could become unreliable. It might refer to files that don't actually exist in the system.

This could become a significant issue for everyone, as they may no longer trust their logs due to a lack of reliability.

On the other hand, events often have custom fields that customers want to log and query later. This is a valid and common use case. However, if a customer wants to log the entire event with all its fields, regardless of what they are, this information should be included in the msg/message field rather than as separate arbitrary fields.

In Powertools Python you can do that using this code:

from aws_lambda_powertools import Logger

logger = Logger()

event = {
    "timestamp": 'one year ago',
    "name": 'user',
    "message": 'message from chat',
    "level": 'admin',
    "service": 'internal chat',
}
logger.info(msg=event)

And then you'll have this output that can be used to query the information the customer needs:

{"level":"INFO","location":"<module>:12","message":{"timestamp":"one year ago","name":"user","message":"message from chat","level":"admin","service":"internal chat"},"timestamp":"2024-10-17 19:18:40,459+0100","service":"service_undefined"}

Regarding to the possible solution:

If duplicate fields occur, a prefix should be added for fields from additional objects.

For example message -> _message

I don't know if I would go down this path, we are changing the customer's information and this can create some questions and unwanted side effects.

The solution I think of at the moment is that if we find any extra field that is a keyword, we issue a warning. I think this is good because of the following reasons:

1 - We do not change customer information
2 - We improve the feedback loop by informing the developer that something is wrong and need to be fixed
3 - We do not raise any exceptions, which in observability is an excellent path to follow.

What do you all think?

Thanks

@dreamorosi
Copy link
Contributor

I think the warning, as well as clearer documentation, would make the most sense.

I agree that we shouldn't overwrite nor rename any field.

@VladimirGorobetsWildix
Copy link
Author

Thanks @leandrodamascena

I also think that we need to look at this from the perspective of the tools that will retain/consume those logs. Let's imagine we ignore the RFC/Protocol and allow customers to override mandatory fields with any values they want.

This is a very good point. Such non-obvious behavior can break the logic of log processing and we can lose an important part of the data.

Still, I am of the opinion that secondary parameters have no right to change the values ​​of system fields.

@dreamorosi dreamorosi self-assigned this Nov 8, 2024
@dreamorosi dreamorosi moved this from Backlog to Working on it in Powertools for AWS Lambda (TypeScript) Nov 8, 2024
@dreamorosi dreamorosi added confirmed The scope is clear, ready for implementation and removed discussing The issue needs to be discussed, elaborated, or refined labels Nov 8, 2024
@dreamorosi
Copy link
Contributor

I have started working on a fix for this bug, we're settling on the option of 1/ emitting a warning when we detect customers are trying to overwrite a reserved key, and 2/ dropping that key to preserve the original ones.

I'm trying to minimize the overhead of this check by adding it in some targeted code paths, so it'll take a bit longer. I'll continue working on this into next week and hope to include the fix in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working confirmed The scope is clear, ready for implementation logger This item relates to the Logger Utility
Projects
Status: Working on it
Development

No branches or pull requests

3 participants