Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Azure Event Hub]- Sending Event Message with Partition Key of type int is DANGEROUS #12353

Closed
aliyevmirzakhan opened this issue Jul 2, 2020 · 17 comments
Assignees
Labels
blocking-release Blocks release Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Event Hubs needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Milestone

Comments

@aliyevmirzakhan
Copy link

Hi, a few days ago I wrote script that sends the real time events to Event Hub. I used Azure Durable Functions to implement parallelism, and in doing so I used partition_key to ensure events with same id is directed to same partition. For some reason I thought the key can be single number and thus I used int types as partition keys, and while sending I never had any error about it. Later I implemented another function App that listens to Event Hub, and triggered when new message is received in Even Hub, and while doing so I was getting the following error at each firing:
Unable to cast object of type 'System.Int32' to type 'System.String

At first I thought may be this is due to encoding of str files that send which took me quite a long time to figure out this was because my usage of int value as partition key due to which Event Hub backend could not generate partition key hashes.

I am wondering why no Exception is raised if int type is passed as partition_key argument at the time of sending message string, and may be this is bug to be considered.

@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jul 2, 2020
@aliyevmirzakhan aliyevmirzakhan changed the title **[Azure Event Hub]** - Sending Event Message with Partition Key of type int is **dangerous** [Azure Event Hub]- Sending Event Message with Partition Key of type int is DANGEROUS Jul 2, 2020
@kaerm kaerm added Client This issue points to a problem in the data-plane of the library. Event Hubs labels Jul 6, 2020
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Jul 6, 2020
@kaerm
Copy link
Contributor

kaerm commented Jul 6, 2020

@aliyevmirzakhan thanks for reporting this, @yunhaoling, @YijunXieMS can you take a look at this?

@yunhaoling
Copy link
Contributor

yunhaoling commented Jul 6, 2020

Hello @aliyevmirzakhan ,

May I first confirm which language and what version of the EventHub sdk are you using?

Because from the error information Unable to cast object of type 'System.Int32' to type 'System.String, it looks like you're using EventHub sdk of .NET, while this repo mainly focuses on Python SDKs.

@johanste
Copy link
Member

johanste commented Jul 7, 2020

I believe the issue is more likely somewhere in the azure event hubs binding for functions. I don't know the protocol well enough to definitively state if an integer is a valid partition key or not - but if my understanding is correct, the value gets transmitted all the way to the receiver, which would imply that it is.

@jsquire
Copy link
Member

jsquire commented Jul 7, 2020

Hi @aliyevmirzakhan. Would you be so kind as to help clarify a bit more about your scenario so that we can make sure to get the right folks involved to assist? It would be helpful to understand:

  • How is the event being published to Event Hubs? The .NET SDKs only allow a string to be used as a partition key when publishing events, so it would seem that the event is being published using some other means. Is it the Python SDK being used to publish the event?

  • Are the Azure Functions that you're using both Python, both .NET or a mix? Can you clarify which stack is generating the exception?

@kaerm kaerm added the needs-author-feedback Workflow: More information is needed from author to address the issue. label Jul 7, 2020
@YijunXieMS
Copy link
Contributor

@johanste Yes the protocol supports string, int (and maybe other types) from our test.
We decided to make it a string for all languages back then. Python has type annotation and documentation to indicate it's a string like below.

        :keyword str partition_key: ...

But we don't raise an error when a user passes an int. If we add the validation now, is this regarded as a break change?

@johanste
Copy link
Member

johanste commented Jul 7, 2020

What does the service support?

@jsquire
Copy link
Member

jsquire commented Jul 7, 2020

The service contract is not formally documented, to my knowledge. The closest that I'm aware of is the AMQP 1.0 in Azure Service Bus and Event Hubs protocol guide, which does not contain formal type information. The entry for the PartitionKey annotation simply links back to the Track Zero client property, which is typed as string.

In our design discussions with the service team, my recollection is that they requested string be used for both Partition Key and Partition Id. I'll ask @ramya-rao-a to confirm that is her understanding as well. For what it's worth, the rest of the languages for both Track One and Track Two are allowing only string.

@ramya-rao-a
Copy link
Contributor

What does the service support

Partition Key is set in the message annotations part of the AMQP message which is essentially a map i.e key-value pair. The type of the value is retained. Therefore, if user passes a non string value for partitionKey and the client passes it along as is, the service will receive the message with partition key as a non string value.

The service does not seem to have any issues with such a message.
We can follow up with the service team on exactly what they do here.

The error we are seeing here (and in Azure/azure-event-hubs-spark#424) is on the receiving side where the client expects a string, but finds a non string value. We read again from the same message annotations part of the AMQP message which I believe does not get mutated and has the same value that the sending client had set.

This is a problem with interop across different clients.

While we can do our due diligence when sending to always ensure that partitionKey is set as string (JS SDK does this, we cast whatever user provides as partitionKey to string before setting it in the message annotations), we are not responsible for third party AMQP implementations out in the wild.

Since the service does not seem to have a problem reading and using a non string partitionKey, it would be interesting to see what we can do at the receiving end to solve this.

@aliyevmirzakhan
Copy link
Author

Hi I am using Python runtime for both sending messages to Event Hub, and receiving them in a Function App that is triggered by Event Hub.

@ghost ghost added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team and removed needs-author-feedback Workflow: More information is needed from author to address the issue. labels Jul 8, 2020
@jsquire
Copy link
Member

jsquire commented Jul 8, 2020

Thank you, @aliyevmirzakhan.

Based on that, I think @johanste's initial assessment is likely. The Azure Functions infrastructure is based on .NET and the bindings are running in the context of that infrastructure using the .NET Event Hubs client rather than running in the context of the Python language worker and using the Python SDK.

The root cause would appear to be that the partition key is enforced as string by the Track One and Track Two client libraries for all languages other than Python. The question, in my opinion, comes down to "should we guarantee that events published from a language can be read by the other languages and across versions of the SDK?"

Assuming so, I believe the choice comes down to:

  • Do we investigate whether there is a way to be looser with type checking and coerce non-string values to string in the non-Python libraries?
  • Do we introduce a breaking change to the Python library to validate the type matches the documentation?
  • Do we leave the Python behavior as-is and consider a stronger warning in documentation about cross-language compatibility?
  • Do we introduce breaking changes to the other libraries to allow a generic object value and, if so, does that include the track one libraries as well?

As @ramya-rao-a mentions, so long as the service team does not have a service-specific reason to disallow, the first is probably our most promising starting point. The tricky part of that will be that the Azure Functions bindings are still using the track one Event Hubs client.

@ramya-rao-a
Copy link
Contributor

cc @serkantkaraca for input from the service team

@serkantkaraca
Copy link
Member

Both track-0 and track-1 .NET clients explicitly cast partition key value from message-annotations-map into EventData.PartitionKey. Service doesn't do any validation on the delivered type and assumes sender library already conducted sanitary checks. This applies to every entry in the annotations map as far as I know.

Unfortunately the messaging AMQP guide that @jsquire pointed doesn't mandate types. We should have mentioned expected types there.

@YijunXieMS
Copy link
Contributor

@serkantkaraca how is the partition_key used in the service to decide which partition an event is sent to when it's an int and when it's a str?

@serkantkaraca
Copy link
Member

@YijunXieMS service ignores partition-key if it is not string type.

@samuelkoppes
Copy link

We are investigating this issue, and will report back with an update by 11/4/2020.

@ramya-rao-a
Copy link
Contributor

Thanks for reviving this topic @samuelkoppes

Addressing @jsquire's options below

Do we investigate whether there is a way to be looser with type checking and coerce non-string values to string in the non-Python libraries?

On the receiving side, we can either coerce non string values to string, or ignore the non string values altogether. I would prefer the ignoring approach to be in sync with what the service does and not mislead the user to think that their non string partition key was actually used by the service. We can also check if the service would consider doing this instead of the client i.e. remove the partition key from the message annotation since it is invalid.

Do we introduce a breaking change to the Python library to validate the type matches the documentation?

No, breaking change is not an option.

Do we leave the Python behavior as-is and consider a stronger warning in documentation about cross-language compatibility?

Yes, we should.

Do we introduce breaking changes to the other libraries to allow a generic object value and, if so, does that include the track one libraries as well?

No, breaking change is not an option.

openapi-sdkautomation bot pushed a commit to AzureSDKAutomation/azure-sdk-for-python that referenced this issue Jan 13, 2021
Swagger changes for Stable version(2020-09-01) for Microsoft.DataBoxEdge (Azure#12353)
@yunhaoling yunhaoling added this to the [2021] March milestone Feb 9, 2021
@yunhaoling yunhaoling added the blocking-release Blocks release label Feb 11, 2021
@yunhaoling
Copy link
Contributor

yunhaoling commented Feb 24, 2021

We have added a strong warning in the docstring for the partition_key parameter saying that

         WARNING: Setting partition_key of non-string value on the events to be sent is discouraged
         as the partition_key will be ignored by the Event Hub service and events will be assigned
         to all partitions using round-robin. Furthermore, there are SDKs for consuming events which expect
         partition_key to only be string type, they might fail to parse the non-string value.

as well as logging.info to warn the usage of non-string partition-key.

PR is merged, I'm closing this issue now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
blocking-release Blocks release Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Event Hubs needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Projects
None yet
Development

No branches or pull requests

10 participants