-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Azure Event Hub]- Sending Event Message with Partition Key of type int is DANGEROUS #12353
Comments
@aliyevmirzakhan thanks for reporting this, @yunhaoling, @YijunXieMS can you take a look at this? |
Hello @aliyevmirzakhan , May I first confirm which language and what version of the EventHub sdk are you using? Because from the error information |
I believe the issue is more likely somewhere in the azure event hubs binding for functions. I don't know the protocol well enough to definitively state if an integer is a valid partition key or not - but if my understanding is correct, the value gets transmitted all the way to the receiver, which would imply that it is. |
Hi @aliyevmirzakhan. Would you be so kind as to help clarify a bit more about your scenario so that we can make sure to get the right folks involved to assist? It would be helpful to understand:
|
@johanste Yes the protocol supports string, int (and maybe other types) from our test. :keyword str partition_key: ... But we don't raise an error when a user passes an int. If we add the validation now, is this regarded as a break change? |
What does the service support? |
The service contract is not formally documented, to my knowledge. The closest that I'm aware of is the AMQP 1.0 in Azure Service Bus and Event Hubs protocol guide, which does not contain formal type information. The entry for the PartitionKey annotation simply links back to the Track Zero client property, which is typed as In our design discussions with the service team, my recollection is that they requested |
Partition Key is set in the message annotations part of the AMQP message which is essentially a map i.e key-value pair. The type of the value is retained. Therefore, if user passes a non string value for partitionKey and the client passes it along as is, the service will receive the message with partition key as a non string value. The service does not seem to have any issues with such a message. The error we are seeing here (and in Azure/azure-event-hubs-spark#424) is on the receiving side where the client expects a string, but finds a non string value. We read again from the same message annotations part of the AMQP message which I believe does not get mutated and has the same value that the sending client had set. This is a problem with interop across different clients. While we can do our due diligence when sending to always ensure that partitionKey is set as string (JS SDK does this, we cast whatever user provides as partitionKey to string before setting it in the message annotations), we are not responsible for third party AMQP implementations out in the wild. Since the service does not seem to have a problem reading and using a non string partitionKey, it would be interesting to see what we can do at the receiving end to solve this. |
Hi I am using Python runtime for both sending messages to Event Hub, and receiving them in a Function App that is triggered by Event Hub. |
Thank you, @aliyevmirzakhan. Based on that, I think @johanste's initial assessment is likely. The Azure Functions infrastructure is based on .NET and the bindings are running in the context of that infrastructure using the .NET Event Hubs client rather than running in the context of the Python language worker and using the Python SDK. The root cause would appear to be that the partition key is enforced as Assuming so, I believe the choice comes down to:
As @ramya-rao-a mentions, so long as the service team does not have a service-specific reason to disallow, the first is probably our most promising starting point. The tricky part of that will be that the Azure Functions bindings are still using the track one Event Hubs client. |
cc @serkantkaraca for input from the service team |
Both track-0 and track-1 .NET clients explicitly cast partition key value from message-annotations-map into EventData.PartitionKey. Service doesn't do any validation on the delivered type and assumes sender library already conducted sanitary checks. This applies to every entry in the annotations map as far as I know. Unfortunately the messaging AMQP guide that @jsquire pointed doesn't mandate types. We should have mentioned expected types there. |
@serkantkaraca how is the partition_key used in the service to decide which partition an event is sent to when it's an int and when it's a str? |
@YijunXieMS service ignores partition-key if it is not string type. |
We are investigating this issue, and will report back with an update by 11/4/2020. |
Thanks for reviving this topic @samuelkoppes Addressing @jsquire's options below
On the receiving side, we can either coerce non string values to string, or ignore the non string values altogether. I would prefer the ignoring approach to be in sync with what the service does and not mislead the user to think that their non string partition key was actually used by the service. We can also check if the service would consider doing this instead of the client i.e. remove the partition key from the message annotation since it is invalid.
No, breaking change is not an option.
Yes, we should.
No, breaking change is not an option. |
Swagger changes for Stable version(2020-09-01) for Microsoft.DataBoxEdge (Azure#12353)
We have added a strong warning in the docstring for the
as well as PR is merged, I'm closing this issue now. |
Hi, a few days ago I wrote script that sends the real time events to Event Hub. I used Azure Durable Functions to implement parallelism, and in doing so I used partition_key to ensure events with same id is directed to same partition. For some reason I thought the key can be single number and thus I used int types as partition keys, and while sending I never had any error about it. Later I implemented another function App that listens to Event Hub, and triggered when new message is received in Even Hub, and while doing so I was getting the following error at each firing:
Unable to cast object of type 'System.Int32' to type 'System.String
At first I thought may be this is due to encoding of str files that send which took me quite a long time to figure out this was because my usage of int value as partition key due to which Event Hub backend could not generate partition key hashes.
I am wondering why no Exception is raised if int type is passed as partition_key argument at the time of sending message string, and may be this is bug to be considered.
The text was updated successfully, but these errors were encountered: