Skip to content

Latest commit

 

History

History
350 lines (248 loc) · 12.9 KB

kafka-protocol-binding.md

File metadata and controls

350 lines (248 loc) · 12.9 KB

Kafka Protocol Binding for CloudEvents - Version 1.0.3-wip

Abstract

The Kafka Protocol Binding for CloudEvents defines how events are mapped to Kafka messages.

Table of Contents

  1. Introduction
  1. Use of CloudEvents Attributes
  1. Kafka Message Mapping
  1. References

1. Introduction

CloudEvents is a standardized and protocol-agnostic definition of the structure and metadata description of events. This specification defines how the elements defined in the CloudEvents specification are to be used in the Kafka protocol as Kafka messages (aka Kafka records).

1.1. Conformance

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119.

1.2. Relation to Kafka

This specification does not prescribe rules constraining transfer or settlement of event messages with Kafka; it solely defines how CloudEvents are expressed in the Kafka protocol as Kafka messages.

The Kafka documentation uses "message" and "record" somewhat interchangeably and therefore the terms are to be considered synonyms in this specification as well.

Conceptually, Kafka is a log-oriented store for records, each holding a singular key/value pair. The store is commonly partitioned, and the partition for a record is typically chosen based on the key's value. Kafka clients accomplish this by using a hash function.

This binding specification defines how attributes and data of a CloudEvent is mapped to the value and headers sections of a Kafka record.

Generally, the user SHOULD configure the key and/or the partition of the Kafka record in a way that makes more sense for his/her use case (e.g. streaming applications), in order to co-partition values, define relationships between events, etc. This spec provides an OPTIONAL definition to map the key section of the Kafka record, without constraining the user to implement it nor use it. An example use case of this definition is when the sink of the event is a Kafka topic, but the source is another transport (e.g. HTTP), and the user needs a way to key the record. As a counter example, it doesn't make sense to use it when the sink and source are Kafka topics, because this might cause the re-keying of the records.

1.3. Content Modes

The CloudEvents specification defines three content modes for transferring events: structured, binary and batch. The Kafka protocol binding does not currently support the batch content mode. Every compliant implementation SHOULD support both structured and binary modes.

The specification defines three content modes for transferring events: structured, binary and batch. The Kafka protocol binding does not currently support the batch content mode.

In the structured content mode, event metadata attributes and event data are placed into the Kafka message value section using an event format.

In the binary content mode, the value of the event data MUST be placed into the Kafka message's value section as-is, with the content-type header value declaring its media type; all other event attributes MUST be mapped to the Kafka message's header section.

Implementations that use Kafka 0.11.0.0 and above MAY use either binary or structured modes. Implementations that use Kafka 0.10.x.x and below MUST only use structured mode. This is because older versions of Kafka lacked support for message level headers.

1.4. Event Formats

Event formats, used with the structured content mode, define how an event is expressed in a particular data format. All implementations of this specification that support the structured content mode MUST support the JSON event format.

1.5. Security

This specification does not introduce any new security features for Kafka, or mandate specific existing features to be used.

2. Use of CloudEvents Attributes

This specification does not further define any of the CloudEvents event attributes.

2.1. data

data is assumed to contain opaque application data that is encoded as declared by the datacontenttype attribute.

An application is free to hold the information in any in-memory representation of its choosing, but as the value is transposed into Kafka as defined in this specification, core Kafka provides data available as a sequence of bytes.

For instance, if the declared datacontenttype is application/json;charset=utf-8, the expectation is that the data value is made available as UTF-8 encoded JSON text.

3. Kafka Message Mapping

With Kafka 0.11.0.0 and above, the content mode is chosen by the sender of the event. Protocol usage patterns that might allow solicitation of events using a particular content mode might be defined by an application, but are not defined here.

The receiver of the event can distinguish between the two content modes by inspecting the content-type Header of the Kafka message. If the header is present and its value is prefixed with the CloudEvents media type application/cloudevents, indicating the use of a known event format, the receiver uses structured mode, otherwise it defaults to binary mode.

If a receiver finds a CloudEvents media type as per the above rule, but with an event format that it cannot handle, for instance application/cloudevents+avro, it MAY still treat the event as binary and forward it to another party as-is.

When the content-type header value is not prefixed with the CloudEvents media type, knowing when the message ought to be parsed as a CloudEvent can be a challenge. While this specification can not mandate that senders do not include any of the CloudEvents headers when the message is not a CloudEvent, it would be reasonable for a receiver to assume that if the message has all of the mandatory CloudEvents attributes as headers then it's probably a CloudEvent. However, as with all CloudEvent messages, if it does not adhere to all of the normative language of this specification then it is not a valid CloudEvent.

3.1. Key Mapping

Every implementation MUST, by default, map the user provided record key to the Kafka record key.

The 'key' of the Kafka message MAY be populated by a "Key Mapper" function, which might map the key directly from one of the CloudEvent's attributes, but might also use information from the application environment, from the CloudEvent's data or other sources.

The shape and configuration of the "Key Mapper" function is implementation specific.

Every implementation SHOULD provide an opt-in "Key Mapper" implementation that maps the Partitioning partitionkey attribute value to the 'key' of the Kafka message as-is, if present.

A mapping function MUST NOT modify the CloudEvent. This means that the aforementioned partitionkey attribute MUST still be included with the transmitted event, if present. It also means that a mapping function that uses key information from an out-of-band source, like a parameter or configuration setting, MUST NOT add an attribute to the CloudEvent.

3.2. Binary Content Mode

The binary content mode accommodates any shape of event data, and allows for efficient transfer and without transcoding effort.

3.2.1. Content Type

For the binary mode, the header content-type property MUST be mapped directly to the CloudEvents datacontenttype attribute.

3.2.2. Event Data Encoding

The data byte-sequence MUST be used as the value of the Kafka message.

In binary mode, the Kafka representation of a CloudEvent with no data is a Kafka message with no value. In a topic with log compaction enabled, any such message will represent a tombstone record, as described in the Kafka compaction documentation.

3.2.3. Metadata Headers

All CloudEvents attributes and CloudEvent Attributes Extensions with exception of data MUST be individually mapped to and from the Header fields in the Kafka message. Both header keys and header values MUST be encoded as UTF-8 strings.

3.2.3.1 Property Names

CloudEvent attributes are prefixed with ce_ for use in the message-headers section.

Examples:

* `time` maps to `ce_time`
* `id` maps to `ce_id`
* `specversion` maps to `ce_specversion`
3.2.4.2 Property Values

The value for each Kafka header is constructed from the respective header's Kafka representation, compliant with the Kafka message format specification.

3.2.5 Example

This example shows the binary mode mapping of an event into the Kafka message. All other CloudEvents attributes are mapped to Kafka Header fields with prefix ce_.

Mind that ce_ here does refer to the event data content carried in the payload.

------------------ Message -------------------

Topic Name: mytopic

------------------- key ----------------------

Key: mykey

------------------ headers -------------------

ce_specversion: "1.0"
ce_type: "com.example.someevent"
ce_source: "/mycontext/subcontext"
ce_id: "1234-1234-1234"
ce_time: "2018-04-05T03:56:24Z"
content-type: application/avro
       .... further attributes ...

------------------- value --------------------

            ... application data encoded in Avro ...

-----------------------------------------------

3.3. Structured Content Mode

The structured content mode keeps event metadata and data together in the payload, allowing simple forwarding of the same event across multiple routing hops, and across multiple protocols.

3.3.1. Kafka Content-Type

If present, the Kafka message header property content-type MUST be set to the media type of an event format.

Example for the JSON format:

content-type: application/cloudevents+json; charset=UTF-8

3.3.2. Event Data Encoding

The chosen event format defines how all attributes, and data, are represented.

The event metadata and data are then rendered in accordance with the event format specification and the resulting data becomes the Kafka application data section.

In structured mode, the Kafka representation of a CloudEvent with no data is a Kafka message which still has a data section (containing the attributes of the CloudEvent). Such a message does not represent a tombstone record in a topic with log compaction enabled, unlike the representation in binary mode.

3.3.3. Metadata Headers

Implementations MAY include the same Kafka headers as defined for the binary mode.

3.3.4 Example

This example shows a JSON event format encoded event:

------------------ Message -------------------

Topic Name: mytopic

------------------- key ----------------------

Key: mykey

------------------ headers -------------------

content-type: application/cloudevents+json; charset=UTF-8

------------------- value --------------------

{
    "specversion" : "1.0",
    "type" : "com.example.someevent",
    "source" : "/mycontext/subcontext",
    "id" : "1234-1234-1234",
    "time" : "2018-04-05T03:56:24Z",
    "datacontenttype" : "application/xml",

    ... further attributes omitted ...

    "data" : {
        ... application data encoded in XML ...
    }
}

-----------------------------------------------

4. References

  • Kafka The distributed stream platform
  • Kafka-Message-Format The Kafka format message
  • RFC2046 Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types
  • RFC2119 Key words for use in RFCs to Indicate Requirement Levels
  • RFC3629 UTF-8, a transformation format of ISO 10646
  • RFC7159 The JavaScript Object Notation (JSON) Data Interchange Format