Releases: divolte/divolte-collector
divolte-collector-0.9.0
This release contains the following changes relative to 0.8.0:
-
Mappings can now hash data in various ways. Basic and keyed hashing are supported using the builtin JDK algorithms.
-
Events published to Kafka now have their timestamps set to the time the event arrived on the divolte server.
-
Events published to Google Pub/Sub have additional meta-data to help with downstream processing:
- To help with event ordering,
timestamp
is the time the event arrived at the server. - To help with event deduplication,
eventIdentifier
is the event ID (as generated by the client).
- To help with event ordering,
-
A bug was fixed that affecting mapping custom parameter values to Avro enumeration.
-
A bug has been fixed where tildes (
~
) weren't properly handled inside custom event parameters. -
A bug has been fixed where the version of
avro-tools
that we shipped didn't work properly due to a missing file. -
The usual dependency updates, of which the most notable is upgrading from Hadoop 2.9 to 3.1.
divolte-collector-0.8.0
This release contains the following changes relative to 0.7.0:
-
Improved shutdown/load-balancer integration. A new setting (
divolte.global.server.shutdown_delay
) allows for a grace period when shutting down. During this period the health-check will fail but the server continues processing requests normally. This should prompt load balancers to remove the endpoint before requests start failing. -
A bug fix for the processing pool configuration for Google Cloud Storage. (Previously the configured buffer size and thread-count were ignored and the values for HDFS used instead.)
-
Improvements to the way values from headers can be extracted during mapping. In particular these should make it easier to map the client's IP address when multiple load-balancer layers are in place. Improvements include:
- Header values are now normalised: multiple headers with the same name and/or comma-separated values are assembled into a single unified list for mapping.
- In addition to the existing
.first()
and.last()
methods, a new.get(x)
method can be used to obtain the value at a specific index. A negative index can be used to retrieve the value relative to the end of the list.
divolte-collector-0.7.0
The main changes in this release relative to 0.6.0 are:
- Support for Google Pub/Sub as a real-time sink.
- Reliability improvements for the Google Cloud Storage sink. Writes that fail due to a transient error are now retried.
- Support for Confluent-compatible messages on Kafka sinks. Kafka sinks can now be configured with the pre-registered identifier for the schema and messages will be formatted to include the Confluent header. For now we still require that all mappings for a Kafka sink use the same schema.
- Some minor fixes to ensure Divolte runs on the Java 9 JVM.
divolte-collector-0.5.0
This release includes the following changes relative to 0.4.1:
- Experimental support for saving events to Google Cloud Storage. (Thanks @friso.)
- The JavaScript API now supports a
whenCommitted()
call to register a callback that will be invoked when it is safe to leave the page without dropping pending events that have been signalled but not delivered to the server. This is intended to make it easier to signal events for click-throughs and click-outs. - The way
pageView
events are implicitly signalled when navigating forwards and backwards through browser history is now consistent across all the browsers that we test against. Previously the events did not always fire, and page-view identifiers were sometimes reused. - The signal queue in the browser no longer stalls if an error is encountered delivering an event to the server. Instead the queue proceeds. In addition, if there is no confirmation (either way) the queue continues to drain after a (configurable) timeout.
- Custom events with parameters that contain undefined values are now handled correctly. Previously this triggered an exception in the browser.
- A potential (and very rare) deadlock in the server has been fixed that could be triggered in the handling of POSTed JSON events if they were of a specific size.
Behind the scenes a lot of work went into improving the stability of the automated browser testing that we do. In addition to making the tests more stable we expanded the set of browsers that we test against.
divolte-collector-0.4.1: Remove unused import.
This is a bug-fix release that includes the following fixes relative to 0.4.0:
- The
javascript.name
property for browser sources was implemented incorrectly; only the default value worked. - Various RPM packaging fixes.
divolte-collector-0.4.0
The main changes in this release relative to 0.3.0 are:
- Divolte can now be configured with multiple endpoints for collecting events, as well as multiple mappings and destinations. More on this below.
- JSON-based event collection is now supported. This is intended to support mobile and server applications.
- We're now using Kafka's new producer API.
This release also introduces a new configuration format:
-
There are now 4 main sections:
global
: for settings that affects the entire server instance. This includes server binding settings, ip2geo configuration, HDFS and Kafka configuration, and thread settings for the various phases of event processing.
sources: the browser and JSON endpoints that events can be received on.sinks
: which HDFS directories and Kafka topics Avro data should be written to.mappings
: which sources should be connected to which sinks, and how received events should be converted to Avro records.
-
Sources are now more configurable. The endpoint paths can now be customised.
-
Kafka now requires different settings because we're using the new producer instead of the old one. The biggest change is that
bootstrap.servers
should be used instead ofmetadata.broker.list
; see the Kafka documentation for more details. -
The HDFS session-binning strategy for writing files has been removed.
-
The maximum 'pause' time for an internal thread to wait when queuing an event for the next stage of processing has been removed. (This used to be the
max_enqueue_delay
setting.) Now we drop messages immediately. In practice queues are either full or empty, and full means there's a problem which delaying isn't going to help with. In fact, it turned out that being full and waiting leads to cascading failures and problems such as thread starvation in the HTTP server.