All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog.
- Avoid recompiling Rust crates when only arrangement debug info changes.
- Fix jump-to-record functionality in HTML profiles.
- Fix row highlighting in HTML profiles.
- Fix broken command recording in the
DDlogDynamic
API.
- More efficient joins
- Bug fixes
- Add anchors to the profile, so one can create hyperlinks to individual profile entries.
- Feature-gate
AnyDeserialize
impl. The DDlog compiler generates an implementation of theAnyDeserialize
trait, which allows clients to deserialize instances ofddlog_std::Any
provided they know relation id of the value being deserialized. This feature is not used by most applications, but can cause significant code bloat and slow down compilation. We feature-gate this impl, so that only users who require this functionality have to pay the price.
The self-profiler remains a useful tool for troubleshooting DDlog performance issues. It runs with low overhead, allows enabling/disabling CPU and change profiling at runtime to only instrument parts of the program, and because it is integrated into the DDlog runtime it can precisely match each DD operator to the corresponding DDlog operator. DDshow does not currently have these features.
In this release we revamp the self profiler to improve its ergonomics. The new self-profiler has the following features:
-
Produces profiles in the form of interactive HTML tables. Each row in the table represents a DD operator and contains operator description, e.g., "Arrange relation 'Rel1' by 'x,y,z'" along with links to one or more source code location that this operator corresponds to (e.g., all locations where this specific arrangement of 'Rel1' is used).
-
The new
dump_profile
API dumps the profile into an HTML file on the disk instead of returning it as a string. All profiles generated by the same process are generated in the same folder, even if the process creates multiple instances of DDlog. This folder also contains a complete snapshot of all DDlog code in the program, so that the profiler can show program sources even when the program does not run on the same system where it was compiled. -
Internally, the self-profiler represents profiles using a well-defined JSON format. New APIs were added to extract each of the four profiles currently supported by DDlog (arrangement size profile, peak arrangement size profile, change profile, and CPU profile) in the JSON format, for automatic processing by third-party tools.
internment.dl
: optimized the implementation ofIntern::default()
to avoid excessive heap allocations and contention.
ddlog_clone()
: C and Java API to clone addlog_record
.
- Fixed parser bug caused by grammar ambiguity whereby
index
could be interpreted as part of an index declaration or as a regular identifier.
-
Rust compilation option
checked_weights
for the code generated by DDlog, which will crash DDlog programs at runtime if they overflow the weights attached to data values. This may be preferable to generating incorrect results. -
Add queryIndex Java API (#1093)
- Closures that depend on generic types generate invalid Rust (#1072).
- Upgrade to FlatBuffers v2.0.0. The previous version of FlatBuffers used in DDlog is not compatible with recent OS X releases.
- Add support for identity views for input tables in DDlogJooqProvider (#1094)
- Translate vector-type fields properly in DDlogJooqProvider (#1089)
- Expose DDlogJooqProvider DSLContext (#1086)
- Windows and Joins work when columns have alias (#1087)
- Fixed a bug in the implementation of
Intern<>
that could lead to incorrect behavior of comparison operators andhashXXX()
functions.
- Fixed compilation speed regression in v0.48.0.
- Bug fixes and improvements in SQL-to-DDlog compiler.
- Enable SQL-to-DDlog compiler to translate
array_length
function calls, which appear in SQL dialects such as H2 and Postgres.
-
Added change profiling support to the DDlog self-profiler. Unlike arrangement size profiling, which tracks the number of records in each arrangment, the change profile shows the amount of churn. For example adding one record and deleting one record will show up as two changes in the change profile, but will cancel out in the size profile. The self-profiler now support the
profile change on/off
commands (also available through the API), which enables change profiling for selected transactions. When change profiling is disabled, the recording stops, but the previously accumulated profile is preserved. By selectively enabling change profiling for a subset of transactions, the user can focus their analysis on specific parts of the program. -
Limited support for dynamic typing. We introduce a new type
Any
to the standard library, which can represent any DDlog value, along with two library functionsto_any()
andfrom_any()
that convert values to and from this type. This feature can be used to, e.g., store a mix of values of different types in a set or map. -
Experimental features to support implementing parts of D3log runtime in DDlog. See #1065 for details.
-
The semantics of the
group_by
operator changed in a subtle way. A group now contain exactly one occurrence of each value. See #1070 for details. -
Removed
ddlog_std::count(Group)
andddlog_std::group_count(Group)
methods to avoid changing their behavior in a non-backwards-compatible way. Addedcount_distinct(Group)
instead, which returns the count of distinct values in the group. -
Removed
ddlog_std::group_sum(Group)
. Addedfunction sum_of(g: Group<'K, 'V>, f: function('V): 'N): 'N
instead.
- Added
--intern-strings
option that causes all strings in generated OVSDB tables to be emitted as 'istring'. This reduces memory use and can aid performance in programs that use strings heavily. See PR #1056.
- Speedup hashing of interned objects (fixes performance regression in 0.43.0, see #1053 for details).
- Speedup serialization of the
json::JsonValue
type (see #1052)
- Added partial support for the Calcite dialiect of SQL (see #1044) to the SQL-to-DDlog translator.
- Optimize code generation for
match
and?
expressions. - Optimize code generation for interpolated strings.
- Introduce
#[by_val]
attribute to pass function arguments by value. - Use the new attribute to optimize a bunch of library functions. This should not break any existing DDlog code, but it will affect Rust code that calls DDlog libraries directly.
- Compile string literals into lazy static to avoid dynamic allocation every time a string literal is used.
- Bug fixes in the SQL-to-DDlog compiler
- Bug in
#[derive(Mutator)]
macro: mutations that change the constructor of a type failed (#1041).
- Improved infrastructure for implementing
FromRecord
andMutator
traits (#1029):- Automatically handle
Record::Serialized()
inFromRecord
andMutator
implementations. - Allow modifying, and not just overwriting
Map
values.
- Automatically handle
- Fixed a bug in type inference: #1022.
- Fixed non-deterministic behavior in
internment.dl
: e0be732061e2556b0bbbfaceb0ab04a76f573ec8
-
New
ddlog_std
library functions:/* Convert any DDlog type into a string in a programmer-facing, * debugging context. Implemented by calling the `Debug::fmt()` * method of the underlying Rust type. */ extern function to_string_debug(x: 'T): string function reverse(v: mut Vec<'X>) function reverse_imm(v: Vec<'X>): Vec<'X>
-
New library functions:
-
ddlog_std.dl
:function values(m: Map<'K, 'V>): Vec<'V> function nth_value(m: Map<'K, 'V>, n: usize): Option<'V> function nth_key(m: Map<'K, 'V>, n: usize): Option<'K>
-
map.dl
:function find(m: Map<'K, 'V>, f: function('V): bool): Option<'V> function any(m: Map<'K, 'V>, f: function('V): bool): bool
-
-
Bug fixes:
- Fixed scrambled self-profiler output.
- Fixed compilation speed regression introduced in 0.42.0.
-
New feature in OVSDB-to-DDlog compiler:
multiset-table
option to force an output-only table to be declared as amultiset
.
- DDShow is a timely/differential dataflow profiler developed by @Kixiron. DDlog now supports DDShow as an alternative to its built-in profiler. The built-in profiler is still preferable in production environments, due to its low overhead, but DDShow is a better (and constantly improving!) option for development-time profiling. See Profiling tutorial for details on how to enable DDShow-based profiling via CLI switches, as well as via Rust/C/Java APIs.
- DDlog now supports several configuration parameters: (1) number of worker threads,
(1) idle merge effort, (3) profiling configuration, (4) debug regions. We
introduce a new startup API
<your_program>_ddlog::run_with_config()
that allows the user to configure these parameters before instantiating a DDlog program. This API is exposed to C via theddlog_run_with_config()
function and to Java via theDDlogConfig
class.
-
Rust API refactoring: Moved
HDDlog
type (handle to a running DDlog program) from the main auto-generated crate to thedifferential_datalog
crate. The auto-generated crate now exports two public functions that create anHDDlog
instance:run
andrun_with_config
. -
Profiling is disabled by default. Previously, DDlog's self-profiler was always enabled by default. This is no longer the case. When starting the program via
<your_program>_ddlog::run()
(ddlog_run()
in C), the profiler will be disabled. To enable the profiler, use therun_with_config()
API described above and explicitly set the profiling mode to eitherSelfProfiling
(to enable the internal profiler) orTimelyProfiling
(to use DDShow).
- Support for "original" annotation on relations to record original name (if name is generated by code).
- Link ovsdb2ddlog statically, making it self-contained for better portability across different Linux distros.
- Addressed some warnings from rustc 1.52+.
- New release process: we now use GitHub actions instead of Travis to create binary DDlog releases. This should not have any effect on users.
- Made all reference counting (internal and through
ddlog_std::Ref
) use reference counters without weak counts in an effort to reduce memory usage
- Fixed compilation error in Go bindings [#993]
- Support building ddlog from a source tarball outside of a git repo [#986]
- ovsdb2ddlog: Support negative values in OVS schemas [#985]
- More string conversions from utf-8 and utf-16
- print/debug functions
time.dl
: Improved support for times and dates. The library now uses thechrono
crate (instead oftime
) internally, which in particular supports timezones.
- Segfault in the Java API in
transactionCommitDumpChanges
.
ddlog_std.dl
: More string conversions from utf-8 and utf-16.print.dl
: print/debug functions.
- Rust compilation error due to missing parens in generated code.
- Upgraded to timely dataflow and differential dataflow dependencies to v0.12.
- Worked on improving the debuggability of ddlog dataflow graphs
- Experimental compiler support for D3log (wip).
- We now support rules that don't start with a positive literal, e.g.,
R() :- not R2(...). R() :- var x = 5.
- Delete old Rust files in the generated project. This prevents compilation errors when upgrading to a new version of DDlog.
There are some breaking API changes in this release:
-
Rust API:
- Factored the Rust API into several traits declared in the
differential_datalog
crate:trait DDlogDynamic
- works with data represented as records.trait DDlog
- extendsDDlogDynamic
to work with strongly typed values wrapped inDDValue
.trait DDlogProfiling
- profiling API.trait DDlogDump
- dump tables and indexes.DDlogInventory
- convert between relation/index names and numeric ids.
- Renamed
apply_valupdates
->apply_updates
,apply_updates
->apply_updates_dynamic
. - Changed method signatures to eliminate any generics. This way we will
be able to implement dynamic dispatch for the DDlog API (i.e., pass
references to a DDlog program as
&dyn DDlogXXX
) in the future.
- Factored the Rust API into several traits declared in the
-
C, Java, Go API.
ddlog_get_table_id
,ddlog_get_index_id
methods now require a DDlog instance, e.g., old signature:extern table_id ddlog_get_table_id(const char* tname);
new signature:
extern table_id ddlog_get_table_id(ddlog_prog hprog, const char* tname);
-
Functional HashSets, aka immutable hashsets (
lib/hashset.dl
). At the API level functional hashsets behave just like regular hashsets; however their internal implementation supports cloning a hashset in time O(1) by sharing the entire internal state between the clone and the parent. Modifying the clone updates only the affected state in a copy-on-write fashion, with the rest of the state still shared with the parent.Example use case: computing the set of all unique id's that appear in a stream. At every iteration, we add all newly observed ids to the set of id's computed so far. This would normally amount to cloning and modifying a potentially large set in time
O(n)
, wheren
is the size of the set. With functional sets, the cost ifO(1)
.Functional data types are generally a great match for working with immutable collections, e.g., collections stored in DDlog relations. We therefore plan to introduce more functional data types in the future, possibly even replacing the standard collections (
Set
,Map
,Vec
) with functional versions.
- Added
--intern-table
flag to the compiler to declare input tables coming from OVSDB asIntern<...>
. This is useful for tables whose records are copied around as a whole and can therefore benefit from interning performance- and memory-wise. In the past we had to create a separate table and copy records from the original input table to it while wrapping them inIntern<>
. With this change, we avoid the extra copy and intern records as we ingest them for selected tables.
- Internal refactoring to improve DDlog's scalability with multiple worker threads.
- C API tutorial, kindly contributed by @smadaminov.
- Compiler crashed when differentiation or delay operators were applied to relations declared outside of the main module of the program.
-
Enable pattern matching in
for
loops andFlatMap
. One can now write:for ((k,v) in map) {}
instead of
for ((kv in map) { var k = kv.0}
and
(var x, var y) = FlatMap(expr)
instead of:
var xy = FlatMap(expr), var x = xy.0
-
Remove the hardwired knowledge about iterable types from the compiler. Until now the compiler only knew how to iterate over
Vec
,Set
,Map
,Group
, andTinySet
types. Instead we now allow the programmer to label any extern type as iterable, meaning that it implementsiter()
andinto_iter()
methods, that return Rust iterators using one of two attributes:#[iterate_by_ref=iter:<type>]
or
#[iterate_by_val=iter:<type>]
where
type
is the type yielded by thenext()
method of the iterator The former indicates that theiter()
method returns a by-reference iterator, the latter indicates thatiter()
returns a by-value iterator.As a side effect of this change, the compiler no longer distinguishes maps from other cotnainers that over 2-tuples. Therefore maps are represented as lists of tuples in the Flatbuf-based Java API.
-
Groups now iterate with weight. Internally, all DDlog relations are multisets, where each element has a weight associated with it. We change the semantics of groups to expose these weights during iteration. Specifically, when iterating over a group in a for-loop or flattening it with
FlatMap
, each value in the iterator is a(v, w)
tuple, wherew
is the weight of elementv
.We keep the semantics of all existing library aggregates unchanged, i.e., they ignore weights during iteration. This means that most existing user code is not affected (custom aggregates are not that common).
-
The
group_by
operator now works on streams and produces a stream of groups, one for each individual transaction. (Tutorial section) -
Two new DDlog operators: delay and differentiation. The former refers to the contents of a relation from
N
transactions ago, whereN
is a positive integer constant. The latter converts a stream into a relation that contains new values added to the stream byt the last transaction. (Tutorial section)
- Support insert_or_update and delete_by_key in flatbuf API, including in Java.
- Introduced a benchmarking framework for DDlog (see rust/ddlog_benches/README.md).
- Print compiler error messages so that emacs can parse the error location.
- Use
internment
crate instead ofarc-interner
to makeIntern<T>
more scalable.
- Support for ephemeral streams
- Fix linker problem on MacOS.
- A heuristic to make type inference errors easier to understand.
New functions in internment.dl
:
```
function parse_dec_u64(s: istring): Option<bit<64>>
function parse_dec_i64(s: istring): Option<signed<64>>
```
- A change in serde caused DDlog-generated Rust code to stop compiling, affecting all recent DDlog releases.
- Removed callback argument from
HDDlog::run
,ddlog_run
, and Go/Java language bindings based onddlog_run
. This optional callback, invoked by DD workers on each update to an output collection complicated the API and was tricky to use correctly. Most importantly, it is superseded by thecommit_dump_changes
API.
- Added the
ddlog_derive
crate that provides derive macros for theFromRecord
,IntoRecord
andMutator
traits - Added the
Record::positional_struct_fields()
method to allow fetching positional fields from records - Added the
Record::get_struct_field()
method to allow getting a struct record's field by name
0.33.0 - Dec 24, 2020
- An optimized implementation of the
distinct
operator may save memory and CPU for recursive relations.
- Added support for regex sets to
lib/regex.dl
- Added
Vec::pop()
function tolib/ddlog_std.dl
.
- Upgrade to the latest versions of timely and differential dataflow crates.
0.32.1 - Dec 22, 2020
- Sped-up the compiler: eliminated several performance bottlenecks, most notably in the type inference algorithm. This yields a 10x speedup on large DDlog projects.
- Fixed regressions introduced in 0.32.0: #859, #860.