The type evolution mechanism comes into play when the data stored in an existing type hierarchy has to be transformed into another type hierarchy. To adhere to the philosophy of static typing, the transformation of a type from belonging to one hierarchy into belonging to a different one is by itself a fundamental operation in Current.
As long as backwards compatibility of data is important, use cases for type evolution appear all the time. Extending the underlying type of a Stream
, or adding a new field to the Storage
are two situations where type evolution inevitably comes into play.
From the data consistency point of view, type change of an object often can be performed with very little, if any, extra work. For example, outside updating the top-level TypeID
-s, adding a new Optional<>
field, converting an existing required field into an Optional<>
one, or adding a new case to a Variant<>
do not require any modification to the underlying data itself.
For a case study, consider altering an inner-level type. Our hypothetical user wants to change the type of Location
from Location = { double latitude, double longitude }
into NewLocation = { double latitude, double longitude, Optional<Country> country }
.
To focus on data integrity, assume the logic to perform the conversion itself already exists.
Without type evolution, the user would have to write custom code to:
- traverse the object of the old type,
- leave all but
Location
fields unchanged, and - apply the transformation function to all
Location
-s to make themNewLocation
-s instead.
Going down the rabbit hole, in the C++ world of static typing, as soon as the inner Location
type is changed every other value of or using this type would have its own type changed. The alternative is to leave the realm of static typing temporarily, and implement the conversion as external-to-Current code parsing the JSON in the old format and converting it into the one which is the new format.
Needless to say, neither is a plausible long-term solution. The former approach has a prerequisite of carrying over a lot of error-prone write-only code (keep in mind, Location
data fields can be part of a Storage
fields, contained deeply in a top-level Transaction
type). The external-to-Current method, effectively a sophistication of the dreaded sed -i
approach, while being about as error-prone would also inevitably be slower to crunch the data, and require some downtime of the service.
The solution implemented in Current is to extend the automatically exported C++ schema to also contain the generic boilerplate code to convert data from this schema to a different, external schema.
The "external schema" is passed in as a C++ template parameter, enabling seamless evolution into the type which is a superset of the original type. The evolution logic applies itself recursively, covering per-field types, per-variant-case types, base structure types, Optional<>
fields, and types inner to STL containers or std::pair<>
-s.
The autogenerated boilerplate can be extended to only account for the change the user has to make. For instance, if the above Location => NewLocation
change is the only modification to apply to the whole mutation log of a Storage
, providing a user-defined type evolver for struct Location
would result in all inner Location
-s, including the ones contained in Optional<>
-s, Variant<>
-s, std::vector<>
-s, etc. to become NewLocation
-s.
To implement the above neatly, we introduce the notions of CURRENT_NAMESPACE
and CURRENT_TYPE_EVOLVER
.
CURRENT_NAMESPACE
is a C++ struct
disguised as a C++ namespace
. The syntax is as follows:
CURRENT_NAMESPACE(NewSchema) {
CURRENT_NAMESPACE_TYPE(Location, new_schema::Location);
// ...
};
Using a macro-defined struct
instead of a namespace
accomplishes three goals:
- A declared
CURRENT_NAMESPACE
can be used as a template parameter, - Natural extension into
CURRENT_DERIVED_NAMESPACE
is straightforward, allowing extending an existing namespace and/or aliasing types within it, and - The
CURRENT_[DERIVED_]NAMESPACE
macro generates astruct
which can be reflected upon to get the name of the "namespace" at runtime, even if the "namespace" has been passed as the template parameter.
Two downsides of using a struct
over a namespace
are:
- All the types belonging to a
CURRENT_NAMESPACE
should be defined in one place, and - When referring to inner types, the
typename
keyword is necessary to be appended in front of the name of the type inner to the "namespace".
When exporting data schema in the C++, "Current", format, the CURRENT_NAMESPACE
containing all the relevant types is exported as well. If the "exposed namespace name" is provided, the CURRENT_NAMESPACE
would have a specific name. Coupled with "exposed namespace types" with user-provided names, this makes the autogenerated C++ schema file 100% ready to be #include
-d and used from the user code.
CURRENT_TYPE_EVOLVER
is the macro to simplify adding user-defined type evolvers. The macro takes four parameters:
- The C++ symbol name of the evolver to define or extend.
- The name of the
CURRENT_NAMESPACE
from which the evolution is being performed. - The name of the type from this
CURRENT_NAMESPACE
to which this implementation should apply. - The actual data transformation code, a single statement or a C++ block.
The syntax is straightforward as well:
CURRENT_NAMESPACE(From) {
CURRENT_NAMESPACE_TYPE(Location, ...);
};
CURRENT_TYPE_EVOLVER(CustomEvolver, From, Location, {
into.latitude = from.latitude; // Boilerplate, autogenerated & copy-pasted.
into.longitude = from.longitude; // Boilerplate, autogenerated & copy-pasted..
into.country = CountryFromLatLong(from); // Added manually by the user.
});
In the underlying C++, evolvers are defined as partial template specializations, and the order of their appearance in the code is irrelevant.
Specialization selection is generic and supports evolution from a type belonging to a CURRENT_NAMESPACE
derived from a CURRENT_NAMESPACE
that originally contained the type to evolve from. In other words, no direct match of From
as the source CURRENT_NAMESPACE
is necessary for the CustomEvolver
defined in the code snippet above to kick in and operate on the FullType
C++ type.
To complete the picture, along with the schema itself, the exported C++ "Current" schema contains two more sections: the natural evolver and boilerplate evolvers.
Type evolution, both natural and user-defined, is applied hierarchically. If only some inner-level type has been changed, the user has to only provide custom code to evolve that inner-level type. For any explicit or implicit use of the type (as a field, as a case of a Variant
, as a base class, as the type contained in a container), the corresponding evolver will be selected and used.
Needless to say, the code would not compile i some types or fields or variant cases can not be evolved, or can not be evolved unambiguously.
The natural evolution is designed to eliminate the need in writing redundant code. When type evolution rules can be inferred, the natural evolver does the job just fine, requiring no extra code at all for the evolution of the following types:
- A
CURRENT_STRUCT
evolving into aCURRENT_STRUCT
, where the destination struct contains all the fields the source one does. - A
Variant<...>
orCURRENT_VARIANT
evolving into aVariant<...>
orCURRENT_VARIANT
, where all possible cases of source variant contains are legal cases of the destination variant. - An
Optional<T>
, which naturally proxies the evolution down the chain to evolveT
. - A C++ type supported in the Current TypeSystem, where "identity evolution" rules are applied.
The natural evolver is global with respect to custom evolution logic defined under a specific name via the CURRENT_TYPE_EVOLVER
macro.
In other words, if CURRENT_TYPE_EVOLVER(Foo)
changes Location
into NewLocation
, and CURRENT_TYPE_EVOLVER(Bar)
changes Location
into an std::string
, the default evolution would still be applied for all the types from the CURRENT_NAMESPACE
that originally contained CURRENT_STRUCT(Location)
, unless Foo
or Bar
specifically define a specialization for the evolution of types other than Location
.
Natural evolvers help when the source type is convertible into the destination type by being either identical or a subset of the latter. In other cases, custom code is required.
To help writing this custom code, the boilerplate evolver is generated alongside the natural evolver and the schema itself.
Two types of boilerplate evolvers generated are boilerplate evolvers for CURRENT_STRUCT
-s and Variant<>
/CURRENT_VARIANT
-s.
The boilerplate evolver for a CURRENT_STRUCT
lists all the fields of this struct. Its purpose is to simplify altering the evolution of just a few fields of a large structure. The boilerplate looks like this (example from examples/TypeEvolution/golden
):
CURRENT_TYPE_EVOLVER(CustomEvolver, From, FullName, {
CURRENT_NATURAL_EVOLVE(From, CustomDestinationNamespace, from.first_name, into.first_name);
CURRENT_NATURAL_EVOLVE(From, CustomDestinationNamespace, from.last_name, into.last_name);
});
The boilerplate evolver for Variant<>
/CURRENT_VARIANT
lists all variant cases. Its purpose is to simplify altering the evolution when just a few cases of a Variant<>
have to be converted into other cases, or ignored altogether. The boilerplate looks like this (example from examples/TypeEvolution/golden
):
CURRENT_TYPE_EVOLVER_VARIANT(CustomEvolver, From, ShrinkingVariant, CustomDestinationNamespace) {
CURRENT_TYPE_EVOLVER_NATURAL_VARIANT_CASE(CustomTypeA, CURRENT_NATURAL_EVOLVE(From, CustomDestinationNamespace, from, into));
CURRENT_TYPE_EVOLVER_NATURAL_VARIANT_CASE(CustomTypeB, CURRENT_NATURAL_EVOLVE(From, CustomDestinationNamespace, from, into));
CURRENT_TYPE_EVOLVER_NATURAL_VARIANT_CASE(CustomTypeC, CURRENT_NATURAL_EVOLVE(From, CustomDestinationNamespace, from, into));
};
While the natural evolver is the code which would be compiled upon #include
-ing the autogenerated exported C++ "Current" schema, the autogeneratied boilerplate evolver section is deliberately commented out with #if 0
/#endif
. The user is responsible for copy-pasting the relevant section of the boilerplate evolver and extending it.
As the destination type for the type evolver must belong to a CURRENT_NAMESPACE
, evolving one Storage
into another Storage
is problematic due to inner Storage
types being autogenerated.
We're weighing the options to have the CURRENT_STORAGE
macro to expose its own CURRENT_NAMESPACE
, but the chances are slim. In the meantime the best practical solution is to export and #include
the schema of both old and new Storage
-s, perform data conversion from the old autogenerated CURRENT_NAMESPACE
into the new one, and then use the ParseJSON<desired_t>(JSON<original_t>(object))
to get the C++ compiler the type it needs.
For more details, please refer to the examples/TypeEvolution
test. The golden/
directory of it contains the exported schema, including the boilerplate type evolver, which is extended in the code.