Skip to content

Latest commit

 

History

History
147 lines (92 loc) · 11.4 KB

Evolution.md

File metadata and controls

147 lines (92 loc) · 11.4 KB

Type Evolution

Objective

The type evolution mechanism comes into play when the data stored in an existing type hierarchy has to be transformed into another type hierarchy. To adhere to the philosophy of static typing, the transformation of a type from belonging to one hierarchy into belonging to a different one is by itself a fundamental operation in Current.

As long as backwards compatibility of data is important, use cases for type evolution appear all the time. Extending the underlying type of a Stream, or adding a new field to the Storage are two situations where type evolution inevitably comes into play.

From the data consistency point of view, type change of an object often can be performed with very little, if any, extra work. For example, outside updating the top-level TypeID-s, adding a new Optional<> field, converting an existing required field into an Optional<> one, or adding a new case to a Variant<> do not require any modification to the underlying data itself.

Problem

For a case study, consider altering an inner-level type. Our hypothetical user wants to change the type of Location from Location = { double latitude, double longitude } into NewLocation = { double latitude, double longitude, Optional<Country> country }.

To focus on data integrity, assume the logic to perform the conversion itself already exists.

Without type evolution, the user would have to write custom code to:

  • traverse the object of the old type,
  • leave all but Location fields unchanged, and
  • apply the transformation function to all Location-s to make them NewLocation-s instead.

Going down the rabbit hole, in the C++ world of static typing, as soon as the inner Location type is changed every other value of or using this type would have its own type changed. The alternative is to leave the realm of static typing temporarily, and implement the conversion as external-to-Current code parsing the JSON in the old format and converting it into the one which is the new format.

Needless to say, neither is a plausible long-term solution. The former approach has a prerequisite of carrying over a lot of error-prone write-only code (keep in mind, Location data fields can be part of a Storage fields, contained deeply in a top-level Transaction type). The external-to-Current method, effectively a sophistication of the dreaded sed -i approach, while being about as error-prone would also inevitably be slower to crunch the data, and require some downtime of the service.

Solution

The solution implemented in Current is to extend the automatically exported C++ schema to also contain the generic boilerplate code to convert data from this schema to a different, external schema.

The "external schema" is passed in as a C++ template parameter, enabling seamless evolution into the type which is a superset of the original type. The evolution logic applies itself recursively, covering per-field types, per-variant-case types, base structure types, Optional<> fields, and types inner to STL containers or std::pair<>-s.

The autogenerated boilerplate can be extended to only account for the change the user has to make. For instance, if the above Location => NewLocation change is the only modification to apply to the whole mutation log of a Storage, providing a user-defined type evolver for struct Location would result in all inner Location-s, including the ones contained in Optional<>-s, Variant<>-s, std::vector<>-s, etc. to become NewLocation-s.

Implementation

Concepts

To implement the above neatly, we introduce the notions of CURRENT_NAMESPACE and CURRENT_TYPE_EVOLVER.

CURRENT_NAMESPACE

CURRENT_NAMESPACE is a C++ struct disguised as a C++ namespace. The syntax is as follows:

CURRENT_NAMESPACE(NewSchema) {
  CURRENT_NAMESPACE_TYPE(Location, new_schema::Location);
  // ...
};

Using a macro-defined struct instead of a namespace accomplishes three goals:

  1. A declared CURRENT_NAMESPACE can be used as a template parameter,
  2. Natural extension into CURRENT_DERIVED_NAMESPACE is straightforward, allowing extending an existing namespace and/or aliasing types within it, and
  3. The CURRENT_[DERIVED_]NAMESPACE macro generates a struct which can be reflected upon to get the name of the "namespace" at runtime, even if the "namespace" has been passed as the template parameter.

Two downsides of using a struct over a namespace are:

  1. All the types belonging to a CURRENT_NAMESPACE should be defined in one place, and
  2. When referring to inner types, the typename keyword is necessary to be appended in front of the name of the type inner to the "namespace".

When exporting data schema in the C++, "Current", format, the CURRENT_NAMESPACE containing all the relevant types is exported as well. If the "exposed namespace name" is provided, the CURRENT_NAMESPACE would have a specific name. Coupled with "exposed namespace types" with user-provided names, this makes the autogenerated C++ schema file 100% ready to be #include-d and used from the user code.

CURRENT_TYPE_EVOLVER

CURRENT_TYPE_EVOLVER is the macro to simplify adding user-defined type evolvers. The macro takes four parameters:

  1. The C++ symbol name of the evolver to define or extend.
  2. The name of the CURRENT_NAMESPACE from which the evolution is being performed.
  3. The name of the type from this CURRENT_NAMESPACE to which this implementation should apply.
  4. The actual data transformation code, a single statement or a C++ block.

The syntax is straightforward as well:

CURRENT_NAMESPACE(From) {
  CURRENT_NAMESPACE_TYPE(Location, ...);
};

CURRENT_TYPE_EVOLVER(CustomEvolver, From, Location, {
  into.latitude = from.latitude;            // Boilerplate, autogenerated & copy-pasted.
  into.longitude = from.longitude;          // Boilerplate, autogenerated & copy-pasted..
  into.country = CountryFromLatLong(from);  // Added manually by the user.
});

In the underlying C++, evolvers are defined as partial template specializations, and the order of their appearance in the code is irrelevant.

Specialization selection is generic and supports evolution from a type belonging to a CURRENT_NAMESPACE derived from a CURRENT_NAMESPACE that originally contained the type to evolve from. In other words, no direct match of From as the source CURRENT_NAMESPACE is necessary for the CustomEvolver defined in the code snippet above to kick in and operate on the FullType C++ type.

Implementation

To complete the picture, along with the schema itself, the exported C++ "Current" schema contains two more sections: the natural evolver and boilerplate evolvers.

Type evolution, both natural and user-defined, is applied hierarchically. If only some inner-level type has been changed, the user has to only provide custom code to evolve that inner-level type. For any explicit or implicit use of the type (as a field, as a case of a Variant, as a base class, as the type contained in a container), the corresponding evolver will be selected and used.

Needless to say, the code would not compile i some types or fields or variant cases can not be evolved, or can not be evolved unambiguously.

Natural Evolver

The natural evolution is designed to eliminate the need in writing redundant code. When type evolution rules can be inferred, the natural evolver does the job just fine, requiring no extra code at all for the evolution of the following types:

  • A CURRENT_STRUCT evolving into a CURRENT_STRUCT, where the destination struct contains all the fields the source one does.
  • A Variant<...> or CURRENT_VARIANT evolving into a Variant<...> or CURRENT_VARIANT, where all possible cases of source variant contains are legal cases of the destination variant.
  • An Optional<T>, which naturally proxies the evolution down the chain to evolve T.
  • A C++ type supported in the Current TypeSystem, where "identity evolution" rules are applied.

The natural evolver is global with respect to custom evolution logic defined under a specific name via the CURRENT_TYPE_EVOLVER macro.

In other words, if CURRENT_TYPE_EVOLVER(Foo) changes Location into NewLocation, and CURRENT_TYPE_EVOLVER(Bar) changes Location into an std::string, the default evolution would still be applied for all the types from the CURRENT_NAMESPACE that originally contained CURRENT_STRUCT(Location), unless Foo or Bar specifically define a specialization for the evolution of types other than Location.

Boilerplate Evolver

Natural evolvers help when the source type is convertible into the destination type by being either identical or a subset of the latter. In other cases, custom code is required.

To help writing this custom code, the boilerplate evolver is generated alongside the natural evolver and the schema itself.

Two types of boilerplate evolvers generated are boilerplate evolvers for CURRENT_STRUCT-s and Variant<>/CURRENT_VARIANT-s.

The boilerplate evolver for a CURRENT_STRUCT lists all the fields of this struct. Its purpose is to simplify altering the evolution of just a few fields of a large structure. The boilerplate looks like this (example from examples/TypeEvolution/golden):

CURRENT_TYPE_EVOLVER(CustomEvolver, From, FullName, {
  CURRENT_NATURAL_EVOLVE(From, CustomDestinationNamespace, from.first_name, into.first_name);
  CURRENT_NATURAL_EVOLVE(From, CustomDestinationNamespace, from.last_name, into.last_name);
});

The boilerplate evolver for Variant<>/CURRENT_VARIANT lists all variant cases. Its purpose is to simplify altering the evolution when just a few cases of a Variant<> have to be converted into other cases, or ignored altogether. The boilerplate looks like this (example from examples/TypeEvolution/golden):

CURRENT_TYPE_EVOLVER_VARIANT(CustomEvolver, From, ShrinkingVariant, CustomDestinationNamespace) {
  CURRENT_TYPE_EVOLVER_NATURAL_VARIANT_CASE(CustomTypeA, CURRENT_NATURAL_EVOLVE(From, CustomDestinationNamespace, from, into));
  CURRENT_TYPE_EVOLVER_NATURAL_VARIANT_CASE(CustomTypeB, CURRENT_NATURAL_EVOLVE(From, CustomDestinationNamespace, from, into));
  CURRENT_TYPE_EVOLVER_NATURAL_VARIANT_CASE(CustomTypeC, CURRENT_NATURAL_EVOLVE(From, CustomDestinationNamespace, from, into));
};

While the natural evolver is the code which would be compiled upon #include-ing the autogenerated exported C++ "Current" schema, the autogeneratied boilerplate evolver section is deliberately commented out with #if 0/#endif. The user is responsible for copy-pasting the relevant section of the boilerplate evolver and extending it.

Footnote

As the destination type for the type evolver must belong to a CURRENT_NAMESPACE, evolving one Storage into another Storage is problematic due to inner Storage types being autogenerated.

We're weighing the options to have the CURRENT_STORAGE macro to expose its own CURRENT_NAMESPACE, but the chances are slim. In the meantime the best practical solution is to export and #include the schema of both old and new Storage-s, perform data conversion from the old autogenerated CURRENT_NAMESPACE into the new one, and then use the ParseJSON<desired_t>(JSON<original_t>(object)) to get the C++ compiler the type it needs.

For more details, please refer to the examples/TypeEvolution test. The golden/ directory of it contains the exported schema, including the boilerplate type evolver, which is extended in the code.