Skip to content

Latest commit

 

History

History
1845 lines (1429 loc) · 71.6 KB

builder.md

File metadata and controls

1845 lines (1429 loc) · 71.6 KB

Builder Interface Reference

Introduction

We assume a separate read-only file and add extensions to this with support from a builder library and a builder object.

The underlying builder library supports two modes of operation that mix together: create which sends data directly to the target buffer (emitter object) and a stack driven start/end approach which allocates objects and vectors on the stack. The code generator chooses the most efficient approach given the circumstances.

Unlike most FlatBuffer language interfaces, tables and vectors are not created back to front: They are either created completely in one operation, or they are constructed on a stack front to back until they can be emitted. The final buffer is still constructed back to front. For big-endian platforms this may require temporary stack allocation of complete vectors where little endian platforms can emit directly.

Tables and vectors stored in other tables or vectors must be completed before the can be stored, but unlike must language interfaces they can be constructed while a parent is also being constructed as long as nesting remains balanced. While this occasionally may require more stack, it may also avoid external temporary allocation.

A builder object is required to start buffer construction. The builder must be initialized first and can be reset and reused between buffers, reusing stack allocation. The builder can have a customized emitter object but here we use the default. Finalizing the buffer depends the emitter and we can use a default finalizer only because we use the default emitter - it allocates and populates a linear buffer from a paged emitter ring buffer.

Note that in most cases flatcc_builder_finalize_buffer is sufficient, but to be strictly portable, use flatcc_builder_finalize_aligned_buffer and aligned_free. aligned_free is often implemented as free in flatcc/portable but not on all platforms. As of flatcc version 0.5.0 flatcc_builder_aligned_free is provided to add robustness in case the applications aligned_free implementation might differ from the library version due to changes in compile time flags.

Generally we use the monster example with various extensions, but to show a simple complete example we use a very simple schema (myschema.fbs):

table mytable { myfield1: int; myfield2: int; }

#include "myschema_builder.h"

void testfun() {

    void *buffer;
    size_t size;
    flatcc_builder_t builder, *B;
    mytable_table_t mt;
    B = &builder;
    flatcc_builder_init(B);

    /* Construct a buffer specific to schema. */
    mytable_create_as_root(B, 1, 2);

    /* Retrieve buffer - see also `flatcc_builder_get_direct_buffer`. */
    /* buffer = flatcc_builder_finalize_buffer(B, &size); */
    buffer = flatcc_builder_finalize_aligned_buffer(B, &size);

    /* This is read-only buffer access. */
    mt = mytable_as_root(buffer);
    assert(mytable_myfield1(mt) == 1);
    assert(mytable_myfield2(mt) == 2);

    /* free(buffer); */
    flatcc_builder_aligned_free(buffer);

    /*
     * Reset, but keep allocated stack etc.,
     * or optionally reduce memory using `flatcc_builder_custom_reset`.
     */
    flatcc_builder_reset(B);

    /* ... construct another a buffer */

    /* Reclaim all memory. */
    flatcc_builder_clear(B);
}

Note that a compiled schema generates a myschema_reader.h file and optionally a myschema_builder.h and some common support files. When building a buffer the myschema_builder.h must be used but when only reading then the myschema_reader.h file should be used instead. Here we are only concerned with building. When building, it is necessary to link with libflatccrt.a runtime library but when reading, all nesessary code is contained in the generated header files.

The builder object only manages a stack of currently active objects and does not store an object that is complete. Instead it calls an emitter object with the partial data ready for emission, similar to a write function. A default emitter is provided which implements a ring buffer and the result may be written to a file, copied to a buffer or a finalized to an allocated buffer. The builder supports these methods directly for default emitter, and only the default emitter because emitters are otherwise defined by only one simple emit function - see emit_test.c for a simple example of a custom emitter. A custom allocator may be useful when working with small buffers in a constrained environment - the allocator handles temporary stacks, virtual table caches etc. but not the emitter.

The allocator and emitter interface is documented in the builder library header pflatcc_builder.h] and the default implementation in flatcc_emitter.h. The default allocator is implemented as part of the flatcc_builder source.

The builder can be reused between buffers using the reset operation. The default emitter can also be reused and will automaticallhy reset when the buffer is. For custom emitters, any reset operation must be called manually. The same applies to clear. The reset operations maintain allocated memory by also reduce memory consumption across multiple resets heuristically.

Size Prefixed Buffers

Buffers can be created with a size prefix of type uoffset_t. When doing this, the buffer is aligned relative to the size prefix such that buffers can be stacked in a file and for example be accessed via memory mapping.

The usual create_as_root and start_as_root has a variant called create_as_root_with_size and start_as_root_with_size.

To read a buffer with a size prefix use:

size_t size;
buffer = flatbuffers_read_size_prefix(rawbuffer, &size);

The size the size of the buffer excluding the size prefix. When verifying buffers the buffer and size arguments should be used. See also monster_test.c for an example.

Note that the size prefix ensures internal alignment but does not guarantee that the next buffer in a file can be appended directly because the next buffers alignment is unknown and because it potentially wastes padding bytes. The buffer size at offset 0 can increased to the needed alignment as long as endianness is handled and the size of the size field is subtracted, and zeroes are appended as necesary.

Namespaces

The generated code is typically wrapped in a custom namespace and functions and definitions that are library specific are usually mapped into the namespace. We often use an empty namespace for custom types and flatbuffers_ for library names, but usually a foo_ prefix could also be used on both cases, where foo is a custom namespace.

Note that the name flatcc_emitter is only used with the default emitter and the name [flatcc_builder] is only used for buffer management but not for constructing content. Once a valid buffer is ready the common and namespace (flatbuffers) and schema specific (or empty) namespace is used with schema specific operations.

All schema specific content is prefixed with a namespace to avoid conflicts - although the namespace is empty if the schema doesn't specify any. Note that the same schema can have multiple namespaces. An example of a namespace prefixed operation:

MyGame_Example_Monster_create_as_root(B, ... lots of args);

To simplify this we can use a macro to prefix a namespace. The use of the name ns is arbitrary and we can choose different names for different namespaces.

#undef ns
#define ns(x) MyGame_Example_ ## x

But the above doesn't work with nested calls to ns such as

ns(Monster_color_add(B, ns(Color_Green));

it would have to be:

ns(Monster_color_add)(B, ns(Color_Green);

Therefore we have a helper macro the does allow nesting:

#undef ns
#define ns(x) FLATBUFFERS_WRAP_NAMESPACE(MyGame_Example, x)

The common namespace can also be wrapped for a more consistent appearance:

#undef nsc
#define nsc(x) FLATBUFFERS_WRAP_NAMESPACE(flatbuffers, x)

nsc(string_ref_t) s;
s = nsc(string_create_str(B, "hello, world!"));

instead of

flatbuffers_string_ref_t s;
s = flatbuffers_string_create_str(B, "hellow, world!);

Error Codes

Functions return values can be grouped roughly into 4 groups: functions returning pointer, references, size_t lengths, and int status codes. Pointers and references return 0 on error. Sizes do not return error. Status codes return 0 on success or an error code that is usually -1. Status codes may be checked with flatbuffers_failed(...).

Endianess

The function flatbuffers_is_native_pe() provide an efficient runtime check for endianness. Since FlatBuffers are little endian, the function returns true when the native endianness matches the protocol endianness which for FlatBuffers is little endian. We do not hardcode little endian because it enables us to support other protocols in the future - for example the struct conversions may be very useful for big endian network protocols.

As of flatcc 0.4.0 it is possible to compile flatcc with native big-endian support which has been tested on AIX. More details in README Endianness

By testing is_native_pe dependencies on speficic compile time flags can be avoided, and these are fragile:

During build, vectors and structs behave differently from tables: A table updates one field at a time, doing endian conversion along the way. A struct is either placed in a table, and is converted by the table specific operation, or it is placed in a vector. A vector only does the endian conversion when the vector is finished, so when a vector is not created atomically with a single create call, the elements are placed on a stack. By default this is in native format, but the user may choose to place buffer encoded structs or scalars in the vector and call vec_end_pe. The same push operation can be used to place a natively encoded struct and a buffer encoded struct in the vector because it does no conversion at that point. Therefore there is also no push_pe method that would mean to push an unconverted element unto the stack. Only for tables and entire vectors does the pe command make sense. If a vector wishes to push a buffer encoded struct when the vector is otherwise constructed in native encoding or vice versa, the vector may be extended empty and then assigned using any of the assign, assign_from_pe or assign_to_pe calls.

We did not mention that a struct can also be a standalone object as a buffer root, and for that it has a end_pe call that essentially works like a single element vector without a length prefix.

The clone operation is a more userfriendly pe operation which takes an object or a vector from an existing buffer and places it in a new buffer without endian conversion.

Deprecated

NOTE: FLATBUFFERS_LITTLEENDIAN is deprecated and will be removed in a future version. It just complicates endina handling.

The header files tries to define FLATBUFFERS_LITTLEENDIAN to 0 or 1 based on system definitions but otherwise leaves the flag undefined. Simply testing for

#if FLATBUFFERS_LITTLEENDIAN
...
#endif

will not fail if the endianness is undetected but rather give the impression that the system is big endian, which is not necessarily true. The flatbuffers_is_native_pe() relates to the detected or system provided conversion functions if a suitable endian.h file after the header file gave up on its own detection (e.g. le16toh(1) == 1). Therefore, it is better to use flatbuffers_is_native_pe() in most cases. It also avoids making assumptions on whether the protocol is little or big endian.

Buffers

A buffer can most simply be created with the create_as_root call for a table or a struct as seen ealier. The as_root part is just a thin wrapper around buffer start and stop calls and using these allows for more flexibility. the as_root also automatically uses the defined file identifier if any.

The build process begins with starting a buffer. The buffer may contain a struct or table, so one of these should be constructed subsequently. Structs are generally created inline in tables, only at the buffer level is a struct created independently. The api actually permits other formats, but it will not be valid flatbuffers then.

flatcc_builder_ref_t root;
flatcc_builder_init(B);
/* 0 indicates no file identifier. */
flatcc_builder_buffer_start(B, 0);
root = /* ... construct a table or a struct */
flatcc_builder_buffer_end(B, root);

buffer_start takes a file identifier as second argument. If null or a string with null characters, the identifier is not stored in the buffer.

Regardless of whether a struct or table is declared as root in the schema or not, there are methods to automatically start both the buffer and struct or buffer and table such as Monster_start/end_as_root. This is also valid for nested buffers. If the schema has a file identifier, it is used as identifier for the created object. The alternative create_as_root_with_identifier allows for explicitly setting an id or explicitly dropping an id by providing a null argument. The corresponding reader function Monster_as_root(buffer) also has a Monster_as_root_with_identifier(buffer, id). Here the id is ignored if the id is null, and otherwise the operation returns null if the id does not match. For the most part ids are handled transparently by these defaults.

The buffer can be started with block alignment and/or a custom identifier using the flatcc_builder_buffer_start_aligned:

flatcc_builder_buffer_start_aligned(B, "myid", 16);
...
flatcc_builder_buffer_end(B, root);

The alignment can be 0 using the minimum required alignment, which is derived from the operations between start/end. The alignment argument is called block_align and is useful if the emitter operates on blocks such as encryption, cache line isolation, or compression blocks where the final buffer should align with the blocks used during construction. This can lead to significant zero padding just after the block header, depending on block size.

The schema specified identifier is given as:

flatbuffers_identifier

and defaults to null. The schema specified extension is given as:

flatbuffers_extension

and defaults to null. Note that flatbuffers_ is replaced by whatever namespace is chosen. Each specific schema type also has a named file exntension reflection the extension active when the type was defined, for example:

MyGame_Example_Monster_file_identifier

This define is used when create_as_root automatically sets a file identifier.

NOTE: before flatcc 0.6.1, the identifier was named

MyGame_Example_Monster_identifier (DEPRECATED)

but that would conflict with a table field named identifier which happened often enough to be a problem. This naming is now removed on conflict and will be completely removed in a future version.

When the buffer is ended, nothing special happens but only at this point does it really makes sense to access the resulting buffer. The default emitter provides a copy method and a direct buffer access method. These are made available in the builder interface and will return null for other emitters. See also flatcc_builder.h and the default emitter in flatcc_emitter.h.

Tables

Adding Fields

If Monster is a table, we can create a Monster buffer (after builder init) as follows:

Monster_start(B);
Monster_Hp_add(B, 80);
...
flatcc_builder_buffer_create(B, Monster_end(B));

All scalar and enums are added similar to the Monster_add_Hp call. We will subsequently see how to deal with other types.

A table can also be created in a single operation using create:

Monster_ref_t m;
m = Monster_create(B, 80, ...);

The create arguments are those taken by the individual fields add operations which is either an scalar, enum, or a reference returned by another create or end call. Note that unlike the C++ interface, unions only take a single argument that is also accepted by the add operation of a union field. Deprecated fields are not included in the argument list.

As of v0.5.3 the arguments are given in field id order which is usually the same as the schema listed order, except with id attributes are given explicitly. Using id order ensures version stability. Note that since deprecated fields are omitted, deprecated fields can still break existing code.

BREAKING: Prior to flatcc v0.5.3 the create call would use the schema order also when fields have id attributes specifying a different order. This could break code across versions and did not match the C++ behavior. It was also document that the original_order attribute affected create argument order, but that was incorrect.

NOTE: If the original_order attribute is set on a table, the create implementation adds fields to the table in schema listed order, otherwise it adds fields in order of decreasing size to reduce alignment overhead. Generally there should be no need to use the original_order attribute. This doesn't affect the call argument order although that was incorrectly document prior to v 0.5.3.

NOTE: the create and create_as_root operations are not guaranteed to be available when the number of fields is sufficiently large because it might break some compilers. Currently there are no such restrictions.

Scalars and enums do not store the value if it it matches the default value which is by default 0 and otherwise defined in the schema. To override this behavior, use force_add. In the monster example, health points default to 100 (percent), so if we wish to force store it in the buffer we could use:

Monster_hp_force_add(B, 100);

Only scalar fields and enums have a force_add operation since only these types have a default value, and other types have a meaningful interpretation of null. (It is not quite clear if empty tables separate from null/absent are valid in all implementations).

force_add may be useful when roundtripping data from a database where it is relevant to distinguish between any valid value and null. Most readers will not be able to tell the difference, but it is possible to inspect a flatbuffer to see if a table field is present, present and default, or absent, meaning null.

NOTE: As of mid 2020, FlatBuffers added optional scalar table fields with support in flatcc 0.6.1. These fields automatically imply force_add to represent null values when a field is absent and therefore these fields do not have a force_add method and these fields also do not have a default value other than null, i.e. null if not added.

If Monster is declared as root, the above may also be called as:

Monster_start_as_root(B);
Monster_add_hp(B, 80);
...
Monster_end_as_root(B);

(Calling Monster_end instead would require buffer_end call subsequently, and is basically a violation of nesting).

Nested Tables

Tables can be nested, for example the Mini field may have type Monster table again (a recursive type):

buffer_start(B);
Monster_start(B);
Monster_add_Hp(B, 80);
Monster_start(B);
Monster_hp_add(B, 81);
...
Monster_mini_add(Monster_end(B));
...
flatcc_builder_buffer_end(B, Monster_end(B));

The child Monster table may be created before the parent or as above between the tables start and end. If created before, reference must be stored until it can be added. The only requirement is that start and end are balanced, that the sub-table is ended before the parent, and that both are created in the same buffer (nested buffers can be created while the parent buffer is still being created, similar to sub-tables, so it is possible to mess this up):

Monster_ref_t root, mini;

buffer_start(B);
Monster_start(B);
Monster_hp_add(B, 81);
mini = Monster_end(B);

Monster_start(B);
Monster_hp_add(B, 80);
Monster_mini_add(B, mini);
root = Monster_end(B);

flatcc_builder_buffer_end(B, root)

Rather than adding a child table explicitly, it can be started and ended as an operation on the field name, here with Monster_Mini_start/end:

Monster_ref_t root;

Monster_start(B);
Monster_add_Hp(B, 80);
Monster_mini_start(B);
Monster_hp_add(B, 81);
Monster_mini_end(B);
root = Monster_end(B);

flatcc_builder_buffer_end(B, root);

We can repeat the the table nesting as deep as we like, provided our builder is willing to allocate enough stack space.

Warning: It is possible to use the wrong table type operations between start/end - don't do that. It is a tradeoff between usability and type safety.

Note that vectors, strings and structs map several standard operations to a field name, for example mytable_myfield_push(B, x). This is not the case with table fields which only map start/end/create in part because it would never terminate for recursive types and in part because each table is different making a generic mapping rather complex and with very long names.

A table may be created with a constructor, but it requires all non-scalar objects to be references or pointers. Struct fields must be pointers to zero padded structs, and strings, vectors and tables must be references. The constructors are probably most useful for simple tables with mostly scalar values (here we use the original Monster fields and leaves out any we have invented for the sake of illustration):

IMPORTANT: objects can generally only be created within a buffer context, i.e. after buffer_start. For example calling flatbuffers_uint8_vec_create before Monster_create_as_root technically violates this rule because the create call also starts the buffer. It is, however, allowed at the top level. For nested buffers (see later) this must be avoided because the vector would end up in the wrong buffer.

Monster_ref_t m;
uint8_t invdata[4] = { 1, 2, 3, 4 };
Vec3_t vec;

flatbuffers_uint8_vec_ref_t inventory =
    flatbuffers_uint8_vec_create(B, invdata, 4);
m = Monster_create(B, &vec, 150, 80, name, inventory,
    Color_Red, Any_as_NONE());
flatcc_builder_buffer_create(m);

or

Monster_create_as_root(B, &vec, 150, 80, name, inventory,
    Color_Red, Any_as_NONE());

Packing tables

By reordering the fields, the table may be packed better, or be better able to reuse an existing vtable. The create call already does this unless the attribute original_order has been set. Unions present a special problem since it is two fields treated as one and the type field will generally waste padding space if stored in order:

To help pack unions better these can be added with the type seperate from the value reference using add_type(B, test.type), add_value(B, test) where the value is only added if the type is not NONE. The add_type should be called last since it is the smallest type.

The same field should not be added more than at most once. Internal reservations that track offset fields may overflow otherwise. An assertion will fail in debug builds.

Required table fields will be asserted in debug builds as part of the end/create call. Only offset fields can have a required attribute.

The generated monster_test_reader.h from monster_test.fbs shows how the default packing takes place in generated create calls, see for example the typealias test. Note that for example vectors are stored together with integers like uint32 because references to vectors have the same size as uint32.

Strings

Strings can be added to tables with zero terminated strings as source

Monster_start(B);
...
Monster_name_create_str(B, "Mega Monster");
Monster_end(B);

or strings potententially containing zeroes:

#define MONSTER "Mega\0Monster"
Monster_start(B);
...
/* Includes embedded zero. */
Monster_name_create(B, MONSTER, sizeof(MONSTER));
Monster_end(B);

or zero terminated source up to at most max_len characters.

#define MONSTER "Mega\0Monster"
Monster_start(B);
...
/* "Mega" */
Monster_name_create_strn(B, MONSTER, 12);
Monster_end(B);

The create_str and create_strn versions finds the string length via strlen and strnlen respectively. append_string also has _str/_strn versions.

A string can also be created from an existing flatbuffer string in which case the length is expected to be stored 4 bytes before the pointer in little endian format, and aligned properly:

Monster_name_clone(B, mybufferstring);

or, create a string at most 4 characters long starting at 0-based index 10, if present:

Monster_name_slice(B, mybufferstring, 10, 4);

If index or index + len goes beyond the source, the result is truncated accordingly, possibly resulting in an empty string.

A string can also be create independently. The above is just shortcuts for that:

flatbuffers_string_ref_t monster_name;
monster_name = flatbuffers_string_create_str("Mega Monster");
Monster_name_add(B, monster_name);

Strings are generally expected to be utf-8, but any binary data will be stored. Zero termination or embedded control codes are includes as is. The string gets a final zero temination regardless, not counted in the string length (in compliance with the FlatBuffers format).

A string can also be constructed from a more elaborate sequence of operations. A string can be extended, appended to, or truncated and reappended to, but it cannot be edited after other calls including calls to update the same string. This may be useful if stripping escape codes or parsed delimiters, etc., but here we just create the same "Mega Monster" string in a more convoluted way:

flatbuffers_string_ref_t name;
char *s;
#define N 20
Monster_start(B);
...
flatbuffers_string_start(B);
flatbuffers_string_append(B, "Mega", 4);
flatbuffers_string_append(B, " ", 1);
s = flatbuffers_string_extend(B, N);
strncpy(s, "Monster", N);
flatbuffers_string_truncate(B, N - strlen(s));
name = flatbuffers_string_end(B);
Monster_name_add(B, name);
...
Monster_end(B);

flatbuffers_string_create... calls are also available when creating the string separate from adding it to a table, for example:

flatbuffers_string_h name;
name = flatbuffers_string_create_str(B, "Mini Monster");

It is guaranteed that any returned the string buffer is zero filled and has an extra zero after the requested length such that strlen can be called on the content, but only the requested bytes may be updated.

Every call only returns the substring being added to the string in that operation. It is also possible to call flatbuffers_string_edit to get a modifiable pointer to the start of the string.

flatbuffers_string_reserved_len(B) returns the current string length including any embedded zeroes, but excluding final zero termination. It is only valid until string_end is called.

See flatcc_builder.h for detailed documentation. Essentially extend reserves zeroed space on the stack and returns a buffer to the new space, and truncate reduces the overall size again, and the string is then given the final length and a zero termination at the end.

There is no endian conversion (except internally for the string length), because UTF-8 strings are not sensitive to endianness.

Like tables, the string may be created while a parent container is being constructed, or before.

Strings can also be used as vector elements, but we will get that when discussing vectors.

Structs

Structs in tables can be added as:

Monster_pos_create(B, 1, 2, 3);

The above essentially does the following:

Vec3_t *v;
v = Monster_pos_start(B);
Vec3_assign(v, 1, 2, -3.2);
Monster_pos_end(B);

Some versions of the monster schema has extra test fields - these would break the assign approach above because there would be extra arguments. Instead we can rely on the zero intialization and assign known fields.

Vec3_t *v;
v = Monster_pos_start(B);
v->x = 1, v->y = 2, v->z = -3.2;
Monster_pos_end(B);

Monster_pos_end_pe(B) can be used when the struct is known to be little endian (pe for protocol endian, meaning no conversion is necessary), for example copied from an existing buffer, but then clone is a better choice:

Monster_pos_clone(B, &v);

When the struct is created alone for use as root:

Vec3_ref_t root;
root = Vec3_create(B, 1, 2, 3)
flatcc_builder_buffer_create(B, root);

An existing struct can be added as:

Vec3_t v;
Vec3_assign(&v, 1, 2, 3);
/* v does not have to be zero padded. */
Monster_pos_add(B, &v);

When adding a struct that is already little endian, presumably from an existing buffer, it can be cloned using:

Monster_pos_clone(B, &v);

Clone assumes the source struct is both little endian and that padding is already zeroed (example ignores error handling), and end_pe does nothing.

*Monster_pos_start(B) = v;
Monster_pos_end_pe(B);

There are several assignment types that convert between host (native) endianness and buffer endiannes. We use pe to indicate protocol_endian rather than just le for little endian because it allows us to change endianness to big endian in the the future and it more clearly states the intention. While big endian is not allowed in FlatBuffers, big endian structs may be useful in other network protocols - but it is not currently supported because it would force little endian platforms to support byte-swapping. The operations are:

assign_from_pe, assign_to_pe, copy, copy_from_pe, copy_to_pe, to_pe and from_pe.

All the copy operations takes a const pointer as source, and to/from_pe is just copy with same source and destination:

Vec3_t v, v2;
Vec3_assign_to_pe(&v2, 1, 2, 3);
Vec3_copy_from_pe(Vec3_clear(&v), &v2);
Vec3_to_pe(&v);

from_pe means from little endian to native endian, end to_pe is the opposite. On little endian platforms all copy operations behave the same and only move fields, not padding. to/from_pe conversion will leave deprecated fields either as they were, or zero them because the operation may be skipped entirely on protocol endian native platforms.

While struct fields cannot be deprecated officially, they are supported if the schema compiler is flagged to accept then. The struct fields are renamed and assigned 0 when using assign or copy, and assign / create has no argument for them.

Because padding can carry noise and unintended information, structs should be cleared before assignment - but if used as a source to copy the padding is not copied so only the destation need to be zeroed.

If a struct is nested, the assign operation includes all fields as if the struct was flattened:

typedef struct Plane Plane_t;
struct Plane {
    Vec3_t direction;
    Vec3_t normal;
};
Plane_t plane;
Plane_clear(&plane);
Plane_assign(&plane, 1, 2, 3, 7, 8, 9);

Structs can also be created standalone, similar to tables and vectors, but FlatBuffers only support this when the struct is used as root.

Assuming Vec3 is declared as root, a buffer only holding a Vec3 struct can be created using:

Vec3_create_as_root(B, 1, 2, 3);

Important: do not store the above as a nested buffer - it would be missing the vector size field. If Monster_playground is a ubyte vector with nested_flatbuffer attribute, then Monster_playground_start/end_as_root may be used.

Structs also support start/end_as_root. In this case start returns the struct pointer, and end_pe_as_root is supported:

Vec3_t *v;
v = Vec3_start_as_root(B);
v->x = 1, v->y = 2, v->z = 3;
Vec3_end_as_root(B);

(Be careful with the different result codes since a tables start_as_root returns an integer result code where 0 is success while a struct returns a pointer that is null on failure.)

The following also creates a buffer at top-level, but it may also be added as a nested buffer because the stack frame detects the nesting:

Vec3_t *v;
flatcc_builder_buffer_start(B);
v = Vec3_start(B);
v->x = 1, v->y = 2, v->z = 3;
flatcc_builder_buffer_end(B, Vec3_end(B));

or flatcc_builder_buffer_start(B); ... Monster_start(B); flatcc_builder_buffer_start(B); v = Vec3_start(B); v->x = 1, v->y = 2, v->z = 3; Monster_playground_add(B, flatcc_builder_buffer_end(B, Vec3_end(B))); flatcc_builder_buffer_end(B, Monster_end(B));

or

flatcc_builder_buffer_ref_t nested_root;
flatcc_builder_buffer_start(B);
nested_root = Vec3_create_as_root(B, 1, 2, 3);
Monster_start(B);
Monster_playground_add(B, nested_root);
flatcc_builder_buffer_end(B, Monster_end(B));

A buffer_ref_t can be used as uint8_vec_ref_t when the buffer is nested, and otherwise the reference cannot be used for anything other than testing for failure. The buffer content should match the type declared in a nested_flatbuffers attribute but it isn't enforced, and a root can be stored in any field of [ubyte] type.

When Monster_playground is declared as nested:

...
Monster_start(B);
Monster_playground_create_as_root(B, 1, 2, 3);
flatcc_builder_buffer_end(B, Monster_end(B));
...

Be aware that Vec3_t is for native updates while Vec3_struct_t is a const pointer to an endian encoded struct used in the reader interface, and actually also as source type in the clone operation.

Fixed Length Arrays in Structs

As of flatcc 0.6.0 it is possible to have fixed length arrays as structs members. A fixed length array is equivalent to having a struct field repeated one or more times. The schema syntax is name : [type:count]; similar to an ordinary struct field name : type;. The type is any type that can ba valid struct field type including enums and nested structs. The size cannot be 0 and the overall size is limited by the maximum struct size the array is contained within which is typically 65535 (2^16-1).

For example, given the schema:

struct MyStruct {
  counters:[int:3];
  // char is only valid as a fixed length array type
  name:[char:6];
}
table MyTable {
  mystruct:MyStruct;
}

The table can be created with:

ns(MyStruct_t) *x;
ns(MyTable_start_as_root(B));
x = ns(MyTable_mystruct_start(B));
x->counters[0] = 1;
x->counters[1] = 2;
x->counters[2] = 3;
strncpy(x->name, "Kermit", sizeof(x->name));
ns(MyTable_mystruct_end(B));
ns(MyTable_end_as_root(B));

Note that char arrays are not zero terminated but they are zero padded, so strncpy is exactly the right operation to use when assigning to char arrays, at least when they do not contain embedded nulls which is valid. Char arrays are expected to be ASCII or UTF-8, but an application may use other encodings if this is clear to all users.

With assignment:

int data[3] = { 1, 2, 3 };
ns(MyStruct_t) *x;
ns(MyTable_start_as_root(B));
x = ns(MyTable_mystruct_start(B));
// Careful: the name argument does not use strncpy internally
// so the source must be at least the expected length
// like other array arguments. Strings can have embedded nulls.
ns(MyStruct_assign(x, data, "Kermit");
ns(MyTable_mystruct_end(B));
ns(MyTable_end_as_root(B));

To read a struct the pointer to the struct is retrieved first

int sum;
int i;
const char *name;
size_t name_len;
ns(MyTable_table_t) t;
ns(MyStruct_struct_t) x;

t = ns(MyTable_as_root(buf));
x = ns(MyTable_mystruct_get(t));
for (sum = 0, i = 0; i < ns(MyStruct_counters_get_len()); ++i) {
  sum += ns(MyStruct_counters_get(x, i)) +
  // char arrays are endian neutral, so we can use pointer access.
  name = ns(MyStruct_name_get_ptr(x);
  name_len = strnlen(name, ns(MyStruct_name_get_len()));
  printf("Added counters from %.*s", name_len, name);
  // char arrays can be accessed like other arrays:
  // ns(MyStruct_name_get(x, i);
}

An alternative to strnlen is strip trailing zeroes which will allow for char arrays embedded zeroes, but there is no direct support for this. The JSON printer uses this approach to shorten the printed char array string.

The _get suffix can be ommitted in the above if the flatcc -g has not supplied to reduce the risk of name conflicts, but not for _get_len and _get_ptr.

Note that it is not possible to have fixed length arrays as part of a table but it is possible to wrap such data in a struct, and it is also possible to have vectors of structs that contain fixed length arrays.

Nested Buffers

These are discussed under Structs and Table sections but it is worth noting that a nested buffers can also be added as pe ubyte vectors which is probably the original intention with nested buffers. However, when doing so it can be difficult to ensure the buffer is correctly aligned. The untyped flatcc_builder has various options to deal with this, but with generated code it is better to create a nested buffer inline when suitable (with nested buffer_start/end or mytable_myfield_create_as_root) - for example a message wrapper with a union of tables holding buffer for a specific message type. In other cases the buffer may truly be created independently of the current buffer and then it can be added with controlled alignment using either the flatcc_builder api for full control, or the nest operation on nested table and struct fields:

To create and add a ubyte vector with a higher alignment than ubytes single byte alignment, the following operation is available as an operation on a nested buffer field:

Monster_playground_nest(B, void *data, size_t size, uint16_t align);

If alignment is unknown, it can be set to 0, and it will default to 8 for nested table types, and to the struct alignment for struct buffers.

Block alignment is inherited from the parent buffer so the child buffer ends up in its own set of blocks, if block alignment is being used. If the nested buffer needs a different block alignment, the flatcc_builder api must be used.

All structs and tables have an start/end/create_as_root even if they are not referenced by any nested_flatbuffers field and they will create [ubyte] vectors containing a nested buffer but only [ubyte] fields with nested_flatbuffers attribute will dedicated start/end/create_as_root on the field name. Structs also have end_pe_as_root.

Scalars and Enums

Scalars keep their original type names uint8_t, double, etc, but they get some operations similar to structs. These are contained in a namespace which by default is flatbuffers_, for example:

uint16_t *flatbuffers_uint16_to_pe(uint16_t *p);
uint16_t *flatbuffers_uint16_from_pe(uint16_t *p);
flatbuffers_bool_t *flatbuffers_bool_to_pe(flatbuffers_bool_t *p);
flatbuffers_bool_t *flatbuffers_bool_from_pe(flatbuffers_bool_t *p);

These may be used freely, but are primarily present as an interface to the vector operations also defined for structs.

Enums have similar definitions which may be used to convert endianness without being concerned with the underlying integer type, for example:

Color_enum_t *Color_to_pe(Color_enum_t *p);

Vectors

Vectors can be created independently, or directly when updating a table - the end result is the same. Builder vector operations always reference element values by pointer, or by reference for offset types like tables and strings.

uint8_t v;
Monster_inventory_start(B);
v = 1;
flatbuffers_uint8_vec_push(B, &v);
v = 2;
flatbuffers_uint8_vec_push(B, &v);
v = 3;
flatbuffers_uint8_vec_push(B, &v);
Monster_inventory_end(B);

or

flatbuffers_uint8_vec_ref_t inv;
uint8_t v;
flatbuffers_uint8_vec_start(B);
v = 1;
flatbuffers_uint8_vec_push(B, &v);
v = 2;
flatbuffers_uint8_vec_push(B, &v);
v = 3;
flatbuffers_uint8_vec_push(B, &v);
inv = flatbuffers_uint8_vec_end(B);
Monster_inventory_add(B, inv);

Because it can be tedious and error-prone to recall the exact field type, and because the operations are not type safe (any kind of push would be accepted), some vector operations are also mapped to the field name:

uint8_t v;
Monster_inventory_start(B);
v = 1;
Monster_inventory_push(B, &v);
v = 2;
Monster_inventory_push(B, &v);
v = 3;
Monster_inventory_push(B, &v);
Monster_inventory_end(B);

Note: vector operations on a type uses the _vec_<operation> syntax, for example uint8_vec_push or Monster_vec_push while operations that are mapped onto table field names of vector type do not use the _vec infix because it is not a type name, for example Monster_inventory_push.

A slightly faster operation preallocates the vector:

uint8_t *v;
Monster_inventory_start(B);
v = Monster_inventory_extend(B, 3);
v[0] = 1, v[1] = 2, v[2] = 3;
v = Monster_inventory_extend(B, 2);
v[0] = 4, v[1] = 5;
Monster_inventory_end(B);

Push just extends one element at time. Note that extend returns the pointer to the extended vector segment. The full vector can be accessed with edit and reserved_len between start/end (recalling that pointers cannot be reused across buffer calls):

uint8_t *v, i;
uint8_t data[] = { 1, 2 };
Monster_inventory_start(B);
Monster_inventory_push(B, &data[0]);
Monster_inventory_push(B, &data[1]);
v = Monster_inventory_edit(B);
for (i = 1; i < Monster_inventory_reserved_len(B); ++i) {
    v[i] = v[i - 1] + v[i];
}
Monster_inventory_end(B);

Note that the name reserved_len is to avoid confusion with _vec_len read operation. It also indicates that it is not the final size since it may change with truncate/extend.

A vector can also contain structs. Let us extend the Monster example with a vector of positions, so we can have a breadcrumb trail:

Monster_breadcrumbs_start(B);
Vec3_vec_push_create(B, 1, 2, 3);
Vec3_vec_push_create(B, 3, 4, 5);
Monster_breadcrumbs_end(B);

or

Monster_breadcrumbs_start(B);
Monster_breadcrumbs_push_create(B, 1, 2, 3);
Monster_breadcrumbs_push_create(B, 3, 4, 5);
Monster_breadcrumbs_end(B);

or

Vec3_t *trails[2];
Monster_breadcrumbs_start(B);
trails = Monster_breadcrumbs_extend(B, 2);
Vec3_create(&trails[0], 1, 2, 3);
Vec3_create(&trails[1], 4, 5, 6);
Monster_breadcrumbs_end(B);

The vec_start/exttend/end/end_pe/create/create_pe/clone/slice are translated into similar calls prefixed with the field name instead of vector and except for start, the calls also add the vector to the table if successful, for example:

uint8_t data[] = { 1, 2, 3 };
Monster_inventory_create(B, data, 3);
Monster_breadcrumbs_slice(B, some_other_breadcrumbs, 0, 10);

Vector operations that are allowed between vec_start and vec_end(_pe) are also mapped. These are vec_extend/append/truncate/edit/reserved_len, and push/push_create/push_copy. push_copy ensures only valid fields are copied, not zero padding (or the unofficial deprecated fields).

A struct push_clone is the same as a push_copy operation because structs are stored inline in vectors - with the exception of union vectors which have push_clone that does the right thing.

The add call adds a vector created independently from the table field, and this is what is going on under the surface in the other calls:

Vec3_t x;
Vec3_vec_ref_t inv;

/* Clear any padding in `x` because it is not allocated by builder. */
Vec3_assign(Vec3_clear(&x), 3, 4, 5);
Vec3_vec_start(B);
Vec3_vec_push_create(B, 1, 2, 3);
Vec3_vec_push(B, &v);
inv = Vec3_vec_end(B);

Monster_breadcrumbs_add(B, inv);

As always, a reference such as inv may only be used at most once, and should be used once to avoid garbage.

Note that Vec3_vec_start would create an independent struct instead of a vector of structs. Also note that vec_ref_t is a builder specific temporary type while vec_t is intended as a const pointer to the first element in an existing buffer in little endian encoding with a size prefix (to be used with clone, for example).

An existing Vec3 struct can also be pushed with Vec3_push(B, &v). The argument must be zero padded. Because vectors are converted at the end, there is no push_pe, but a struct may be in little endian using push on all platforms if vec_end_pe is used at the end.

A vector may also be created from an existing array:

uint8_t data[] = { 1, 2, 3 };
Monster_inventory_add(B, flatbuffers_uint8_vec_create(B, data, 3));

This also applies to arrays of structs as long as they are properly zero padded. create_pe is similar but does not do any endian conversion, and is similar to clone except there are no header prefix.

Likewise an existing vector with proper zero padding may be appended using the extend operation. The format must be native or little endian depending on whether vec_end or vec_end_pe is called at the end.

All vectors are converted to little endian when the end command is called. end_pe prevents this from happening.

clone and slice and can be used to copy an entire, or a partial array from an existing buffer. The pointer must be to the first vector element in little endian format, and it must have a size prefix and be aligned (like any flatbuffer vector). slice takes a base-0 index and a vector length where the result is truncated if the source is not large enough.

Monster_inventory_clone(B, v);

or

Monster_inventory_add(flatbuffers_int8_clone(B, v);

or

Monster_inventory_add(flatbuffers_int8_slice(B, v, 2, 4);

or

Monster_inventory_slice(B, v, 2, 4);

A vector of strings an be constructed as (friends is a string vector field that we just invented for the occasion):

flatbuffers_string_ref_t friend, *p;
Monster_friends_start(B);
  friend = flatbuffer_string_create_str(B, "Peter Pan");
  Monster_friends_push_create_str(B, "Shrek");
  Monster_friends_push_create_str(B, "Pinnochio");
  Monster_friends_push_create_str(B, "Pinnochio");
  Monster_friends_push_create(B, "Hector", 6);
  Monster_friends_push(friend);
  p = Monster_friends_extend(B, 1);
  *p = flatbuffers_string_create_str("Cindarella");
  Monster_friends_push_start(B);
    flatbuffers_string_append("The Little");
    flatbuffers_string_append("Mermaid");
  Monster_friends_push_end(B);
Monster_friends_end(B);

Vectors and strings have a second argument to start, see also the spawn example below.

Finally, vectors can contain tables. Table vectors are offset vectors just like string vectors. push_start pushes a new table and allows for updates until push_end. If we have a spawn vector of monsters in the Monster table, we can populate it like this:

Monster_spawn_start(B);
  Monster_vec_push_start(B);
    Monster_Hp_add(B, 27);
  Monster_vec_push_end(B);
  Monster_vec_push_create(B,
    /* Approximate argument list for illustration only. */
    &vec, 150, 80, name, inventory, Color_Red, Any_as_None());
Monster_spawn_end(B);

The push operation has constructors push_start/end/create for both tables struct, and string elements. String elements also have push_create_str/create_strn/clone/slice. Structs also have push_copy. Between push_start and push_end the operations valid for the given table or string element can be used (typically add for tables, and append for strings).

Instead of Monster_vec_push_start we can also uses Monster_spawn_push_start etc. - in this case the child type is the same as the parent, but using the field specific push_start ensures we get the right table element type.

Monster_spawn_push_start(B) takes no length argument because it is a table element, while Monster_friends_push_start(B) because it is a string element (similar to a vector).

Monster_spawn_start(B) should just be followed by push operations rather than following up with Monster_spawn_extend(B, n) because we risk loose references that can lead to crashes. But handled carefully it is possible:

Monster_vec_ref_t mvec;
Monster_spawn_start(B);
mvec = Monster_spawn_extend(B, 2);
mvec[0] = Monster_create(B, ...);
mvec[1] = Monster_create(B, ...);
Monster_spawn_end(B);

We can also push a reference to an independently create monster table, all as seen before with strings.

As of flatcc version 0.5.2 it is also possible to clone tables. Therefore we also have push_clone on vectors of tables.

While the use of extend and truncate is possible with vectors of strings and tables, they should be used with care because the elements are references and will just end up as garbage if truncated. On the other hand, unused elements should be truncated as 0 elements in an offset vector is not valid.

A vector of tables or strings can be created using an externally built array of references, for example:

Monster_ref_t monsters[20];
Monster_vec_ref_t mvec;
monsters[0] = Monster_create(B, ...);
...
mvec = Monster_vec_create(B, monsters, 20);

By convention, create calls bypass the internal stack when the endian format is otherwise compatible, and thus feed the emitter directly. This is not possible with table and string vectors because the references in the source vectors must be translated into offsets. Therefore these create calls are similar to start, append, end calls. There is an internal, but unexposed flatcc_builder version create_offset_vector_direct which destroys the source vector instead of allocating a stack copy.

Unions

Unlike the C++ Flatbuffers library, we do not expose a separate union type field except via a small struct with a union of typed references and a type field. This struct is given to the create argument, and above it is zero initialized meaning default None.

Unions can be created with value specific start/end/create calls. The add call is not specialized since it takes a union reference:

Monster_test_Weapon_start(B);
Weapon_rounds_add(B, 50);
Monster_test_Weapon_end(B);

or

Monster_test_Weapon_create(B, 50);

or

Monster_test_Weapon_add(B, Weapon_create(B, 50));

or

Monster_test_Pickup_start(B);
Pickup_location_create(B, 0, 0, 17);
Pickup_hint_create_str(B, "Jump High!");
Monster_test_Pickup_end(B);

or

Pickup_ref_t test;
Pickup_start(B);
Pickup_location_create(B, 0, 0, 17);
test = Pickup_end(B);
Monster_test_add(B, Any_as_Pickup(test));

or

Any_union_ref_t test;
Pickup_start(B);
Pickup_location_create(B, 0, 0, 17);
/* test.Pickup = Pickup_end(B); no longer possible as of v0.5.0 */
test.value = Pickup_end(B); /* As of v0.5.1. */
test.type = Any_Pickup;
Monster_test_add(B, test);

The following is valid and will not return an error, but also has no effect:

Monster_test_add(B, Any_as_NONE());

Note: the union structure has been changed for v0.5.0, and v0.5.1. Both unions and union vectors are now represented by a struct with the fields { type, value } in the low level interfaces. Before 0.5.0 only unions of tables were supported.

Union Vectors

The monster_test.fbs schema has a field named manyany in the Monster table. It is vector of unions of type Any.

We can create a vector using

Any_union_vec_ref_t anyvec_ref;

Any_vec_start(B);
Any_vec_push(TestSimpleTableWithEnum_create(B));
anyvec_ref = Any_vec_end(B);
Monster_manyany_add(anyvec_ref);

A union can be constructed with type specific _push or _push_create operations:

Monster_manyany_start(B);
Monster_manyany_push(B, Any_as_TestSimpleTableWithEnum(ref));
Monster_manyany_end(B);

Monster_manyany_start(B);
Monster_manyany_TestSimpleTableWithEnum_push(B, ref);
Monster_manyany_end(B);

Monster_manyany_start(B);
Monster_manyany_TestSimpleTableWithEnum_push_create(B, args);
Monster_manyany_end(B);

and other similar operations, much like other vectors.

Note that internally anyvec_ref is really two references, one to type vector and one to a table vector. The vector is constructed a single vector of unions and later split into two before final storage. If it is necessary to create a union vector from a vector of tables and types, the low level builder interface has a direct call to do this.

Union vectos generally use more temporary stack space because during construction because each element as a struct of type and reference which don't back as densely as a two separate tables. In addition the separated type and table vectors must be constructed temporarily. The finaly buffer result is resonably compatct since the type vector does not use much space. Unions will also be somewhat slower to construct, but not unreasonably so.

Unions of Strings and Structs

Note: as of v0.5.0 unions can also contain strings and structs in addition to tables. Support for these types in other languages may vary, but C++ does support them too.

All union values are stored by reference. Structs that are not unions are stored inline in tables and cannot be shared but unions of struct type are stored by reference and can be shared. A union value is therefore always a reference. This is mostly transparent because the generated table field methods has create/start/end calls for each union value type and addition to add.

To illustrate the use of these variation we use the Movie table from monster_test.fbs:

namespace Fantasy;

table Attacker {
    sword_attack_damage: int;
}

struct Rapunzel {
    hair_length: uint16;
}

struct BookReader {
    books_read: int;
}

union Character {
    MuLan: Attacker = 2,  // Can have name be different from type.
    Rapunzel = 8,         // Or just both the same, as before.
    Belle: Fantasy.BookReader,
    BookFan: BookReader,
    Other: string,
    Unused: string = 255
}

table Movie {
    main_character: Character;
    antagonist: Character;
    side_kick: Character;
    cameo: Character;
    characters: [Character];
}

and the mixed type test case from monster_test.c:

nsf(Character_union_ref_t) ut;
nsf(Rapunzel_ref_t) cameo_ref;
nsf(Attacker_ref_t) attacker_ref;
nsf(BookReader_ref_t) br_ref;
nsf(BookReader_t *) pbr;
nsf(Movie_table_t) mov;

nsf(Movie_start_as_root(B));
br_ref = nsf(BookReader_create(B, 10));
cameo_ref = nsf(Rapunzel_create(B, 22));
ut = nsf(Character_as_Rapunzel(cameo_ref));
nsf(Movie_main_character_Rapunzel_create(B, 19));
nsf(Movie_cameo_Rapunzel_add(B, cameo_ref));
attacker_ref = nsf(Attacker_create(B, 42));
nsf(Movie_antagonist_MuLan_add(B, attacker_ref));
nsf(Movie_side_kick_Other_create_str(B, "Nemo"));
nsf(Movie_characters_start(B));
nsf(Movie_characters_push(B, ut));
nsf(Movie_characters_MuLan_push(B, attacker_ref));
nsf(Movie_characters_MuLan_push_create(B, 1));
nsf(Character_vec_push(B, nsf(Character_as_Other(nsc(string_create_str(B, "other"))))));
nsf(Movie_characters_Belle_push(B, br_ref));
pbr = nsf(Movie_characters_Belle_push_start(B));
pbr->books_read = 3;
nsf(Movie_characters_Belle_push_end(B));
nsf(Movie_characters_Belle_push(B, nsf(BookReader_create(B, 1))));
nsf(Movie_characters_Belle_push_create(B, 2));
nsf(Movie_characters_Other_push(B, nsc(string_create_str(B, "another"))));
nsf(Movie_characters_Other_push_create_str(B, "yet another"));
nsf(Movie_characters_end(B));
nsf(Movie_end_as_root(B));

Note that reading a union of string type requires a cast which can be seen in the full test case in monster_test.c.

Error Handling

The API generally expects all error codes to be checked but the following table and vector operations will accept and return an error:

  • add null reference to table, vector, or string.
  • push null reference to table or string.
  • buffer_end/create null reference to root.

This can simplify pushing or adding atomically created objects, for example by adding a cloned vector to table field.

It is especially important to check start operations because the builder will not be in the expected stack frame context after failure and will not have reserved necessary internal memory, for example when adding a table field.

On a server with reasonable amount of memory using the default allocator, and with an emitter that will not return errors, and when it can be expected that inputs will not exceed the size contraints of the flatbuffer data types, and if the api is being used correctly, then there are no reason for failure and error handling may be skipped. However, it is sometimes desireable for servers to restrict a single clients memory usage, and then errors are very likely unless the source data is already limited. As an opposite example, an embedded device sending small network packages using a fixed but large enough allocation pool, would be in total control and need not be concerned with any errors.

Type System Overview

The generated methods for building buffers may look the same but have different semantics. For example _clone on a table field such as Monster_enemy_clone will actually create a table based on the content of a table in a another buffer, then add that table to the currently open table. But Monster_clone will create clone and just return a reference without adding the reference to any table. There is also push_clone which adds an element to an open vector. The same applies to many other operations.

Basically there are the following different types of methods:

  • Methods on native flatbuffer types, such as flatbuffer_string_start.
  • Methods on generated types types such as Monster_start
  • Methods on field members such as as Monster_emeny_start
  • Methods on vectors on vectors of the above such as flatbuffers_string_vec_start, Monster_vec_start. Monster_inventory_vec_start.
  • Slight adaptions for buffer roots and nested buffer roots.

For unions and union vectors the story is more complex - and the api might need to be cleaned up further, but generally there are both union type fields, union value fields, and union fields representing both, and vectors of the same. In additional there are pseudo fields for each union member because create on a union does not make sense, but Monster_myvariant_MyTable_create does create and MyTable table and assigns it with the correct type to the field Monster_myvariant_type and `Monster_myvariant.

Cloning

As of flatcc v0.5.2 it is also possible to clone tables, unions, vectors of tables, vectors of strings, and vectors of unions. Previously many operations did have a clone or a push_clone operator, but these were all raw byte copies. Table cloning and union cloning is signficantly more involved as it a simple copy will not work due to stored references, possible sharing of references and because the required alignment of table is hard to reason about without building a new table. Unions and union vectors are even more difficult.

That said, cloning is now implemented for all relevant data types.

All clone operations expect the content to originate from another finalized buffer. For scalars and structs there are copy operations that are almost the same as clone - they both avoid endian conversion.

Structs have a special case with clone and copy: Whenever a struct is stored inline in the desitination buffer, it behaves like copy. Whenever the destination is a buffer root, or a union member, the result is a reference to an independent memory block. When calling clone on a struct type the destination is unknown and a indendpendent reference is created. If this is not the intention a copy operation can be used. When used field methods the destination type is known at the right thing will happen.

Cloning a table will, by default, expand any shared references in the source into separate copies. This is also true when cloning string vectors, or any other data that holds references. Worst case this can blow up memory (which is also true when printing JSON from a buffer).

It is possible to preserve the exact DAG structure when cloning. It may not worthwhile for simple use cases but it goes as follows:

The builder has a pointer to a flatcc_refmap_t object. This is a fairly small stack allocated object that implements a hashtable. By default this pointer is null, and we have the above mentioned expansion. If it is not null, each newly cloned object will have its reference stored in the refmap. The next time the same object is cloned, the existing reference will be taken from the refmap instead. See source comments in flatcc_refmap.h and flatcc_builder.h, and monster_test.c clone tests.

Note that, for example, it might be relevant to preserve DAG structure when cloning one object with all its sub-objects, but if it is cloned a second time, a new copy is desired still while preseving the inner DAG structure. This can be done by working with multiple refmaps and simple swapping them out via flatcc_builder_set_refmap. It is also possible to add references manually to a refmap before cloning.

Warning: the refmap MUST not hold any foreign references when starting a nested root clone or when cloning inside a nested buffer that has been started but not ended because it is invalid to share references between buffers and there are no safety checks for this.

Picking

Picking is a method that is related to clone and also introduced with flatcc 0.5.2. A pick method is only defined on a table field or a struct field. Instead of taking an a read reference of same type as the field, it takes a reference to to the same container type (table or struct). Essentially pick means: find myself in the other table, clone me, and and me to the new table which is currently open. So clone takes an entire table where pick takes a single field. Table cloning is implemented as a sequence of pick method, one for each field as can be seen in the generated builder source. A pick operation does nothting if the field is not set. Pick also works with refmaps because it does an internal clone operation. In the generated code, only clone on types will use the refmap but other clone and pick operations do depend on these type clone methods.

Sorting Vectors

Vectors can be sorted, but not by the primary builder interface because:

String and table elements cannot be accessed after they have been emitted. The emitter can do all sorts of async operations other than actually building a buffer, for example encrypting blocks and / or send partial buffers over the network. Scalars could be sorted, but the most efficient way of emitting vectors does not create a temporary vector but emits the source directly when endianess allows for it. Less significant, the buffer producer is likely busy processing content and / or on a resource constrained device. Altogether, it is much simpler to not support sorting at this interface level.

To understand how sorting is implemented, lets first look at how an already sorted vector can be searched:

Every vector of string, scalar and enum element types have a find operation in the reader interface that performs a binary seach. Every vector of table and struct elements have a find_by_<field_name> iff there is a key attribute on at least one top-level scalar, enum or string field type. FlatBuffers do not officially allow for multiple key attributes, but if enabled, there will by a find_by operation for every keyed element field. In addition there is a find operation that maps to the first keyed field.

The read interface returns a vector type, which is a const pointer, when accessing a table field of vector type. The find operation takes such a vector as first argument, and a key as second. Strings have variations to allow for keys with a given length (similar to strcmp vs strncmp).

This leads us to the sort interface:

Every find and find_by operation has a matching sort and sort_by operation table and struct vectors maps sort to the first keyed sort_by operation. The sort operation takes a non-const vector which has the type name suffix _mutable_vec_t. These vectors are not available via the reader interface and must be cast explicitly from _vec_t to _mutable_vec_t. When this is done, the vector can be sorted in-place in the buffer without any memory allocation and without any recursion.

If the namespace is flatbuffers, a string vector is sorted by:

flatbuffers_string_vec_t vec;
vec = ...;
`flatbuffers_string_vec_sort((flatbuffers_string_mutable_vec_t)vec)`

Scalar and enum vectors have similar inline sort operations, for example:

flatbuffers_uint8_vec_sort(flatbuffer_uint8_mutable_vec_t vec);

For vectors of tables or structs the sort function is named by the key field. Assuming the Monster table has a key attribute on the Hp field, the following sort operation is available:

MyGame_Example_Monster_vec_t monsters;
monsters = ...;
MyGame_Example_Monster_vec_sort_by_Hp(
    (MyGame_Example_Monster_mutable_vec_t)monsters);

Note: this is the reader interface. Any kind of ref_t type used by the builder do not apply here. (Advanced: if an emitter builds a buffer, the ref type can be used to find the actual vector pointer and then it can be sorted by casting the pointer to a vector, even if the buffer isn't finished).

Multiple keys per table or struct is an optional feature. Each key will have its own sort and find function similar to the above. The first key also has the shortcut:

MyGame_Example_Monster_vec_sort(m);

The current implementation uses heap sort which is nearly as fast as quicksort and has a compact implementation that does not require recursion or external memory and is robust against DOS attacks by having worst case O(n log n). It is, however, not a stable sort. The sort assumes struct have a reasonable size so swap operations can be done efficiently. For large structs a decicated sort operation building an external index vector would be better, but this is not supported.

Note that a DAG is valid so there can be multiple vectors referring to the same table elements, and each can be sorted by a different key.

The find operations are stable meaning they always return the lowest index of any matching key or flatbuffers_not_found which is larger than any other index.

Dangers of Sorting

If a buffer was received over, say, an untrusted network the buffer should be verified before being accessed. But verification only makes it safe to read a buffer, not to modify a buffer because for example two vectors can be crafted to overlap each other without breaking any verification rules.

Thus, sorting is intended to be done shortly after the buffer is constructed while it can still be trusted.

Using find on a buffer that is supposed to be sorted, but isn't, can yield unexpected search results, but the result will always be a one element in the vector being searched, not a buffer overrun.

Scanning

Some vectors can be sorted by different keys depending on which version version of _sort_by is being used. Obviously _find_by must match the sorted key.

If we need to search for a key that is not sorted, or if we simply do not want to sort the vector, it is possible to use scanning operations instead by using _scan or _scan_by. Scanning is similar to find except that it does a linear search and it supports scanning from a given position.

More information on scanning in the README file, and in the monster_test.c test file.

Example of different interface type users

A resource constrained microcontroller is building flatbuffers from sensor data using an emitter that sends UDP packages of the flatbuffer as soon as enough data is ready. A server reassembles the packages or discards them if any UDP package was lost. One the package is assembled, the server sorts specific vectors such as temparture levels in the buffer before it sends the buffer upstream to a storage service through a TCP/IP connection. The analyzers perform taks such as detecting abnormal temparature readings based on the sorted vector data.

In the above example, the original sensor devices are not interested in the reader interface nor the sort interface. While the sort and find operations may be available, it is dead inline code that does not inflate the binary codes image size, but the libflatccrt library is linked in. The collecting server is not interested in the builder interface and does not link with the libflatccrt library but uses both the inline functions of the reader intrface and the sort interface. The upstream data storage service uses no interface at all since it treats the buffers as binary blobs in a database indexed by device and time. The end users only use the read only interface to visualize and analyze and has no need for the builder or the sort interface.

Special Emitters

An emitter only need to implement one function to replace or wrap the default emitter. See flatcc_builder.h on flatcc_builder_emit_fun for details, and also emit_test.c for a very simple custom emitter that just prints debug messages, and flatcc_emitter.h.

When adding padding flatcc_builder_padding_base is used as base in iov entries and an emitter may detect this pointer and assume the entire content is just nulls. Usually padding is of limited size by its very nature so the benefit of handling this is also limited, but it, or a similar user provided constants can be used for similar purposes:

When creating a vector in a single operation from an external C-array, no copying takes place on the internal builder stack. Therefore it is valid to provide a null pointer or a valid array such as flatcc_builder_padding_base that is is too small for the given length, provided that the emitter is aware of it. This in turn can be used to allocate space in the emitters internal datastructure so the vector can be filled after the fact if so desired. Pointer tagging may be another way to communicate special intent. Be aware that only create calls support this - any append, start/end or other dynamic operation will require valid inpout and will stack allocate temporary space.

Emitters always receive a small table of iov entries that together form a single object including necessary headers and padding, for example a vector, a string, a nested buffer header, or a vtable. This is guaranteed by the api, but there is no coordination to provide details about which call is in order to keep the interface simple and fast. If this is desired the user must hint the emitter out of band before calling the relevant build operation. This can also be one indirectly by setting user_state in the emitter and have the emitter inspect this setting.

When adding vectors piecemeal using append or similar as opposed to zero or less than zero copy approach above, the memory cost is obviously higher, but unless the individual objects grow large, the stack will operate in hot cpu cache so the bandwidth from main memory to cpu and back will not necessarily double. If the stack grows large it may also be worthwhile trimming the stack with a custom allocator and custom builder reset between buffers to reduce stack size and initialization overhead.