- Introduction
- Size Prefixed Buffers
- Namespaces
- Error Codes
- Endianess
- Buffers
- Tables
- Packing tables
- Strings
- Structs
- Nested Buffers
- Scalars and Enums
- Vectors
- Unions
- Error Handling
- Type System Overview
- Cloning
- Picking
- Sorting Vectors
- Example of different interface type users
- Special Emitters
We assume a separate read-only file and add extensions to this with support from a builder library and a builder object.
The underlying builder library supports two modes of operation that mix
together: create
which sends data directly to the target buffer
(emitter object) and a stack driven start/end
approach which allocates
objects and vectors on the stack. The code generator chooses the most
efficient approach given the circumstances.
Unlike most FlatBuffer language interfaces, tables and vectors are not created back to front: They are either created completely in one operation, or they are constructed on a stack front to back until they can be emitted. The final buffer is still constructed back to front. For big-endian platforms this may require temporary stack allocation of complete vectors where little endian platforms can emit directly.
Tables and vectors stored in other tables or vectors must be completed before the can be stored, but unlike must language interfaces they can be constructed while a parent is also being constructed as long as nesting remains balanced. While this occasionally may require more stack, it may also avoid external temporary allocation.
A builder object is required to start buffer construction. The builder must be initialized first and can be reset and reused between buffers, reusing stack allocation. The builder can have a customized emitter object but here we use the default. Finalizing the buffer depends the emitter and we can use a default finalizer only because we use the default emitter - it allocates and populates a linear buffer from a paged emitter ring buffer.
Note that in most cases flatcc_builder_finalize_buffer
is sufficient,
but to be strictly portable, use
flatcc_builder_finalize_aligned_buffer
and aligned_free
.
aligned_free
is often implemented as free
in flatcc/portable
but
not on all platforms. As of flatcc version 0.5.0
flatcc_builder_aligned_free
is provided to add robustness in case the
applications aligned_free
implementation might differ from the library
version due to changes in compile time flags.
Generally we use the monster example with various extensions, but to
show a simple complete example we use a very simple schema (myschema.fbs
):
table mytable { myfield1: int; myfield2: int; }
#include "myschema_builder.h"
void testfun() {
void *buffer;
size_t size;
flatcc_builder_t builder, *B;
mytable_table_t mt;
B = &builder;
flatcc_builder_init(B);
/* Construct a buffer specific to schema. */
mytable_create_as_root(B, 1, 2);
/* Retrieve buffer - see also `flatcc_builder_get_direct_buffer`. */
/* buffer = flatcc_builder_finalize_buffer(B, &size); */
buffer = flatcc_builder_finalize_aligned_buffer(B, &size);
/* This is read-only buffer access. */
mt = mytable_as_root(buffer);
assert(mytable_myfield1(mt) == 1);
assert(mytable_myfield2(mt) == 2);
/* free(buffer); */
flatcc_builder_aligned_free(buffer);
/*
* Reset, but keep allocated stack etc.,
* or optionally reduce memory using `flatcc_builder_custom_reset`.
*/
flatcc_builder_reset(B);
/* ... construct another a buffer */
/* Reclaim all memory. */
flatcc_builder_clear(B);
}
Note that a compiled schema generates a myschema_reader.h
file and
optionally a myschema_builder.h
and some common support files. When
building a buffer the myschema_builder.h
must be used but when only
reading then the myschema_reader.h
file should be used instead. Here
we are only concerned with building. When building, it is necessary to
link with libflatccrt.a
runtime library but when reading, all
nesessary code is contained in the generated header files.
The builder object only manages a stack of currently active objects and
does not store an object that is complete. Instead it calls an emitter
object with the partial data ready for emission, similar to a write
function. A default emitter is provided which implements a ring buffer
and the result may be written to a file, copied to a buffer or a
finalized to an allocated buffer. The builder supports these methods
directly for default emitter, and only the default emitter because
emitters are otherwise defined by only one simple emit function - see
emit_test.c
for a simple example of a custom emitter.
A custom allocator may be useful when working with small buffers in a
constrained environment - the allocator handles temporary stacks,
virtual table caches etc. but not the emitter.
The allocator and emitter interface is documented in the builder library header pflatcc_builder.h] and the default implementation in flatcc_emitter.h. The default allocator is implemented as part of the flatcc_builder source.
The builder can be reused between buffers using the reset
operation.
The default emitter can also be reused and will automaticallhy reset
when the buffer is. For custom emitters, any reset operation must be
called manually. The same applies to clear. The reset operations
maintain allocated memory by also reduce memory consumption across
multiple resets heuristically.
Buffers can be created with a size prefix of type uoffset_t
. When
doing this, the buffer is aligned relative to the size prefix such that
buffers can be stacked in a file and for example be accessed via memory
mapping.
The usual create_as_root
and start_as_root
has a variant called
create_as_root_with_size
and start_as_root_with_size
.
To read a buffer with a size prefix use:
size_t size;
buffer = flatbuffers_read_size_prefix(rawbuffer, &size);
The size the size of the buffer excluding the size prefix. When verifying buffers the buffer and size arguments should be used. See also monster_test.c for an example.
Note that the size prefix ensures internal alignment but does not guarantee that the next buffer in a file can be appended directly because the next buffers alignment is unknown and because it potentially wastes padding bytes. The buffer size at offset 0 can increased to the needed alignment as long as endianness is handled and the size of the size field is subtracted, and zeroes are appended as necesary.
The generated code is typically wrapped in a custom namespace and
functions and definitions that are library specific are usually mapped
into the namespace. We often use an empty namespace for custom types and
flatbuffers_
for library names, but usually a foo_
prefix could also
be used on both cases, where foo
is a custom namespace.
Note that the name flatcc_emitter
is only used with the default emitter
and the name [flatcc_builder] is only used for buffer management but not
for constructing content. Once a valid buffer is ready the common and
namespace (flatbuffers
) and schema specific (or empty) namespace is used
with schema specific operations.
All schema specific content is prefixed with a namespace to avoid conflicts - although the namespace is empty if the schema doesn't specify any. Note that the same schema can have multiple namespaces. An example of a namespace prefixed operation:
MyGame_Example_Monster_create_as_root(B, ... lots of args);
To simplify this we can use a macro to prefix a namespace. The use
of the name ns
is arbitrary and we can choose different names for
different namespaces.
#undef ns
#define ns(x) MyGame_Example_ ## x
But the above doesn't work with nested calls to ns such as
ns(Monster_color_add(B, ns(Color_Green));
it would have to be:
ns(Monster_color_add)(B, ns(Color_Green);
Therefore we have a helper macro the does allow nesting:
#undef ns
#define ns(x) FLATBUFFERS_WRAP_NAMESPACE(MyGame_Example, x)
The common namespace can also be wrapped for a more consistent appearance:
#undef nsc
#define nsc(x) FLATBUFFERS_WRAP_NAMESPACE(flatbuffers, x)
nsc(string_ref_t) s;
s = nsc(string_create_str(B, "hello, world!"));
instead of
flatbuffers_string_ref_t s;
s = flatbuffers_string_create_str(B, "hellow, world!);
Functions return values can be grouped roughly into 4 groups: functions
returning pointer, references, size_t
lengths, and int
status codes.
Pointers and references return 0 on error. Sizes do not return error.
Status codes return 0 on success or an error code that is usually -1.
Status codes may be checked with flatbuffers_failed(...)
.
The function flatbuffers_is_native_pe()
provide an efficient runtime
check for endianness. Since FlatBuffers are little endian, the function
returns true when the native endianness matches the protocol endianness
which for FlatBuffers is little endian. We do not hardcode little endian
because it enables us to support other protocols in the future - for
example the struct conversions may be very useful for big endian network
protocols.
As of flatcc 0.4.0 it is possible to compile flatcc with native big-endian support which has been tested on AIX. More details in README Endianness
By testing is_native_pe
dependencies on speficic compile time flags
can be avoided, and these are fragile:
During build, vectors and structs behave differently from tables: A
table updates one field at a time, doing endian conversion along the
way. A struct is either placed in a table, and is converted by the table
specific operation, or it is placed in a vector. A vector only does the
endian conversion when the vector is finished, so when a vector is not
created atomically with a single create
call, the elements are placed on a
stack. By default this is in native format, but the user may choose to
place buffer encoded structs or scalars in the vector and call
vec_end_pe
. The same push
operation can be used to place a
natively encoded struct and a buffer encoded struct in the vector
because it does no conversion at that point. Therefore there is also no
push_pe
method that would mean to push an unconverted element unto
the stack. Only for tables and entire vectors does the pe command make
sense. If a vector wishes to push a buffer encoded struct when the
vector is otherwise constructed in native encoding or vice versa, the
vector may be extended empty and then assigned using any of the
assign
, assign_from_pe
or assign_to_pe
calls.
We did not mention that a struct can also be a standalone object
as a buffer root, and for that it has a end_pe
call that essentially
works like a single element vector without a length prefix.
The clone
operation is a more userfriendly pe
operation which takes
an object or a vector from an existing buffer and places it in a new
buffer without endian conversion.
NOTE: FLATBUFFERS_LITTLEENDIAN
is deprecated and will be removed in
a future version. It just complicates endina handling.
The header files tries to define FLATBUFFERS_LITTLEENDIAN
to 0 or 1
based on system definitions but otherwise leaves the flag undefined.
Simply testing for
#if FLATBUFFERS_LITTLEENDIAN
...
#endif
will not fail if the endianness is undetected but rather give the
impression that the system is big endian, which is not necessarily true.
The flatbuffers_is_native_pe()
relates to the detected or system
provided conversion functions if a suitable endian.h
file after the
header file gave up on its own detection (e.g. le16toh(1) == 1
).
Therefore, it is better to use flatbuffers_is_native_pe()
in most
cases. It also avoids making assumptions on whether the protocol is
little or big endian.
A buffer can most simply be created with the create_as_root
call for
a table or a struct as seen ealier. The as_root
part is just a thin
wrapper around buffer start and stop calls and using these allows for
more flexibility. the as_root
also automatically uses the defined file
identifier if any.
The build process begins with starting a buffer. The buffer may contain a struct or table, so one of these should be constructed subsequently. Structs are generally created inline in tables, only at the buffer level is a struct created independently. The api actually permits other formats, but it will not be valid flatbuffers then.
flatcc_builder_ref_t root;
flatcc_builder_init(B);
/* 0 indicates no file identifier. */
flatcc_builder_buffer_start(B, 0);
root = /* ... construct a table or a struct */
flatcc_builder_buffer_end(B, root);
buffer_start
takes a file identifier as second argument. If null or a
string with null characters, the identifier is not stored in the buffer.
Regardless of whether a struct or table is declared as root in the schema or
not, there are methods to automatically start both the buffer and struct or buffer
and table such as Monster_start/end_as_root
. This is also valid for
nested buffers. If the schema has a file identifier, it is used as
identifier for the created object. The alternative
create_as_root_with_identifier
allows for explicitly setting an id or
explicitly dropping an id by providing a null argument. The
corresponding reader function Monster_as_root(buffer)
also has a
Monster_as_root_with_identifier(buffer, id)
. Here the id is ignored if the id
is null, and otherwise the operation returns null if the id does not match.
For the most part ids are handled transparently by these defaults.
The buffer can be started with block alignment and/or a custom
identifier using the flatcc_builder_buffer_start_aligned
:
flatcc_builder_buffer_start_aligned(B, "myid", 16);
...
flatcc_builder_buffer_end(B, root);
The alignment can be 0 using the minimum required alignment, which is
derived from the operations between start/end
. The alignment argument
is called block_align
and is useful if the emitter operates on blocks
such as encryption, cache line isolation, or compression blocks where
the final buffer should align with the blocks used during construction.
This can lead to significant zero padding just after the block header,
depending on block size.
The schema specified identifier is given as:
flatbuffers_identifier
and defaults to null. The schema specified extension is given as:
flatbuffers_extension
and defaults to null. Note that flatbuffers_
is replaced by whatever
namespace is chosen. Each specific schema type also has a named file
exntension reflection the extension active when the type was defined,
for example:
MyGame_Example_Monster_file_identifier
This define is used when create_as_root
automatically sets a file
identifier.
NOTE: before flatcc 0.6.1, the identifier was named
MyGame_Example_Monster_identifier (DEPRECATED)
but that would conflict with a table field named identifier
which
happened often enough to be a problem. This naming is now removed on
conflict and will be completely removed in a future version.
When the buffer is ended, nothing special happens but only at this point
does it really makes sense to access the resulting buffer. The default
emitter provides a copy method and a direct buffer access method. These
are made available in the builder interface and will return null for
other emitters. See also flatcc_builder.h and the default emitter in
flatcc_emitter.h
.
If Monster
is a table, we can create a Monster buffer (after
builder init) as follows:
Monster_start(B);
Monster_Hp_add(B, 80);
...
flatcc_builder_buffer_create(B, Monster_end(B));
All scalar and enums are added similar to the Monster_add_Hp
call. We
will subsequently see how to deal with other types.
A table can also be created in a single operation using create
:
Monster_ref_t m;
m = Monster_create(B, 80, ...);
The create arguments are those taken by the individual fields add
operations which is either an scalar, enum, or a reference returned by
another create or end call. Note that unlike the C++ interface, unions
only take a single argument that is also accepted by the add
operation
of a union field. Deprecated fields are not included in the argument
list.
As of v0.5.3 the arguments are given in field id order which is usually the same as the schema listed order, except with id attributes are given explicitly. Using id order ensures version stability. Note that since deprecated fields are omitted, deprecated fields can still break existing code.
BREAKING: Prior to flatcc v0.5.3 the create call would use the schema order
also when fields have id attributes specifying a different order. This
could break code across versions and did not match the C++ behavior.
It was also document that the original_order
attribute affected create
argument order, but that was incorrect.
NOTE: If the original_order
attribute is set on a table, the create
implementation adds fields to the table in schema listed order,
otherwise it adds fields in order of decreasing size to reduce alignment
overhead. Generally there should be no need to use the original_order
attribute. This doesn't affect the call argument order although that
was incorrectly document prior to v 0.5.3.
NOTE: the create
and create_as_root
operations are not guaranteed to
be available when the number of fields is sufficiently large because it
might break some compilers. Currently there are no such restrictions.
Scalars and enums do not store the value if it it matches the default
value which is by default 0 and otherwise defined in the schema. To
override this behavior, use force_add
. In the monster example, health
points default to 100 (percent), so if we wish to force store it in the
buffer we could use:
Monster_hp_force_add(B, 100);
Only scalar fields and enums have a force_add
operation since only these
types have a default value, and other types have a meaningful
interpretation of null. (It is not quite clear if empty tables separate
from null/absent are valid in all implementations).
force_add
may be useful when roundtripping data from a database where it is
relevant to distinguish between any valid value and null. Most readers will not
be able to tell the difference, but it is possible to inspect a flatbuffer to
see if a table field is present, present and default, or absent, meaning null.
NOTE: As of mid 2020, FlatBuffers added optional scalar table fields with support in flatcc 0.6.1. These fields automatically imply force_add
to represent null values when a field is absent and therefore these fields do not have a force_add
method and these fields also do not have a default value other than null
, i.e. null if not added.
If Monster is declared as root, the above may also be called as:
Monster_start_as_root(B);
Monster_add_hp(B, 80);
...
Monster_end_as_root(B);
(Calling Monster_end
instead would require buffer_end
call
subsequently, and is basically a violation of nesting).
Tables can be nested, for example the Mini field may have type Monster table again (a recursive type):
buffer_start(B);
Monster_start(B);
Monster_add_Hp(B, 80);
Monster_start(B);
Monster_hp_add(B, 81);
...
Monster_mini_add(Monster_end(B));
...
flatcc_builder_buffer_end(B, Monster_end(B));
The child Monster table may be created before the parent or as above between the tables start and end. If created before, reference must be stored until it can be added. The only requirement is that start and end are balanced, that the sub-table is ended before the parent, and that both are created in the same buffer (nested buffers can be created while the parent buffer is still being created, similar to sub-tables, so it is possible to mess this up):
Monster_ref_t root, mini;
buffer_start(B);
Monster_start(B);
Monster_hp_add(B, 81);
mini = Monster_end(B);
Monster_start(B);
Monster_hp_add(B, 80);
Monster_mini_add(B, mini);
root = Monster_end(B);
flatcc_builder_buffer_end(B, root)
Rather than adding a child table explicitly, it can be started and ended
as an operation on the field name, here with Monster_Mini_start/end
:
Monster_ref_t root;
Monster_start(B);
Monster_add_Hp(B, 80);
Monster_mini_start(B);
Monster_hp_add(B, 81);
Monster_mini_end(B);
root = Monster_end(B);
flatcc_builder_buffer_end(B, root);
We can repeat the the table nesting as deep as we like, provided our builder is willing to allocate enough stack space.
Warning: It is possible to use the wrong table type operations
between start/end
- don't do that. It is a tradeoff between usability
and type safety.
Note that vectors, strings and structs map several standard operations
to a field name, for example mytable_myfield_push(B, x)
. This is not the
case with table fields which only map start/end/create
in part because it
would never terminate for recursive types and in part because each table
is different making a generic mapping rather complex and with very long
names.
A table may be created with a constructor, but it requires all non-scalar objects to be references or pointers. Struct fields must be pointers to zero padded structs, and strings, vectors and tables must be references. The constructors are probably most useful for simple tables with mostly scalar values (here we use the original Monster fields and leaves out any we have invented for the sake of illustration):
IMPORTANT: objects can generally only be created within a buffer
context, i.e. after buffer_start
. For example calling
flatbuffers_uint8_vec_create
before Monster_create_as_root
technically violates this rule because the create call also starts the
buffer. It is, however, allowed at the top level. For nested buffers
(see later) this must be avoided because the vector would end up in the
wrong buffer.
Monster_ref_t m;
uint8_t invdata[4] = { 1, 2, 3, 4 };
Vec3_t vec;
flatbuffers_uint8_vec_ref_t inventory =
flatbuffers_uint8_vec_create(B, invdata, 4);
m = Monster_create(B, &vec, 150, 80, name, inventory,
Color_Red, Any_as_NONE());
flatcc_builder_buffer_create(m);
or
Monster_create_as_root(B, &vec, 150, 80, name, inventory,
Color_Red, Any_as_NONE());
By reordering the fields, the table may be packed better, or be better
able to reuse an existing vtable. The create
call already does this
unless the attribute original_order
has been set. Unions present a
special problem since it is two fields treated as one and the type field
will generally waste padding space if stored in order:
To help pack unions better these can be added with the type
seperate from the value reference using add_type(B, test.type)
,
add_value(B, test)
where the value is only added if the type is
not NONE
. The add_type
should be called last since it is the
smallest type.
The same field should not be added more than at most once. Internal reservations that track offset fields may overflow otherwise. An assertion will fail in debug builds.
Required table fields will be asserted in debug builds as part of the
end/create
call. Only offset fields can have a required attribute.
The generated monster_test_reader.h
from monster_test.fbs shows how
the default packing takes place in generated create
calls, see for
example the typealias test. Note that for example vectors are stored
together with integers like uint32
because references to vectors have
the same size as uint32
.
Strings can be added to tables with zero terminated strings as source
Monster_start(B);
...
Monster_name_create_str(B, "Mega Monster");
Monster_end(B);
or strings potententially containing zeroes:
#define MONSTER "Mega\0Monster"
Monster_start(B);
...
/* Includes embedded zero. */
Monster_name_create(B, MONSTER, sizeof(MONSTER));
Monster_end(B);
or zero terminated source up to at most max_len
characters.
#define MONSTER "Mega\0Monster"
Monster_start(B);
...
/* "Mega" */
Monster_name_create_strn(B, MONSTER, 12);
Monster_end(B);
The create_str
and create_strn
versions finds the string length via strlen
and strnlen respectively. append_string
also has _str/_strn
versions.
A string can also be created from an existing flatbuffer string in which case the length is expected to be stored 4 bytes before the pointer in little endian format, and aligned properly:
Monster_name_clone(B, mybufferstring);
or, create a string at most 4 characters long starting at 0-based index 10, if present:
Monster_name_slice(B, mybufferstring, 10, 4);
If index or index + len goes beyond the source, the result is truncated accordingly, possibly resulting in an empty string.
A string can also be create independently. The above is just shortcuts for that:
flatbuffers_string_ref_t monster_name;
monster_name = flatbuffers_string_create_str("Mega Monster");
Monster_name_add(B, monster_name);
Strings are generally expected to be utf-8, but any binary data will be stored. Zero termination or embedded control codes are includes as is. The string gets a final zero temination regardless, not counted in the string length (in compliance with the FlatBuffers format).
A string can also be constructed from a more elaborate sequence of operations. A string can be extended, appended to, or truncated and reappended to, but it cannot be edited after other calls including calls to update the same string. This may be useful if stripping escape codes or parsed delimiters, etc., but here we just create the same "Mega Monster" string in a more convoluted way:
flatbuffers_string_ref_t name;
char *s;
#define N 20
Monster_start(B);
...
flatbuffers_string_start(B);
flatbuffers_string_append(B, "Mega", 4);
flatbuffers_string_append(B, " ", 1);
s = flatbuffers_string_extend(B, N);
strncpy(s, "Monster", N);
flatbuffers_string_truncate(B, N - strlen(s));
name = flatbuffers_string_end(B);
Monster_name_add(B, name);
...
Monster_end(B);
flatbuffers_string_create...
calls are also available when creating
the string separate from adding it to a table, for example:
flatbuffers_string_h name;
name = flatbuffers_string_create_str(B, "Mini Monster");
It is guaranteed that any returned the string buffer is zero filled and has an extra zero after the requested length such that strlen can be called on the content, but only the requested bytes may be updated.
Every call only returns the substring being added to the string in that
operation. It is also possible to call flatbuffers_string_edit
to get a
modifiable pointer to the start of the string.
flatbuffers_string_reserved_len(B)
returns the current string length
including any embedded zeroes, but excluding final zero termination. It
is only valid until string_end
is called.
See flatcc_builder.h for detailed documentation. Essentially extend
reserves zeroed space on the stack and returns a buffer to the new
space, and truncate reduces the overall size again, and the string is
then given the final length and a zero termination at the end.
There is no endian conversion (except internally for the string length), because UTF-8 strings are not sensitive to endianness.
Like tables, the string may be created while a parent container is being constructed, or before.
Strings can also be used as vector elements, but we will get that when discussing vectors.
Structs in tables can be added as:
Monster_pos_create(B, 1, 2, 3);
The above essentially does the following:
Vec3_t *v;
v = Monster_pos_start(B);
Vec3_assign(v, 1, 2, -3.2);
Monster_pos_end(B);
Some versions of the monster schema has extra test fields - these would break the assign approach above because there would be extra arguments. Instead we can rely on the zero intialization and assign known fields.
Vec3_t *v;
v = Monster_pos_start(B);
v->x = 1, v->y = 2, v->z = -3.2;
Monster_pos_end(B);
Monster_pos_end_pe(B)
can be used when the struct is known to be
little endian (pe for protocol endian, meaning no conversion is necessary),
for example copied from an existing buffer, but then clone
is a better
choice:
Monster_pos_clone(B, &v);
When the struct is created alone for use as root:
Vec3_ref_t root;
root = Vec3_create(B, 1, 2, 3)
flatcc_builder_buffer_create(B, root);
An existing struct can be added as:
Vec3_t v;
Vec3_assign(&v, 1, 2, 3);
/* v does not have to be zero padded. */
Monster_pos_add(B, &v);
When adding a struct that is already little endian, presumably from an existing buffer, it can be cloned using:
Monster_pos_clone(B, &v);
Clone assumes the source struct is both little endian and that padding
is already zeroed (example ignores error handling), and end_pe
does nothing.
*Monster_pos_start(B) = v;
Monster_pos_end_pe(B);
There are several assignment types that convert between host (native)
endianness and buffer endiannes. We use pe
to indicate
protocol_endian
rather than just le
for little endian
because it
allows us to change endianness to big endian in the the future and it
more clearly states the intention. While big endian is not allowed in
FlatBuffers, big endian structs may be useful in other network
protocols - but it is not currently supported because it would force
little endian platforms to support byte-swapping. The operations are:
assign_from_pe
, assign_to_pe
, copy
, copy_from_pe
,
copy_to_pe
, to_pe
and from_pe
.
All the copy operations takes a const pointer as source, and
to/from_pe
is just copy with same source and destination:
Vec3_t v, v2;
Vec3_assign_to_pe(&v2, 1, 2, 3);
Vec3_copy_from_pe(Vec3_clear(&v), &v2);
Vec3_to_pe(&v);
from_pe
means from little endian to native endian, end to_pe
is the opposite. On little endian platforms all copy operations behave
the same and only move fields, not padding. to/from_pe
conversion
will leave deprecated fields either as they were, or zero them because
the operation may be skipped entirely on protocol endian native platforms.
While struct fields cannot be deprecated officially, they are supported if the schema compiler is flagged to accept then. The struct fields are renamed and assigned 0 when using assign or copy, and assign / create has no argument for them.
Because padding can carry noise and unintended information, structs should be cleared before assignment - but if used as a source to copy the padding is not copied so only the destation need to be zeroed.
If a struct is nested, the assign operation includes all fields as if the struct was flattened:
typedef struct Plane Plane_t;
struct Plane {
Vec3_t direction;
Vec3_t normal;
};
Plane_t plane;
Plane_clear(&plane);
Plane_assign(&plane, 1, 2, 3, 7, 8, 9);
Structs can also be created standalone, similar to tables and vectors, but FlatBuffers only support this when the struct is used as root.
Assuming Vec3 is declared as root, a buffer only holding a Vec3 struct can be created using:
Vec3_create_as_root(B, 1, 2, 3);
Important: do not store the above as a nested buffer - it would be
missing the vector size field. If Monster_playground
is a ubyte vector
with nested_flatbuffer
attribute, then
Monster_playground_start/end_as_root
may be used.
Structs also support start/end_as_root
. In this case start
returns
the struct pointer, and end_pe_as_root
is supported:
Vec3_t *v;
v = Vec3_start_as_root(B);
v->x = 1, v->y = 2, v->z = 3;
Vec3_end_as_root(B);
(Be careful with the different result codes since a tables start_as_root
returns an integer result code where 0 is success while a struct returns
a pointer that is null on failure.)
The following also creates a buffer at top-level, but it may also be added as a nested buffer because the stack frame detects the nesting:
Vec3_t *v;
flatcc_builder_buffer_start(B);
v = Vec3_start(B);
v->x = 1, v->y = 2, v->z = 3;
flatcc_builder_buffer_end(B, Vec3_end(B));
or flatcc_builder_buffer_start(B); ... Monster_start(B); flatcc_builder_buffer_start(B); v = Vec3_start(B); v->x = 1, v->y = 2, v->z = 3; Monster_playground_add(B, flatcc_builder_buffer_end(B, Vec3_end(B))); flatcc_builder_buffer_end(B, Monster_end(B));
or
flatcc_builder_buffer_ref_t nested_root;
flatcc_builder_buffer_start(B);
nested_root = Vec3_create_as_root(B, 1, 2, 3);
Monster_start(B);
Monster_playground_add(B, nested_root);
flatcc_builder_buffer_end(B, Monster_end(B));
A buffer_ref_t
can be used as uint8_vec_ref_t
when the
buffer is nested, and otherwise the reference cannot be used
for anything other than testing for failure. The buffer content
should match the type declared in a nested_flatbuffers
attribute
but it isn't enforced, and a root can be stored in any field of
[ubyte] type.
When Monster_playground
is declared as nested:
...
Monster_start(B);
Monster_playground_create_as_root(B, 1, 2, 3);
flatcc_builder_buffer_end(B, Monster_end(B));
...
Be aware that Vec3_t
is for native updates while Vec3_struct_t
is a const
pointer to an endian encoded struct used in the reader interface, and actually
also as source type in the clone operation.
As of flatcc 0.6.0 it is possible to have fixed length arrays as structs
members. A fixed length array is equivalent to having a struct field repeated
one or more times. The schema syntax is name : [type:count];
similar to an
ordinary struct field name : type;
. The type is any type that can ba valid
struct field type including enums and nested structs. The size cannot be 0 and
the overall size is limited by the maximum struct size the array is contained
within which is typically 65535 (2^16-1).
For example, given the schema:
struct MyStruct {
counters:[int:3];
// char is only valid as a fixed length array type
name:[char:6];
}
table MyTable {
mystruct:MyStruct;
}
The table can be created with:
ns(MyStruct_t) *x;
ns(MyTable_start_as_root(B));
x = ns(MyTable_mystruct_start(B));
x->counters[0] = 1;
x->counters[1] = 2;
x->counters[2] = 3;
strncpy(x->name, "Kermit", sizeof(x->name));
ns(MyTable_mystruct_end(B));
ns(MyTable_end_as_root(B));
Note that char arrays are not zero terminated but they are zero padded, so strncpy is exactly the right operation to use when assigning to char arrays, at least when they do not contain embedded nulls which is valid. Char arrays are expected to be ASCII or UTF-8, but an application may use other encodings if this is clear to all users.
With assignment:
int data[3] = { 1, 2, 3 };
ns(MyStruct_t) *x;
ns(MyTable_start_as_root(B));
x = ns(MyTable_mystruct_start(B));
// Careful: the name argument does not use strncpy internally
// so the source must be at least the expected length
// like other array arguments. Strings can have embedded nulls.
ns(MyStruct_assign(x, data, "Kermit");
ns(MyTable_mystruct_end(B));
ns(MyTable_end_as_root(B));
To read a struct the pointer to the struct is retrieved first
int sum;
int i;
const char *name;
size_t name_len;
ns(MyTable_table_t) t;
ns(MyStruct_struct_t) x;
t = ns(MyTable_as_root(buf));
x = ns(MyTable_mystruct_get(t));
for (sum = 0, i = 0; i < ns(MyStruct_counters_get_len()); ++i) {
sum += ns(MyStruct_counters_get(x, i)) +
// char arrays are endian neutral, so we can use pointer access.
name = ns(MyStruct_name_get_ptr(x);
name_len = strnlen(name, ns(MyStruct_name_get_len()));
printf("Added counters from %.*s", name_len, name);
// char arrays can be accessed like other arrays:
// ns(MyStruct_name_get(x, i);
}
An alternative to strnlen
is strip trailing zeroes which will allow for
char arrays embedded zeroes, but there is no direct support for this. The JSON
printer uses this approach to shorten the printed char array string.
The _get
suffix can be ommitted in the above if the flatcc -g
has not
supplied to reduce the risk of name conflicts, but not for _get_len
and
_get_ptr
.
Note that it is not possible to have fixed length arrays as part of a table but it is possible to wrap such data in a struct, and it is also possible to have vectors of structs that contain fixed length arrays.
These are discussed under Structs and Table sections but it is worth
noting that a nested buffers can also be added as pe ubyte vectors
which is probably the original intention with nested buffers. However,
when doing so it can be difficult to ensure the buffer is correctly
aligned. The untyped flatcc_builder
has various options to deal with
this, but with generated code it is better to create a nested buffer
inline when suitable (with nested buffer_start/end
or
mytable_myfield_create_as_root
) - for example a message wrapper with
a union of tables holding buffer for a specific message type. In other
cases the buffer may truly be created independently of the current
buffer and then it can be added with controlled alignment using either
the flatcc_builder
api for full control, or the nest
operation on
nested table and struct fields:
To create and add a ubyte vector with a higher alignment than ubytes single byte alignment, the following operation is available as an operation on a nested buffer field:
Monster_playground_nest(B, void *data, size_t size, uint16_t align);
If alignment is unknown, it can be set to 0, and it will default to 8 for nested table types, and to the struct alignment for struct buffers.
Block alignment is inherited from the parent buffer so the child buffer
ends up in its own set of blocks, if block alignment is being used. If
the nested buffer needs a different block alignment, the flatcc_builder
api must be used.
All structs and tables have an start/end/create_as_root
even if they
are not referenced by any nested_flatbuffers
field and they will
create [ubyte] vectors containing a nested buffer but only [ubyte]
fields with nested_flatbuffers
attribute will dedicated
start/end/create_as_root
on the field name. Structs also have
end_pe_as_root
.
Scalars keep their original type names uint8_t
, double
, etc, but
they get some operations similar to structs. These are contained in a
namespace which by default is flatbuffers_
, for example:
uint16_t *flatbuffers_uint16_to_pe(uint16_t *p);
uint16_t *flatbuffers_uint16_from_pe(uint16_t *p);
flatbuffers_bool_t *flatbuffers_bool_to_pe(flatbuffers_bool_t *p);
flatbuffers_bool_t *flatbuffers_bool_from_pe(flatbuffers_bool_t *p);
These may be used freely, but are primarily present as an interface to the vector operations also defined for structs.
Enums have similar definitions which may be used to convert endianness without being concerned with the underlying integer type, for example:
Color_enum_t *Color_to_pe(Color_enum_t *p);
Vectors can be created independently, or directly when updating a table - the end result is the same. Builder vector operations always reference element values by pointer, or by reference for offset types like tables and strings.
uint8_t v;
Monster_inventory_start(B);
v = 1;
flatbuffers_uint8_vec_push(B, &v);
v = 2;
flatbuffers_uint8_vec_push(B, &v);
v = 3;
flatbuffers_uint8_vec_push(B, &v);
Monster_inventory_end(B);
or
flatbuffers_uint8_vec_ref_t inv;
uint8_t v;
flatbuffers_uint8_vec_start(B);
v = 1;
flatbuffers_uint8_vec_push(B, &v);
v = 2;
flatbuffers_uint8_vec_push(B, &v);
v = 3;
flatbuffers_uint8_vec_push(B, &v);
inv = flatbuffers_uint8_vec_end(B);
Monster_inventory_add(B, inv);
Because it can be tedious and error-prone to recall the exact field type, and because the operations are not type safe (any kind of push would be accepted), some vector operations are also mapped to the field name:
uint8_t v;
Monster_inventory_start(B);
v = 1;
Monster_inventory_push(B, &v);
v = 2;
Monster_inventory_push(B, &v);
v = 3;
Monster_inventory_push(B, &v);
Monster_inventory_end(B);
Note: vector operations on a type uses the _vec_<operation>
syntax, for
example uint8_vec_push
or Monster_vec_push
while operations that are mapped
onto table field names of vector type do not use the _vec
infix because it is
not a type name, for example Monster_inventory_push
.
A slightly faster operation preallocates the vector:
uint8_t *v;
Monster_inventory_start(B);
v = Monster_inventory_extend(B, 3);
v[0] = 1, v[1] = 2, v[2] = 3;
v = Monster_inventory_extend(B, 2);
v[0] = 4, v[1] = 5;
Monster_inventory_end(B);
Push just extends one element at time. Note that extend
returns the
pointer to the extended vector segment. The full vector can be accessed
with edit
and reserved_len
between start/end
(recalling that pointers
cannot be reused across buffer calls):
uint8_t *v, i;
uint8_t data[] = { 1, 2 };
Monster_inventory_start(B);
Monster_inventory_push(B, &data[0]);
Monster_inventory_push(B, &data[1]);
v = Monster_inventory_edit(B);
for (i = 1; i < Monster_inventory_reserved_len(B); ++i) {
v[i] = v[i - 1] + v[i];
}
Monster_inventory_end(B);
Note that the name reserved_len
is to avoid confusion with
_vec_len
read operation. It also indicates that it is not the final
size since it may change with truncate/extend
.
A vector can also contain structs. Let us extend the Monster example with a vector of positions, so we can have a breadcrumb trail:
Monster_breadcrumbs_start(B);
Vec3_vec_push_create(B, 1, 2, 3);
Vec3_vec_push_create(B, 3, 4, 5);
Monster_breadcrumbs_end(B);
or
Monster_breadcrumbs_start(B);
Monster_breadcrumbs_push_create(B, 1, 2, 3);
Monster_breadcrumbs_push_create(B, 3, 4, 5);
Monster_breadcrumbs_end(B);
or
Vec3_t *trails[2];
Monster_breadcrumbs_start(B);
trails = Monster_breadcrumbs_extend(B, 2);
Vec3_create(&trails[0], 1, 2, 3);
Vec3_create(&trails[1], 4, 5, 6);
Monster_breadcrumbs_end(B);
The vec_start/exttend/end/end_pe/create/create_pe/clone/slice
are
translated into similar calls prefixed with the field name instead of
vector
and except for start
, the calls also add the vector to the
table if successful, for example:
uint8_t data[] = { 1, 2, 3 };
Monster_inventory_create(B, data, 3);
Monster_breadcrumbs_slice(B, some_other_breadcrumbs, 0, 10);
Vector operations that are allowed between vec_start
and
vec_end(_pe)
are also mapped. These are
vec_extend/append/truncate/edit/reserved_len
, and push/push_create/push_copy
.
push_copy
ensures only valid fields are copied, not zero padding (or
the unofficial deprecated fields).
A struct push_clone
is the same as a push_copy
operation
because structs are stored inline in vectors - with the
exception of union vectors which have push_clone
that does the
right thing.
The add
call adds a vector created independently from the table field,
and this is what is going on under the surface in the other calls:
Vec3_t x;
Vec3_vec_ref_t inv;
/* Clear any padding in `x` because it is not allocated by builder. */
Vec3_assign(Vec3_clear(&x), 3, 4, 5);
Vec3_vec_start(B);
Vec3_vec_push_create(B, 1, 2, 3);
Vec3_vec_push(B, &v);
inv = Vec3_vec_end(B);
Monster_breadcrumbs_add(B, inv);
As always, a reference such as inv
may only be used at most once, and
should be used once to avoid garbage.
Note that Vec3_vec_start
would create an independent struct instead of a
vector of structs. Also note that vec_ref_t
is a builder specific
temporary type while vec_t
is intended as a const pointer to the first
element in an existing buffer in little endian encoding with a size
prefix (to be used with clone, for example).
An existing Vec3 struct can also be pushed with Vec3_push(B, &v)
. The
argument must be zero padded. Because vectors are converted at the end,
there is no push_pe
, but a struct may be in little endian using push
on all platforms if vec_end_pe
is used at the end.
A vector may also be created from an existing array:
uint8_t data[] = { 1, 2, 3 };
Monster_inventory_add(B, flatbuffers_uint8_vec_create(B, data, 3));
This also applies to arrays of structs as long as they are properly zero
padded. create_pe
is similar but does not do any endian conversion,
and is similar to clone
except there are no header prefix.
Likewise an existing vector with proper zero padding may be appended
using the extend
operation. The format must be native or little endian
depending on whether vec_end
or vec_end_pe
is called at the end.
All vectors are converted to little endian when the end
command is
called. end_pe
prevents this from happening.
clone
and slice
and can be used to copy an entire, or a partial
array from an existing buffer. The pointer must be to the first vector
element in little endian format, and it must have a size prefix and be
aligned (like any flatbuffer vector). slice
takes a base-0 index and
a vector length where the result is truncated if the source is not
large enough.
Monster_inventory_clone(B, v);
or
Monster_inventory_add(flatbuffers_int8_clone(B, v);
or
Monster_inventory_add(flatbuffers_int8_slice(B, v, 2, 4);
or
Monster_inventory_slice(B, v, 2, 4);
A vector of strings an be constructed as (friends
is a string
vector field that we just invented for the occasion):
flatbuffers_string_ref_t friend, *p;
Monster_friends_start(B);
friend = flatbuffer_string_create_str(B, "Peter Pan");
Monster_friends_push_create_str(B, "Shrek");
Monster_friends_push_create_str(B, "Pinnochio");
Monster_friends_push_create_str(B, "Pinnochio");
Monster_friends_push_create(B, "Hector", 6);
Monster_friends_push(friend);
p = Monster_friends_extend(B, 1);
*p = flatbuffers_string_create_str("Cindarella");
Monster_friends_push_start(B);
flatbuffers_string_append("The Little");
flatbuffers_string_append("Mermaid");
Monster_friends_push_end(B);
Monster_friends_end(B);
Vectors and strings have a second argument to start, see also the spawn
example
below.
Finally, vectors can contain tables. Table vectors are offset
vectors just like string vectors. push_start
pushes a new table and
allows for updates until push_end
. If we have a spawn vector of monsters in
the Monster table, we can populate it like this:
Monster_spawn_start(B);
Monster_vec_push_start(B);
Monster_Hp_add(B, 27);
Monster_vec_push_end(B);
Monster_vec_push_create(B,
/* Approximate argument list for illustration only. */
&vec, 150, 80, name, inventory, Color_Red, Any_as_None());
Monster_spawn_end(B);
The push operation has constructors push_start/end/create
for both tables
struct, and string elements. String elements also have
push_create_str/create_strn/clone/slice
. Structs also have
push_copy
. Between push_start
and
push_end
the operations valid for the given table or string element can be
used (typically add
for tables, and append
for strings).
Instead of Monster_vec_push_start
we can also uses
Monster_spawn_push_start
etc. - in this case the child type is the
same as the parent, but using the field specific push_start
ensures we
get the right table element type.
Monster_spawn_push_start(B)
takes no length argument because it is a
table element, while Monster_friends_push_start(B)
because it is a
string element (similar to a vector).
Monster_spawn_start(B)
should just be followed by push operations
rather than following up with Monster_spawn_extend(B, n)
because we
risk loose references that can lead to crashes. But handled carefully
it is possible:
Monster_vec_ref_t mvec;
Monster_spawn_start(B);
mvec = Monster_spawn_extend(B, 2);
mvec[0] = Monster_create(B, ...);
mvec[1] = Monster_create(B, ...);
Monster_spawn_end(B);
We can also push a reference to an independently create monster table, all as seen before with strings.
As of flatcc version 0.5.2 it is also possible to clone tables.
Therefore we also have push_clone
on vectors of tables.
While the use of extend
and truncate
is possible with vectors of
strings and tables, they should be used with care because the elements
are references and will just end up as garbage if truncated. On the
other hand, unused elements should be truncated as 0 elements in an
offset vector is not valid.
A vector of tables or strings can be created using an externally built array of references, for example:
Monster_ref_t monsters[20];
Monster_vec_ref_t mvec;
monsters[0] = Monster_create(B, ...);
...
mvec = Monster_vec_create(B, monsters, 20);
By convention, create calls bypass the internal stack when the endian
format is otherwise compatible, and thus feed the emitter directly.
This is not possible with table and string vectors because the
references in the source vectors must be translated into offsets.
Therefore these create calls are similar to start, append, end calls.
There is an internal, but unexposed flatcc_builder
version
create_offset_vector_direct
which destroys the source vector instead
of allocating a stack copy.
Unlike the C++ Flatbuffers library, we do not expose a separate union type field except via a small struct with a union of typed references and a type field. This struct is given to the create argument, and above it is zero initialized meaning default None.
Unions can be created with value specific start/end/create
calls. The add
call is not specialized since it takes a union reference:
Monster_test_Weapon_start(B);
Weapon_rounds_add(B, 50);
Monster_test_Weapon_end(B);
or
Monster_test_Weapon_create(B, 50);
or
Monster_test_Weapon_add(B, Weapon_create(B, 50));
or
Monster_test_Pickup_start(B);
Pickup_location_create(B, 0, 0, 17);
Pickup_hint_create_str(B, "Jump High!");
Monster_test_Pickup_end(B);
or
Pickup_ref_t test;
Pickup_start(B);
Pickup_location_create(B, 0, 0, 17);
test = Pickup_end(B);
Monster_test_add(B, Any_as_Pickup(test));
or
Any_union_ref_t test;
Pickup_start(B);
Pickup_location_create(B, 0, 0, 17);
/* test.Pickup = Pickup_end(B); no longer possible as of v0.5.0 */
test.value = Pickup_end(B); /* As of v0.5.1. */
test.type = Any_Pickup;
Monster_test_add(B, test);
The following is valid and will not return an error, but also has no effect:
Monster_test_add(B, Any_as_NONE());
Note: the union structure has been changed for v0.5.0, and v0.5.1. Both unions and union vectors are now represented by a struct with the fields { type, value } in the low level interfaces. Before 0.5.0 only unions of tables were supported.
The monster_test.fbs
schema has a field named manyany in the Monster
table. It is vector of unions of type Any.
We can create a vector using
Any_union_vec_ref_t anyvec_ref;
Any_vec_start(B);
Any_vec_push(TestSimpleTableWithEnum_create(B));
anyvec_ref = Any_vec_end(B);
Monster_manyany_add(anyvec_ref);
A union can be constructed with type specific _push
or _push_create
operations:
Monster_manyany_start(B);
Monster_manyany_push(B, Any_as_TestSimpleTableWithEnum(ref));
Monster_manyany_end(B);
Monster_manyany_start(B);
Monster_manyany_TestSimpleTableWithEnum_push(B, ref);
Monster_manyany_end(B);
Monster_manyany_start(B);
Monster_manyany_TestSimpleTableWithEnum_push_create(B, args);
Monster_manyany_end(B);
and other similar operations, much like other vectors.
Note that internally anyvec_ref
is really two references, one to type
vector and one to a table vector. The vector is constructed a single
vector of unions and later split into two before final storage. If it is
necessary to create a union vector from a vector of tables and types,
the low level builder interface has a direct
call to do this.
Union vectos generally use more temporary stack space because during construction because each element as a struct of type and reference which don't back as densely as a two separate tables. In addition the separated type and table vectors must be constructed temporarily. The finaly buffer result is resonably compatct since the type vector does not use much space. Unions will also be somewhat slower to construct, but not unreasonably so.
Note: as of v0.5.0 unions can also contain strings and structs in addition to tables. Support for these types in other languages may vary, but C++ does support them too.
All union values are stored by reference. Structs that are not unions
are stored inline in tables and cannot be shared but unions of struct
type are stored by reference and can be shared. A union value is
therefore always a reference. This is mostly transparent because the
generated table field methods has create/start/end
calls for each union
value type and addition to add
.
To illustrate the use of these variation we use the Movie table from monster_test.fbs:
namespace Fantasy;
table Attacker {
sword_attack_damage: int;
}
struct Rapunzel {
hair_length: uint16;
}
struct BookReader {
books_read: int;
}
union Character {
MuLan: Attacker = 2, // Can have name be different from type.
Rapunzel = 8, // Or just both the same, as before.
Belle: Fantasy.BookReader,
BookFan: BookReader,
Other: string,
Unused: string = 255
}
table Movie {
main_character: Character;
antagonist: Character;
side_kick: Character;
cameo: Character;
characters: [Character];
}
and the mixed type test case from monster_test.c:
nsf(Character_union_ref_t) ut;
nsf(Rapunzel_ref_t) cameo_ref;
nsf(Attacker_ref_t) attacker_ref;
nsf(BookReader_ref_t) br_ref;
nsf(BookReader_t *) pbr;
nsf(Movie_table_t) mov;
nsf(Movie_start_as_root(B));
br_ref = nsf(BookReader_create(B, 10));
cameo_ref = nsf(Rapunzel_create(B, 22));
ut = nsf(Character_as_Rapunzel(cameo_ref));
nsf(Movie_main_character_Rapunzel_create(B, 19));
nsf(Movie_cameo_Rapunzel_add(B, cameo_ref));
attacker_ref = nsf(Attacker_create(B, 42));
nsf(Movie_antagonist_MuLan_add(B, attacker_ref));
nsf(Movie_side_kick_Other_create_str(B, "Nemo"));
nsf(Movie_characters_start(B));
nsf(Movie_characters_push(B, ut));
nsf(Movie_characters_MuLan_push(B, attacker_ref));
nsf(Movie_characters_MuLan_push_create(B, 1));
nsf(Character_vec_push(B, nsf(Character_as_Other(nsc(string_create_str(B, "other"))))));
nsf(Movie_characters_Belle_push(B, br_ref));
pbr = nsf(Movie_characters_Belle_push_start(B));
pbr->books_read = 3;
nsf(Movie_characters_Belle_push_end(B));
nsf(Movie_characters_Belle_push(B, nsf(BookReader_create(B, 1))));
nsf(Movie_characters_Belle_push_create(B, 2));
nsf(Movie_characters_Other_push(B, nsc(string_create_str(B, "another"))));
nsf(Movie_characters_Other_push_create_str(B, "yet another"));
nsf(Movie_characters_end(B));
nsf(Movie_end_as_root(B));
Note that reading a union of string type requires a cast which can be seen in the full test case in monster_test.c.
The API generally expects all error codes to be checked but the following table and vector operations will accept and return an error:
add
null reference to table, vector, or string.push
null reference to table or string.buffer_end/create
null reference to root.
This can simplify pushing or adding atomically created objects, for example by adding a cloned vector to table field.
It is especially important to check start operations because the builder will not be in the expected stack frame context after failure and will not have reserved necessary internal memory, for example when adding a table field.
On a server with reasonable amount of memory using the default allocator, and with an emitter that will not return errors, and when it can be expected that inputs will not exceed the size contraints of the flatbuffer data types, and if the api is being used correctly, then there are no reason for failure and error handling may be skipped. However, it is sometimes desireable for servers to restrict a single clients memory usage, and then errors are very likely unless the source data is already limited. As an opposite example, an embedded device sending small network packages using a fixed but large enough allocation pool, would be in total control and need not be concerned with any errors.
The generated methods for building buffers may look the same but
have different semantics. For example _clone
on a table field
such as Monster_enemy_clone
will actually create a table based
on the content of a table in a another buffer, then add that
table to the currently open table. But Monster_clone
will
create clone and just return a reference without adding the
reference to any table. There is also push_clone
which adds
an element to an open vector. The same applies to many other
operations.
Basically there are the following different types of methods:
- Methods on native flatbuffer types, such as
flatbuffer_string_start
. - Methods on generated types types such as
Monster_start
- Methods on field members such as as
Monster_emeny_start
- Methods on vectors on vectors of the above such as
flatbuffers_string_vec_start
,Monster_vec_start
.Monster_inventory_vec_start
. - Slight adaptions for buffer roots and nested buffer roots.
For unions and union vectors the story is more complex - and the
api might need to be cleaned up further, but generally there are
both union type fields, union value fields, and union fields
representing both, and vectors of the same. In additional there
are pseudo fields for each union member because create
on a
union does not make sense, but
Monster_myvariant_MyTable_create
does create and MyTable
table and assigns it with the correct type to the field
Monster_myvariant_type
and `Monster_myvariant.
As of flatcc v0.5.2 it is also possible to clone tables, unions,
vectors of tables, vectors of strings, and vectors of unions.
Previously many operations did have a clone or a push_clone
operator, but these were all raw byte copies. Table cloning and
union cloning is signficantly more involved as it a simple copy
will not work due to stored references, possible sharing of
references and because the required alignment of table is hard
to reason about without building a new table. Unions and union
vectors are even more difficult.
That said, cloning is now implemented for all relevant data types.
All clone operations expect the content to originate from another finalized buffer. For scalars and structs there are copy operations that are almost the same as clone - they both avoid endian conversion.
Structs have a special case with clone and copy: Whenever a
struct is stored inline in the desitination buffer, it behaves
like copy. Whenever the destination is a buffer root, or a union
member, the result is a reference to an independent memory
block. When calling clone on a struct type the destination is
unknown and a indendpendent reference is created. If this is not
the intention a copy
operation can be used. When used field
methods the destination type is known at the right thing will
happen.
Cloning a table will, by default, expand any shared references in the source into separate copies. This is also true when cloning string vectors, or any other data that holds references. Worst case this can blow up memory (which is also true when printing JSON from a buffer).
It is possible to preserve the exact DAG structure when cloning. It may not worthwhile for simple use cases but it goes as follows:
The builder has a pointer to a flatcc_refmap_t
object. This is
a fairly small stack allocated object that implements a
hashtable. By default this pointer is null, and we have the
above mentioned expansion. If it is not null, each newly cloned
object will have its reference stored in the refmap. The next
time the same object is cloned, the existing reference will be
taken from the refmap instead. See source comments in
flatcc_refmap.h
and flatcc_builder.h
, and monster_test.c
clone tests.
Note that, for example, it might be relevant to preserve DAG
structure when cloning one object with all its sub-objects, but
if it is cloned a second time, a new copy is desired still while
preseving the inner DAG structure. This can be done by working
with multiple refmaps and simple swapping them out via
flatcc_builder_set_refmap
. It is also possible to add
references manually to a refmap before cloning.
Warning: the refmap MUST not hold any foreign references when starting a nested root clone or when cloning inside a nested buffer that has been started but not ended because it is invalid to share references between buffers and there are no safety checks for this.
Picking is a method that is related to clone and also introduced with flatcc 0.5.2. A pick method is only defined on a table field or a struct field. Instead of taking an a read reference of same type as the field, it takes a reference to to the same container type (table or struct). Essentially pick means: find myself in the other table, clone me, and and me to the new table which is currently open. So clone takes an entire table where pick takes a single field. Table cloning is implemented as a sequence of pick method, one for each field as can be seen in the generated builder source. A pick operation does nothting if the field is not set. Pick also works with refmaps because it does an internal clone operation. In the generated code, only clone on types will use the refmap but other clone and pick operations do depend on these type clone methods.
Vectors can be sorted, but not by the primary builder interface because:
String and table elements cannot be accessed after they have been emitted. The emitter can do all sorts of async operations other than actually building a buffer, for example encrypting blocks and / or send partial buffers over the network. Scalars could be sorted, but the most efficient way of emitting vectors does not create a temporary vector but emits the source directly when endianess allows for it. Less significant, the buffer producer is likely busy processing content and / or on a resource constrained device. Altogether, it is much simpler to not support sorting at this interface level.
To understand how sorting is implemented, lets first look at how an already sorted vector can be searched:
Every vector of string, scalar and enum element types have a find
operation in the reader interface that performs a binary seach. Every
vector of table and struct elements have a find_by_<field_name>
iff
there is a key attribute on at least one top-level scalar, enum or
string field type. FlatBuffers do not officially allow for multiple key
attributes, but if enabled, there will by a find_by
operation for
every keyed element field. In addition there is a find
operation that
maps to the first keyed field.
The read interface returns a vector type, which is a const pointer, when accessing a table field of vector type. The find operation takes such a vector as first argument, and a key as second. Strings have variations to allow for keys with a given length (similar to strcmp vs strncmp).
This leads us to the sort interface:
Every find
and find_by
operation has a matching sort
and sort_by
operation table and struct vectors maps sort
to the first keyed
sort_by
operation. The sort operation takes a non-const vector which
has the type name suffix _mutable_vec_t
. These
vectors are not available via the reader interface and must be cast
explicitly from _vec_t
to _mutable_vec_t
. When this is done, the
vector can be sorted in-place in the buffer without any memory
allocation and without any recursion.
If the namespace is
flatbuffers
, a string vector is sorted by:
flatbuffers_string_vec_t vec;
vec = ...;
`flatbuffers_string_vec_sort((flatbuffers_string_mutable_vec_t)vec)`
Scalar and enum vectors have similar inline sort operations, for example:
flatbuffers_uint8_vec_sort(flatbuffer_uint8_mutable_vec_t vec);
For vectors of tables or structs the sort function is named by the key
field. Assuming the Monster table has a key attribute on the Hp
field,
the following sort operation is available:
MyGame_Example_Monster_vec_t monsters;
monsters = ...;
MyGame_Example_Monster_vec_sort_by_Hp(
(MyGame_Example_Monster_mutable_vec_t)monsters);
Note: this is the reader interface. Any kind of ref_t
type used by the
builder do not apply here. (Advanced: if an emitter builds a buffer, the
ref type can be used to find the actual vector pointer and then it can
be sorted by casting the pointer to a vector, even if the buffer isn't
finished).
Multiple keys per table or struct is an optional feature. Each key will have its own sort and find function similar to the above. The first key also has the shortcut:
MyGame_Example_Monster_vec_sort(m);
The current implementation uses heap sort which is nearly as fast as quicksort and has a compact implementation that does not require recursion or external memory and is robust against DOS attacks by having worst case O(n log n). It is, however, not a stable sort. The sort assumes struct have a reasonable size so swap operations can be done efficiently. For large structs a decicated sort operation building an external index vector would be better, but this is not supported.
Note that a DAG is valid so there can be multiple vectors referring to the same table elements, and each can be sorted by a different key.
The find operations are stable meaning they always return the lowest
index of any matching key or flatbuffers_not_found
which is larger
than any other index.
If a buffer was received over, say, an untrusted network the buffer should be verified before being accessed. But verification only makes it safe to read a buffer, not to modify a buffer because for example two vectors can be crafted to overlap each other without breaking any verification rules.
Thus, sorting is intended to be done shortly after the buffer is constructed while it can still be trusted.
Using find on a buffer that is supposed to be sorted, but isn't, can yield unexpected search results, but the result will always be a one element in the vector being searched, not a buffer overrun.
Some vectors can be sorted by different keys depending on which version
version of _sort_by
is being used. Obviously _find_by
must match the
sorted key.
If we need to search for a key that is not sorted, or if we simply do
not want to sort the vector, it is possible to use scanning operations
instead by using _scan
or _scan_by
. Scanning is similar to find
except that it does a linear search and it supports scanning from a
given position.
More information on scanning in the README file, and in the monster_test.c test file.
A resource constrained microcontroller is building flatbuffers from sensor data using an emitter that sends UDP packages of the flatbuffer as soon as enough data is ready. A server reassembles the packages or discards them if any UDP package was lost. One the package is assembled, the server sorts specific vectors such as temparture levels in the buffer before it sends the buffer upstream to a storage service through a TCP/IP connection. The analyzers perform taks such as detecting abnormal temparature readings based on the sorted vector data.
In the above example, the original sensor devices are not interested in
the reader interface nor the sort interface. While the sort and find
operations may be available, it is dead inline code that does not
inflate the binary codes image size, but the libflatccrt library is
linked in. The collecting server is not interested in the builder
interface and does not link with the libflatccrt
library but uses
both the inline functions of the reader intrface and the sort interface.
The upstream data storage service uses no interface at all since it
treats the buffers as binary blobs in a database indexed by device and
time. The end users only use the read only interface to visualize and
analyze and has no need for the builder or the sort interface.
An emitter only need to implement one function to replace or wrap the
default emitter. See flatcc_builder.h on flatcc_builder_emit_fun
for
details, and also emit_test.c
for a very simple custom emitter that
just prints debug messages, and flatcc_emitter.h.
When adding padding flatcc_builder_padding_base
is used as base in iov
entries and an emitter may detect this pointer and assume the entire
content is just nulls. Usually padding is of limited size by its very
nature so the benefit of handling this is also limited, but it, or a
similar user provided constants can be used for similar purposes:
When creating a vector in a single operation from an external C-array,
no copying takes place on the internal builder stack. Therefore it is
valid to provide a null pointer or a valid array such as
flatcc_builder_padding_base
that is is too small for the given length,
provided that the emitter is aware of it. This in turn can be used to
allocate space in the emitters internal datastructure so the vector can
be filled after the fact if so desired. Pointer tagging may be another
way to communicate special intent. Be aware that only create
calls
support this - any append
, start/end
or other dynamic operation will
require valid inpout and will stack allocate temporary space.
Emitters always receive a small table of iov entries that together form
a single object including necessary headers and padding, for example a
vector, a string, a nested buffer header, or a vtable. This is
guaranteed by the api, but there is no coordination to provide details
about which call is in order to keep the interface simple and fast. If
this is desired the user must hint the emitter out of band before
calling the relevant build operation. This can also be one indirectly by
setting user_state
in the emitter and have the emitter inspect this
setting.
When adding vectors piecemeal using append
or similar as opposed to
zero or less than zero copy approach above, the memory cost is obviously
higher, but unless the individual objects grow large, the stack will
operate in hot cpu cache so the bandwidth from main memory to cpu and
back will not necessarily double. If the stack grows large it may also
be worthwhile trimming the stack with a custom allocator and custom
builder reset between buffers to reduce stack size and initialization
overhead.