Skip to content

Protobuf implementation in pure Nim that leverages the power of the macro system to not depend on any external tools

License

Notifications You must be signed in to change notification settings

PMunch/protobuf-nim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

protobuf

This is a pure Nim implementation of protobuf, meaning that it doesn't rely on the protoc compiler. The entire implementation is based on a macro that takes in either a string or a file containing the proto3 format as specified at https://developers.google.com/protocol-buffers/docs/proto3. It then produces procedures to read, write, and calculate the length of a message, along with types to hold the data in your Nim program. The data types are intended to be as close as possible to what you would normally use in Nim, making it feel very natural to use these types in your program in contrast to some protobuf implementations. Protobuf 3 however has all fields as optional fields, this means that the types generated have a little bit of special sauce going on behind the scenes. This will be explained in a later section. The entire read/write structure is built on top of the Stream interface from the streams module, meaning it can be used directly with anything that uses streams.

Example

To whet your appetite the following example shows how this protobuf macro can be used to generate the required code and read and write protobuf messages. This example can also be found in the examples folder. Note that it is also possible to read in the protobuf specification from a file.

import protobuf, streams

# Define our protobuf specification and generate Nim code to use it
const protoSpec = """
syntax = "proto3";

message ExampleMessage {
  int32 number = 1;
  string text = 2;
  SubMessage nested = 3;
  message SubMessage {
    int32 a_field = 1;
  }
}
"""
parseProto(protoSpec)

# Create our message
var msg = new ExampleMessage
msg.number = 10
msg.text = "Hello world"
msg.nested = initExampleMessage_SubMessage(aField = 100)

# Write it to a stream
var stream = newStringStream()
stream.write msg

# Read the message from the stream and output the data, if it's all present
stream.setPosition(0)
var readMsg = stream.readExampleMessage()
if readMsg.has(number, text, nested) and readMsg.nested.has(aField):
  echo readMsg.number
  echo readMsg.text
  echo readMsg.nested.aField

Generated code

Since all the code is generated from the macro on compile-time and not stored anywhere the generated code is made to be deterministic and easy to understand. If you would like to see the code however you can pass -d:echoProtobuf switch on compile-time and the macro will output the generated code.

Optional fields

As mentioned earlier protobuf 3 makes all fields optional. This means that each field can either exist or not exist in a message. In many other protobuf implementations you notice this by having to use special getter or setter procs for field access. In Nim however we have strong meta-programming powers which can hide much of this complexity for us. As can be seen in the above example it looks just like normal Nim code except from one thing, the call to has. Whenever a field is set to something it will register its presence in the object. Then when you access the field Nim will first check if it is present or not, throwing a runtime ValueError if it isn't set. If you want to remove a value already set in an object you simply call reset with the name of the field as seen in example 3. To check if a value exists or not you can call has on it as seen in the above example. Since it's a varargs call you can simply add all the fields you require in a single check. In the below sections we will have a look at what the protobuf macro outputs. Since the actual field names are hidden behind this abstraction the following sections will show what the objects "feel" like they are defined as. Notice also that since the fields don't actually have these names a regular object initialiser wouldn't work, therefore you have to use the "init" procs created as seen in the above example.

Messages

The types generated are named after the path of the message, but with dots replaced by underscores. So if the protobuf specification contains a package name it starts with that, then the name of the message. If the message is nested then the parent message is put between the package and the message. As an example we can look at a protobuf message defined like this:

syntax = "proto3"; // The only syntax supported
package = our.package;
message ExampleMessage {
    int32 simpleField = 1;
}

The type generated for this message would be named our_package_ExampleMessage. Since Nim is case and underscore insensitive you can of course write this with any style you desire, be it camel-case, snake-case, or a mix as seen above. For this specific instance the type would appear to be:

type
  our_package_ExampleMessage = ref object
    simpleField: int32

Messages also generate a reader, writer, and length procedure to read, write, and get the length of a message on the wire respectively. All write procs are simply named write and are only differentiated by their types. This write procedure takes two arguments plus an optional third parameter, the Stream to write to, an instance of the message type to write, and a boolean telling it to prepend the message with a varint of its length or not. This boolean is used for internal purposes, but might also come in handy if you want to stream multiple messages as described in https://developers.google.com/protocol-buffers/docs/techniques#streaming. The read procedure is named similarily to all the streams module readers, simply "read" appended with the name of the type. So for the above message the reader would be named read_our_package_ExampleMessage. Notice again how you can write it in different styles in Nim if you'd like. One could of course also create an alias for this name should it prove too verbose. Analagously to the write procedure the reader also takes an optional maxSize argument of the maximum size to read for the message before returning. If the size is set to 0 the stream would be read until atEnd returns true. The len procedure is slightly simpler, it only takes an instance of the message type and returns the size this message would take on the wire, in bytes. This is used internally, but might have some other applications elsewhere as well. Notice that this size might vary from one instance of the type to another as varints can have multiple sizes, repeated fields different amount of elements, and oneofs having different choices to name a few.

Enums

Enums are named the same way as messages, and are always declared as pure. So an enum defined like this:

syntax = "proto3"; // The only syntax supported
package = our.package;
enum Langs {
  UNIVERSAL = 0;
  NIM = 1;
  C = 2;
}

Would end up with a type like this:

type
  our_package_Langs {.pure.} = enum
    UNIVERSAL = 0, NIM = 1, C = 2

For internal use enums also generate a reader and writer procedure. These are basically a wrapper around the reader and writer for a varint, only that they convert to and from the enum type. Using these by themselves is seldom useful.

OneOfs

In order for oneofs to work with Nims type system they generate their own type. This might change in the future. Oneofs are named the same way as their parent message, but with the name of the oneof field, and _OneOf appended. All oneofs contain a field named option of a ranged integer from 0 to the number of options. This type is used to create an object variant for each of the fields in the oneof. So a oneof defined like this:

syntax = "proto3"; // The only syntax supported
package our.package;
message ExampleMessage {
  oneof choice {
    int32 firstField = 1;
    string secondField = 1;
  }
}

Will generate the following message and oneof type:

type
  our_package_ExampleMessage_choice_OneOf = object
    case option: range[0 .. 1]
    of 0: firstField: int32
    of 1: secondField: string
  our_package_ExampleMessage = ref object
    choice: our_package_ExampleMessage_choice_OneOf

Exporting message definitions

If you want to re-use the same message definitions in multiple places in your code it's a good idea to create a module for you definition. This can also be useful if you want to rename some of the fields protobuf declares, or if you want to hide particular messages or create extra functionality. Since protobuf uses a little bit of magic under the hood a special exportMessage macro exists that will create the export statements you need in order to export a message definition from the module that reads the protobuf specification, to any module that imports it. Note however that it doesn't export sub-messages or any dependent types, so be sure to export those manually. Anything that's not a message (such as an enum) should be exported by the normal export statement.

Limitations

This library is still in an early phase and has some limitations over the official version of protobuf. Noticably it only supports the "proto3" syntax, so no optional or required fields. It also doesn't currently support maps but you can use the official workaround found here: https://developers.google.com/protocol-buffers/docs/proto3#maps. This is planned to be added in the future. It also doesn't support options, meaning you can't set default values for enums and can't control packing options. That being said it follows the proto3 specification and will pack all scalar fields. It also doesn't support services.

These limitations apply to the parser as well, so if you are using an existing protobuf specification you must remove these fields before being able to parse them with this library.

If you find yourself in need of these features then I'd suggest heading over to https://github.com/oswjk/nimpb which uses the official protoc compiler with an extension to parse the protobuf file.

Rationale

Some might be wondering why I've decided to create this library. After all the protobuf compiler is extensible and there are some other attempts at using protobuf within Nim by using this. The reason is three-fold, first off no-one likes to add an extra step to their compilation process. Running protoc before compiling isn't a big issue, but it's an extra compile-time dependency and it's more work. By using a regular Nim macro this is moved to a simple step in the compilation process. The only requirement is Nim and this library meaning tools can be automatically installed through nimble and still use protobuf. It also means that all of Nims targets are supported, and sending data between code compiled to C and Javascript should be a breeze and can share the exact same code for generating the messages. This is not yet tested, but any issues arising should be easy enough to fix. Secondly the programatic protobuf interface created for some languages are not the best. Python for example has some rather awkward and un-natural patterns for their protobuf library. By using a Nim macro the code can be customised to Nim much better and has the potential to create really native-feeling code resulting in a very nice interface. And finally this has been an interesting project in terms of pushing the macro system to do something most languages would simply be incapable of doing. It's not only a showcase of how much work the Nim compiler is able to do for you through its meta-programming, but has also been highly entertaining to work on.

This file is automatically generated from the documentation found in protobuf.nim. Use nim doc2 protobuf.nim to get the full documentation.

About

Protobuf implementation in pure Nim that leverages the power of the macro system to not depend on any external tools

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages