A general overview of what is needed to add support for new target language in reference KS compiler.
Things to check out and familiarize oneself with:
-
general .ksy structure, metadata
-
sequences and instances
-
primitive types
-
repetitions
-
user-defined types
-
enums
-
processing algorithms
Clone central KS repo, learn how to build a compiler from sources, install prerequisites, try to build the compiler and run the tests. Take a look at our CI.
The best way to get some inspiration in how it’s done is probably take one or two of existing target languages and experiment a little, seeing which KS concepts result in what generated code.
The core element that KS deals with is a "type". Basically, everything starts with a "type", i.e. a "file" is a "type". "Type" can include:
-
a collection of fields parsed sequentially (
seq
) -
a number of fields parsed randomly or calculated (
instances
) -
definition of nested types (
types
) -
lists of integer constants (
enums
)
Most usually, a concept of "type" maps to a concept of "class" or "object" in target language, but it can be a standalone data structure with a couple of generated methods to work with it, or probably something completely different.
Struct has to parse the data from somewhere. Decide on which target language’s abstraction of stream is best (probably it’s good to support reading both from disk files, if they’re available, and from arbitrary byte buffers). One would probably need to implement a KaitaiStream-style wrapper for these stream(s), as it might lack some useful functions, like bit-level reading, read-until-the-terminator, etc.
KS deals with a few primitive and complex types:
-
integer numbers
-
floating point numbers
-
byte arrays
-
strings
-
"generic" arrays of everything
Decide upon mapping of all these types into native types of target language. Pay extra attention to:
-
signed vs unsigned support
-
if target language has any platform-dependent types
-
encoding support for strings
-
substitutions / workarounds for everything that target language does not support
-
nulls / undefined state of variables - these could be particularly useful to implement lazy instances parsing; if a language does not support such state of variables, we’ll have to introduce extra "flag" variables to store information
Carefully check out different coding practices related to target language. If there is an official or semi-official standard or recommendation, try to follow it. Things to pay particular attention to:
-
naming standards (i.e. UpperCamelCase, lowerCamelCase, under_score_case, something else) for various parts of generated code (names of source files, classes, methods, properties, etc)
-
code formatting style and guidelines (indent size & practices)
-
docstrings layout and general documentation standards
-
private/protected/public access restriction traditions
Keeping it close to our standard API is heavily recommended, unless there’s some good reason to do it differently. In particular, try to stick to proposed method names, it will make life much easier.
Take one simplest test hello_world.ksy and translate it manually into source code in a new target language, as it would have been compiled by KS compiler. Inspiration can be drawn from any other already supported language.
Choose a testing framework that target language would use. Ideally, we should be able to run it with installing little extra dependencies in our Travis CI configuration. At the bare minimum, we’ll need a command-line based test runner that will output some report file we can aggregate in our CI later (JUnit XML reports are usually good candidates).
Create testing infrastructure in tests repo: usually it boils down to:
-
spec/$LANG
- a project or just a bunch of files with test specs; it’s heavily recommended to do 1 test .ksy format = 1 test case = 1 file, if possible -
run-$LANG
- a single script to launch test runner with all the tests
Port spec for HelloWorld test to new target language and make it work with manually compiled hello_world.ksy
.
As explained in [developers_intro.adoc], when developing support for a new language you can either inherit from ClassCompiler (which assumes you are building an object-oriented class structure) or AbstractCompiler (where nearly all implementations details are left up to you).
When implementing your new langauge you will need to create at least two new files,
-
kaitai_struct_compiler\shared\src\main\scala\io\kaitai\struct{YourLang}ClassCompiler.scala
-
kaitai_struct_compiler\shared\src\main\scala\io\kaitai\struct\languages{YourLang}Compiler.scala
{YourLang}ClassCompiler.scala dictates the high-level function calls that print your code with the CompileClass
method.
{YourLang}Compiler.scala contains functions that generate the header and function definition files - these functions are specific to your language.
Note that {YourLang}ClassCompiler.scala typically calls methods that are defined in ConstructClassCompiler.scala
. The methods defined in ConstructClassCompiler.scala
then reference lang.method()
and these methods are defined as override methods in {YourLang}Compiler.scala.