This document provides an overview of Kaitai Struct project from a developer’s perspective: general architecture, infrastructure, etc.
1. Compiler’s architecture
The heart of Kaitai Struct project is obviously a reference compiler. As all compilers, it "compiles" or "translates" input files (Kaitai Struct YAML, .ksy) into output files (source code in target programming languages, like C++, Java, etc).
In order to do this translation, compiler performs several major steps:
Loading of YAML files and parsing them into in-memory tree of objects
"Pre-compilation" - a set of preparatory actions (such as type inferring, resolving names, compile-time sanity checks), which are the same for all target languages
"Compilation" - traversal of the KSY object tree in certain order, rendering source code in target language by application of certain "templates"
2. Entry point
Before these 3 keys steps are performed, there is always some entry point to get us started. There are multiple, platform-dependent entry points in the compiler:
JVM: io.kaitai.struct.JavaMain does:
command-line argument parsing, which results in a
runs JavaMain.run with it
runs JavaMain.compileOneInput for every .ksy source file specified, doing lots of wrapping to properly handle regular output and exceptions
JS: io.kaitai.struct.MainJs has the main
3. 3-step compilation process
Current implementation bundles steps 1 and 2 into one invocation:
For JVM, it is implemented in
For JS, it is implemented in
3.1. Loading and parsing of YAML files
The aim of this stage is, given a list of file names to load
(typically, these would be .ksy files) to load them, parse them as
YAML, and convert them to all into elements of
io.kaitai.struct.format - typically various
*Spec things, like
AttributeSpec, etc. End result is a single
object, which incorporates one or many
ClassSpec which define user
3.1.1. Loading YAML files
Loading YAML files is currently, unfortunately, done by external library in platform-specific manner:
For JVM, it calls SnakeYAML. Everything related to this step is incapsulated in
For JS, it calls: TODO
We’re working on bringing pure Scala YAML parser, but it’s a relatively distant goal: see #229.
Conversion from YAML objects to Spec-objects is typically performed by
3.1.3. Loading imports
After we’ve parsed YAML, we can recursively load all files mentioned in imports. The biggest catch in this process is that it is effectively reading more disk files + running more YAML parsing, i.e. it’s a step back, again into platform-dependent territory.
The aim of this stage is to do preparatory language-independent
actions. The whole step is invoked from
individual substeps are implemented by classes in
io.kaitai.struct.precompile. Please refer to per-class documentation
(if it exists) to every particular step.
Precompilation modifies (enriches) existing ClassSpecs object. Alternatively, it might throw an exception if some of the validaton checks failed (TODO: exception structure).
Compilation is a final step, which converts enriched ClassSpecs into source code in target language.
This task is obviously dependent on target language, thus it is performed by language-specific class.
There are 2 main variations of implementing these:
Classes which inherit
AbstractCompilerdirectly, such as
GraphVizCompiler, do everything from scratch and organize generation flow in some arbitrary manner.
Most traditional languages (such as Java, Ruby, Python, C++, etc), which has something in common, use ready-made
ClassCompiler, which is a simple skeleton for compiling something resembling typical understanding of a "class" with members/methods. To introduce language-specific behavior, one can:
Provide implementation of
LanguageCompiler, which is acts like a template with many different simple methods like "start new file", "start new class", "finish a class", etc.
GoClassCompiler), overriding some of the control flow.
During compilation process, we occasionally need to do translation of
KS expression language (which is target language-agnostic) into actual
target language snippets. "Translators" are per-language classes
io.kaitai.struct.translators which implement that
translation. The most important method they provide is
which gets KS expression and is expected to return a string in target
TODO: explain about translators which do not generated only a string (i.e. Go).